Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Designing a portable natural language database interface Booth, Allan David 1983

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1983_A6_7 B66.pdf [ 6.01MB ]
Metadata
JSON: 831-1.0051837.json
JSON-LD: 831-1.0051837-ld.json
RDF/XML (Pretty): 831-1.0051837-rdf.xml
RDF/JSON: 831-1.0051837-rdf.json
Turtle: 831-1.0051837-turtle.txt
N-Triples: 831-1.0051837-rdf-ntriples.txt
Original Record: 831-1.0051837-source.json
Full Text
831-1.0051837-fulltext.txt
Citation
831-1.0051837.ris

Full Text

IGNING A PORTABLE NATURAL LANGUAGE DATABASE INTERFACE by ALLAN DAVID BOOTH B . S c , U n i v e r s i t y of B r i t i s h Columbia, 1979 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE i n THE FACULTY OF GRADUATE STUDIES ' DEPARTMENT OF COMPUTER SCIENCE We accept t h i s t h e s i s as conforming to the r e q u i r e d standard. THE UNIVERSITY OF BRITISH COLUMBIA March, 1983 © A l l a n David Booth, 1983 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date DE-6 (3/81) i i A b s t r a c t A l l o w i n g a user to i n q u i r e i n t o a database i n h i s n a t i v e language i s becoming an i n c r e a s i n g l y d e s i r a b l e f e a t u r e . Consequently, there have been a number of attempts to a t t a c h a n a t u r a l language f r o n t end to an e x i s t i n g database management system. However, few database systems today are competent i n p r o c e s s i n g n a t u r a l language q u e r i e s . The main stumbling block i s the a d a p t a t i o n of an e x i s t i n g n a t u r a l language f r o n t end to a new domain of d i s c o u r s e . The development of a domain independent n a t u r a l language i n t e r f a c e to an e x i s t i n g database management system i s d i s c u s s e d here. The u n d e r l y i n g domain independent f e a t u r e s of n a t u r a l language are examined and combined i n t o one l i n g u i s t i c c o re. The domain s p e c i f i c i n f o r m a t i o n i s gathered i n t o an i n f o r m a t i o n d i c t i o n a r y f o r the l i n g u i s t i c core to pro c e s s . F i n a l l y , the i n t e r f a c e to the database h a n d l i n g r o u t i n e s i s modularized with standard inputs and outputs. Table of Contents i i i Table of Contents 1 I n t r o d u c t i o n 1 2 The Development of Question Answering Systems 5 2.1 Syntax Without Semantics 6 2.2 Semantics Without Syntax 11 2.3 Coexistence 14 2.4 Summary 16 3 Concepts i n N a t u r a l Language P o r t a b i l i t y 18 3.1 The Language S t r u c t u r e 21 3.1.1 B u i l d i n g The Sentence S t r u c t u r e 21 3.1.2 Using The Sentence S t r u c t u r e 23 3.1.3 R e l a x a t i o n of Grammatical Rules 24 3.2 Vocabulary 27 3.2.1 Morphing 27 3.2.2 Idioms and Jargon 28 3.2.3 Verbs 28 3.2.4 Nouns 30 3.2.5 A d j e c t i v e s and Adverbs 30 3.2.6 P r e p o s i t i o n s 31 3.3 S o p h i s t i c a t i o n of Design 32 3.3.1 Metaquestions 32 3.3.2 User I n t e r a c t i o n and Communication 33 3.3.3 S p e l l i n g C o r r e c t i o n 35 Table of Contents T a b l e of C o n t e n t s i v 3.3.4 Knowledge A c q u i s i t i o n 35 3.3.5 Making Assumptions 37 3.3.6 Answer G e n e r a t i o n 37 3.4 Summary 39 4 System D e s i g n : P a r t I - The L i n g u i s t i c Core 43 4.1 The NL P a r s e r 45 4.1.1 The ATN P a r s e r 47 4.1.2 The ATN Grammar 48 4.1.3 Scanning 50 4.1.3.1 The Morpher 50 4.1.3.2 Compound Words 53 4.1.3.3 A b b r e v i a t i o n s and Synonyms 54 4.1.4 Semantic R o u t i n e s 54 4.1.4.1 Adding a Noun Phrase 55 4.1.4.2 Adding a Verb Phrase 57 4.1.4.3 Adding Noun Phrase M o d i f i e r s 59 4.1.4.4 P r e p o s i t i o n a l P h r a s e s 60 4.1.4.5 F i n d i n g Pronoun A n t e c e d e n t s 61 4.1.4.6 C o n j u n c t i o n s 62 4.1.5 L o c a l Communication 63 4.2 S y n t a c t i c D i c t i o n a r y 64 4.2.1 Noun D e f i n i t i o n 65 4.2.2 Verb D e f i n i t i o n 66 4.2.3 A d j e c t i v e D e f i n i t i o n 68 4.2.4 Q u a n t i f i e r D e f i n i t i o n 69 4.2.5 P r e p o s i t i o n D e f i n i t i o n 69 Ta b l e of C o n t e n t s T a b l e of C o n t e n t s v 4.2.6 Synonyms and A b b r e v i a t i o n s 70 4.2.7 Compound Word D e f i n i t i o n 72 4.3 Knowledge A c q u i s i t i o n 72 4.3.1 S p e l l i n g C o r r e c t i o n 74 4.3.2 Database Search 74 4.3.3 I g n o r i n g Words 75 4.3.4 E n t e r i n g New Words 76 4.4 I n t e r n a l Communication 76 4.5 B u i l d i n g the S t a n d a r d Sentence R e p r e s e n t a t i o n (SSR) . 78 4.5.1 Reducing the SSR 81 4.5.2 D e f a u l t Search F i e l d s 82 4.5.3 C o u n t i n g Database Items - *NUMBER 82 4.5.4 U s i n g an A u x i l i a r y V e r b as a Main V e r b 84 4.5.5 Verb Phrase E l l i p s i s .• 84 4.5.6 Pronoun Reference - *REF ! 85 4.5.7 Embedded Noun P h r a s e s 86 4.6 Answer G e n e r a t i o n 87 4.7 Summary ' 89 5 System D e s i g n : P a r t I I - The A p p l i c a t i o n I n t e r f a c e s 91 5.1 Domain D e f i n i t i o n 91 5.1.1 Domain D i c t i o n a r y 94 5.1.1.1 A c t i o n s 94 5.1.1.2 The F i e l d s 95 5.1.1.3 Terms and J a r g o n 98 5.1.1.4 The F i e l d Elements 99 5.1 .2 The Case L i s t 100 T a b l e of C o n t e n t s T a b l e of C o n t e n t s v i 5.1.3 I n v e r t e d Database 101 5.2 The Database I n t e r f a c e 104 5.2.1 Database Format R o u t i n e s 106 5.2.2 Data Format R o u t i n e s 107 5.3 Summary 108 6 A Change of Domains 110 6.1 The D e f i n i t i o n P r o c e s s : A Guide t o the P e r p l e x e d ... 110 6.1.1 C o n s t r u c t i n g an I n v e r t e d Database 111 6.1.2 Database F i e l d D e f i n i t i o n s 112 6.1.3 A c t i o n D e f i n i t i o n s 112 6.1.4 A b b r e v i a t i o n s , Synonyms and J a r g o n 113 6.2 The R e s t a u r a n t Domain 113 6.3 A d a p t a t i o n t o the B i b l i o g r a p h y Domain 115 6.4 The Conference Domain . 119 6.5 Summary 120 7 C o n c l u s i o n s • 121 7.1 Open I s s u e s 122 7.1.1 Text R e t r i e v a l by Content 122 7.1.2 V a l u e Judgements 122 7.1.3 M u l t i - F i e l d Answers 123 7.1.4 Complex C o n j u n c t i o n s 124 7.1.5 Pronoun R e f e r e n c e 125 7.1.6 C l a r i f i c a t i o n D i a l o g u e 125 7.1.7 Sample Sentence G e n e r a t i o n 125 7.2 Problems f o r F u t u r e Work 126 T a b l e of C o n t e n t s T a b l e of C o n t e n t s v i i 7.2.1 E x t e n s i o n s t o the S y n t a c t i c Component 126 7.2.2 E x t e n s i o n s t o the Semantic Component 126 7.2.3 A d a p t a t i o n t o a New Database System 127 7.2.4 C o m p u t a t i o n a l O p t i m i z a t i o n 127 7.3 Summary 128 B i b l i o g r a p h y 129 Appendix A: T r a n s i t i o n Network Grammar 133 Appendix B: Case L i s t 137 Appendix C: P a r t i a l D e f i n i t i o n of the S y n t a c t i c D i c t i o n a r y 139 Appendix D: P a r t i a l D e f i n i t i o n of the R e s t a u r a n t Domain ... 146 D.1 . The Domain D i c t i o n a r y 146 D.2 The Case L i s t 152 D.3 The I n v e r t e d Database 152 Appendix E: Sample S e s s i o n 156 T a b l e of C o n t e n t s F i g u r e s v i i i F i g u r e s 2.1 Sentence Deep S t r u c t u r e 7 3.1 C u r r e n t Q u e s t i o n Answering Systems 20 3.2 The AMOUNT ATN Network 25 3.3 Proposed N a t u r a l Language System 40 4.1 The L i n g u i s t i c Core 45 4.2 The N a t u r a l Language P a r s e r 46 4.3 The Q u a n t i f i e r ATN Network 49 4.4 The S u f f i x T a b l e ; 52 4.5 The S t a n d a r d Sentence R e p r e s e n t a t i o n 79 5.2 Proposed N a t u r a l Language System: A Review 92 5.2 The Domain D e f i n i t i o n Module 93 5.3 The Database I n t e r f a c e Module 105 6.1 The F i e l d s i n the R e s t a u r a n t s Database 114 6.2 The F i e l d s i n the B i b l i o g r a p h y Database 116 6.3 The F i e l d s i n the Conference Database 119 F i g u r e s ix Acknowledgements I would l i k e to extend my thanks to Dr. R i c h a r d Rosenberg who i n i t i a l l y c r e a t e d and c o n t i n u a l l y i n c r e a s e d my i n t e r e s t i n the study of n a t u r a l language understanding. I would e s p e c i a l l y l i k e to thank my parents and f r i e n d s f o r t h e i r constant support - without i t t h i s t h e s i s would undoubtedly never have been f i n i s h e d . 1 Chapter 1 I n t r o d u c t i o n One might have thought that the use of l a r g e database systems would be widespread by today. People i n every walk of l i f e c o u l d b e n e f i t from the day to day use of such systems. But the development has f a i l e d to reach t h i s expected l e v e l . One of the major reasons f o r t h i s f a i l u r e has been an i n a b i l i t y to provide the database with a s u i t a b l e n a t u r a l language (NL) i n t e r f a c e . Casual users of a database with no formal t r a i n i n g i n the use of computers i n v a r i a b l y balk at having to l e a r n a. somewhat a r t i f i c i a l language i n which to communicate with the machine. Branches of A r t i f i c i a l I n t e l l i g e n c e ( A l ) have been concerned with t h i s problem f o r some time. They probe i n t o a l l aspects of n a t u r a l language understanding from answering database q u e r i e s to automatic t r a n s l a t i o n and p a r a p h r a s i n g . The qu e s t i o n answering (Q/A) paradigm has some r a t h e r s t r o n g r e s t r i c t i o n s on the inputs allowed and these can h e l p to s i m p l i f y the task . For example, when answering q u e s t i o n s to a database, a system need not worry about d e c l a r a t i v e sentences. Because i n f o r m a t i o n i n the database w i l l be a s s o c i a t e d with one p a r t i c u l a r t o p i c , both the number of words which must be understood and the p o s s i b l e meanings of those words w i l l be 2 l i m i t e d . T h i s reduces the occurrence of ambiguity i n the subset of the language which i s being processed. But even with a l l of these assumptions which l i m i t the language p r o c e s s i n g requirements of a Q/A system, there are s t i l l few commercially v i a b l e n a t u r a l language database i n t e r f a c e s on the market. However, t h i s does not mean that the technology to produce such an i n t e r f a c e does not e x i s t . Many examples of adequate NL systems can be found i n the c u r r e n t l i t e r a t u r e ( H a r r i s 1977a; ' S a c e r d o t i 1977; Waltz et a l 1976; Woods et a l 1972). The major stumbling block i n a p p l y i n g t h i s technology seems to be i n the s t a r t - u p c o s t s of t r a n s f e r r i n g a reasonable p o r t i o n of any developed NL system to a new domain or database system. These s t a r t - u p c o s t s are u s u a l l y comparable to the i n i t i a l development c o s t of the e n t i r e system. By examining branches of Computer Science which have a l r e a d y d e a l t with the i s s u e s of p o r t a b i l i t y (such as compiler design and o p e r a t i n g systems r e s e a r c h ) , i t becomes c l e a r t h at i t i s p o s s i b l e to apply p r e s e n t Al techniques to develop Q/A systems i n t o u s e f u l t o o l s . U n f o r t u n a t e l y one cannot simply take the c u r r e n t systems and modify them to s u i t the needs. The is s u e of p o r t a b i l i t y must be b u i l t i n at the ground l e v e l i f a system i s to remain s t r u c t u r a l l y sound. Indeed, the problem that w i l l be addressed here i s the issue of p o r t a b i l i t y . How can a NL i n t e r f a c e be designed so that i t can be t r a n s f e r r e d to a d i f f e r e n t system with minimal e f f o r t and s t i l l r e t a i n a reasonably high standard of q u e s t i o n 3 an s w e r i n g c a p a b i l i t y ? The p o r t a b i l i t y i s s u e w i l l be viewed from two d i f f e r e n t p e r s p e c t i v e s : domain p o r t a b i l i t y and database  p o r t a b i l i t y . Domain p o r t a b i l i t y r e f e r s t o the problems of a p p l y i n g the NL i n t e r f a c e t o a new domain of d i s c o u r s e . The i s s u e of database p o r t a b i l i t y d e a l s w i t h a change i n the a c t u a l p h y s i c a l s t r u c t u r e and d a t a a c c e s s i n g methods of the database management system. A l l NL systems c o n t a i n some p r o c e d u r e s f o r d e a l i n g w i t h l i n g u i s t i c f e a t u r e s of the language r e g a r d l e s s of the domain or database s t r u c t u r e . I t i s t h e s e c o n c e p t s we wi s h t o e x p l o i t i n the d e s i g n of the system. The method of a c h i e v i n g p o r t a b i l i t y here has been t o a b s t r a c t a l l components of c u r r e n t systems which a r e domain independent and combine them t o g e t h e r i n t o one " l i n g u i s t i c c o r e " (Rosenberg 1980). T h i s component c o n s i s t s not o n l y of the n a t u r a l language p a r s i n g p r o c e d u r e s but a l s o of the user i n t e r a c t i o n and answer g e n e r a t i o n components. The l i n g u i s t i c c o r e c o n s u l t s w i t h a "domain d e f i n i t i o n " t o r e t r i e v e i n f o r m a t i o n about the p a r t i c u l a r domain i n which i t i s wo r k i n g and communicates w i t h the database t h r o u g h a "database i n t e r f a c e " . In Chapter 2 we w i l l r e v i e w some p a s t and c u r r e n t systems w i t h a g o a l toward p o i n t i n g out some of t h e i r achievements as w e l l as t h e i r s h o r t c o m i n g s . Next i s an ov e r v i e w of the i m p o r t a n t c o n c e p t s of n a t u r a l language p o r t a b i l i t y i n Chapter 3. A w o r k i n g system embodying t h e s e i d e a s i s p r e s e n t e d i n some d e t a i l i n c h a p t e r s 4 and 5. Chapter 6 c o n t a i n s a d e s c r i p t i o n of 4 the domain m o d i f i c a t i o n process which was necessary to change the domain from the i n i t i a l r e s t a u r a n t i n f o r m a t i o n database to a b i b l i o g r a p h y database and then to a conference r e g i s t r a t i o n database. The l a s t chapter attempts to summarize the ideas presented i n t h i s t h e s i s as w e l l as p r o v i d e some d i r e c t i o n s f o r f u r t h e r r e s e a r c h . 5 Chapter 2 The Development of Question Answering Systems The developmental path of n a t u r a l language q u e s t i o n answering (Q/A) systems has taken many t w i s t s and t u r n s . I n f l u e n c e d g r e a t l y by Chomsky's research"~~iTr t r a n s f o r m a t i o n a l grammars (Chomsky 1965), r e s e a r c h e r s i n i t i a l l y expected that l i n g u i s t i c s was going to p l a y a major r o l e i n the f i e l d . E a r l y systems were developed w i t h i n the t r a n s f o r m a t i o n a l i s t approach; that i s , they adhered to a l i n e a r paradigm (Rosenberg 1980) i n which the s t r u c t u r a l d e s c r i p t i o n of a query was f i r s t o b t a ined before any semantic p r o c e s s i n g was i n i t i a t e d (Woods 1967; Woods et a l 1972). However, s i n c e a s y n t a c t i c a l l y d i r e c t e d parse tends to generate a l a r g e number of ambiguous parses from a n a t u r a l language query, many r e s e a r c h e r s attempted to f i n d a more data d i r e c t e d method of p a r s i n g (Schank 1973; Waltz et a l 1976; Marcus 1979). I t was soon r e a l i z e d t h a t , at l e a s t i n the l i m i t e d Q/A paradigm, the "meaning" of a query c o u l d be e x t r a c t e d by t a i l o r i n g a system towards the semantics and paying only l i t t l e a t t e n t i o n to syntax (Brown et a l 1974; Waltz et a l 1976; S a c e r d o t i 1977). The major disadvantage of these systems was that they were b u i l t . e n t i r e l y around the vocabulary of the domain i n which they were working and to change the domain meant 6 to r e w r i t e most of the code. Today r e s e a r c h e r s are e x p l o r i n g the middle ground where both aspects of n a t u r a l language understanding c o e x i s t . Systems are being b u i l t which r e t a i n the framework of the p u r e l y s y n t a c t i c parse but i n c l u d e i n t e r m e d i a t e i n t e r a c t i o n with the semantic component to determine meaning and weed out unwanted parses e a r l y ( H a r r i s 1977a; Bobrow and Webber 1980). 2.1 Syntax Without Semantics (or S t r u c t u r e Without Meaning) Work on syntax d i r e c t e d p a r s i n g has, f o r the most p a r t , been based on the t r a n s f o r m a t i o n a l approach of l i n g u i s t s such as Chomsky (Chomsky 1965), Katz and P o s t a l (Katz and P o s t a l 1964). An input sentence was transformed from i t s input " s u r f a c e s t r u c t u r e " i n t o a s y n t a c t i c "deep s t r u c t u r e " before any semantic i n t e r p r e t a t i o n was attempted. The deep s t r u c t u r e of a sentence i s the l e v e l at which the meaning can be obtained, under the 1965 "Aspects" (Chomsky 1965) theory. The s u r f a c e s t r u c t u r e , on the other hand, i s the u t t e r e d form of the sentence. In g e n e r a l , one deep s t r u c t u r e c o u l d correspond to many d i f f e r e n t s u r f a c e s t r u c t u r e s . For example, the two q u e r i e s : Which r e s t a u r a n t s take r e s e r v a t i o n s ? and R e s e r v a t i o n s are taken by which r e s t a u r a n t s ? 7 although they have d i f f e r e n t s u r f a c e s t r u c t u r e s , have the same deep s t r u c t u r e ( F i g u r e 2.1). Sentence / / \ Q Noun Phrase Verb Phrase I / \ Noun Verb Noun Phrase I I I r e s t a u r a n t s take Noun I r e s e r v a t i o n s F i g u r e 2.1: Sentence Deep S t r u c t u r e The Augmented T r a n s i t i o n Network (ATN) parser (Woods 1970) i s the prime example of work i n t h i s d i r e c t i o n . The ATN i t s e l f was modelled on the f i n i t e s t a t e t r a n s i t i o n graph, which i s a network of nodes r e p r e s e n t i n g s t a t e s and d i r e c t e d , l a b e l l e d a r c s governing the c o n d i t i o n s f o r t r a n s i t i o n from one s t a t e to another. For the purpose of n a t u r a l language understanding, the t r a n s i t i o n s were based on s y n t a c t i c c a t e g o r i e s and a v a r i e t y of c o n d i t i o n s . The reason f o r producing the deep s t r u c t u r e was to capture as many of the r e g u l a r i t i e s of a n a t u r a l language as p o s s i b l e , thereby reducing the number of p o s s i b l e s t r u c t u r e s which the semantic component would have to c o n s i d e r . A f t e r producing i t s deep s t r u c t u r e , c o n t r o l would be passed to the semantic pro c e s s o r and the "meaning" e x t r a c t e d . Although there were 8 d i s t i n c t advantages to t h i s method, a few major disadvantages developed. Because the s y n t a c t i c processor ( u s u a l l y the ATN p a r s e r ) i n c o r p o r a t e d no semantic i n f o r m a t i o n , i t c o u l d only generate pure s y n t a c t i c parses which had no semantic grounding on which to determine s e n s i b i l i t y . S ince, even when employing semantic i n f o r m a t i o n , there i s s t i l l ambiguity i n the E n g l i s h language, without i t , i t became an impossible task to generate the one c o r r e c t p a r se. In f a c t , many spurious parses were u s u a l l y generated. Consequently, a "generate and t e s t " s t r a t e g y was adopted by some. A l l p o s s i b l e deep s t r u c t u r e s were generated and presented, .in t u r n , to some d e c i s i o n component ( H a r r i s 1977a) u n t i l the intended one was found. For example, the query: F i n d a car with a t r a i l e r which i s red. i s ambiguous because there i s no way to decide from syntax alone whether "red" i s the c o l o u r of the car or of the t r a i l e r . An even more important problem was that s y n t a c t i c a l l y sound q u e r i e s which had no p o s s i b i l i t y of success s e m a n t i c a l l y had to be completely parsed before the semantic d e c i s i o n component c o u l d be c o n s u l t e d . An example from the PLANES system (Waltz et a l 1976) i s : How many engine r e p a i r s r e q u i r e d maintenance i n May ? 9 I f parsed by a p u r e l y s y n t a c t i c processor there would be no way to d i s c o v e r the s u b t l e f a c t that "engine r e p a i r s " c o u l d never r e q u i r e maintenance. A d d i t i o n a l l y , s i n c e there were so many p o s s i b l e parses, i t became necessary to show the user that the system had indeed s e l e c t e d the c o r r e c t one. T h i s e i t h e r r e q u i r e d the implementation of a paraphraser ( H a r r i s 1977a) or, more o f t e n , r e q u i r e d the user ' to have a working knowledge of the system's i n t e r n a l s y n t a c t i c sentence r e p r e s e n t a t i o n (Woods et a l 1972). A d e f i n i t e advantage of the "two pass" ( s y n t a c t i c / s e m a n t i c ) system, although never f u l l y r e a l i z e d , was the a b i l i t y of the i n i t i a l p r o c e s s o r , using only s y n t a c t i c knowledge to remain r e l a t i v e l y independent of the domain in which i t was working. T h i s means that to change the domain would merely r e q u i r e m o d i f i c a t i o n s to the second, semantic p r o c e s s o r . But because the deep s t r u c t u r e produced by the s y n t a c t i c processor had to c o n t a i n a l l of the s y n t a c t i c i n f o r m a t i o n , i t was a complex r e p r e s e n t a t i o n . And because the r e p r e s e n t a t i o n was complex, the semantic processor had to be complex to "understand" i t . A l s o , the r e p r e s e n t a t i o n c r e a t e d by the s y n t a c t i c parse c o n t a i n e d l i t t l e i n f o r m a t i o n which would prove h e l p f u l i n e x t r a c t i n g the meaning of the o r i g i n a l query. Consequently, most of the " u s e f u l " p r o c e s s i n g had to be done i n the semantic phase of the program. T h e r e f o r e , the idea that the s y n t a c t i c phase of the program should never have to be r e w r i t t e n was obscured by the 10 f a c t that the semantic phase was i t s e l f i n c r e d i b l y l a r g e and complex (Woods 1967; Woods et a l 1972). Even an aspect such as anaphora which should i d e a l l y be i n c l u d e d w i t h i n the domain independent s y n t a c t i c p o r t i o n c o u l d not be because of i t s need fo r semantic i n t e r p r e t a t i o n . For the naive user to be comfortable with a computer system i t must be f a i r l y f l e x i b l e . The c r e a t i o n of completely grammatical q u e r i e s i s both verbose and d i f f i c u l t - e s p e c i a l l y "on the f l y " . Using a s t r i c t s y n t a c t i c p a r s e r f o r the f i r s t pass demanded that the grammar be reasonably i n f l e x i b l e s i n c e the only i n f o r m a t i o n which the p a r s e r c o u l d employ was s y n t a c t i c . I f the c o n s t r u c t s were not s t r i c t l y adhered t o , many am b i g u i t i e s might be introduced and the parser would become unable to parse the query. One method adopted which helped to p a r t i a l l y a l l e v i a t e t h i s problem was to t r y to parse the query as a "noun phrase u t t e r a n c e " (Woods et a l 1972) or a "sentence fragment" ( H a r r i s 1977a) i f i t c o u l d not be parsed as a complete query. T h i s , l i k e a l l "add-on" f i x t u r e s , only minimally reduced problems which were caused by the b a s i c design of the method. Furthermore, the problem of non-grammatical inputs was s t i l l not addressed. Some of the p a r s e r s which used complete backup (e.g. the ATN p a r s e r ) would c o n s t r u c t a sentence component such as a noun phrase and then, i f the parse f a i l e d , the component would be d i s s o l v e d . L a t e r , at a d i f f e r e n t stage of the same parse, the 11 same component might have to be r e c o n s t r u c t e d . T h i s i s c l e a r l y a waste of time and energy and again some work has been done to remedy t h i s s i t u a t i o n (Bobrow and Webber 1980). The semantic p o r t i o n - the second pass of the two pass system - has been handled i n a number of ways. The most common i s the " p r o c e d u r a l " semantics (Woods 1967; Woods et a l 1972) where p a t t e r n s i n the deep s t r u c t u r e t r i g g e r the use of c e r t a i n procedures. U n f o r t u n a t e l y , to update or add a new c o n s t r u c t was a complex task i n i t s e l f . 2.2 Semantics Without Syntax (or Meaning Without S t r u c t u r e ) S e m a n t i c a l l y o r i e n t e d systems are t y p i c a l l y data d i r e c t e d , one pass systems. Much of the j u s t i f i c a t i o n f o r t h i s has come from i n t r o s p e c t i o n to f i n d the methods which we o u r s e l v e s use f o r n a t u r a l language understanding (Schank 1973). The term data d i r e c t e d (as opposed to syntax d i r e c t e d ) means that r a t h e r than s e a r c h i n g f o r , say, a p r e p o s i t i o n a l phrase at each stage of a parse, one would only be looked f o r a f t e r a p r e p o s i t i o n has f i r s t been found. The concept of one pass means that the semantic content of each word or phrase i s i n t r o d u c e d immediately as the word i s parsed and, t h e r e f o r e , a f t e r j u s t one parse, the meaning of the e n t i r e query should have been found. The main t h r u s t of the s e m a n t i c a l l y o r i e n t e d Q/A systems 12 has been i n the area of so c a l l e d semantic grammars. Semantic grammars use semantic concepts as the b a s i c b u i l d i n g b l o c k s f o r d e v e l o p i n g a sentence r e p r e s e n t a t i o n r a t h e r than s y n t a c t i c ones. As examples, the SOPHIE system (Brown et a l 1974) parses concepts such as v o l t a g e s and r e s i s t o r types, the PLANES system (Waltz et a l 1976) can understand plane types and damage types and SRI's Naval database c a l l e d LADDER ( S a c e r d o t i 1977; Hendrix et a l 1978) d e a l s competently with s h i p types and p a r t s . Since the semantic concepts which semantic grammars look f o r tend to be r e l a t i v e l y unambiguous w i t h i n a p a r t i c u l a r domain, many of the n a t u r a l a m b i g u i t i e s i n the n a t u r a l language can be ignored. T h i s type of system has demonstrated a high p r o f i c i e n c y i n simple domains (Brown et a l 1974) and even simple sentence fragments and non-grammatical input can be handled. When the semantic u n i t s have been formed, they are u s u a l l y f i t t e d i n t o s l o t s i n a p r e d e f i n e d p a t t e r n to determine a c c e p t a b i l i t y (Waltz et a l 1976; Brown et a l 1974). These p a t t e r n s d e f i n e the range of q u e s t i o n s which the system can answer and any d e v i a t i o n from them w i l l u s u a l l y r e s u l t i n an unanswerable query. T h i s i s not n e c e s s a r i l y worse than the problems a r i s i n g with misunderstandings d u r i n g a s y n t a c t i c parse because at l e a s t the system d i d not generate unwanted parses; however, the system s t i l l d i d not r e a l l y have a grasp of where i t went wrong and t h e r e f o r e , c o u l d r a r e l y p r o v i d e the user with any guidance as to why the e r r o r may have o c c u r r e d . 13 Semantic grammars themselves are a method of r e p r e s e n t i n g the d o m a i n - s p e c i f i c knowledge i n a Q/A system. U n f o r t u n a t e l y to c r e a t e or modify them, as i s the case with any coded or " p r o c e d u r a l " semantics (Woods 1967), one r e q u i r e s e i t h e r a complete knowledge of the intimate d e t a i l s of the system's workings or the use of a semantic grammar "generator". LIFER (Hendrix 1977), used i n c o n j u n c t i o n with the LADDER (S a c e r d o t i 1977) system at SRI i s an example of such a generator. NETEDI (Waltz et a l 1976) i s used i n the PLANES system to modify a l r e a d y c r e a t e d semantic grammars. However even with the help of the grammar generator, i t i s s t i l l an u n s t r u c t u r e d and somewhat ad hoc process to i n s e r t new grammatical s t r u c t u r e s i n t o the o r i g i n a l code. Since l i t t l e n o t i c e i s p a i d to syntax i n a s e m a n t i c a l l y d r i v e n system, some of the r e g u l a r i t i e s i n E n g l i s h (or any n a t u r a l language) cannot be e a s i l y captured. Without the b e n e f i t of s y n t a c t i c s t r u c t u r e s , the semantic grammar can degenerate to the l e v e l of having to s p e c i f y or expect every p o s s i b l e query. Regardless of the complexity of the r e p r e s e n t a t i o n , however, a major flaw i n the philosophy of s e m a n t i c a l l y o r i e n t e d systems i s t h a t , being developed around the semantics of a p a r t i c u l a r domain, they must be v i r t u a l l y r e w r i t t e n when a p p l i e d to a new s i t u a t i o n . 1 4 2.3 C o e x i s t e n c e Some c u r r e n t Q/A systems t r y t o t a k e advantage of s y n t a c t i c and semantic p r o c e s s i n g s i m u l t a n e o u s l y . A b a s i c a l l y s y n t a c t i c p a r s e i s performed w i t h i n t e r m e d i a t e c a l l s t o the semantic r o u t i n e s t o d e t e r m i n e a c c e p t a b i l i t y and e x t r a c t meaning (Bobrow and Webber 1980). The u n i t s formed a r e then combined i n t o the semantic sentence r e p r e s e n t a t i o n . In t h i s way, t h e systems g a i n t h e s y n t a c t i c framework of the s y n t a c t i c a l l y o r i e n t e d systems as w e l l as the " s i n g l e p a s s " advantage of the s e m a n t i c a l l y o r i e n t e d ones. Rather than h a v i n g a d i s t i n c t s y n t a c t i c - s e m a n t i c p r o c e s s i n g s p l i t , t h e r e i s a s y n t a c t i c - s e m a n t i c knowledge s p l i t . The s y n t a c t i c knowledge i s t h a t which i s always t r u e f o r a l l domains, b e i n g based on the s y n t a x of the n a t u r a l language, whereas the semantic knowledge depends on the domain i n v o l v e d . The i m p o r t a n t problem w i t h t h e s e systems l i e s i n t h e s e a r c h f o r an adequate and e a s i l y m o d i f i a b l e r e p r e s e n t a t i o n f o r t h e s p e c i f i c a t i o n of the domain dependent knowledge. As mentioned b e f o r e , b oth semantic grammars and p r o c e d u r a l s e m a n t i c s a r e means of r e p r e s e n t i n g semantic knowledge. The major d i s a d v a n t a g e of t h e s e , a l o n g w i t h Woods' cascaded ATNs (Woods 1980), i s t h a t they a r e b a s i c a l l y program code and t o a l t e r them u s u a l l y r e q u i r e s u n s t r u c t u r e d programming changes. C l e a r l y , i t would be s i m p l e r t o modify i n f o r m a t i o n t h a t was s t r u c t u r e d i n a d i c t i o n a r y - t y p e , d e c l a r a t i v e f o r m a t . 15 The G r a c e f u l I n t e r a c t i o n (GI) system (Hayes and Reddy 1979) i s an attempt t o s e g r e g a t e many d i f f e r e n t l e v e l s of knowledge. The domain s p e c i f i c (or t a s k s p e c i f i c ( B a l l and Hayes 1980)) knowledge i s r e p r e s e n t e d i n a "schema" which i s p a t t e r n e d a f t e r M i n s k y ' s frame s t r u c t u r e s (Minsky 1975). These schema c a p t u r e the i d e a t h a t the domain independent i n f o r m a t i o n i n a system s h o u l d not o n l y be s e p a r a b l e from the o v e r a l l system, but a l s o be f o r m a l l y d e f i n e a b l e . The GI schema reduce t h e d e f i n i t i o n of a new database t o the l e v e l of s l o t f i l l i n g , t h e r e b y r e d u c i n g the e f f o r t i n v o l v e d i n the p r o c e s s w h i l e a l s o m i n i m i z i n g the chance of e r r o r . However, because these schema have been d e s i g n e d t o d e s c r i b e computer programs and not r e a l w o r l d s i t u a t i o n s , t hey do not d e a l w i t h i n c o m p l e t e d e s c r i p t i o n s . Most database systems a r e f o r c e d t o d e a l w i t h i n c o m p l e t e w o r l d d e f i n i t i o n s . Problems u s u a l l y o c c u r when one t r i e s t o d e f i n e the p o s s i b l e database elements s i n c e i n many i n s t a n c e s a f i e l d can c o n t a i n v i r t u a l l y any v a l u e . T h i s makes the p o s s i b l e range of v a l u e s i n f i n i t e and, c o n s e q u e n t l y , h a r d t o d e f i n e . The ROBOT system ( H a r r i s 1977a) uses an i n v e r t e d index of the database as the w o r l d knowledge f o r the system. I t i s t r e a t e d as an e x t e n s i o n of the d i c t i o n a r y and so a d d i t i o n of a new element t o the database s i m p l y r e q u i r e s an update of the i n v e r t e d i n d e x . T h i s method a l l o w s a c t u a l v a l u e s i n the database t o t a k e on meanings even i f the e l e m e n t s , p o s s i b l y j a r g o n , may not be found i n a r e a l d i c t i o n a r y . 16 In capturing and exploiting the r e g u l a r i t i e s in a natural language l i k e English, no system seems to have more potential than the case driven systems (Taylor and Rosenberg 1975). A case grammar (Fillmore 1968) attempts to capture the purpose of a word or phrase in a sentence by determining i t s role in terms of a system of cases. For example in the query: Who serves chicken ? "who" i s the agent of the action, "serves" i s the action and "chicken" i s the patient (or the thing being acted upon). Case systems combine both syntactic structures and semantic knowledge into one unit which can be s p e c i f i e d in a concise, understandable and e a s i l y modifiable way. In addition, there i s a structure imposed by the case "frames" which provides a basis for a formal, structured change to the semantic knowledge. The case system proposed by Fillmore in 1968 did not define the actual number of cases needed to specify a natural language but attempts have yielded numbers ranging from a mere fi v e (Celce-Murcia 1979) on up. The actual number of cases i s irr e l e v a n t , however, because i t i s the a b i l i t y to specify a domain simply by specifying the cases that gives the system i t s power and elegance. 2.4 Summary There currently does not appear to be a t r u l y domain or 17 database independent n a t u r a l language q u e s t i o n a n s w e r i n g system. I t does appear t h a t the c l o s e s t t h i n g t o one i s the ROBOTf ( H a r r i s 1977a) system, m a i n l y because of i t s use of an i n v e r t e d index of the database as the b a s i c semantic knowledge f o r the domain. The RUS p a r s e r a t t a c h e d t o the PSI-KLONE system (Bobrow and Webber 1980) a t BBN appears t o be making headway by d e p a r t i n g from b o t h the s y n t a c t i c a l l y o r i e n t e d and the s e m a n t i c a l l y o r i e n t e d p a r s i n g methods t o combine the two i n t o one g e n e r a l p a r s e . But even t h e s e systems s t i l l appear t o t r y t o keep t h e s y n t a c t i c and semantic knowledge somewhat s e p a r a t e . A remedy f o r t h i s seems t o l i e i n the case d r i v e n systems which a l l o w t h e c o m b i n a t i o n of s y n t a x and s e m a n t i c s not o n l y a t the p r o c e s s i n g l e v e l but a l s o a t the knowledge l e v e l . fThe ROBOT ( H a r r i s 1977a) system i s now b e i n g marketed by the A r t i f i c i a l I n t e l l i g e n c e Corp. under the name I n t e l l e c t . For f u r t h e r d e t a i l s see Johnson (1981). 18 Chapter 3 Concepts i n N a t u r a l Language P o r t a b i l i t y C e r t a i n l y the i d e a of a t r a n s p o r t a b l e computer program i s not a new one. R e s e a r c h e r s i n c o m p i l e r d e s i g n and o p e r a t i n g systems have s t u d i e d t h i s problem f o r some time now"(JcJhnson and R i t c h i e 1978; R i c h a r d s 1969). These s t u d i e s have shown t h a t t o make a system t r a n s p o r t a b l e , the changeable p a r t must be s e p a r a t e d from the system c o r e . The two d i s c i p l i n e s do n o t , however, s p l i t up t h e i r p r o j e c t s i n the same way. The c o m p i l e r d e s i g n s p l i t f o l l o w s the f l o w of the program. U s u a l l y the "code g e n e r a t i o n " phase i s s e p a r a t e , f o l l o w i n g the s y n t a c t i c and semantic p r o c e s s i n g phases. The o p e r a t i n g systems s p l i t , , on the ot h e r hand, i s one of f u n c t i o n a l i n t e r f a c e s . The system dependent r o u t i n e s may be c a l l e d upon a t any time d u r i n g the o p e r a t i n g system's p r o c e s s i n g . A t tempts i n A l t o f o l l o w t h i s s i m p l e s e p a r a t i o n i d e a have met w i t h modest s u c c e s s . E a r l i e r we examined systems which f o l l o w the c o m p i l e r d e s i g n p o r t a b i l i t y method of " f l o w s e p a r a t i o n " where s y n t a c t i c p r o c e s s i n g of a q u e s t i o n i s completed b e f o r e any semantic p r o c e s s i n g i s s t a r t e d . T h i s i s not n e c e s s a r y . J u s t as i n the o p e r a t i n g system where system dependent r o u t i n e s may play, an i m p o r t a n t r o l e a t any t i m e , domain and database dependent r o u t i n e s may p l a y an i m p o r t a n t 19 r o l e at any time d u r i n g NL p r o c e s s i n g . The important f e a t u r e of t h i s type of s e p a r a t i o n i s the s t r u c t u r e d , w e l l - d e f i n e d i n t e r f a c e between the system dependent and independent r o u t i n e s . In the p a s t , NL systems which attempted to i n c o r p o r a t e any t r a n s p o r t a b i l i t y f e a t u r e s have been s p l i t i n t o d i f f e r e n t phases (F i g u r e 3.1). Each phase of the program, s y n t a c t i c , semantic and r e t r i e v a l , was f o r c e d to operate on i t s own. Since the s y n t a c t i c s t r u c t u r a l d e s c r i p t i o n had to c o n t a i n any i n f o r m a t i o n which the l a t e r phases might r e q u i r e , the s t r u c t u r e developed became extremely complex and c o n v o l u t e d . Furthermore, because the semantic component had to process t h i s r e l a t i v e l y complex s t r u c t u r e i t had to be f a i r l y complex i n d e s i g n . The review of v a r i o u s n a t u r a l language q u e s t i o n answering systems has shown the s i m i l a r i t i e s of t h e i r t a s k s . Whether the domain i n v o l v e d was f o r moon rocks or a company p a y r o l l , there were always a number of common f u n c t i o n s to perform. T h i s chapter w i l l examine some of these f u n c t i o n s and t r y to show the need f o r simultaneous s y n t a c t i c and semantic p r o c e s s i n g . A d d i t i o n a l l y i t w i l l be shown that each of the tasks c o n t a i n s a domain dependent as w e l l as a domain independent p o r t i o n . I t i s t h i s d i f f e r e n c e which i s important i n a p o r t a b l e q u e s t i o n answering system s i n c e i t i s only the domain dependent i n f o r m a t i o n which need be m o d i f i e d when changing the domain of d i s c o u r s e . u s e r query / \ / s y n t a c t i c \ \ p a r s e r / \_ / s t r u c t u r a l d e s c r i p t i o n / \ / semantic \ \ t r a n s l a t o r / \ _/ query language statement / \ / d a t a \ \ r e t r i e v e r / \ / database answer F i g u r e 3.1: C u r r e n t Q u e s t i o n Answering Systemsf f t a k e n from Rosenberg (1980) p. 5. 21 3.1 The Language S t r u c t u r e From the beginning r e s e a r c h e r s have n o t i c e d that s t r u c t u r a l f e a t u r e s of n a t u r a l languages h e l p to c a t e g o r i z e sentences and l i m i t the t o t a l number of d i f f e r e n t v a r i a t i o n s p o s s i b l e . However, even though the E n g l i s h language imposes a reasonably s t r i c t s t r u c t u r e on sentence components and the r o l e s they must p l a y i n a query, there i s s t i l l much ambiguity to be found among these components. S o l v i n g t h i s ambiguity to determine the c o r r e c t sentence s t r u c t u r e turns out to be a d i f f i c u l t task. E f f i c i e n t l y u s i n g the s t r u c t u r e a f t e r i t has been b u i l t i s again d i f f i c u l t . However, using the s t r u c t u r e as i t i s developed d u r i n g the b u i l d i n g process to p r o v i d e c l u e s f o r f u t u r e a d d i t i o n s w i l l make both processes s i m p l e r . 3.1.1 B u i l d i n g The Sentence S t r u c t u r e The syntax of the language can p r o v i d e many c l u e s as to the meaning of a sentence without even t a k i n g i n t o account the a c t u a l meanings of the i n d i v i d u a l words. Examining a t y p i c a l q u e s t i o n answering s e s s i o n might produce a number of q u e r i e s of the form: How many warchen b l i n g e s are there? Which i s the best f r u g l e ? Where i s the ploon? When d i d the muddel frump? F i n d a l l of the b l i n t o g s which have punded. 22 In order to b u i l d the sentence s t r u c t u r e , i t i s not necessary to know what the warchen b l i n g e s , f r u g l e s , ploons, muddels or b l i n t o g s are or even to know what i t means to frump or to pund. I t i s , however, mandatory to know which are the nouns and verbs, and which i s the su b j e c t and o b j e c t . T h i s i n f o r m a t i o n i s , f o r the most p a r t , e i t h e r p o s i t i o n a l (dependent upon the l o c a t i o n of the word i n the sentence) or morphological (dependent upon the s t r u c t u r e of the word). I t i s d u r i n g the examination of more complex cases that problems a r i s e . In the sentence: Have the b l i n t o g s punded the f r u g l e with the ploon? i t i s impossible to immediately decide on the c o r r e c t sentence s t r u c t u r e without i n v o l v i n g the meanings of the i n d i v i d u a l words. For example with one set of meanings: Have the b u l l i e s h i t the -boy with the b r i c k ? the p r e p o s i t i o n a l phrase "with the b r i c k " probably m o d i f i e s the verb phrase " h i t " , while i n : Have the b u l l i e s h i t the boy with the g l a s s e s ? the assumption would be that the p r e p o s i t i o n a l phrase "with the g l a s s e s " i s modifying the noun phrase "the boy". T h i s problem with p r e p o s i t i o n a l phrase m o d i f i c a t i o n causes much ambiguity i n the E n g l i s h language. A c t u a l l y there i s no 23 way to determine which i s the c o r r e c t i n t e r p r e t a t i o n from the one sentence alone and p o s s i b l y none even when the e n t i r e d i a l o g u e i s taken i n t o account. E i t h e r i n t e r p r e t a t i o n i s p o s s i b l e . However, humans would u s u a l l y p r e f e r one i n t e r p r e t a t i o n over the other and t h i s p r e f e r e n c e should somehow be taken i n t o account. Regardless of which i n t e r p r e t a t i o n the i n d i v i d u a l words have, the components noun phrase, verb phrase and p r e p o s i t i o n a l phrase can be j o i n e d i n t o separate u n i t s f o r f u r t h e r p r o c e s s i n g . The ambiguity l i e s o u t s i d e the bounds of the i n d i v i d u a l components but ra t h e r i n the r e l a t i o n s h i p s among them. The semantics of these u n i t s can be used to f i t the f i n a l sentence s t r u c t u r e t o g e t h e r . I f i n f o r m a t i o n gathered e a r l i e r i n the sentence s t r u c t u r e b u i l d i n g process i s used along with the semantic i n t e r p r e t a t i o n of the c u r r e n t component to h e l p r e s o l v e the a m b i g u i t i e s , there i s a l a r g e p r o b a b i l i t y t h at the c o r r e c t o v e r a l l i n t e r p r e t a t i o n of' the sentence w i l l be made without the need f o r backup. Even i f some backup i s r e q u i r e d , the components can be r e o r g a n i z e d without any need to d i s s o l v e them. 3.1.2 Using The Sentence S t r u c t u r e Once the sentence s t r u c t u r e has been b u i l t , methods developed i n l i n g u i s t i c s can be used to f i n d pronoun antecedents ( H i r s t 1979). Simple verb phrase e l l i p s i s can a l s o be q u i t e e a s i l y handled once there i s a comprehensive sentence s t r u c t u r e 24 to work with (Hendrix et a l 1978). There e x i s t u n i v e r s a l concepts (not domain dependent) which can be reco g n i z e d by t h e i r s t r u c t u r e . For these p a r t i c u l a r l i n g u i s t i c c o n s t r u c t s , the idea behind "semantic" grammars may be h e l p f u l . These grammars t r y to recognize s p e c i f i c c o n s t r u c t s rather than g e n e r a l ones. One of the semantic grammars i n the PLANES (Waltz et a l 1976) system i s used to recognize "amounts" (Figure 3.2). I t has been designed to recognize c o n s t r u c t s such as "more than t h r e e " , "more than three but l e s s than f i v e " , and "three or fewer times". I f a system t r i e s to recognize "amounts" (as i n the PLANES (Waltz et a l 1976) system) or " q u a n t i f i e r s " (as i n t h i s system), as concepts which are u n i v e r s a l i n nature and not t i e d to any domain, the power of the semantic grammar cari be obtained without having to take along with i t i t s inherent domain s p e c i f i c i t y . 3.1.3 R e l a x a t i o n of Grammatical- Rules Of course when working with humans, one must remember that they are f a l l i b l e . For t h i s reason i t i s q u i t e important that a l l of the grammatical and s t r u c t u r a l r u l e s be r e l a x e d when they are not a b s o l u t e l y necessary. For example, lack of number agreement can u s u a l l y be accomodated by a human i n the process of understanding the sentence. An e r r o r such as: F i n d a books about n a t u r a l language understanding. 25 (defatn AMOUNT ((*AMOUNT (wrd (any some) t ( s e t r r e l '>) ( s e t r # 0) (to AM:END)) (wrd between t ( s e t r r e l '<>) (to AM:<>)) (cat comp (not (wrd between)) ( s e t r r e l ( s e l e c t q * ( a t l e a s t '>=) (atmost '<=) ( l e s s t h a n '<) (gre a t e r t h a n '>) ( e x a c t l y ' = ) n i l ) ) (to AM:REL)) (jump AM:REL t ( s e t r r e l '=))) (AM:<> (cat i n t e g e r t ( s e t r # *) (to AM:<>:1))) (AM:<>:1 (wrd and t (to AM:<>:2))) (AM:<>:2 (cat i n t e g e r t ( s e t r r e l '<) ( s e t r # (max ( l i s t $# * ) ) ) (to AM:END))) (AM:REL (cat i n t e g e r t ( s e t r # *) (to AM:#))) (AM:# . (wrd (time times) t (to AM:#)) (cat conj t (eg $ r e l '=) (to AM:CONJ)) (jump AM:AMT)) (AM:AMT (cat conj t (to AM:AMT1)) (jump AM:END)) (AM:AMT1 (push *AMOUNT t ( s e t r pred ( l i s t *)) (to AM:END))) (AM:CONJ (wrd (fewer l e s s ) t ( s e t r r e l '<=) (to AM:END)) (wrd more t ( s e t r r e l '>=) (to AM:END))) (AM:END (wrd (time times) t (to AM:END)) (pop (append ( l i s t ( b u i l d q (+ +) r e l #)) $ p r e d ) ) ) ) ) F i g u r e 3.2: The AMOUNT ATN Networkf f taken from Waltz et a l (1976) p. 116. The code has been somewhat a b b r e v i a t e d from i t s o r i g i n a l form. 26 should be able to be parsed even though there i s a c e r t a i n amount of ambiguity. S i m i l a r l y , sentence fragments such as: Books by Chomsky. should be handled. Simple verb phrase e l l i p s i s would be q u i t e common i n a q u e s t i o n answering system. An example of t h i s i s : Who serves s p a g h e t t i ? THE "OLD SPAGHETTI FACTORY". steak? At t h i s p o i n t the system should be a b l e to i n f e r that the qu e s t i o n was r e a l l y : Who serves steak? and a c t a c c o r d i n g l y . I f t h i s r e l a x a t i o n i s not done, the s t r i c t grammatical r u l e s w i l l remove any freedom the user once had i n s p e c i f y i n g h i s query - p o s s i b l y to l e s s than t h a t of an " a r t i f i c i a l " query language. In the s i t u a t i o n s where the system makes allowance fo r a grammatical e r r o r , there should be some way to t e l l the user which i n t e r p r e t a t i o n of the sentence was being used. A common method has been to r e t u r n a c o r r e c t e d v e r s i o n of the input sentence to the user f o r v e r i f i c a t i o n . A p r e f e r r e d way would be to develop an answer g e n e r a t i o n component which would somehow i n c o r p o r a t e the o r i g i n a l q u e s t i o n i n t o the f i n a l answer. 27 3.2 Vocabulary As a r e s u l t of reviewing the s e m a n t i c a l l y d r i v e n systems i t can be seen that they performed q u i t e w e l l c o n s i d e r i n g that they were p r i m a r i l y keying on the meanings of s p e c i f i c words and almost completely i g n o r i n g the sentence s t r u c t u r e . The vocabulary used i n any one p a r t i c u l a r domain i s an extremely important source of knowledge which cannot be ignored. There are many f a c e t s to the s t r u c t u r e and meaning of words and word groups w i t h i n any one domain from the simple meanings we a t t a c h to proper nouns to the i n f e r r e d meanings of i d i o m a t i c phrases. 3.2.1 Morphing Word morphology i s the study of the s t r u c t u r e of words. In a n a t u r a l language parser a "morpher" u s u a l l y r e f e r s to a r o u t i n e which s y s t e m a t i c a l l y removes p r e f i x e s and s u f f i x e s to f i n d the root of a given word. Because of the s t r u c t u r e of E n g l i s h words, t h i s can be done u s u a l l y by knowing the s y n t a c t i c c ategory and use of the word and i g n o r i n g the meaning. Then with the combination of root word meaning and the f u n c t i o n of the p r e f i x e s and s u f f i x e s , the meaning of the e n t i r e word can be determined. T h i s process a l l o w s a system to understand many words with only a l i m i t e d d i c t i o n a r y of re g u l a r and i r r e g u l a r words. 28 3.2.2 Idioms and Ja r g o n In c o n t r a s t , a s p e c t s of n a t u r a l language such as idioms and j a r g o n a r e t o t a l l y s e m a n t i c a l l y o r i e n t e d . Some i d i o m s a re t o o complex t o handle even w i t h c u r r e n t A l t e c h n o l o g y but most become c o m p l i c a t e d o n l y i f they a r e p a r s e d w i t h t h e c o n v e n t i o n a l methods. T h e r e f o r e , i t would p r o b a b l y be advantageous t o c o n s i d e r t h e s e as semantic c o n c e p t s and handle them b e f o r e any normal p a r s i n g i s a p p l i e d . To do t h i s , a c c e s s t o t h e semantic domain knowledge must be p r o v i d e d d u r i n g the p a r s e . 3.2.3 Verbs In the Q/A paradigm, the v e r b performs many d i f f e r e n t f u n c t i o n s . The f i r s t , and most o b v i o u s , i s t o d e s i g n a t e t h e a c t i o n of the query. For example, i n the se n t e n c e : Who s e r v e s c h i c k e n ? the case frame of the v e r b " s e r v e " p r o v i d e s s l o t s f o r the component noun p h r a s e s . The second common f u n c t i o n of the v e r b i s t o d e s i g n a t e the o p e r a t i o n which i s t o be performed by the system. When p a r s i n g t h e i n p u t : F i n d a cheap Japanese p l a c e . t h e v e r b " f i n d " i s i n t e r p r e t t e d as a command t o r e t u r n the name 29 of a r e s t a u r a n t which s a t i s f i e s the c o n s t r a i n t s "cheap" and "Japanese". In the sentence: T o t a l the s a l a r i e s of the managers. " t o t a l " i s taken to be a command to the system. O b v i o u s l y , to process t h i s p a r t i c u l a r command p r o p e r l y , the system must have the c a p a b i l i t y of " t o t a l l i n g " a f i e l d , e i t h e r w i t h i n the database system or w i t h i n the NL i n t e r f a c e i t s e l f . T h i s f u n c t i o n of the verb w i l l be dependent only on the f u n c t i o n s a v a i l a b l e i n the database and not on the domain. The other major use of the verb i s s t r i c t l y s y n t a c t i c . In the sentence: Where i s White Spot? the verb "be" i s used to designate the c o n s t r a i n t . A u x i l i a r y verbs are used at the beginning of a sentence to i n d i c a t e t h a t the query r e q u i r e s a yes-no response. T h i s can be shown i n the example: Does Yangtzee open on"Thursdays? where the main verb i s "open" and the a u x i l i a r y verb "do" i s used to designate the type of answer d e s i r e d . 30 3.2.4 Nouns In t h i s type of system, nouns a r e u s u a l l y t i e d t o the domain i n some way. P r o p e r nouns w i l l a l most always be found as v a l u e s i n the database whereas common nouns w i l l be found not o n l y as d a t a b a s e v a l u e s but a l s o as g e n e r a l domain j a r g o n . In the q uery: What i s on t h e menu a t White Spot? the p roper noun "White Spot" w i l l be found i n the database but the common noun "menu" p r o b a b l y w i l l n o t . However, b o t h of t h e s e nouns form a p a r t of the domain s p e c i f i c i n f o r m a t i o n . 3.2.5 A d j e c t i v e s and Adverbs Many a d j e c t i v e s and adverbs appear, on the s u r f a c e , t o be domain independent b u t , i n r e a l i t y , a r e n o t . In the query: Which " i s the cheapest Greek r e s t a u r a n t ? the a d j e c t i v e " c h e a p e s t " would have a m i x t u r e of p r o p e r t i e s which would i n c l u d e a domain independent as w e l l as a domain dependent p a r t . In a Q/A system, a s u p e r l a t i v e would u s u a l l y i n d i c a t e t h a t the g r e a t e s t or l e a s t v a l u e of a f i e l d was d e s i r e d . T h i s would be the domain independent p o r t i o n of the meaning. The domain dependency comes i n d e c i d i n g which f i e l d i s t o be examined and which o r d e r i n g of f i e l d v a l u e s w i l l be used. 31 When the a d j e c t i v e or adverb i s not d e s i g n a t i n g a f i e l d , as i n : F i n d at l e a s t 4 . . . then the word can be assumed to have only the domain independent p o r t i o n of the meaning. T h e r e f o r e , some a d j e c t i v e s and adverbs can r e s i d e i n the s y n t a c t i c d i c t i o n a r y while o t h e r s must form a par t of the domain s p e c i f i c i n f o r m a t i o n . 3.2.6 P r e p o s i t i o n s P r e p o s i t i o n s p l a y an extremely important r o l e i n E n g l i s h , e s p e c i a l l y when a case theory i s being implemented. By using these words as f l a g s i t i s p o s s i b l e to determine a l i m i t e d number of uses of a phrase before a c t u a l l y examining the e n t i r e phrase - i n some domains there may be only one reasonable meaning. For example, i f we have a LOCATION f i e l d i n our domain but no TIME f i e l d , then: at the . . . w i l l most l i k e l y be d e s i g n a t i n g some form of l o c a t i o n . O bviously the process i s not as ' simple as t h i s one example i l l u s t r a t e s because i n general there are many d i f f e r e n t " f l a g s " f o r any one idea (or case) as w e l l as many d i f f e r e n t ideas f o r any one f l a g . However t h i s method w i l l , at the very l e a s t , p r ovide a s t a r t i n g p o i n t f o r determining the c o r r e c t i n t e r p r e t a t i o n . 3 2 3.3 S o p h i s t i c a t i o n of Design To design a working system i s not d i f f i c u l t . To design a usable system, however, i s . Among what are c l a s s e d as aspects of design s o p h i s t i c a t i o n are such elements as s p e l l i n g c o r r e c t i o n , g e n e r a l user i n t e r a c t i o n , metaquestions and knowledge a c q u i s i t i o n . 3.3.1 Metaquestions A major f e a t u r e of any robust system i s i t s a b i l i t y to handle q u e s t i o n s and e x p l a n a t i o n s about i t s e l f , i . e . "metaquestions". I f asked: How many records are there? i t would be unreasonable f o r a system to r e t r i e v e every r e c o r d from the database and then count them, but i t should i n s t e a d simply have a count ready. S i m i l a r l y i f asked: What do you know about? the system should not dump the contents of the database. These kinds of qu e s t i o n s should be recognized and processed with l i t t l e or no database i n t e r a c t i o n . The component which i d e n t i f i e s these q u e s t i o n s should be domain independent because the q u e s t i o n s themselves w i l l be the same r e g a r d l e s s of which domain the system i s working i n . The answers to the qu e s t i o n s 33 a r e , however, b o t h domain and database dependent. But answers t o t h e s e q u e s t i o n s w i l l p r o b a b l y not be found i n the database d i r e c t l y and t h e r e f o r e they must be c o n s i d e r e d p a r t of the domain dependent i n f o r m a t i o n . 3.3.2 User I n t e r a c t i o n and Communication Any system which i s d e s i g n e d t o communicate w i t h even p a r t i a l l y n a i v e u s e r s must have some way t o i n f o r m the user when i t i s c o n f u s e d or needs a d d i t i o n a l i n f o r m a t i o n . T h i s component of the system need o n l y use the r e l e v a n t p o r t i o n s of the semantic meanings of the domain dependent words i n i t s a t t e m p t s t o e x t r a c t the r e q u i r e d i n f o r m a t i o n from the u s e r . T h i s t ype of u s e r i n t e r a c t i o n can be g u i d e d by the system and r e s t r i c t the p o s s i b l e user answers so t h a t i t w i l l o b t a i n the i n f o r m a t i o n i t i s s e e k i n g q u i c k l y . Assume a system c o n t a i n s i n v e n t o r y i n f o r m a t i o n f o r a b o o k s t o r e and has never been t o l d t h a t " p u r p l e " i s a c o l o u r . The q u e s t i o n : Are t h e r e any p u r p l e pens? might produce a c o n f u s i o n i n the system and a r e a s o n a b l e response would be: 34 I don't understand the meaning of " p u r p l e " . Is i t a : 1. q u a l i t y 2. c o l o u r 3. manufacturer 4 . something e l s e T h i s p a t t e r n c o u l d be generated by knowing the p o s s i b l e f i e l d s i n which the word can belong or by making a p r e d i c t i o n based on the p a r t of the sentence processed so f a r . For example i t might be used i f the word c o u l d only be found i n a few, e q u a l l y probable f i e l d s and none of. the f i e l d s had been chosen as the d e f a u l t . T h i s answer c o u l d then be s t o r e d i n the d i c t i o n a r y f o r l a t e r r e f e r e n c e . T h i s type of menu d r i v e n d i a l o g u e has been shown to work w e l l enough to a l l o w the user to see where the system i s confused, to a l l o w the user to give a c o r r e c t , understandable and c o n c i s e response without f o r c i n g the user to understand the system's i n t e r n a l r e p r e s e n t a t i o n f o r q u e r i e s . A d d i t i o n a l l y , the menu d r i v e n method g i v e s the user some guidance and a s s i s t a n c e i n d e c i d i n g what an a p p r o p r i a t e answer would be. In c o n t r a s t , a q u e s t i o n such as: What does "purple" mean? would provide no guidance f o r the user at a l l . 35 Most of t h i s menu d r i v e n d i a l o g u e can be generated by the component r e q u i r i n g the answer and then passed to a "user i n t e r a c t i o n " component to e x t r a c t the answer from the user. T h i s allows the user to see one c o n s i s t e n t i n t e r f a c e r e g a r d l e s s of which p a r t of the system needs the i n f o r m a t i o n . The c o n t r o l l i n g user i n t e r a c t i o n component, again, i s not dependent upon domain. 3 . 3 . 3 S p e l l i n g C o r r e c t i o n The a r t of s p e l l i n g c o r r e c t i o n i s s t i l l a very ad hoc, time consuming and u n r e l i a b l e p r o c e s s . N e v e r t h e l e s s , i t should be done i f at a l l p o s s i b l e i n a reasonable (not n o t i c e a b l e to the user) amount of time. Many software systems now employ some form of s p e l l i n g c o r r e c t i o n procedure i n t h e i r makeup, from simple te x t p r o c e s s i n g systems to complex "programmer's workbench" systems. The a l g o r i t h m s range from simple lookup of common s p e l l i n g e r r o r s to complicated procedures where a user's t y p i c a l mistakes are "remembered" by the system. 3 . 3 . 4 Knowledge A c q u i s i t i o n There are many l e v e l s of knowledge a c q u i s i t i o n , even i n a simple q u e s t i o n answering system. Some of the new knowledge comes from w i t h i n the system, such as when a new word i s broken apar t and subsequently "understood", while other knowledge comes d i r e c t l y from the user, such as when a new term i s d e f i n e d . 36 S t i l l other i n f o r m a t i o n can be d e r i v e d from the d i a l o g u e . Some work i s being done in b u i l d i n g a p s y c h o l o g i c a l model of the user as the d i a l o g u e p r o g r e s s e s . However, a l l of t h i s l e a r n i n g whether simple or complex - r e q u i r e s some use of a dynamic knowledge a c q u i s i t i o n component which may be i n v o l v e d at any stage of the d i a l o g u e . A simple "add-on" f e a t u r e i s not enough. In our p r e v i o u s example, once the system has found an answer to i t s q u e s t i o n and now knows what " p u r p l e " i s , i t should be a b l e to save t h i s i n f o r m a t i o n to use at a l a t e r date. Any f u t u r e r e f e r e n c e s to " p u r p l e " . s h o u l d not have to r e s u l t i n a query to the user. A reasonable system should l e a r n from i t s mistakes and thereby never make the same mistake twice. T h i s r e q u i r e s m o d i f i c a t i o n of e i t h e r some pa r t of the program or the data. The simpler s o l u t i o n i s to allow the program to modify i t s world d e f i n i t i o n . In the above example, the knowledge that purple i s a c o l o u r should be e a s i l y s t o r e d in' t h i s world d e f i n i t i o n . Since many of the "meanings" of words w i l l be found i n the database i t s e l f , i t makes sense to a l l o w the system to query the database i f confused about a term. If the Q/A component i s i n t e r f a c e d to a s u f f i c i e n t l y f a s t database system, and there i s only a narrow range of p o s s i b l e meanings of the term, t h i s s t r a t e g y c o u l d be adopted. T h i s method has been shown to work when coupled with an i n v e r t e d index of the database as d i s c u s s e d e a r l i e r ( H a r r i s 1977a) but with the c u r r e n t l e v e l of database 37 management system technology, i t would be too slow to use as the so l e source of semantic knowledge. 3.3.5 Making Assumptions To allow the use of the system with a minimal amount of e f f o r t , assumptions must be made. Pronouns and idioms which people f r e q u e n t l y use when communicating with each other u s u a l l y without t h i n k i n g much about i t - must be handled i f the system i s to be robust. O v e r a l l , the system must make a number of assumptions so that the user does not get bogged down by the unnatural r e s t r i c t i o n s which computer systems u s u a l l y impose on t h e i r human users. Since the computer can not c u r r e n t l y make these assumptions on i t s own, they must be somehow predetermined. F i n d i n g pronoun antecedents i s a general enough task that i t can be con t a i n e d i n the domain independent p o r t i o n of a program. However, the i n t e r p r e t a t i o n of an idiom i s u s u a l l y t i e d q u i t e c l o s e l y to the domain.-3.3.6 Answer Generation By c o r r e c t i n g s p e l l i n g e r r o r s , a l l o w i n g loose and improper grammar and g e n e r a l l y making unconfirmed assumptions, a system might s u f f e r from one obvious problem. I t i s p o s s i b l e that the system w i l l answer a q u e s t i o n d i f f e r e n t to the one that was o r i g i n a l l y asked. For t h i s reason, the o r i g i n a l q u e s t i o n (or what the system b e l i e v e s the q u e s t i o n to be) must somehow be 38 i n c o r p o r a t e d i n t o the answer. I f the user has asked: How many pu r p l e pens are there? then, r a t h e r than a response o f : 42. a p r e f e r a b l e answer would be: There are 42 p u r p l e pens. T h i s may seem a t r i v i a l p o i n t with t h i s example but the importance can be seen more r e a d i l y when the system does not know the answer. A response o f : None, c o u l d mean: There are no pu r p l e pens, but i t c o u l d a l s o mean: I don't have any i n f o r m a t i o n about " p u r p l e " . or: I don't have any i n f o r m a t i o n about "pens". or even: I don't have any i n f o r m a t i o n about " p u r p l e " or "pens". A l l of these l a t t e r answers would be more i n f o r m a t i v e by t e l l i n g 39 the user e x a c t l y what the system does or does not know. A c t u a l l y the process of answer ge n e r a t i o n i s a f a r more complex one than t h i s d e s c r i p t i o n might p o r t r a y . Some work i s being done i n t h i s area but i t i s not at a l l c l e a r how one determines the c o r r e c t words or phrases to use i n the answer. However, the s p e c i a l i z e d answers generated f o r q u e s t i o n answering systems and the r e q u i r e d s i m p l i c i t y of them l i m i t s the task to an almost manageable one. 3.4 Summary Most of the system components examined r e q u i r e l i t t l e knowledge of the domain i n which they are working i n order to f u n c t i o n . They do, however, a l l r e q u i r e a l a r g e amount of time to develop and t h i s e f f o r t should not have to be repeated each time a new system i s c o n s t r u c t e d . The h a n d l i n g of loose grammar, pronoun r e f e r e n c e and verb phrase e l l i p s i s should have l i t t l e i n t e r a c t i o n with the domain s p e c i f i c i n f o r m a t i o n . The c o n t r o l l i n g p o r t i o n s of the user i n t e r a c t i o n , s p e l l i n g c o r r e c t i o n and l e a r n i n g components need only use the domain dependent i n f o r m a t i o n as s l o t f i l l e r s . L i k e wise the answer gen e r a t i o n component need only use the i n f o r m a t i o n returned from the database as s l o t f i l l e r s i n the generated answer. A l l of these components can be combined together i n one domain and database independent " l i n g u i s t i c c o r e " ( F i g u r e 3 . 3 ) . Since 4 0 these components are v i r t u a l l y domain independent, they should never have to be r e w r i t t e n when the system i s adapted t o a new domain. -< user <-NL query NL answer domain d i c t i o n a r y i n v e r t e d database case l i s t <—> / \ / \ / NL \ / answer \ \ par s e r / \ generator / \ / \ / l i n g u i s t i c c ore standard sentence r e p r e s e n t a t i o n standard data r e p r e s e n t a t i o n domain def i n i t ion / \ / • \ . / SSR \ / data \ • \ a n a l y s e r / \ formatter / • \ / • \ / database i n t e r f a c e database query raw data -> database >-F i g u r e 3 . 3 : Proposed N a t u r a l Language System 41 While i t may seem a l i t t l e unconventional to suggest that the f i r s t phase of p r o c e s s i n g (parsing) and the l a s t phase (answer gen e r a t i o n ) are combined i n the same u n i t while an intermediate phase such as database r e t r i e v a l i s not, there are reasons to support such a s t r u c t u r e . I t i s d e s i r a b l e to have the i n f o r m a t i o n s t r u c t u r e which i s passed from the p a r s e r to the r e t r i e v a l r o u t i n e s be as w e l l d e f i n e d as p o s s i b l e . At the same time, i n order to allow i n f o r m a t i v e answer g e n e r a t i o n , a l a r g e amount of i n f o r m a t i o n both from the o r i g i n a l sentence and from the p r e v i o u s d i a l o g u e must be a c c e s s a b l e . To combine these two g o a l s i n a c o n v e n t i o n a l system, the s t r u c t u r e developed by the p a r s e r would have to be very complex indeed and the answer generator would have to be very complex to decypher i t . Instead, by combining the n a t u r a l language parser and the answer generator, these modules can communicate f r e e l y while a s t r i c t l y -defined i n t e r f a c e between t h i s component and the database r e t r i e v a l p o r t i o n i s maintained. Although p a r s i n g many u n i v e r s a l c o n s t r u c t s (such as q u a n t i f i e r s ) r e q u i r e s l i t t l e domain knowledge, i t u s u a l l y does r e q u i r e some. Furthermore, c o n s t r u c t s such as simple idioms which d e f i n i t e l y r e q u i r e semantic i n f o r m a t i o n can be processed much more e a s i l y i f there i s access to t h i s i n f o r m a t i o n d u r i n g the parse. The domain o r i e n t e d i n f o r m a t i o n such as the general vocabulary, idioms, and s p e c i f i c jargon should a l s o be kept i n one u n i t . To make t h i s module e a s i l y a c c e s s i b l e as w e l l as e a s i l y m o d i f i a b l e , a case s t r u c t u r e d d e c l a r a t i v e format i s 42 suggested. An i n v e r t e d index of the database would be most e f f e c t i v e as a simple d e f i n i t i o n of a l l the world knowledge r e s i d i n g i n the database. To allow the system to be t r a n s f e r r e d with a minimal e f f o r t from one database management system (DBMS) to another i t should be designed to query an " i d e a l i z e d database". An i d e a l i z e d database i s one which has a good, b a s i c set of f u n c t i o n s and can be adapted e a s i l y to any " r e a l " DBMS. T h i s i d e a l i z e d database should c o n t a i n only the e s s e n t i a l f u n c t i o n s , thereby reducing the e f f o r t needed to design the i n t e r f a c e between the i d e a l i z e d database and the r e a l database. There are two separate f u n c t i o n s which must be performed i n the "database i n t e r f a c e " . F i r s t l y , the output from the n a t u r a l language p a r s e r must be t r a n s l a t e d i n t o a l e g a l database query. Secondly, the raw data returned by the database must be formatted i n t o the s t r u c t u r e expected by the answer generator. 43 Chapter 4 System Design; Part I - The L i n g u i s t i c Core In an attempt to lend credence to the concept of a domain and database independent n a t u r a l language (NL) i n t e r f a c e , a prototype q u e s t i o n answering (Q/A) system has been c o n s t r u c t e d . I t has been l o g i c a l l y , i f not p h y s i c a l l y , d i v i d e d i n t o three completely separate modules. The user i n t e r f a c e or l i n g u i s t i c c o r e i n c o r p o r a t e s most of the f e a t u r e s now seen i n c o n v e n t i o n a l Q/A systems. I t i s intended to form- an a p p l i c a t i o n independent framework f o r database q u e r i e s to which i n f o r m a t i o n concerning'the c u r r e n t domain and database system can be a t t a c h e d . Components f o r NL p a r s i n g , knowledge a c q u i s i t i o n , and answer ge n e r a t i o n are a l l i n c l u d e d i n t h i s module. Whereas a s t r u c t u r e d i n t e r f a c e e x i s t s between t h i s l i n g u i s t i c core and the other modules, i n t e r n a l communication has been l e f t f a i r l y u n s t r u c t u r e d . T h i s i n t e r n a l communication method, which b a s i c a l l y c o n s i s t s of a number of " r e g i s t e r s " and a s s o c i a t e d v a l u e s , can be e a s i l y added to or m o d i f i e d to a l l o w as much f l e x i b i l i t y as p o s s i b l e . The second l o g i c a l u n i t i s the domain d e f i n i t i o n . T h i s module c o n t a i n s the d e f i n i t i o n of the p a r t i c u l a r domain in which we are working. I t i s a s m a l l , e a s i l y m o d i f i a b l e u n i t , s m a l l e r 44 in s i z e than the l i n g u i s t i c c ore, but c e n t r a l to the ideas r e f l e c t e d i n t h i s t h e s i s . The i n t e r f a c e between the domain d e f i n i t i o n and the l i n g u i s t i c core must be s t r i c t l y maintained s i n c e t h i s d e f i n i t i o n w i l l have to be changed and updated c o n s t a n t l y , and without any a l t e r a t i o n s to the core i t s e l f . The l a s t module i s a l s o small i n s i z e compared with the l i n g u i s t i c c o r e . T h i s i s the database i n t e r f a c e module. I t s purpose i s to hide the r e a l , p h y s i c a l database s t r u c t u r e from the l i n g u i s t i c core and provide an i d e a l i z e d s t r u c t u r e . U n l i k e the domain i n t e r f a c e , which i s based on a d e c l a r a t i v e format, t h i s module does c o n t a i n code. F o r t u n a t e l y though, t h i s module should only have to be mo d i f i e d when adapting the system to a new database system, not when changing, the domain. A formal s t r u c t u r e has been d e f i n e d which p r o v i d e s the b a s i s f o r communication between the l i n g u i s t i c core and the database i n t e r f a c e . In t h i s chapter we w i l l concern o u r s e l v e s with the the design of the l i n g u i s t i c core ( F i g u r e 4.1). The goal i s to form a general purpose Q/A system which r e c e i v e s q u e r i e s from the user, t r a n s l a t e s them i n t o a standard sentence r e p r e s e n t a t i o n (SSR), c o n s u l t s the database (through the database i n t e r f a c e ) and formulates an a p p r o p r i a t e answer, a l l with no notion of the domain i n which i t i s working save f o r what i n f o r m a t i o n i t can e x t r a c t from the the domain i n t e r f a c e . The g e n e r a l design of the core can be thought of as a three-phase process - NL 45 from user to user / \ <-/ SSR \ \ formatter / \ _ / -> / \ <-/ knowledge \ -> \ a c q u i s i t i o n / \ / / \ < — 1 L < / NL \ < < \ p a r s e r / \ / < s y n t a c t i c d i c t i o n a r y g l o b a l r e g i s t e r s world r e g i s t e r s -> / \ -> / answer \ \ generator / \ / to database i n t e r f a c e from database i n t e r f a c e F i g u r e 4.1: The L i n g u i s t i c Core p a r s i n g , SSR b u i l d i n g and answer g e n e r a t i o n . In a d d i t i o n , there i s a knowledge a q u i s i t i o n component and a s y n t a c t i c d i c t i o n a r y which can be accessed from any of the other modules. A system of r e g i s t e r s i s used as an i n t e r n a l communication method. 4.1 The NL Parser The job of the NL p a r s e r ( F i g u r e 4.2) i s to convert an input sentence i n n a t u r a l language i n t o some i n t e r n a l 46 from user / \ —> / / word \ / \ scanner / / \ / / / \ / ATN grammar \ \ \ \ \ ATN / / semantic \ \ p a r s e r / \ r o u t i n e s / \ / \ / —> \ / \ / l o c a l r e g i s t e r s to SSR formatter F i g u r e 4.2: The N a t u r a l Language Parser r e p r e s e n t a t i o n , while r e t a i n i n g as much of the o r i g i n a l meaning as p o s s i b l e . The method of p a r s i n g used here can be d e s c r i b e d as "component p a r s i n g " . A s m a l l , b a s i c component such as a noun phrase or verb phrase i s f i r s t combined together on a s y n t a c t i c l e v e l and then added i n t o the t o t a l i n t e r n a l sentence r e p r e s e n t a t i o n u s i n g the semantic i n t e r p r e t a t i o n of the component. In t h i s way, the sentence r e p r e s e n t a t i o n i s b u i l t up as the sentence i s parsed. T h i s may cause problems when i t i s found, p a r t way through a parse, that a wrong d e c i s i o n has been made about the f u n c t i o n of a component. However, i t does a l l e v i a t e the problem of having to keep a l l ambiguous v e r s i o n s around u n t i l the end of the parse. When the parse r has f i n a l l y f i n i s h e d with the sentence, there w i l l be at most one parse generated. Another b e n e f i t of t h i s p a r s i n g s t r a t e g y i s that 47 i n d i v i d u a l components, once they have been formed, w i l l not be s p l i t up unless they f a i l some semantic t e s t . They may, however, be switched with other components u n t i l an a c c e p t a b l e s t r u c t u r e i s found. T h i s makes the pa r s e r more e f f i c i e n t because the components themselves w i l l u s u a l l y be c o r r e c t l y formed on the f i r s t attempt, even though t h e i r f u n c t i o n i n the sentence may not be' known. The NL parser i s composed of a gen e r a l augmented t r a n s i t i o n network (ATN) grammar parser (Woods 1970), the ATN grammar, and many s m a l l , s p e c i a l i z e d r o u t i n e s which handle tasks ranging from the l e x i c a l a n a l y s i s of an input word to the m o d i f i c a t i o n of the c u r r e n t sentence r e p r e s e n t a t i o n to accomodate a new p r e p o s i t i o n a l phrase. These components w i l l now be examined i n d e t a i l . 4.1.1 The ATN Parser The f u n c t i o n of the ATN parse r i s to produce a s t r u c t u r a l r e p r e s e n t a t i o n of the input sentence a c c o r d i n g to the ATN grammar. During t h i s process, f u n c t i o n s designated by the grammar are invoked to b u i l d t h i s r e p r e s e n t a t i o n . The s t a t e of the pa r s e r i s saved when a p a r t i c u l a r t r a n s i t i o n i n the grammar i s chosen, thus a l l o w i n g f o r complete backup when an e r r o r i s d e t e c t e d . The ATN p a r s e r used was o r i g i n a l l y w r i t t e n i n LISP by Dr. 48 R. R e i t e r ( R e i t e r 1978) and has s i n c e been only s l i g h t l y m o d i f i e d . 4.1.2 The ATN Grammar The grammar used i n t h i s system i s b a s i c a l l y a s y n t a c t i c grammar, augmented by c a l l s to the semantic r o u t i n e s . I t attempts to represent a sentence i n terms of i t s component s y n t a c t i c s t r u c t u r e s . T h e r e f o r e , p o r t i o n s of the grammar are devoted to r e c o g n i z i n g c o n s t r u c t s such as noun phrases, verb phrases, determiners and q u a n t i f i e r s . The semantic r o u t i n e s are used both to v e r i f y c e r t a i n semantic t e s t s on the components as w e l l as combine them together to form the i n t e r n a l sentence r e p r e s e n t a t i o n . C u r r e n t l y the grammar i s a s m a l l , b a s i c v e r s i o n ; however, i t should be a b l e to be developed independently of the r e s t of the system to some degree. Development of the grammar i s an ongoing process and whenever i t i s m o d i f i e d , s i n c e only l i n g u i s t i c knowledge i s represented, the m o d i f i c a t i o n s should b e n e f i t a l l systems c u r r e n t l y u s ing i t . T r a n s i t i o n network diagrams f o r the grammar used here can be found i n Appendix A. Some p o r t i o n s of the grammar are modelled on the semantic grammar concept that c o n s t r u c t s are parsed by l o o k i n g at s p e c i f i c words and phrases r a t h e r than general s y n t a c t i c c a t e g o r i e s . However, u n l i k e t r u e semantic grammars, the c o n s t r u c t s being parsed here are l i n g u i s t i c i n nature (e.g. 49 q u a n t i f e r ) r a t h e r than semantic ( e . g . p l a n e t y p e (Waltz e t a l 1976)). The q u a n t i f i e r network i n t h i s system ( F i g u r e 4.3) can r e c o g n i z e such c o n s t r u c t s as "at l e a s t f o u r " and "more than t h r e e but l e s s than 5". (quant (wrd a t t ( t o q / s u p e r ) ) (mem (not no) t ( s e t r qneg t ) ( t o q/comp)) (jump q/comp t ) ( t s t ( q v a l u e *) (add-quant ( g e t r q c o n j ) ( q v a l u e *) n i l ( n v a l u e * ) ) ( t o q / c o n j ) ) (jump q/num t ( s e t r q ( q v a l u e ' e x a c t ) ) ) ) (q/comp ( c a t adv ( g e t f c o m p a r a t i v e ) ( s e t r q ( q v a l u e * ) ) ( t o q / t h a n ) ) ) (q/than (wrd than t ( t o q/num))) (q/super ( c a t adv ( g e t f s u p e r l a t i v e ) ( s e t r qneg t ) ( s e t r q ( q v a l u e * ) ) ( t o q/num))) (q/num (push number t (add-quant ( g e t r q c o n j ) ( g e t r q) ( g e t r qneg) *) ( t o q / c o n j ) ) ) (q/conj ( c a t c o n j t ( s e t r q c o n j *) ( s e t r q n i l ) ( s e t r qneg n i l ) ( t o q u a n t ) ) (wrd of t ( t o q / r e s e t ) ) (jump q / r e s e t t ) ) ( q / r e s e t (q/acc (pop ( g e t r quant) ( g e t r q u a n t ) ) ) ) ) (jump q/acc t ( s e t r quant ( g e t - g ( c u r r e n t - g ' q u a n t ) ) ) ( r e i n i t - g ' q u a n t ) ) ) F i g u r e 4.3: The Q u a n t i f i e r ATN Network 50 Although the semantic grammar idea i s u s e f u l i n t h i s s i t u a t i o n , the database implementor should not be r e q u i r e d to develop new code when he or she d e f i n e s a new database. For t h i s reason, semantic grammars have only been allowed i n the l i n g u i s t i c core p o r t i o n of the program. The non-programming techniques of s e t t i n g up a new domain w i l l be d i s c u s s e d l a t e r , but i t i s s u f f i c i e n t to say now that no m o d i f i c a t i o n of the grammar should be necessary when a new domain i s d e f i n e d . 4.1.3 Scanning A l a r g e p a r t of any NL system i s devoted to the i d e n t i f i c a t i o n of the b a s i c u n i t s (or words) of the input sentence. T h i s component i s concerned with i d e n t i f y i n g root words, compound words, a b b r e v i a t i o n s , synonyms and even database elements. In most cases, t h i s i s not a d e t e r m i n i s t i c process e s p e c i a l l y i f i t i s done before the parse be g i n s . In t h i s system, the scanning i s done during the parse i n order to allow as much i n f o r m a t i o n as p o s s i b l e to be used i n word i d e n t i f i c a t i o n . The v a r i o u s scanning procedures w i l l now be examined. 4.1.3.1 The Morpher The f u n c t i o n of the morpher i s to s t r i p p r e f i x e s and s u f f i x e s from an input word i n a systematic f a s h i o n to produce the root form. The r a t i o n a l e f o r i n c l u d i n g one i n a NL parse r 51 i s to reduce the a c t u a l number of words needed in the d i c t i o n a r y . For any word, we should be able to determine i t s meaning from the combined meanings of i t s root and the p r e f i x e s and s u f f i x e s a t t a c h e d to i t . U n f o r t u n a t e l y , t h i s means that the m o r p h o l o g i c a l i n f o r m a t i o n must somehow be i n c l u d e d with the word in the d i c t i o n a r y . For most words t h i s i s q u i t e a simple process but f o r some i t can become r a t h e r complex (see S e c t i o n 4.2). The r o o t - f i n d i n g method used i n t h i s system i s q u i t e simple and s t r a i g h t f o r w a r d . In t u r n , each p o s s i b l e s u f f i x i s removed from the candidate word. If the root i s found to be i n the d i c t i o n a r y and the word category agrees with that expected, the new word i s entered i n t o the d i c t i o n a r y . A t a b l e of some of the r e g u l a r s u f f i x e s which are examined i s i n F i g u r e 4.4. T h i s method works w e l l f o r r e g u l a r l y i n f l e c t e d words whose morp h o l o g i c a l i n f o r m a t i o n i s easy to s t o r e i n the d i c t i o n a r y . For example, the morphological i n f o r m a t i o n needed f o r a verb in t h i s system are the s u f f i x e s to add to form the present and past tenses. They are s t o r e d i n the s y n t a c t i c d i c t i o n a r y as: (SERVE V S-D) For a noun, the i n f o r m a t i o n r e q u i r e d i s the p l u r a l i z i n g s u f f i x : (DATUM N A) and f o r an a d j e c t i v e , the s u f f i x e s r e q u i r e d to form the 52 ending to new ending word f e a t u r e s of remove to add category new word s s-noun p l u r a l s s-d-verb present tense & 3rd person s i n g u l a r s s-ed-verb present tense & 3rd person s i n g u l a r es es-noun p l u r a l es es-ed-verb present tense & 3rd person s i n g u l a r i e s y es-noun p l u r a l i e s y es-ed-verb present tense & 3rd person s i n g u l a r 's s-noun p o s s e s s i v e 's es-noun p o s s e s s i v e 's proper-noun p o s s e s s i v e 's pronoun p o s s e s s i v e s' s proper-noun p o s s e s s i v e s' s es-noun p o s s e s s i v e s' s s-noun p o s s e s s i v e i e d y es-ed-verb past p a r t i c i p l e & 2nd person p l u r a l ed es-ed-verb past p a r t i c i p l e & s i n g u l a r - p l u r a l ed s-ed-verb past p a r t i c i p l e & s i n g u l a r - p l u r a l d s-d-verb past p a r t i c i p l e & s i n g u l a r - p l u r a l ing s-d-verb present p a r t i c i p l e ing s-ed-verb present p a r t i c i p l e ing es-ed-verb present p a r t i c i p l e ing i r r - v e r b present p a r t i c i p l e ing e s-d-verb present p a r t i c i p l e * * i n g * s-ed-verb present p a r t i c i p l e **ed * s-ed-verb s i n g u l a r - p l u r a l past p a r t i c i p l e est e r - e s t - a d j e c t i v e s u p e r l a t i v e * * e s t * e r - e s t - a d j e c t i v e s u p e r l a t i v e i e s t y e r - e s t - a d j e c t i v e s u p e r l a t i v e st r - s t - a d j e c t i v e s u p e r l a t i v e er e r - e s t - a d j e c t i v e comparative **er * e r - e s t - a d j e c t i v e comparative i e r y e r - e s t - a d j e c t i v e comparative r r - s t - a d j e c t i v e comparative i c e s ex es-noun p l u r a l a urn a-noun p l u r a l F i g u r e 4.4: The S u f f i x Table 53 comparative and s u p e r l a t i v e forms must be a v a i l a b l e : (NEW ADJ ER-EST) Since these words are r e g u l a r l y i n f l e c t e d , c e r t a i n l i n g u i s t i c r u l e s can a l s o be a p p l i e d . One such r u l e i s to double the "n" i n "run" before adding " i n g " to form the p a r t i c i p l e . The root of an i r r e g u l a r word cannot u s u a l l y be found with a word morpher. T h e r e f o r e , these words must be i n i t i a l l y s t o r e d i n the d i c t i o n a r y along with a l l of t h e i r i n f l e c t i o n s . 4.1.3.2 Compound Words Compound words are those which, although separate l e x i c a l items, f u n c t i o n as a s i n g l e u n i t . For these words i t appears to be more b e n e f i c i a l to t r e a t them as a s i n g l e u n i t r a t h e r than as separate p a r t s . However, most of the i n d i v i d u a l words have meanings of t h e i r own and so the system must allow f o r combination e r r o r s . The s t r a t e g y adopted here to allow both compound words and the i n d i v i d u a l p a r t s to e x i s t s i m u l t a n e o u s l y , i s to f i r s t j o i n the longest s t r i n g which e x i s t s i n the d i c t i o n a r y . If the parse subsequently f a i l s , the scanning r o u t i n e s back up one l e v e l and attempt to use the next longest _compound. For example, the name: U n i v e r s i t y of I l l i n o i s Chicago C i r c l e 54 would f i r s t be parsed i n the f u l l form and then, i f the parse f a i l s , the s u c c e s s i v e l y s m a l l e r chunks: 1. U n i v e r s i t y of I l l i n o i s Chicago 2. U n i v e r s i t y of I l l i n o i s 3. U n i v e r s i t y of would be t r i e d u n t i l f i n a l l y the one word " u n i v e r s i t y " would be attempted. In t h i s example, " U n i v e r s i t y of I l l i n o i s Chicago" and " U n i v e r s i t y o f " would probably not be found i n the d i c t i o n a r y and so they would not be accepted as v a l i d compound words. 4 . 1 . 3 . 3 A b b r e v i a t i o n s and Synonyms An a b b r e v i a t i o n i s c o n s i d e r e d to be a s u b s t i t u t i o n of one word f o r another at the l e x i c a l l e v e l . T h e r e f o r e , i f the word "can't" i s d e f i n e d as an a b b r e v i a t i o n of "can not", the s u b s t i t u t i o n w i l l occur before the word "can't" i s ever morphed. A synonym, on the other hand, i s c o n s i d e r e d to be a s u b s t i t u t i o n at the root word l e v e l . I f the verb " d i s p l a y " i s d e f i n e d as a synonym f o r "show", then " d i s p l a y i n g " w i l l be c o n s i d e r e d synonymous with "showing". 4 .1 .4 Semantic Routines S p e c i a l i z e d semantic r o u t i n e s are invoked by the grammar to b u i l d an i n t e r n a l r e p r e s e n t a t i o n of the o r i g i n a l input sentence. 55 T h i s i n t e r n a l r e p r e s e n t a t i o n i s nothing more than a set of val u e s f o r the g l o b a l r e g i s t e r s i n the l i n g u i s t i c c o r e . These val u e s are subsequently used to format the standard sentence r e p r e s e n t a t i o n (SSR) (see S e c t i o n 4.5). In a d d i t i o n to p r o c e s s i n g the sentence components such as noun and verb phrases, r o u t i n e s are i n c l u d e d which handle c o n j u c t i o n s and f i n d pronoun antecedents. 4:1.4.1 Adding a Noun Phrase A f t e r the noun phrase (NP) has been s y n t a c t i c a l l y determined, a semantic r o u t i n e i s c a l l e d to i n t e g r a t e i t i n t o the e x i s t i n g sentence r e p r e s e n t a t i o n . Depending on the c h a r a c t e r i s t i c s of the NP and the c u r r e n t r e p r e s e n t a t i o n , a number of t h i n g s can happen. The f i r s t step i s to determine whether i t i s the nominative, d a t i v e or a c c u s a t i v e case. One of the r u l e s used i n t h i s d e t e r m i n a t i o n i s that i t w i l l be assumed to be nominative i f i t i s the f i r s t element of the sentence. L a t e r , t h i s assumption may have to be revoked owing to the i n f l u e n c e s of subsequent components. In the sentence: I can be served c h i c k e n at which r e s t a u r a n t s . the f i r s t NP found i s composed of the s i n g l e pronoun " I " . A f t e r determining t h i s , and having no i n f o r m a t i o n to the c o n t r a r y , the NP w i l l be assumed to be the agent of the sentence. The f i r s t 56 noun phrase i n the sentence w i l l a l s o be saved f o r f u t u r e pronoun antecedent d e t e r m i n a t i o n (see S e c t i o n 4.1.4.5). Next, the verb phrase (VP) "can be served" w i l l be c o n s t r u c t e d and the sentence w i l l be found to be p a s s i v e (see S e c t i o n 4.1.4.2). At t h i s p o i n t , the system w i l l n o t i c e that i t has made a judgement e r r o r about the r o l e of the f i r s t NP and w i l l have to modify the s t r u c t u r e which i t has b u i l t . The a c t u a l r o l e of the NP " I " i n the sentence i s i n the r e c i p i e n t case. The next NP to be c o n s t r u c t e d w i l l c o n s i s t of the noun "c h i c k e n " . In determining i t s r o l e , the case f i l l e r r e s t r i c t i o n s of the verb w i l l be taken i n t o account. Since the verb "serve" can take a food* type as the p a t i e n t but not as the agent, "chicken" must f i l l the p a t i e n t case. The NP c o n s t r a i n t s i n the f i n a l sentence r e p r e s e n t a t i o n w i l l be: RECIPIENT: I and: PATIENT: (FOOD = CHICKEN) The h a n d l i n g of the p r e p o s i t i o n a l phrase: at which r e s t a u r a n t s i s d i s c u s s e d i n S e c t i o n 4.1.4.4. 57 4.1.4.2 Adding a Verb Phrase When a verb i s d e f i n e d i n a p a r t i c u l a r domain, the cases the verb a l l o w s and the r e l e v a n t f i e l d s which can f i l l each case must be s p e c i f i e d (see S e c t i o n 4.2.2). When the parse r t r i e s to add a verb to the c u r r e n t sentence r e p r e s e n t a t i o n , the major task i s to see that the noun and p r e p o s i t i o n a l phrase u n i t s which have been found so f a r f i t i n t o the desig n a t e d cases of the verb. T h i s allows disambiguation of a noun element which may be found i n more than one f i e l d . I f the sentence turns out to be p a s s i v e , the r o l e of the i n i t i a l noun phrase (NP) must be redetermined. For example, i n the p r e v i o u s example: I can be served chicken at which r e s t a u r a n t s . the NP " I " has been, found before the VP "can be served". Because of t h i s , the NP " I " was i n i t i a l l y assumed to be the agent of the sentence. When the sentence i s deemed to be p a s s i v e , the system must f i n d out what the r e a l r o l e of t h i s NP i s . When adding the main verb to the sentence r e p r e s e n t a t i o n , the p r o p e r t i e s of each NP determined are checked to see that they can indeed f i l l the case s l o t of the a c t i o n to which they have been d e s i g n a t e d . I f one can not, a number of th i n g s may happen. Sometimes r o u t i n e s are c a l l e d which w i l l switch the case f i l l e r components u n t i l the s t r u c t u r e i s v a l i d but, u s u a l l y the p o s s i b l e r o l e s of a given NP are s e v e r l y l i m i t e d and when 58 t h i s happens, the parse w i l l u s u a l l y f a i l . A u x i l i a r y verbs c o n t r i b u t e l i t t l e to the o v e r a l l sentence r e p r e s e n t a t i o n . T h e i r main f u n c t i o n s here are to designate a YES-NO q u e s t i o n when they are found at the beginning of a sentence as i n : Does White Spot serve chicken? and to make a sentence p a s s i v e when found i n c o n j u n c t i o n with a main verb: Chicken i s served by which r e s t a u r a n t s . R e l a t i v e c l a u s e s are sometimes in t r o d u c e d by a verb p a r t i c i p l e : F i n d a r e s t a u r a n t s e r v i n g c h i c k e n or steak. When t h i s happens, r o u t i n e s are c a l l e d which suspend s t r u c t u r e b u i l d i n g at the c u r r e n t l e v e l and con t i n u e at a lower l e v e l . When t h i s lower l e v e l p r o c e s s i n g i s completed, the p r o c e s s i n g of the o r i g i n a l s t r u c t u r e i s resumed. Sometimes a s i t u a t i o n w i l l occur where the d e f i n e d format of a verb should be o v e r r i d d e n . Assume that the d e f i n i t i o n of the verb "serve", i n the r e s t a u r a n t database, takes as a p a t i e n t case the f i e l d s "food" or "meals". Then i f the parser came ac r o s s the unexpected sentence: 59 Who serves Hastings S t r e e t ? i t should be able to o v e r r i d e the d e f i n i t i o n of serve and generate the c o r r e c t p a rse: (AND (NAME = ?) (ADDRESS = HASTINGS STREET)) 4.1.4.3 Adding Noun Phrase M o d i f i e r s Many t h i n g s f a l l i n t o the category of noun phrase m o d i f i e r s and they are a l l handled s i m i l a r l y by the system. Some of these are a d j e c t i v e s , o r d i n a l s and q u a n t i f i e r s . They are saved i n l o c a l r e g i s t e r s when found and added i n t o the sentence r e p r e s e n t a t i o n when the head noun i s determined. For example, i n the sentence: F i n d a l l of Schank's recent books. the p o s s e s s i v e "Schank's" would be s t o r e d i n the r e g i s t e r NPMOD1 as: NPMOD1: (AUTHOR = SCHANK) Next, when " r e c e n t " t i s found, the s t r u c t u r e a s s o c i a t e d with NPMOD2 w i l l be: NPMOD2: (DATE > 1980) f The d e f i n i t i o n of recent used here ( l a t e r than 1980) i s a r b i t r a r y . I t would be d e f i n e d by the database a d m i n i s t r a t o r and found i n the domain d i c t i o n a r y (see s e c t i o n 5.1.1). 60 A f t e r the head noun "books" i s f i n a l l y found, a l l of the NP m o d i f i e r s w i l l be combined i n t o one general m o d i f i e r as: NPMOD: (AND (AUTHOR = SCHANK) (DATE > 1960)} and t h i s m o d i f i e r w i l l then be added i n t o the g l o b a l r e g i s t e r sentence r e p r e s e n t a t i o n . By c r e a t i n g a new m o d i f i e r r e g i s t e r f o r each NP m o d i f i e r encountered, the system can handle a v i r t u a l l y i n f i n i t e number of NP m o d i f i e r s . 4.1.4.4 P r e p o s i t i o n a l Phrases The p r e p o s i t i o n a l phrase (PP) i s very important i n the Q/A paradigm. I t i s with these that many of the query c o n s t r a i n t s are determined. In a c a s e - d r i v e n system such as t h i s , the p r e p o s i t i o n i s used to designate the p o s s i b l e cases which the a s s o c i a t e d NP can f i l l . Then, u s i n g t h i s i n f o r m a t i o n along with the c u r r e n t sentence r e p r e s e n t a t i o n , the system can determine the a c t u a l f u n c t i o n of the at t a c h e d noun phrase. The d e f i n i t i o n of p r e p o s i t i o n a l i n f o r m a t i o n w i l l be d i s c u s s e d i n the s e c t i o n on the s y n t a c t i c d i c t i o n a r y ( S e c t i o n 4.2.5). For example, i f the system has d e f i n e d the p r e p o s i t i o n "on" to handle the l o c a t i o n and the time cases, then i n the sentence: Which r e s t a u r a n t s are on G r a n v i l l e S t r e e t ? 61 the system has the c h o i c e of e i t h e r f i l l i n g the l o c a t i o n or the time case. When the NP " G r a n v i l l e S t r e e t " i s found to designate a p l a c e , the disambiguation can be done. A f t e r the PP parse i s f i n i s h e d , the c o n s t r a i n t : LOCATION: (ADDRESS = GRANVILLE STREET) w i l l be added i n t o the sentence r e p r e s e n t a t i o n . 4.1.4.5 F i n d i n g Pronoun Antecedents Only extremely simple pronoun r e f e r e n c e i s c u r r e n t l y handled by the system. S p e c i f i c pronouns r e f e r r i n g to " i t " and "them" are taken to r e f e r to the l a s t item r e t r i e v e d by the database r o u t i n e s . Although t h i s i s an extremely naive view of pronoun r e f e r e n c e , the methods used here can be expanded to i n c l u d e more complex cases. The reason f o r adding t h i s component at a l l was t h a t , even with only l i m i t e d c a p a b i l i t i e s , i t can h e l p the user enormously. T h i s simple s o l u t i o n can handle such c o n s t r u c t i o n s as: How many r e s t a u r a n t s serve chicken? THERE ARE 2 REFERENCES. Who are they? THEY ARE "STEER AND STEIN" AND "WHITE SPOT". The s i m p l i c i t y of t h i s system i s not inherent i n i t s des i g n , but r a t h e r i s a f u n c t i o n of the time and e f f o r t a l l o t e d 62 to the development of the i n d i v i d u a l components. 4.1.4.6 Conjunctions Conjunctions cause some of the ambiguity of n a t u r a l language. However, they can be used unambiguously and, at l e a s t i n t h i s form, must be allowed even f o r a simple NL system. For example, the c o n j u n c t i o n "and" i n : F i n d some p l a c e which serves steak and l o b s t e r . would cause no ambiguity, g e n e r a t i n g a query to s a t i s f y the c o n s t r a i n t s : (AND (FOOD = STEAK) (FOOD = LOBSTER)) On the other hand, the query: How many people are coming from CMU and SRI? i s a l i t t l e harder to pr o c e s s . Rather than g e n e r a t i n g the set i n t e r s e c t i o n c o n s t r a i n t : (AND (INSTITUTION = CMU) (INSTITUTION = SRI)) which would t r y to f i n d the people who come from both CMU and SRI, the user r e a l l y wants to generate the set union c o n s t r a i n t : (OR (INSTITUTION = CMU) (INSTITUTION = SRI)) which should f i n d people who are coming from e i t h e r CMU or SRI. T h i s s u b t l e f a c t should somehow be re c o g n i z e d by the system. In 6 3 simple cases t h i s can be handled by changing an "and" to an "or" i f the f i e l d being processed can have onl y one value at a time. In the f i r s t case, the FOOD f i e l d c o u l d have more than one ent r y because a r e s t a u r a n t can o b v i o u s l y serve more than one type of food. However, i n the second case, the INSTITUTION f i e l d would be s i n g l e - v a l u e d because a person would ( u s u a l l y ) come from only one i n s t i t u t i o n . Simple c o n j u n c t i o n s are handled by combining a l l conjuncted components under one of the two c a t e g o r i e s AND or OR. These two " f u n c t i o n s " are represented i n the i n t e r n a l sentence r e p r e s e n t a t i o n (and a l s o i n the SSR) by *AND and *OR r e s p e c t i v e l y and have a syntax o f : (*AND c o n s t r a i n t s c o n s t r a i n t 2 c o n s t r a i n t s . . .) (*OR c o n s t r a i n t l c o n s t r a i n t 2 c o n s t r a i n t s . . .) 4.1.5 L o c a l Communication While b u i l d i n g the i n t e r n a l sentence r e p r e s e n t a t i o n , any values which w i l l be needed by another p a r t of the par s e r are put i n t o l o c a l r e g i s t e r s . L a t e r , the r o u t i n e needing t h i s i n f o r m a t i o n can e a s i l y r e t r i e v e the cont e n t s of the r e g i s t e r . The use of these r e g i s t e r s c l o s e l y p a r a l l e l s that of the g l o b a l and world r e g i s t e r s used to communicate between v a r i o u s p a r t s of the l i n g u i s t i c c o r e . For f u r t h e r d e t a i l s r e f e r to the s e c t i o n d e s c r i b i n g the f u n c t i o n of these r e g i s t e r s ( S e c t i o n 4.4). 64 4.2 S y n t a c t i c D i c t i o n a r y The s y n t a c t i c , as compared to the semantic or domain, d i c t i o n a r y c o n t a i n s i n f o r m a t i o n r e l a t i n g to the s y n t a c t i c and morphological p r o p e r t i e s of the words. Words r e l a t i n g s p e c i f i c a l l y to one database w i l l not be found here. Database va l u e s w i l l probably be found i n the i n v e r t e d index (see S e c t i o n 5.1.2) and domain s p e c i f i c verbs and~nouns w i l l be found i n the domain d i c t i o n a r y (see S e c t i o n 5.1.1). Most of the s y n t a c t i c d i c t i o n a r y i s taken up with common words such as determiners, pronouns, q u a n t i f i e r s and c o n j u n c t i o n s . A l a r g e p a r t of t h i s d i c t i o n a r y i s devoted to the d e f i n i t i o n of p r e p o s i t i o n s s i n c e they play an important r o l e i n most c a s e - d r i v e n Q/A systems. The morphological i n f o r m a t i o n i n c l u d e d v a r i e s with each word category but u s u a l l y designates s u f f i x e s which might be added to the root word to form r e g u l a r c o n j u g a t i o n s . The kind of s y n t a c t i c i n f o r m a t i o n present a l s o depends on the word category. I r r e g u l a r l y i n f l e c t e d words pose q u i t e a d i f f e r e n t problem. Any word which w i l l be used o f t e n (such as "be") w i l l be i n i t i a l l y s t o r e d i n the s y n t a c t i c d i c t i o n a r y along with a l l of i t s c o n j u g a t i o n s . However, some words are not common enough to be i n i t i a l l y put i n t o the d i c t i o n a r y and some may simply be new to the system. T h i s i s a problem which one might think time would overcome. S u r e l y , sooner or l a t e r a l l necessary words would have been entered i n the d i c t i o n a r y . U n f o r t u n a t e l y t h i s i s not the case and i f our system expects t h i s i n f o r m a t i o n , i t 65 must be p r e s e n t . To a i d i n t h i s task, a l i m i t e d knowledge a c q u i s i t i o n component (see S e c t i o n 4.3) has been i n c l u d e d . T h i s component a l l o w s new words to be entered and s p e l l i n g e r r o r s to be c o r r e c t e d by the user d u r i n g the parse. To see e x a c t l y what type of i n f o r m a t i o n i s i n c l u d e d i n the s y n t a c t i c d i c t i o n a r y , the d e f i n i t i o n of nouns, verbs and p r e p o s i t i o n s as w e l l as synonyms, a b b r e v i a t i o n s and compound words w i l l be d i s c u s s e d . 4.2.1 Noun D e f i n i t i o n Included i n the category of nouns are common nouns, proper nouns and pronouns. The mor p h o l o g i c a l i n f o r m a t i o n r e q u i r e d f o r a common noun i s the s u f f i x which must be added to form the p l u r a l . Examples of these a r e : (NUMBER N S) (BOX N ES) (INFORMATION N MASS) (DATUM N A) Proper nouns and pronouns are not commonly p l u r a l i z e d and so, no morp h o l o g i c a l i n f o r m a t i o n i s s t o r e d with them. However, the morpher has been designed to all o w reasonable proper noun p l u r a l i z a t i o n s such as i n : How many McCarthys are coming to the conference? 66 The semantic i n f o r m a t i o n i n c l u d e d depends upon the a c t u a l words. Domain s p e c i f i c words are d i s c u s s e d i n S e c t i o n 5.1.1. Any domain independent common nouns c u r r e n t l y have no semantic i n f o r m a t i o n a s s o c i a t e d with them and so they are e f f e c t i v e l y ignored by the p a r s e r . Pronouns such as he, she, everybody and anybody, have along with t h e i r m o r p h o l o g i c a l i n f o r m a t i o n , semantic i n f o r m a t i o n which i n c l u d e s both t h e i r category ( g e n e r a l , q u e s t i o n or r e l a t i v e ) and any cases which they may d e s i g n a t e . Examples of pronoun d e f i n i t i o n s a r e : (ITS PRO (IT POSS)) (SOMEWHERE PRO * PRO* (GENERAL (CASES (LOCATION)))) (THAT PRO * PRO* (RELATIVE)) There are c u r r e n t l y no domain independent proper nouns i n the system. 4.2.2 Verb D e f i n i t i o n There are three c l a s s e s of verbs i n t h i s system; a u x i l i a r i e s , commands and a c t i o n s . The d e f i n i t i o n of an a u x i l i a r y verb i n c l u d e s i t s root form and any semantic f e a t u r e s such as _.the tense and modal c h a r a c t e r i s t i c s . Some examples of a u x i l i a r y verb d e f i n i t i o n s a r e : 67 (AM V (BE (TNS PRESENT) (PNCODE 3SG))) (DONE V (DO (TNS PASTPART))) (CAN V * V* ((TNS PRESENT)(PNCODE ANY)(AUX MODAL))) Commands are used to designate p o s s i b l e database f u n c t i o n s . When used i n a sentence as an imperative, the verb takes the command d e f i n i t i o n . I f we had a system r o u t i n e DRAW-GRAPH which we wanted to invoke with the command "graph", i t would be de f i n e d as: (GRAPH COMMAND DRAW-GRAPH) A c t i o n s are u s u a l l y found only i n the domain d i c t i o n a r y ( S e c t i o n 5.1.1), but some have been i n c l u d e d here as examples. An a c t i o n i s d e f i n e d by i t s morphological f e a t u r e s as w e l l as i t s semantic case frame. The morp h o l o g i c a l f e a t u r e s are the endings to add to form the present and past t e n s e s : (SERVE V S-D) (EAT V IRR) (ATE V (EAT (TNS PAST))) The case frame i s implemented here as a l i s t of p o s s i b l e cases of the verb. Not a l l p o s s i b l e cases need be i n c l u d e d but, r a t h e r , only the ones which are important i n the domain. For example, i n the r e s t a u r a n t database "serve" and "eat" are 68 d e f i n e d a s : (SERVE ACTION (AG NAME PA (FOOD MEALS) RE *HUMAN)) (EAT ACTION (AG *HUMAN PA (FOOD MEALS)) The o r d e r i n which the f i e l d s f o r each case a re l i s t e d i s used t o dete r m i n e a d e f a u l t p r i o r i t y o r d e r i n g on them. For example, i f the q u e s t i o n was a s k e d : What does White Spot s e r v e ? because of the o r d e r i n g , the c o n s t r a i n t which would be g e n e r a t e d would be: (AND (NAME = WHITE SPOT) (FOOD = ? ) ) I f the ty p e of meals was d e s i r e d , the query would have t o be: What meals does White Spot s e r v e ? 4.2.3 A d j e c t i v e D e f i n i t i o n As i s the case w i t h b o t h nouns and v e r b s , a d j e c t i v e s a r e u s u a l l y domain s p e c i f i c . The m o r p h o l o g i c a l i n f o r m a t i o n which must be s u p p l i e d i s the s u f f i x e s r e q u i r e d t o form the co m p a r a t i v e and s u p e r l a t i v e c o n j u n c t s . Many t i m e s the semantic i n f o r m a t i o n a s s o c i a t e d w i t h an a d j e c t i v e i s a r b i t r a r y . In the b i b l i o g r a p h y d a t a b a s e , " r e c e n t " i s d e f i n e d a s : (DATE > 1980) 69 and i n the r e s t a u r a n t database, "good" i s d e f i n e d as: (STARS > 3) 4.2.4 Q u a n t i f i e r D e f i n i t i o n Q u a n t i f i e r s are s u f f i c i e n t l y g e n e r a l to be found i n the s y n t a c t i c d i c t i o n a r y . They u s u a l l y have a q u a n t i f i e r value (QVALUE) and/or a numeric value (NVALUE) a s s o c i a t e d with them. Some examples of QVALUEs are EXACT, MORE, and LESS. Examples of NVALUEs are 0, 1, 2, 3 and ALL. Some examples of q u a n t i f i e r d e f i n i t i o n s i n t h i s system a r e : (COUPLE QVALUE *EXACT NVALUE 2) (FEW QVALUE *MORE NVALUE 2) (NONE QVALUE *EXACT NVALUE 0) 4.2.5 P r e p o s i t i o n D e f i n i t i o n P r e p o s i t i o n s p l a y an important r o l e i n t h i s system. However, t h e i r d e f i n i t i o n i s ra t h e r simple. The main pa r t of t h e i r d e f i n i t i o n i s a l i s t of which cases they can r e f e r t o . For example the p r e p o s i t i o n s " a t " and " t o " are d e f i n e d as: (AT PREP* ((CASES LOC TIME))) (TO PREP* ((CASES REC DEST BEN PURP))) These cases are used to f i l l s l o t s i n the d e f i n i t i o n of the main 7 0 verb i n the sentence. See Appendix B f o r the l i s t of cases s u p p l i e d . Appendix C c o n t a i n s a sample s y n t a c t i c d i c t i o n a r y with some p r e p o s i t i o n s and the cases they f l a g . 4.2.6 Synonyms and A b b r e v i a t i o n s Synonyms and a b b r e v i a t i o n s both perform the s i m i l a r f u n c t i o n of a l l o w i n g the s u b s t i t u t i o n of one word (or a group of words) i n a sentence f o r another. The main d i s t i n c t i o n made between the two i n t h i s implementation i s that a synonym s u b s t i t u t i o n occurs at the root word l e v e l while an a b b r e v i a t i o n s u b s t i t u t i o n occurs at the l e x i c a l l e v e l . These concepts are very important because they a l l o w a s m a l l , c a r e f u l l y d e f i n e d core of i n f o r m a t i o n to be expanded simply i n t o a l a r g e subset of n a t u r a l language. Synonyms are p r i m a r i l y used to inform the parse r that two d i f f e r e n t words have the same meaning. For example, i n most Q/A systems, the meanings of the commands " f i n d " , "show", " d i s p l a y " , " p r i n t " and " l i s t " would be the same. The d e f i n i t i o n of these can be made by d e f i n i n g only one (say " f i n d " ) completely and then d e f i n i n g the others as synonyms: (FIND . complete d e f i n i t i o n ) (SHOW SYNONYM FIND) (DISPLAY SYNONYM FIND) (PRINT SYNONYM FIND) (LIST SYNONYM FIND) 71 As w e l l as a l l o w i n g d i f f e r e n t verbs to appear the same, the synonym f e a t u r e can a l s o be used to allow d i f f e r e n t meanings of the same verb. For example, suppose that i n t h i s system, there e x i s t three d i f f e r e n t meanings of the verb "take". These c o u l d a l l be d e f i n e d by: TAKE SYNONYM (TAKE1 TAKE2 TAKE3) TAKE1 . . . f i r s t meaning TAKE2 . . . second meaning TAKE3 . . . t h i r d meaning Here the synonym f e a t u r e i s used to show t h a t the verbs TAKE1, TAKE2 and TAKE3 are a l l r e a l l y the verb "take". I f the input i s : Who i s t a k i n g r e s e r v a t i o n s ? then, when the parser i s t r y i n g to understand " t a k i n g " , these steps w i l l be f o l l o w e d . F i r s t the root word "take" w i l l be found. Next, the system w i l l d i s c o v e r that the word i s a synonym of TAKE1, TAKE2 and TAKE3. The morpher w i l l then r e t u r n the d e f i n i t i o n of the word TAKE 1 to the p a r s e r . I t i s not u n t i l the parse f a i l s u s ing t h i s d e f i n i t i o n that TAKE2 w i l l be c o n s i d e r e d . T h i s means that the d e f i n i t i o n s of TAKE1 through TAKE3 should be s o r t e d by p l a u s i b i l i t y so that the c o r r e c t one w i l l be found as soon as p o s s i b l e . The a c t u a l meaning d e f i n i t i o n of these verbs w i l l be found i n the s e c t i o n on verb d e f i n i t i o n s ( S e c t i o n 4.2.2). 72 A b b r e v i a t i o n s , s i n c e they are processed before any normal p a r s i n g i s i n i t i a t e d , can be used to d e f i n e simple l e x i c a l idioms. But the most important use of a b b r e v i a t i o n s i s to d e f i n e jargon common to the domain (see S e c t i o n 5.1.1). 4.2.7 Compound Word D e f i n i t i o n Each compound word i s d e f i n e d i n the d i c t i o n a r y as a l i s t of words forming the compound. For example, the r e s t a u r a n t name "White Spot" might be d e f i n e d as: ((WHITE SPOT) NPR *) The system p r e f e r s to manipulate the determiner "how many" as a s i n g l e u n i t and so i t has been d e f i n e d as: ((HOW MANY) DET . . .) 4.3 Knowledge A c q u i s i t i o n The major knowledge a c q u i s i t i o n component i n t h i s system i s i n v o l v e d with l e a r n i n g new words. There are s e v e r a l s i t u a t i o n s when t h i s w i l l happen. When a word i s broken apart by the m o r p h o l o g i c a l r o u t i n e s and i t s p r o p e r t i e s are determined, t h i s new word i s then entered i n t o the d i c t i o n a r y so that subsequent., r e f e r e n c e s to the word are found more e f f i c i e n t l y . If the word cannot be analyzed by the system, then the user i s asked to 7 3 c l a r i f y i t . I f t h i s i s s u c c e s s f u l , the new word i s entered i n t o the d i c t i o n a r y with t h i s d e f i n i t i o n . The t h i r d way f o r the system to " l e a r n " a new word i s by querying the database. A f t e r f i n d i n g the p r e v i o u s l y unknown term i n the database, i t w i l l be entered i n t o the domain d e f i n i t i o n f o r f u t u r e r e f e r e n c e . T h i s sample d i a l o g u e from the r e s t a u r a n t s database w i l l show the route taken to determine the meaning of an unknown word: Who serves a r t i c h o k e s ? I CANNOT FIND ' ARTICHOKES ' IN THE DICTIONARY. DO YOU WANT ME TO STOP PROCESSING THE QUERY? no DID YOU MISSPELL ' ARTICHOKES '? no WOULD I FIND ' ARTICHOKES ' IN THE DATABASE? no WOULD IT BE SAFE TO IGNORE THE WORD ' ARTICHOKES '? no DO YOU WISH TO ENTER ' ARTICHOKES 1 INTO THE DICTIONARY? no ERROR » ' ARTICHOKES ' CANNOT BE MORPHED. An important b e n e f i t of keeping a l l of the user i n t e r a c t i o n i n one u n i t , besides the obvious one that i t i s e a s i e r to modify, i s that the user i s f a c i n g a c o n s i s t e n t i n t e r f a c e and should know what response was expected from a p a r t i c u l a r q u e s t i o n . Each of the attempts to l e a r n a new word w i l l now be examined i n d e t a i l . 74 4.3.1 S p e l l i n g C o r r e c t i o n I f , when asked: DID YOU MISSPELL ' ARTICHOKES '? the user had typed i n "yes", he would have been prompted f o r a replacement. T h i s replacement would have then been used throughout the r e s t of the parse. In the c u r r e n t system, there i s no attempt at automatic s p e l l i n g c o r r e c t i o n , but t h i s should c e r t a i n l y be a part of any r e a l - w o r l d NL system. 4.3.2 Database Search I f t here e x i s t s an i n v e r t e d index of the database ( S e c t i o n 5.1.2), the p o s s i b i l i t y i s t h a t no intermediate database searches w i l l be r e q u i r e d . However, sometimes the i n v e r t e d index has not been kept up-to-date. Then, i f the word i s not i n the i n v e r t e d index, i t w i l l be necessary to look i n the database for i t to make sure that i t has not been j u s t r e c e n t l y added. This can be done a u t o m a t i c a l l y by the NL system i f there are some c l u e s as to the f i e l d i n which the unknown f i e l d might be found. In t h i s system, i f the word can not be found i n the i n v e r t e d database, the system w i l l ask: 7 5 WOULD I FIND ' ARTICHOKES ' IN THE DATABASE? If the user responds with "yes" then the system w i l l ask f o r the expected f i e l d and then search t h i s f i e l d f o r the v a l u e . 4.3.3 Ignoring Words When p r o c e s s i n g any n a t u r a l language sentence, there occur many words which c o u l d be s a f e l y ignored without a f f e c t i n g the meaning of the e n t i r e sentence. T h i s assumes that the word conveys no u s e f u l i n f o r m a t i o n to the p r o c e s s i n g of the sentence. Words l i k e " please" and "thank you" can u s u a l l y be ignored wherever they appear i n the sentence. Others can only be ignored at c e r t a i n p o i n t s i n the parse. I t i s important to remember, however, that no word can s a f e l y be ignored i f i t cannot be f i r s t i d e n t i f i e d by the system. T h e r e f o r e , i f the system does not know the meaning of a word, i t must, i f a l l e l s e f a i l s , ask the user f o r a d e f i n i t i o n or i f the word can be s a f e l y ignored. The use of such a procedure can be shown when the user e n t e r s the query: How many books d i d Noam Chomsky wr i t e ? Since the system has no in f o r m a t i o n on f i r s t names of authors, i t can not i d e n t i f y the word "Noam". I t then asks the user: WOULD IT BE SAFE TO IGNORE THE WORD ' NOAM ' ? which, when the user agrees, w i l l i n i t i a t e a query s a t i s f y i n g 76 the c o n s t r a i n t : (AUTHOR = CHOMSKY) Fur t h e r m o r e , the user i s g i v e n the added i n f o r m a t i o n t h a t the system has no i n f o r m a t i o n on "Noam". I f i t were the case t h a t f i r s t names were i n the datab a s e but t h a t "Noam" was n o t , the i n i t i a l query would have been e f f e c t i v e l y answered and the p r o c e s s i n g c o u l d be stoppe d . W h i l e t h i s method r e q u i r e s more user i n t e r a c t i o n , i t seems s u p e r i o r t o s i m p l y p r o d u c i n g an answer such as "none" which would not convey the same i n f o r m a t i o n . 4.3.4 E n t e r i n g New Words To e n t e r a new word, one must c u r r e n t l y g i v e the complete d i c t i o n a r y d e f i n i t i o n ( S e c t i o n 4.2) f o r the new word. However, a complete NL system s h o u l d a l l o w f o r a smoother user i n t e r a c t i o n f o r the d e f i n i t i o n . 4.4 I n t e r n a l Communication The communication between p a r t s of the p a r s e r and between the p a r s e r and o t h e r p a r t s of the l i n g u i s t i c c o r e i s managed thr o u g h t h r e e s e t s of r e g i s t e r s . The l o c a l r e g i s t e r s h o l d v a l u e s f o r a s h o r t time and a r e used p r i m a r i l y w i t h i n the NL p a r s e r t o g a t h e r i n f o r m a t i o n about a p a r t i c u l a r c o n s t r u c t (such 77 as a noun phrase or a verb phrase) before the component i s added to the internal sentence representation. For example, when parsing the noun phrase: a fast food place after the determiner "a" has been found, the knowledge that the noun phrase i s singular can be stored in the l o c a l register NUMBER by: (SETR NUMBER * SG) The global registers are used to store information about the portion of the sentence which has already been parsed. For example, in the. sentence: Who i s open for lunch and serves Chinese food? the information that the sentence i s in the present tense can be stored in a global register after the f i r s t verb phrase has been added to the internal sentence structure. This i s done by: (SETR-G TENSE 'PRESENT) The stored information can be retrieved and v e r i f i e d when the second verb i s being parsed by the function c a l l : (GETR-G TENSE) which would return the value "present". 78 The t h i r d c l a s s of r e g i s t e r s are the world r e g i s t e r s . These represent the long term memory and c o n t a i n i n f o r m a t i o n about the c o n t i n u i n g d i a l o g u e . T h i s i n f o r m a t i o n i s c u r r e n t l y only used f o r f i n d i n g pronoun antecedents but c o u l d a l s o be used f o r b u i l d i n g a "model" of the user to a i d i n p r o v i d i n g an answer more t a i l o r e d to h i s needs. An example of what i n f o r m a t i o n might be s t o r e d i n a world r e g i s t e r i s : (SETR-W AGENT (GETR-G AGENT)) which would save the c u r r e n t agent f o r f u t u r e r e f e r e n c e by copying i t from a g l o b a l to a world r e g i s t e r . 4.5 B u i l d i n g the Standard Sentence Re p r e s e n t a t i o n (SSR) The b a s i c u n i t s upon which the standard sentence r e p r e s e n t a t i o n (SSR) i s b u i l t are the cases. Each component i s a s s i g n e d a p a r t i c u l a r case to f i l l (or f u n c t i o n to perform) i n the c u r r e n t s t r u c t u r e . Each f i l l e d case then becomes a c o n s t r a i n t i n the query to the database. The cases are d e s i g n a t e d when the verbs of the system are d e f i n e d (see S e c t i o n 4.2.2 f o r the d e f i n i t i o n of v e r b s ) . A f t e r the p a r s i n g r o u t i n e s have developed an i n t e r n a l sentence r e p r e s e n t a t i o n of the query, the SSR i s produced. T h i s new s t r u c t u r e p r o v i d e s a s t r i c t l y d e f i n e d (Figure 4.5) communication path between the l i n g u i s t i c core and the database 79 SSR STYPE CONSTRAINT SIMPLECONSTRAINT FIELD RELATION ELEMENT ( STYPE CONSTRAINT ) whfind | yes-no SIMPLECONSTRAINT | ( *and CONSTRAINT* ) | ( *or CONSTRAINT* ) | ( *not CONSTRAINT ) ( FIELD RELATION ELEMENT ) fieldname | *number | *ref = I - | < | <- | > | >• elementvalue I ? I * F i g u r e 4 . 5 : The Standard Sentence Repre s e n t a t i o n i n t e r f a c e . The SSR attempts to capture the p o r t i o n of a query's "meaning" which i s r e l e v a n t f o r e x t r a c t i n g the answer from the database. Some i n f o r m a t i o n i s l o s t i n t h i s s t r u c t u r e because attempts are made to make i t as simple as p o s s i b l e f o r the database i n t e r f a c e to i n t e r p r e t and so the p o s s i b i l i t y always remains of an incomplete or erroneous answer. To b u i l d the SSR, the r e l e v a n t p o r t i o n s of the c u r r e n t i n t e r n a l r e p r e s e n t a t i o n of the query are s e l e c t e d and formatted a c c o r d i n g to the d e f i n i t i o n . By using t h i s two step method of i n t e r n a l r e p r e s e n t a t i o n and SSR, the i n t e r n a l r e p r e s e n t a t i o n can be m o d i f i e d simply without m o d i f i c a t i o n to the database i n t e r f a c e r o u t i n e s . A d d i t i o n a l l y , a l l i n f o r m a t i o n needed f o r i n f o r m a t i v e answer ge n e r a t i o n and f i n d i n g a pronoun antecedent can be r e t a i n e d i n the i n t e r n a l r e p r e s e n t a t i o n without c l u t t e r i n g the 80 SSR and without f o r c i n g the database r o u t i n e s to understand, or even simply i g n o r e , t h i s e x t r a i n f o r m a t i o n . Some example SSRs ar e : 1) Who serves chicken? (WHFIND (*AND (NAME = ?) (FOOD = CHICKEN))) 2) Where i s the Empress of China? (WHFIND (*AND (ADDRESS = ?) (NAME = EMPRESS OF CHINA))) 3) F i n d 4 r e s t a u r a n t s that serve Japanese food, (WHFIND (*AND (NAME = ?) (*NUMBER = 4 ) (FOOD = JAPANESE))) 4) What i s on the menu at White Spot? (WHFIND „ (*AND (FOOD = ?) (NAME = WHITE SPOT))) The SSR f o r m a t t i n g component r e t r i e v e s i t s i n f o r m a t i o n from the r e g i s t e r s of the short and long term memory ( g l o b a l and world r e g i s t e r s ) . The "data" f o r the f i n a l SSR i s taken from the case d e f i n i t i o n r e g i s t e r s and the " c o n t r o l i n f o r m a t i o n " i s taken from other, c u r r e n t l y somewhat ad hoc, r e g i s t e r s i n memory. Only i n f o r m a t i o n which p r o v i d e s a c o n s t r a i n t f o r the query i s e x t r a c t e d from the r e g i s t e r s and used i n the SSR. For 81 example, i n the query: Which r e s t a u r a n t s w i l l serve me chicken? the r e c i p i e n t case r e g i s t e r w i l l be f i l l e d by "me" ( i n f a c t i t r e a l l y c o n t a i n s the f i l l e r *HUMAN). Since t h i s concept w i l l supply no e x t r a i n f o r m a t i o n to the query, i t i s ignored when c r e a t i n g the SSR: (WHFIND (*AND (NAME = ?) (FOOD = CHICKEN))) 4.5.1 Reducing the SSR A f t e r the SSR has been b u i l t , the f o r m a t t i n g r o u t i n e s reduce i t to a c a n n o n i c a l form, removing any unnecessary c o n j u n c t i o n s . For example, f o r the query: Who serves steak and l o b s t e r ? the i n i t i a l SSR c r e a t e d w i l l be: (WHFIND (*AND (NAME = ?) (*AND (FOOD = STEAK) (FOOD = LOBSTER)))) and the reduced v e r s i o n w i l l be: (WHFIND (*AND (NAME = ?) (FOOD = STEAK) (FOOD = LOBSTER))) 82 4.5.2 D e f a u l t Search F i e l d s During SSR f o r m a t t i n g , a check i s made to determine the d e f a u l t f o r any ambiguous f i e l d . In the query: What does White Spot serve? i t i s un c l e a r from the d e f i n i t i o n of the verb "serve", which i s : (SERVE V S-D ACTION (AG NAME PA (FOOD MEALS) RE *HUMAN)) whether the f i e l d d e s i g n a t i n g a type of food or the f i e l d d e s i g n a t i n g a type of meal should be searched. Because of the o r d e r i n g of the f i e l d l i s t at d e f i n i t i o n time, the "food" f i e l d i s taken as d e f a u l t . The SSR produced i s : (WHFIND (*AND (NAME = WHITE SPOT) (FOOD = ? ) ) ) 4.5.3 Counting Database Items - *NUMBER P r o v i d i n g a count of items i n a c e r t a i n f i e l d i s used so oft e n i n any database query language that i t must be somehow be handled by the o v e r a l l system. Because many database management systems w i l l process t h i s type of request f a s t e r than complete r e t r i e v a l , i t has been added as a part of the SSR d e f i n i t i o n , thereby a l l o w i n g the DBMS to know th a t no a c t u a l r e t r i e v a l of the records i s necessary. The method used here to handle t h i s f e a t u r e was to use the imaginary f i e l d *NUMBER when an item 83 count i s to be r e t u r n e d r a t h e r than the a c t u a l items themselves. For example, i n : How many foods does the Yangtzee have? the SSR generated w i l l be: (WHFIND (*AND (FOOD = ?) (*NUMBER = ?) (NAME = YANGTZEE))) The reason f o r hand l i n g the count as an imaginary f i e l d r a t h e r than as a separate sentence type (e.g. WHCOUNT) can be seen i n the SSR f o r f : What i s the address and number of di s h e s of Yangtzee? which would be: (WHFIND (*AND (ADDRESS = ?) (*AND (FOOD = ?) (*NUMBER = ?)) (NAME = YANGTZEE))) which should r e t u r n a count of the d i f f e r e n t foods a v a i l a b l e as we l l as the r e s t a u r a n t ' s l o c a t i o n . The *NUMBER f i e l d i s a l s o used to l i m i t the number of answers p r i n t e d . For example, the SSR f o r : F i n d at l e a s t 4 and not more than 6 Greek r e s t a u r a n t s . t Sentences such as t h i s cannot as yet be handled by the system even though the SSR allows f o r them. 84 w i l l be: (WHFIND (*AND (NAME = ?) (*NUMBER >= 4) (*NUMBER <= 6) (FOOD = GREEK))) 4.5.4 Using an A u x i l i a r y Verb as a Main Verb The check f o r use of an a u x i l i a r y verb as the main verb of a sentence i s done here. At the end of the parse of the query: Where i s the Seven Seas? the system i s l e f t e x p e c t i n g a main verb. The formatter determines whether the verb "be" i s being used as a main or an a u x i l i a r y verb and generates the SSR: (WHFIND (*AND (ADDRESS = ?) (NAME = SEVEN SEAS))) 4.5.5 Verb Phrase E l l i p s i s L i m i t e d verb phrase e l l i p s i s h a n d l i n g i s done by the SSR for m a t t e r . The world r e g i s t e r s are used to h o l d i n f o r m a t i o n from one query to the next. A f t e r a sentence such as: Who serves steak? the main verb "serve" i s s t o r e d i n one of the world r e g i s t e r s . 85 I f the next query entered was simply: Steak? then the system would i n f e r the query t o be: Who serves steak? and produce the SSR: (WHFIND (*AND (NAME = ?) (FOOD = STEAK))) 4.5.6 Pronoun Reference - *REF If the antecedent f o r any pronoun has not a l r e a d y been found before SSR f o r m a t t i n g , i t w i l l be found at t h i s time. For t h i s , another imaginary f i e l d has been i n c l u d e d i n the SSR d e f i n i t i o n - *REF. The only pronoun r e f e r e n c e c u r r e n t l y handled by t h i s system i s i n using the p r e v i o u s r e s u l t from a search. If the user has asked: How many r e s t a u r a n t s serve Chinese food? THERE ARE 14 REFERENCES. then the next query might be: What are t h e i r names and l o c a t i o n s ? which would produce the SSR: 86 (WHFIND (*AND (*REF = *) (NAME = ?) (ADDRESS = ? ) ) ) The " c o n s t r a i n t " : (*REF = *) informs the database i n t e r f a c e to use the previous r e s u l t i n the cu r r e n t s e a r c h . 4.5.7 Embedded Noun Phrases A problem occurs when p a r s i n g sentences such as: What i s the White Spot on G r a n v i l l e ' s phone number. Although the SSR generated i s : (WHFIND (*AND (NAME = WHITE SPOT) (*AND (ADDRESS = GRANVILLE STREET) (PHONE = ? ) ) ) ) we can see that the c o n s t r a i n t : (PHONE = ?) has been a t t a c h e d to the SSR i n the wrong p l a c e . Although the reduced form w i l l be: (WHFIND (*AND (NAME = WHITE SPOT) (ADDRESS = GRANVILLE STREET) (PHONE = ?))) 87 and t h i s , when passed to the database, w i l l generate the c o r r e c t answer, the o r i g i n a l attachment of a NP which i s embedded w i t h i n another i s s t i l l r a t h e r l i m i t e d i n c a p a b i l i t y . 4.6 Answer Generation The f u n c t i o n of the answer generator i s to b u i l d and r e t u r n a meaningful answer to the o r i g i n a l q u e s t i o n . To do t h i s , i t cannot r e l y e n t i r e l y upon the i n f o r m a t i o n s t o r e d i n the SSR. The communication l i n k between the NL par s e r and answer generator i s b u r i e d deep w i t h i n the r e g i s t e r s which form the long and short term memories. T h i s does not make the system any l e s s f l e x i b l e , however, because the e n t i r e l i n g u i s t i c core s t i l l remains separate from the a p p l i c a t i o n (domain and database) i n t e r f a c e s . An important d i f f e r e n c e which t h i s system d i s p l a y s from p r e v i o u s NL systems i s that the i n f o r m a t i o n used i n forming the answer comes from the parser and not from the s t r u c t u r e passed to the database. By t h i s method the SSR can remain simple while the system, as a whole, can s t i l l p r o v i d e i n f o r m a t i v e answers. The f i r s t s tep of the answer gener a t i o n mechanism i s to e x t r a c t the answer from the database. I t does t h i s by c o n s u l t i n g the database i n t e r f a c e . The answer generator passes the SSR to the database i n t e r f a c e and r e c e i v e s a l i s t of the a p p r o p r i a t e answers (see S e c t i o n 5.2). Information r e l a t i n g to 88 the type of q u e s t i o n asked i s e x t r a c t e d from the SSR and used to decypher the returned i n f o r m a t i o n . Next, the answer generator uses c o n t r o l i n f o r m a t i o n from the s h o r t term memory to l i m i t or expand the answer to be returned to the user. Words that are to be i n c l u d e d i n the returned message are s e l e c t e d and the proper i n f l e c t e d form i s determined. F i n a l l y , the answer i s formed and returned to the user. For example, a f t e r p a r s i n g the i n p u t : Who serves chicken? the NL p a r s e r w i l l produce the SSR: (WHFIND (*AND (NAME = ?) (FOOD = CHICKEN))) which i s then passed to the database i n t e r f a c e (DBI). Returned by the DBI i s the l i s t of answers: ("WHITE SPOT" "STEER AND STEIN") By using only the i n f o r m a t i o n returned by the DBI and t h a t c o n t a i n e d i n the SSR, there e x i s t s o n l y enough i n f o r m a t i o n to produce t h i s same l i s t of answers. However, by r e t r i e v i n g the verb from the short term memory ( g l o b a l r e g i s t e r s ) , the system produces the response: "WHITE SPOT" AND "STEER AND STEIN" SERVE CHICKEN. 89 4.7 Summary The l i n g u i s t i c core forms the heart of the NL system. I t has been designed to f u n c t i o n as an independent u n i t with intermediate c a l l s to the domain d e f i n i t i o n , database and p o s s i b l y even the user to a i d i n p r o c e s s i n g the query. The core i s made up of the NL p a r s e r , the answer generator and a communication path of r e g i s t e r s between them. By u s i n g t h i s s t r u c t u r e , q u e r i e s to the database are t r e a t e d as simply one small step i n the t o t a l process and the i n f o r m a t i o n s t r u c t u r e passed to the database does not have to c o n t a i n a l l of the i n f o r m a t i o n which the answer generator w i l l e v e n t u a l l y need. In a d d i t i o n , the parser can pose q u e r i e s to the database i n t e r f a c e •whenever the need a r i s e s d u r i n g the parse and not have to wait u n t i l the parse has been completed. The NL p a r s e r processes a s y n t a c t i c grammar to e x p l o i t the r e g u l a r i t i e s of the E n g l i s h language while at the same time p r o v i d e s f o r intermediate c a l l s to semantic v e r i f i c a t i o n and s t r u c t u r e b u i l d i n g r o u t i n e s to i d e n t i f y i m possible i n t e r p r e t a t i o n s e a r l y . In t h i s way the g e n e r a l , domain independent p o r t i o n s of p r e v i o u s s y n t a c t i c a l l y o r i e n t e d systems can be captured without the drawback of g e n e r a t i n g l a r g e numbers of s e m a n t i c a l l y unreasonable parses. D e f i n i t i o n of i n d i v i d u a l words i n the l i n g u i s t i c s e c t i o n i s p r i m a r i l y concerned with t h e i r m o rphological f e a t u r e s . There 90 are few completely domain independent nouns and even fewer such verbs. Verbs are d e f i n e d i n a case frame s t r u c t u r e to allow the g r e a t e s t ease of both d e f i n i t i o n and use. The case frame d e f i n i t i o n f o r a verb i s f a i r l y s t r a i g h t f o r w a r d , being simply a l i s t of the cases which the verb takes and a l i s t of p o s s i b l e f i e l d s which can f i l l each case. To take advantage of these case frame d e f i n i t i o n s , the NL parser attempts to f i l l the case s l o t s with i n f o r m a t i o n e x t r a c t e d from the query. The SSR d e f i n i t i o n i s c u r r e n t l y q u i t e l i m i t e d i n scope; however, t h i s i s not a severe l i m i t a t i o n at present as i t s t i l l a l l o w s a reasonable v a r i e t y of q u e s t i o n s to be answered by the system. F u r t h e r r e s e a r c h should r e s u l t i n a more comprehensive r e p r e s e n t a t i o n . Next we w i l l look at the a p p l i c a t i o n s i n t e r f a c e - the domain d e f i n i t i o n and the database i n t e r f a c e . 91 Chapter 5 System Design: Part II - The A p p l i c a t i o n I n t e r f a c e s (Domain D e f i n i t i o n and Database I n t e r f a c e ) In Chapter 3 we saw that i t would be b e n e f i c i a l to design a n a t u r a l language q u e s t i o n answering system so that the domain and database s p e c i f i c knowledge was d i s t i n c t l y separate from the l i n g u i s t i c core (Figure 5.1). In Chapter 4 we d i s c u s s e d the f u n c t i o n s which were s u f f i c i e n t l y domain and database independent to form a l i n g u i s t i c c o r e . We w i l l now turn our a t t e n t i o n to the a p p l i c a t i o n i n t e r f a c e s - the domain d e f i n i t i o n and the database i n t e r f a c e . 5.1 Domain D e f i n i t i o n The main t h e s i s behind t h i s work has been to attempt to remove as much domain s p e c i f i c i n f o r m a t i o n as p o s s i b l e from the system and i s o l a t e i t i n a "domain d e f i n i t i o n " ( F i g u r e 5.2). To make the changes to t h i s d e f i n i t i o n simply and c o r r e c t l y , a d e c l a r a t i v e format has been used (see Appendix D f o r an annotated, sample domain d e f i n i t i o n ) . With f h i s format i t i s hoped that any changes made to the domain w i l l be reduced to the l e v e l of " s l o t f i l l i n g " or "form f i l l i n g " . By removing the need f o r programming, the changes become understandable, even to a r e l a t i v e n o v i c e . The domain d e f i n i t i o n i s broken up i n t o three 9 2 -< user <-NL query NL answer domain d i c t i o n a r y i n v e r t e d database case l i s t - / \ / \ • / NL \ / answer \ <—> • \ parser / \ generator / \ / \ / l i n g u i s t i c core standard sentence r e p r e s e n t a t i o n standard data r e p r e s e n t a t i o n domain def i n i t i o n / \ / \ / SSR \ / data \ \ a n a l y s e r / \ formatter / \ / \ / database i n t e r f a c e database query raw data -> database >-F i g u r e 5 .1 : Proposed N a t u r a l Language System: A Review l o g i c a l s e c t i o n s . These are the domain d i c t i o n a r y , the case l i s t , and the i n v e r t e d index of the database. I t was i n i t i a l l y hoped that the domain d e f i n i t i o n c o u l d have remained t o t a l l y separate from both the l i n g u i s t i c core and the database i n t e r f a c e . Keeping the d e f i n i t i o n separate from 93 domain d i c t i o n a r y i n v e r t e d index case l i s t -> l i n g u i s t i c c ore F i g u r e 5.2: The Domain D e f i n i t i o n Module the l i n g u i s t i c core turned out to be a l o g i c a l s t e p because of i t s d e c l a r a t i v e s t r u c t u r e as compared with the p r o c e d u r a l s t r u c t u r e of the l i n g u i s t i c c o r e . However, i t became obvious that any changes to the p h y s i c a l c h a r a c t e r i s t i c s of the database c o u l d not h e l p but be r e f l e c t e d , at l e a s t to some degree, i n the domain d e f i n i t i o n . Consequently, s e p a r a t i n g the domain d e f i n i t i o n from the database i n t e r f a c e became a more d i f f i c u l t t a s k . What r e s u l t e d was that the domain d e f i n i t i o n now c o n t a i n s a domain view of the database. T h i s does not mean a d e f i n i t i o n of the e n t i r e database, but r a t h e r the p a r t s of the database which w i l l change when the domain changes. Such p a r t s are the f i e l d s and f i e l d elements, but not the f u n c t i o n s or data a c c e s s i n g methods. 9 4 5 . 1 . 1 Domain D i c t i o n a r y The domain d i c t i o n a r y c o n t a i n s the d e f i n i t i o n s of the a c t i o n s , f i e l d s , jargon and even some f i e l d elements allowed w i t h i n the p a r t i c u l a r domain. Much as i n the s y n t a c t i c d i c t i o n a r y , a l l i n d i v i d u a l l y d e f i n e d terms i n the domain d i c t i o n a r y must have morphological and s y n t a c t i c i n f o r m a t i o n s t o r e d with them. A d d i t i o n a l l y , each category has a s s o c i a t e d with i t some i n f o r m a t i o n which may change a c c o r d i n g to the domain. To s i m p l i f y the d e f i n i t i o n of "meaning" of the domain s p e c i f i c terms, a case s t r u c t u r e has been employed. The verbs of the system are d e f i n e d over a range of cases and the nouns are p l a c e d i n t o one of the case c a t e g o r i e s . 5 . 1 . 1 . 1 A c t i o n s The verbs i n the domain have an " a c t i o n " d e f i n i t i o n which s p e c i f i e s a "conceptual p a t t e r n " to be i n t e r p r e t e d by the grammar. Any r e l e v e n t cases f o r the verb, u s u a l l y at l e a s t agent (AG), p a t i e n t (PA) and r e c i p i e n t (RE) cases, are d e f i n e d by the f i e l d s which can f i l l them. For example: (SERVE ACTION (AG NAME PA (FOOD MEALS))) d e f i n e s the verb "serve". The d e f i n i t i o n i s taken from the r e s t a u r a n t s database and means: 95 (a) that a r e s t a u r a n t can serve something (b) that a type of food (e.g. Chinese food) or a type of meal (e.g. b r e a k f a s t ) can be served 5.1.1.2 The F i e l d s The f i e l d s i n the domain d e s c r i b e the g e n e r a l category of t h i n g s to look f o r . They s p e c i f y where to look i n the database but they do not- supply s p e c i a l v a l u e s of what to look f o r . For example, some of the f i e l d s i n the r e s t a u r a n t domain are COST, FOOD, RESERVATIONS and ADDRESS. The marker DBFIELD i s used to i d e n t i f y the real,- database name of a f i e l d . Since the name of the address f i e l ' d i n the r e s t a u r a n t database i s r e a l l y LOC, i t i s d e f i n e d i n the domain d i c t i o n a r y as: (ADDRESS DBFIELD LOC) Another semantic marker (DBCAT) i s used t o s p e c i f y the morphological p r o p e r t i e s of elements i n the database. By s p e c i f y i n g a l l e n t r i e s i n the COST f i e l d as a d j e c t i v e s by: (COST DBCAT ADJ) any of the database e n t r i e s i n that f i e l d can be i n f l e c t e d to the comparative or s u p e r l a t i v e . T h i s "master f i e l d " method f o r 96 s p e c i f y i n g a l l elements of a p a r t i c u l a r f i e l d has the b e n e f i t of a l l o w i n g a simpler and smal l e r d e f i n i t i o n of the i n v e r t e d database. I t does, however, i n t r o d u c e some problems i n t o the scanning and morphing r o u t i n e s . U s u a l l y a l l elements of a p a r t i c u l a r f i e l d would not have the same morphological f e a t u r e s . Take f o r example the two elements of the FOOD f i e l d - "Chinese" and " c h i c k e n " . Not only are the morphological e n t r i e s f o r these words completely d i f f e r e n t , but the words do not even perform the same l i n g u i s t i c f u n c t i o n . The word "Chinese", when r e f e r r i n g to "Chinese food" i s a c t i n g as an a d j e c t i v e while the word "chi c k e n " i s d e f i n i t e l y a noun. A smal l -change to the l i n g u i s t i c component, however, adds enough l e n i e n c y that the system w i l l now make allowances f o r these "master f i e l d s " . One o p t i o n a l marker which can be given to a f i e l d i s one which d e s i g n a t e s o r d e r i n g of a f i e l d . There are two gen e r a l o r d e r i n g types. The f i r s t i s a simple numeric or l e x i c a l o r d e r i n g and can be e i t h e r ascending or descending. T h i s has been used i n the "date" f i e l d i n the b i b l i o g r a p h y database and d e f i n e d as: (DATE ORDER *ASCENDING) Subsequently, q u e s t i o n s of "before" and " a f t e r " can be answered. As an example, assume that the dates i n the database were B.C. dates. The only change which would have to be made to the domain d e f i n i t i o n would be: (DATE ORDER *DESCENDING) 97 Questions i n v o l v i n g both B.C. and A.D. dates i n the same f i e l d (perhaps f l a g g e d by an entry i n another f i e l d ) have not been addressed here. The other major type of o r d e r i n g i s not so easy to de a l with. In the r e s t a u r a n t s database the " c o s t " f i e l d i s a f i n i t e - v a l u e d f i e l d c o n t a i n i n g only the values "expensive", "moderate"" and "in e x p e n s i v e " . I f the words c o u l d have been chosen d i f f e r e n t l y then i t c o u l d be l e f t at a simple l e x i c a l o r d e r i n g but t h i s r a r e l y o c c u r s . In t h i s f i e l d , the query: F i n d a cheap Japanese r e s t a u r a n t . would have no method of order r e f e r e n c e . The word "cheap" would be d e f i n e d to the system as: (CHEAP INDF COST INDR *MORE INDE MODERATE) but without some o r d e r i n g on the f i e l d i t s e l f , t h i s o r d e r i n g would be of l i t t l e use. I f e i t h e r *ASCENDING or *DESCENDING order were used then s u r e l y the system would r e t u r n a f a u l t y answer. For f i n i t e - v a l u e d f i e l d s ( c u r r e n t l y there i s no way to handle i n f i n i t e - v a l u e d f i e l d s ) , the d e f i n i t i o n would be: (COST ORDER (INEXPENSIVE MODERATE EXPENSIVE)) Another o p t i o n a l marker d e f i n e s the range of values allowed i n a p a r t i c u l a r f i e l d . Simply because a word does not p r e s e n t l y appear i n a c e r t a i n f i e l d i n a database does not u s u a l l y mean 9 8 that the word can never appear t h e r e . T h i s i s where the d i s t i n c t i o n between INFINITE-VALUED and FINITE-VALUED f i e l d s comes i n to p l a y . There are f i e l d s which allow o n l y a l i m i t e d number of d i f f e r e n t values to be p r e s e n t . Such a f i e l d i s the STARS f i e l d i n the r e s t a u r a n t database where the only p o s s i b l e v a l u e s are 0, 1, 2, 3, 4 and 5. The main b e n e f i t i n making t h i s d i s t i n c t i o n i s that when the i n v e r t e d database handler has looked f o r a value and cannot f i n d i t , i f p r o c e s s i n g a FINITE-VALUED f i e l d i t can r e t u r n immediately to the parser without i n v o k i n g a f u t i l e database search. 5.1.1.3 Terms and Jargon Many of the domain s p e c i f i c terms and jargon w i l l i n d i c a t e a s p e c i f i c f i e l d . Again from the r e s t a u r a n t domain, the noun "menu" would pr o v i d e a r e f e r e n c e to the FOOD f i e l d . There are u s u a l l y many nouns which i n d i c a t e the same f i e l d . For example, " c o s t " , "expensive", "cheap" and " p r i c e " c o u l d a l l i n d i c a t e a p r o c e s s i n g of the COST f i e l d . The f i e l d i n d i c a t o r s are u s u a l l y proper nouns, common nouns or a d j e c t i v e s . In a d d i t i o n to the necessary morphological d e f i n i t i o n of a l l words, the domain d e f i n i t i o n of these p a r t i c u l a r nouns and a d j e c t i v e s has three e x f r a components ( i f r e l e v e n t ) : INDF - i n d i c a t e d f i e l d INDR - r e l a t i o n between f i e l d and f i e l d element INDE - i n d i c a t e d f i e l d element 9 9 For example, the d e f i n i t i o n : (CHEAP INDF COST INDR *LESS INDE MODERATE) means that any r e c o r d with an entry i n the COST f i e l d l e s s than "moderate" w i l l be c o n s i d e r e d to be "cheap". An important f e a t u r e that the system needs i s the power to recognize non-database elements as database elements. There are many times when words which a user may use as jargon may not a c t u a l l y be i n the database and t h e r e f o r e not i n the index. However, to make the system usable, i t must be a b l e t o i d e n t i f y these terms f o r what they a r e . The a b b r e v i a t i o n mechanism i s used to handle t h i s problem. An example would be i f we wanted to use MIT as an a b b r e v i a t i o n f o r "Massachusetts I n s t i t u t e of Technology". The d e f i n i t i o n a l l o w i n g t h i s would be i n c l u d e d i n the domain d i c t i o n a r y or the i n v e r t e d database simply as: (MIT ABBREV "MASSACHUSETTS INSTITUTE OF TECHNOLOGY") 5 . 1 . 1 . 4 The F i e l d Elements Most f i e l d elements are d e f i n e d simply by t h e i r presence i n the i n v e r t e d database. However, some provide more i n f o r m a t i o n to the query p r o c e s s i n g than simply a r e f e r e n c e to t h e i r name and, t h e r e f o r e , would be found in the domain d e f i n i t i o n i t s e l f . For example, i n the above d e f i n i t i o n of the a d j e c t i v e "cheap", a "cheap r e s t a u r a n t " would mean more than j u s t a r e s t a u r a n t i n the 100 database with the value "cheap" i n the COST f i e l d . A c t u a l l y i t would mean any r e s t a u r a n t with a value i n the c o s t f i e l d l e s s than "moderate". 5.1.2 The Case L i s t The case l i s t i s used f o r i n t e r n a l m a nipulation of the case s t r u c t u r e s which form the b a s i s of the i n t e r n a l semantic r e p r e s e n t a t i o n of the query. A l l p r e p o s i t i o n s and many of the g e n e r a l , domain independent adverbs are d e f i n e d i n terms of the case l i s t . For example, " a t " has been d e f i n e d as r e l a t i n g to the "time" and " l o c a t i o n " cases. The q u e s t i o n adverbs "when" and "where" are a l s o d e f i n e d i n terms of these cases and so, to make these adverbs f u n c t i o n a l i n a new domain, only the d e f i n i t i o n of the "time" and " l o c a t i o n " cases must be pr o v i d e d . The d e f i n i t i o n of the case l i s t simply r e q u i r e s the d e s i g n a t i o n of which database f i e l d s f a l l i n t o which case category. A l i s t of p o s s i b l e cases has been p r o v i d e d to h e l p guide the domain implementor when d e f i n i n g the case l i s t but i t i s i n no way meant to be exhau s t i v e . The p o s s i b i l i t y e x i s t s f o r the domain implementor to add to the case l i s t ; however, s i n c e the p r e p o s i t i o n s have been d e f i n e d i n terms of t h i s p a r t i c u l a r l i s t , any changes to i t would have to be r e f l e c t e d i n the s y n t a c t i c d i c t i o n a r y . The case l i s t p r o v i d e d has been m o d i f i e d on l y s l i g h t l y from the case l i s t found i n T a y l o r and Rosenberg (1975). The complete l i s t of d e f i n e d cases can be found i n 101 Appendix B. 5.1.3 I n v e r t e d Database As i n the ROBOT system ( H a r r i s 1977a), the i n v e r t e d index of the database i s used to " d e f i n e " a l l of the r e a l world knowledge of the system. T h i s i s the set of terms found i n the database i t s e l f . The use of an i n v e r t e d database i n t h i s system i s not a b s o l u t e l y necessary. The g a i n s made by i n c o r p o r a t i n g i t i n t o the design of the system are i n query p r o c e s s i n g time. With the index, the system does not have to examine the database f o r the "meaning" of every database element. There are times when i t i s u s e f u l to make a quick check i n the database to see i f an element i s p r e s e n t . I f there i s an up to date i n v e r t e d index i t should only be necessary to search the index but, more o f t e n than not, i f there i s an index at a l l , i t i s probably out of date. In any l a r g e database system, updates to the database are made c o n t i n u a l l y while updates to the i n v e r t e d index would be done r a r e l y . There are other times when i t would not h e l p to search the database. I f there i s a p r e c i s e l y d e f i n e d f i e l d such as COLOUR with "red", "green" or "blue" e n t r i e s only, then no amount of database updates w i l l change the f a c t that red, green and blue are the only c o l o u r s allowed. Here we don't want to search the database (see S e c t i o n 5.1.1.2). Another reason f o r not querying 102 the database, but r a t h e r querying the user i n s t e a d , i s i f the database system i s slow i n responding. T h i s has probably been the assumption made by most NL system d e s i g n e r s u n t i l r e c e n t l y as they u s u a l l y t r y to make onl y one c a l l to the database. A c t u a l l y , i f the database system i s reasonably f a s t , the n a t u r a l language p a r s e r can r e t r i e v e an enormous amount of i n f o r m a t i o n from i t through intermediate c a l l s . S t i l l another s i t u a t i o n when the p a r s e r might not want to search the database i s when there i s no i n f o r m a t i o n to i n d i c a t e the f i e l d to search. C l e a r l y i t would be r i d i c u l o u s to search every f i e l d of the database to f i n d the element. The i d e a l s i t u a t i o n would be i f the database i t s e l f c o u l d be used • at the base l e v e l of the i n v e r t e d index. Indeed some database languages may provide t h i s f a c i l i t y , but the system used here p r o v i d e s no such l i n k . I f the database language w i l l not p r o v i d e an index, i t must be b u i l t by the database implementor. F o r t u n a t e l y , t h i s i s a task which can be r e a d i l y automated. Sometimes, however, b u i l d i n g an i n v e r t e d database r e q u i r e s more space than the system i s allowed to use. In t h i s case, we must f a l l back on the database search method. The d e c i s i o n here becomes a c l a s s i c one of space versus time and i s u s u a l l y based on machine l i m i t s . Since the machine u n d e r l y i n g t h i s p a r t i c u l a r system has few space problems, time was seen to be the c r u c i a l q u a n t i t y . 103 Since the c o n s t r u c t i o n of an i n v e r t e d index f o r a database i s both a time consuming and menial p r o c e s s , a program was designed to generate an i n v e r t e d index a u t o m a t i c a l l y . The program was w r i t t e n i n the MTS E d i t Procedure sublanguage (Hogg 1980) and i s designed to take SPIRES database output and c r e a t e a LISP d i c t i o n a r y with e n t r i e s of the gen e r a l form: (element ELEMENT-OF f i e l d ) A sample i n v e r t e d database can be found i n Appendix D. In a d d i t i o n t o the d i c t i o n a r y of a c t u a l database elements, the i n v e r t e d index r e q u i r e s the power to i d e n t i f y d i f f e r e n t elements i n the database which .are synonymous. F r e q u e n t l y the database w i l l c o n t a i n synonyms and i t i s only through i d e n t i f i c a t i o n of these synonyms that meaningful answers, to q u e r i e s can be produced. Take, f o r example, the case of the three database elements "burgers", "hamburgers" and "cheeseburgers". I f the query was: Who serves burgers? a l l p l a c e s with "FOOD = BURGER" as w e l l as a l l p l a c e s with "FOOD = HAMBURGER" and "FOOD = CHEESEBURGER" would be expected to be found. To handle t h i s f e a t u r e , a new semantic marker was c r e a t e d . I t i s simple to use, r e q u i r i n g a l i s t of a l l synonyms found i n the database, but must be. ente r e d by the domain implementor. The format to d e f i n e the above case would be: 104 (BURGER FOOD+ (HAMBURGER CHEESEBURGER)) 5.2 The Database I n t e r f a c e The database i n t e r f a c e ( F i g u r e 5.3) i s designed to provide an i d e a l i z e d database to which the system can pose q u e s t i o n s . Not o n l y w i l l t h ere be the one query to f i n d the data to answer the user query, but a l s o p o s s i b l y many intermediate q u e r i e s to f i n d i n f o r m a t i o n needed to co n t i n u e p r o c e s s i n g at any time. A l l of the i n f o r m a t i o n r e l e v a n t t o a p a r t i c u l a r database query language must somehow be i n c o r p o r a t e d . I t s purpose i s to hide the a c t u a l p h y s i c a l c h a r a c t e r i s t i c s of the p a r t i c u l a r database from the l i n g u i s t i c c o r e . The database i n t e r f a c e i s composed of two completely separate s e c t i o n s - the database format r o u t i n e s , which handle input to the database and the data format r o u t i n e s which handle the database output. In changing the u n d e r l y i n g database these r o u t i n e s would have to be r e w r i t t e n but the r e s t of the system should not have t o be m o d i f i e d . The philosophy behind the e n t i r e database i n t e r f a c e i s s i m p l i c i t y . Since i t i s not c l e a r which f u n c t i o n s any database query language may or may not p r o v i d e , assumptions have been kept to the bare minimum. In t h i s way i t should be simple to adapt t h i s system t o any and a l l database systems. The i d e a l method of communication with the database, f o r both the database 105 from l i n g u i s t i c core to l i n g u i s t i c core / \ / format \ \ query / \ _ / / \ -> / save \ <-\ answer / \ / / \ / \ / format \ < — / s e l e c t o r \ \ YES/NO / \ / \ / \ / / \ / send \ \ query / \ / / \ -> / format \ \ WHFIND / \ / / \ / r e c e i v e \ \ data / \ / to database from database F i g u r e 5.3: The Database I n t e r f a c e Module format and data format r o u t i n e s , i s to pass messages d i r e c t l y through low l e v e l f u n c t i o n c a l l s . U n f o r t u n a t e l y the database system l i n k e d to i n t h i s system a l l o w s no such communication. Because of t h i s , a ra t h e r roundabout route has to be taken. Two separate tasks have to be i n i t i a t e d , one running the NL i n t e r f a c e and one running the database management system (DBMS). The two tasks communicate through a shared f i l e with the database format r o u t i n e s g e n e r a t i n g "user" q u e r i e s and the data format r o u t i n e s i n t e r p r e t t i n g the DBMS responses. T h i s method i s extremely awkward and poses more problems than should 106 normally be expected but q u e r i e s can s t i l l be handled i n a reasonably short time. 5.2.1 Database Format Routines The database format r o u t i n e s t r a n s f o r m the standard sentence r e p r e s e n t a t i o n (SSR) i n t o a query i n the data base query language. As mentioned e a r l i e r , t here are two obvious methods of approaching t h i s problem. One i s to communicate d i r e c t l y with the database through the low l e v e l f u n c t i o n c a l l s but, i n t h i s p a r t i c u l a r system, these f u n c t i o n s were not a v a i l a b l e and the database format r o u t i n e s had to generate a "user" query to the database. If a p a r t i c u l a r f u n c t i o n i s c a l l e d f o r i n the SSR, then the r o u t i n e s should f i n d and c a l l the a p p r o p r i a t e database r o u t i n e . A l s o among the r e s p o n s i b i l i t i e s of these r o u t i n e s i s the " f a k i n g " of any f u n c t i o n s which the database should provide but doesn't. A t y p i c a l example would be i f the database were expected to p r o v i d e a l i s t of c u r r e n t l y searchable f i e l d s upon request . Since t h i s i s a common f u n c t i o n of many databases, i t would not be an unwarranted e x p e c t a t i o n and i f the database language we are communicating with does not pr o v i d e t h i s f u n c t i o n , then these r o u t i n e s must. C u r r e n t l y there are only three low l e v e l f u n c t i o n s which the database language i s expected to handle. These c o u l d be 107 expanded but i t should be remembered that i f an i d e a l i n t e r f a c e i s to be p r o v i d e d , one to which v i r t u a l l y any database language c o u l d adapt, then they should i n c l u d e only the very common f u n c t i o n s . The three low l e v e l f u n c t i o n s which would have to be implemented b e f o r e a new database c o u l d be a t t a c h e d a r e : DB-SELECT - which s e l e c t s the a p p r o p r i a t e database DB-EXIST? - which r e t u r n s the number of elements s a t i s f y i n g a c e r t a i n query. DB-FIND - which r e t u r n s the elements s p e c i f i e d by the c o n s t r a i n t s . The database format r o u t i n e s f o r t h i s system were w r i t t e n i n LISP. 5.2.2 Data Format Routines These r o u t i n e s work on the output of the database i n t e r f a c e . T h e i r f u n c t i o n i s to take the output data from the database as r e t u r n e d by the query language or low l e v e l database f u n c t i o n s and r e t u r n to the Q/A system the p o r t i o n of the answer i t r e q u i r e s . The standard format that t h i s system expects i s a l i s t of the elements found. The b a s i c s t r u c t u r e i s : (FIELD = ELEMENT) If more than one p i e c e of i n f o r m a t i o n i s to be r e t u r n e d , i t w i l l be returned as a l i s t of l i s t s : 108 ((FIELD1 (FIELD2 (FIELD3 ELEMENT1) ELEMENT2) ELEMENT3)) The data format r o u t i n e s i n t h i s system have been implemented p a r t i a l l y i n LISP, but mostly i n the SPIRES P r o t o c o l sublanguage (Buckland 1981). 5.3 Summary The database i n t e r f a c e has been kept as small as p o s s i b l e . There have been no complex f u n c t i o n s such as the i d e n t i f i c a t i o n and h a n d l i n g of metaquestions (see S e c t i o n 3.3.1) i n c l u d e d i n i t . Through t h i s simple i n t e r f a c e i t should be s t r a i g h t f o r w a r d * to a t t a c h a new database to the NL system; however, as the que s t i o n s from the NL system become more i n v o l v e d , the database i n t e r f a c e w i l l undoubtedly have to become more complex i t s e l f . The domain d e f i n i t i o n c o n t a i n s three separate components: the domain d i c t i o n a r y , the case l i s t and the i n v e r t e d index f o r the database. Together they attempt t o provide an i n f o r m a t i o n bank which the NL parse r can query to r e t r i e v e domain dependent i n f o r m a t i o n . A l l p a r t s of the domain d e f i n i t i o n have been s t r u c t u r e d i n a d e c l a r a t i v e format to f a c i l i t a t e quick and easy m o d i f i c a t i o n . 109 To determine whether or not the domain d e f i n i t i o n process was both s u f f i c i e n t and simple, the NL system was t r a n s f e r r e d to a new domain of d i s c o u r s e . The next chapter p r o v i d e s a d i s c u s s i o n of t h i s p r o c e s s . 1 1 0 Chapter 6 A Change of Domains In order to determine whether or not the n a t u r a l language database i n t e r f a c e c r e a t e d i n Chapters 4 and 5 was indeed domain independent, a ~test was performed. The t e s t was to adapt the i n t e r f a c e to a new domain of d i s c o u r s e . A f t e r d e v e l o p i n g the i n t e r f a c e t o i n t e r a c t adequately i n the i n i t i a l r e s t a u r a n t domain, i t was adapted to an A l b i b l i o g r a p h y domain. Then, a f t e r r e v i s i o n s based on the r e s u l t s of the domain change, the i n t e r f a c e was t r a n s f e r r e d to a conference domain (see Appendix E f o r a sample s e s s i o n ) . 6.1 The D e f i n i t i o n Process; A Guide to the Perplexed The d e f i n i t i o n process i s made up of a few tasks which must a l l be performed by someone f a m i l i a r with the domain and database system being used. (e.g. the database a d m i n i s t r a t o r (DBA)). I t would be h e l p f u l i f t h i s person had some knowledge of the NL system but h o p e f u l l y i t has been designed i n such a way that t h i s i s not r e a l l y necessary. The tasks to be performed a r e : 111 1) c o n s t r u c t an i n v e r t e d database ( i f none e x i s t s a l r e a d y ) 2) d e f i n e the database f i e l d s to be used 3) d e f i n e the a c t i o n s which w i l l be allowed 4) d e f i n e any a b b r e v i a t i o n s , synonyms and jargon to be used Appendix D c o n t a i n s a sample domain d e f i n i t i o n . 6.1.1 C o n s t r u c t i n g an Inverted Database Sometimes a database system w i l l p r o v i d e f a s t access to an i n v e r t e d index of the database. More o f t e n i t w i l l be slow or non - e x i s t e n t . In these cases i t i s b e n e f i c i a l to b u i l d one e x t e r n a l to the NL and database systems. As d i s c u s s e d i n Chapter 5, the DBMS S p i r e s to which the i n t e r f a c e was at t a c h e d contained no hooks f o r an i n v e r t e d database. However, s i n c e the inf o r m a t i o n r e q u i r e d i n the i n v e r t e d database i s only the f i e l d v alues and the name of the f i e l d ( s ) i n which i t i s l o c a t e d , the c o n s t r u c t i o n was easy to automate. B u i l d i n g the i n v e r t e d database f o r a l l of the domains t e s t e d was done a u t o m a t i c a l l y on the DBMS output by a procedure w r i t t e n i n the MTS E d i t Procedure sublanguage. 1 1 2 6.1.2 Database F i e l d D e f i n i t i o n s A l l searchable and non-searchable f i e l d s i n the database must be d e f i n e d to the NL parser by the system a d m i n i s t r a t o r . T h i s d e f i n i t i o n process i s c u r r e n t l y very simple as few ge n e r a l f e a t u r e s have been implemented. However, the d e f i n i t i o n should be able to be extended when any new f e a t u r e i s d e s i r e d . The f i e l d d e f i n i t i o n s inform the system what the p r o p e r t i e s of the p a r t i c u l a r f i e l d a r e; both mandatory and o p t i o n a l p r o p e r t i e s must be d e f i n e d . C u r r e n t l y there are two f i e l d d e f i n i t i o n s which are mandatory (DBFIELD and DBCAT) and one o p t i o n a l p r o p e r t y (ORDER) d e s i g n a t i n g the o r d e r i n g of the f i e l d (see S e c t i o n 5.1.1.2). 6.1.3 A c t i o n D e f i n i t i o n s The a c t i o n s are d e f i n e d i n a case frame s t r u c t u r e as was d i s c u s s e d i n S e c t i o n 5.1.1.1. De c i d i n g which a c t i o n s need to be d e f i n e d was done, f o r each domain, by gen e r a t i n g a l i s t of sample q u e s t i o n s and then e x t r a c t i n g from t h i s l i s t the domain s p e c i f i c v erbs. The cases to d e f i n e f o r each a c t i o n were taken from the case l i s t p r o v i d e d - a copy of which can be found i n Appendix B. 113 6.1.4 A b b r e v i a t i o n s , Synonyms and Jargon The d e f i n i t i o n of domain s p e c i f i c terms and jargon can be found i n S e c t i o n 5.1.1.3. Many of the terms d e f i n e d were a c t u a l l y an ex t e n s i o n of the i n v e r t e d database. In some cases, common a b b r e v i a t i o n s (such as "UBC" f o r " U n i v e r s i t y of B r i t i s h Columbia") may not be found i n the database. To f a c i l i t a t e the use of these a b b r e v i a t i o n s , they must be added e i t h e r to the domain d i c t i o n a r y or to the i n v e r t e d database. Sometimes non-standard a b b r e v i a t i o n s are used by the trade (even i f a standard e x i s t s ) . Whereas "Comm. ACM" i s the standard a b b r e v i a t i o n f o r "Communications of the A s s o c i a t i o n f o r Computing Machinery", "CACM" i s a l s o widely used. By making both of them a b b r e v i a t i o n s to the NL system, the user does not have to remember which i s the standard. 6.2 The Restaurant Domain The i n i t i a l database around which the demonstration system was b u i l t holds i n f o r m a t i o n concerning r e s t a u r a n t s . I t i s the type of database which might soon be found on a t e l e v i s i o n i n f o r m a t i o n network (e.g. T e l i d o n ) . Included here are data concerning the l o c a l e a t i n g e s t a b l i s h m e n t s ; the types of d i s h e s they serve, t h e i r l o c a t i o n , hours of o p e r a t i o n , q u a l i t y of food and r e l a t i v e p r i c e s . Both searchable and non-searchable f i e l d s 114 are i n c l u d e d . The r e s t a u r a n t database used here was developed by the UBC Computing Centre to demonstrate the SPIRES DBMS. During demonstrations, new users are encouraged to add data to the SPIRES s u b f i l e and so the r e s u l t i n g database i s a l i t t l e u n r e l i a b l e and i n c o n s i s t e n t i n naming conventions. The f i e l d s of the r e s t a u r a n t s u b f i l e used are shown i n F i g u r e 6.1. Fieldname D e s c r i p t i o n Searchable name re s t a u r a n t name yes l o c a t ion address yes phone phone number no c o s t approx. c o s t of a meal f o r 2 yes food types of food served yes s t a r s q u a l i t y of the r e s t a u r a n t yes meals when i s i t open yes comments anything e l s e no F i g u r e 6.1: The F i e l d s i n the Restaurants Database Some example q u e r i e s which p r o s p e c t i v e d i n e r s might have f o r such a system a r e : 115 What are some I t a l i a n r e s t a u r a n t s ? Can you f i n d me a cheap Japanese place? Which are the best r e s t a u r a n t s ? What i s on the menu at White Spot? How many Chinese food p l a c e s are there? When does the Yangtze open? Is there a T u r k i s h p l a c e which i s open f o r lunch? What i s the White Spot on G r a n v i l l e ' s phone number? Since t h i s was the i n i t i a l domain a t t a c h e d to the NL system, i t s design was t a i l o r e d towards answering these q u e s t i o n s . 6.3 Adaptation to the B i b l i o g r a p h y Domain A f t e r the system was able to f u n c t i o n adequately i n the r e s t a u r a n t domain system, i t was time to turn to another. An A r t i f i c i a l I n t e l l i g e n c e (Al) b i b l i o g r a p h y database was chosen. The vocabulary of t h i s new domain was d i s s i m i l a r enough to cause a p o t e n t i a l p o r t a b i l i t y problem even though the s t r u c t u r e of the qu e s t i o n s remained s i m i l a r . Some of the f i e l d s i n v o l v e d i n t h i s database were the author, book, s u b j e c t , p u b l i s h e r , date and a b s t r a c t , again i n c l u d i n g both searchable and non-searchable f i e l d s . The A l b i b l i o g r a p h y database has been developed by the UBC Department of Computer Science p r i m a r i l y as a r e s e a r c h a i d . The 116 a d d i t i o n s to t h i s database are made i n a more uniform and c o n t r o l l e d method than the r e s t a u r a n t s database and i t t h e r e f o r e presented a more r e l i a b l e i n f o r m a t i o n base. The a c t u a l f i e l d s i n t h i s database used are shown i n Fig u r e 6.2. Fieldname D e s c r i p t i o n Searchable author author yes t i t l e t i t l e of the work yes date date i t was w r i t t e n yes type what type of a r t i c l e yes a b s t r a c t a b s t r a c t of the a r t i c l e no l o c a t i o n where the book i s p h y s i c a l l y yes keywords a s s o c i a t e d t o p i c s yes pub p u b l i s h e r of the book yes i n s t what i n s t i t u t i o n put i t out yes F i g u r e 6.2: The F i e l d s i n the B i b l i o g r a p h y Database Some of the qu e s t i o n s which were put to t h i s system a r e : Who wrote Aspects? How many papers has Schank w r i t t e n ? How many v i s i o n books were w r i t t e n before 1978? F i n d at l e a s t 4 papers by Minsky. 1 17 The major d i f f e r e n c e between the databases came i n the area of vocabulary. There seemed to be no ge n e r a l way to d e f i n e words and s p e c i a l terms so th a t they c o u l d apply to a l l databases s i m u l t a n e o u s l y . For example, whereas i n the b i b l i o g r a p h y database the mention of the word "name" b r i n g s about c o n f l i c t s between the p u b l i s h e r , author and book t i t l e , i n the r e s t a u r a n t database, the word i s v i r t u a l l y unambiguous ( s i g n i f y i n g the r e s t a u r a n t name). Conversely, the q u e s t i o n : Where i s Schank and Colby's book? to the b i b l i o g r a p h y database i n v o l v e s no ambiguity (there was only one " l o c a t i o n " f i e l d ) whereas the query: Where i s a good steak p l a c e ? to the r e s t a u r a n t database does because i t c o u l d r e f e r to the name of the r e s t a u r a n t or the address. Other words p l a y major r o l e s i n one database (such as "serve" i n r e s t a u r a n t s , " w r i t e " i n b i b l i o g r a p h y and " r e g i s t e r " i n conference) but never appear i n the o t h e r s . The database elements were d e f i n e d by i n v e r t i n g the database. Although f u l l y automated, the process d i d r e q u i r e a s u b s t a n t i a l amount of CPU time and d i s k space due to the s i z e of the database. A d d i t i o n s to the database i n terms of a b b r e v i a t i o n s and synonyms were made by a manual pass over the i n v e r t e d database i n an e d i t o r . 118 D e f i n i t i o n of the f i e l d s , both i n the domain d i c t i o n a r y and in the case l i s t , was f a i r l y s t r a i g h t f o r w a r d and quick s i n c e the database had under 10 f i e l d s to d e f i n e . Next some sample sentences were generated to determine the domain s p e c i f i c a c t i o n s and jargon to d e f i n e . T h i s was the most time consuming process s i n c e there was no formal procedure to f o l l o w . The a c t u a l d e f i n i t i o n of these domain s p e c i f i c terms (again u s i n g the p r e - d e f i n e d case l i s t ) was r e l a t i v e l y q u i c k . One shortcoming which was uncovered d u r i n g the domain changing e x e r c i s e was the omission of an important u n i v e r s a l f e a t u r e from the i n i t i a l v e r s i o n . Since there was no numeric o r d e r i n g of elements i n the r e s t a u r a n t system, i t had no way to handle q u e s t i o n s r e l a t i n g to "before" and " a f t e r " . There was no p o s s i b i l i t y to simply add to the c u r r e n t d e f i n i t i o n ; the parse r i t s e l f had to be m o d i f i e d . The problem was s o l v e d by a l l o w i n g the v a l u e s *ASCENDING and *DESCENDING to appear on the marker ORDER. T h i s turned out to be a simple e x t e n s i o n to the f i e l d d e f i n i t i o n to d e f i n e i t and only a s l i g h t m o d i f i c a t i o n to the l i n g u i s t i c core to handle i t . Problems caused by o v e r s i g h t are bound to happen i n any system and no system w i l l ever a c t u a l l y be complete; however, t h i s o v e r s i g h t was due more to a s e v e r l y l i m i t e d t e s t i n g stage of the r e s t a u r a n t system than to the design of the NL parser as a whole. 119 6. 4 The Conference Domain The t h i r d database hooked up to the n a t u r a l language parser was a conference r e g i s t r a t i o n database. I t was o r i g i n a l l y designed by the Computer Science Department at UBC to c o n t a i n i n f o r m a t i o n on the p a r t i c i p a n t s at the 7th I n t e r n a t i o n a l J o i n t Conference on A r t i f i c i a l I n t e l l i g e n c e (IJCAI) h e l d at UBC i n 1981 . The a c t u a l f i e l d s used i n t h i s database used are shown i n Fig u r e 6.3. Fieldname D e s c r i p t i o n Searchable name name of the p a r t i c i p a n t yes i n s t i t u t e i n s t i t u t i o n the person came from yes a~t ime a r r i v a l time yes o-country country the person came from yes type how the person - r e g i s t e r e d yes F i g u r e 6.3: The F i e l d s i n the Conference Database Some of the qu e s t i o n s which were handled by t h i s system are : 120 Who i s coming from SRI? Has John McCarthy r e g i s t e r e d ? F i n d a l l the people who have r e g i s t e r e d as an e a r l y - s t u d e n t . When d i d Minsky r e g i s t e r ? How many people are coming from MIT? When i s Schank coming? As i n the d e f i n i t i o n of the Al B i b l i o g r a p h y domain, the c r e a t i o n of the i n v e r t e d database was done a u t o m a t i c a l l y . Again a set of q u e s t i o n s were generated i n order to e x t r a c t the domain dependent a c t i o n s and jargon. Because there were no new concepts to handle f o r t h i s domain, the e n t i r e d e f i n i t i o n process was completed w i t h i n a few hours. 6 . 5 Summary A f t e r the l i n g u i s t i c core had been brought up to a l e v e l of competence where i t c o u l d handle the simple q u e s t i o n s posed, the d e f i n i t i o n of a new domain became a s t r a i g h t f o r w a r d and quick p r o c e s s . However, the d i f f e r e n t domains used i n t h i s t e s t were a l l r e s i d i n g under the same database system and t h i s undoubtedly played a r o l e i n l i m i t i n g the s t r u c t u r e of q u e s t i o n s which c o u l d be asked or answered. 121 Chapter 7 C o n c l u s i o n s The achievements of t h i s system l i e in the ease with which i t can be adapted to a new domain of d i s c o u r s e . The s t r u c t u r e of the domain dependent i n f o r m a t i o n allows a great deal of q u e s t i o n answering c a p a b i l i t y to be d e f i n e d e a s i l y and q u i c k l y . Of course, there remain i s s u e s of adequacy and e x t e n d a b i l i t y which have not been d e a l t with s a t i s f a c t o r i l y . I t has never been expected that the methods and s t r u c t u r e s developed here c o u l d be t r a n s f e r r e d , as i s , to a more complex world of general d i s c o u r s e ; however, i n the more l i m i t e d q u e s t i o n answering paradigm they do appear to be reasonably a c c e p t a b l e . The s t r a t e g y of s e p a r a t i n g the knowledge base from the l i n g u i s t i c component does seem u s e f u l enough to be a necessary f e a t u r e i n many domains of d i s c o u r s e . With techniques such as these i t should be p o s s i b l e to develop l a r g e n a t u r a l language database i n t e r f a c e s which are g e n e r a l enough that the domain of d i s c o u r s e can be a l t e r e d without r e q u i r i n g s i g n i f i c a n t m o d i f i c a t i o n s of the e n t i r e system. During the development of t h i s p a r t i c u l a r system there have been a number of i s s u e s r a i s e d which, for some reason or another, c o u l d not be adequately addressed i n the c u r r e n t c o n t e x t . F r e q u e n t l y these problems were set a s i d e because of 122 time c o n s t r a i n t s but others were j u s t beyond the scope of t h i s t h e s i s . Next we w i l l b r i e f l y c o n s i d e r some of these i s s u e s . 7.1 Open Issues Some of the i s s u e s which have not been r e s o l v e d i n t h i s system are the h a n d l i n g of t e x t , value judgements, m u l t i - f i e l d answers, complex c o n j u n c t i o n s , pronoun r e f e r e n c e , c l a r i f i c a t i o n d i a l o g u e and sample sentence g e n e r a t i o n . 7.1.1 Text R e t r i e v a l by Content The whole subject of r e t r i e v i n g t e x t by content i s much too d i f f i c u l t f o r the c u r r e n t system. T h i s became an i s s u e i n the. r e s t a u r a n t domain while attempting to process the "comments" f i e l d and again i n the A l b i b l i o g r a p h y domain when p r o c e s s i n g the f i e l d " a b s t r a c t " . However, t h i s i s a problem-which has not yet been adequately addressed by r e s e a r c h e r s i n g e n e r a l . There are few, i f any, c u r r e n t systems which can p r o p e r l y process t e x t . 7.1.2 Value Judgements An added b e n e f i t of a n a t u r a l language database query system would be i t s a b i l i t y to make some types of value judgements. An example of what i s meant here i s the f o l l o w i n g : 123 Which i s the best r e s t a u r a n t i n town? Of course, methods of answering t h i s w i l l be d i f f e r e n t i n each database system. In some, the f o l l o w i n g steps might have to be performed: 1. S e l e c t STARS 2. Sort i n t o descending sequence 3. Return the f i r s t r e c o r d while i n another i t might be done more simply. In t h i s example, our semantic ( r e t r i e v a l ) component must be able to handle m u l t i - l e v e l commands to the database and t h i s adds complexity. In t h i s system, "best" has been d e f i n e d as any e n t r y with the h ighest number of stars.. The s t r u c t u r e passed to the r e t r i e v a l component w i l l be (FIND (NAME = ?) (STARS >= *ANY)) C u r r e n t l y , while t h i s s t r u c t u r e can be d e f i n e d and processed by the l i n g u i s t i c c o r e , i t can not be handled by the database i n t e r f a c e . 7.1.3 M u l t i - F i e l d Answers In some databases an answer may i n v o l v e e n t r i e s i n more than one f i e l d . An example of t h i s might be found i n a telephone d i r e c t o r y system. Assume that the area code was not 124 e x p l i c i t l y s t o r e d i n the database but c o u l d be determined by the p r o v i n c e and c i t y f i e l d s t o g e t h e r . A query such as: What i s John Smith's phone number? would have to do some reasonably complex c a l c u l a t i o n s to determine the answer. Another type of m u l t i - f i e l d answer would a r i s e when the values of one f i e l d depended upon the v a l u e s of another. T h i s might happen i n a accounting database where one f i e l d i s an a b s o l u t e amount and another i s a code s i g n i f y i n g a d e b i t or a c r e d i t . 7.1.4 Complex Con j u n c t i o n s The p r o c e s s i n g of simple c o n j u n c t i o n s was d i s c u s s e d i n S e c t i o n 4.1.4.6. When many c o n j u n c t i o n s are strung together i t becomes d i f f i c u l t to g i v e any g e n e r a l r u l e s to process them. For example, i n : Have you seen a dog and a bone or a c a t ? the tendency i s to j o i n "the dog" and "the bone", while i n : Have you seen a lady and a boy or a g i r l ? the grouping i s not q u i t e as obvious. Humans use both context and semantics to decide the grouping and we cannot expect a program to handle these types of conjuncted phrases u n t i l i t can 125 d e a l competently with these concepts. 7.1.5 Pronoun Reference Only simple pronoun r e f e r e n c e has been d e a l t with. T h i s was only p a r t i a l l y because of time c o n s t r a i n t s . The q u e s t i o n of complete pronoun r e f e r e n c e (at the human l e v e l ) i s f a r beyond the a b i l i t y of most c u r r e n t systems. F o r t u n a t e l y the design of the s y n t a c t i c - s e m a n t i c i n t e r f a c e ( g eneral r e g i s t e r s ) a l l o w s f o r a great d e a l of f l e x i b i l i t y . 7.1.6 C l a r i f i c a t i o n Dialogue When the system f a i l s i n some p a r t of i t s p r o c e s s i n g i t can e i t h e r g i v e up or enter i n t o a c l a r i f i c a t i o n d i a l o g u e with the user. T h i s problem has been addressed s u p e r f i c i a l l y i n Chapters 3 and 4 however i t i s an important i s s u e which must be ex p l o r e d more f u l l y (Codd et a l 1978). 7.1.7 Sample Sentence Generation The problem here i s how to f i n d out which words should be d e f i n e d i n a new domain and i t i s a problem which has been g l o s s e d over d u r i n g the development o f, not only t h i s system, but a l s o of most pre v i o u s systems. The problem i s not a t o t a l l y t r i v i a l and unimportant one i f we are to adapt a system t o a new domain q u i c k l y . In the domain change undergone to t e s t t h i s 126 system, sample sentence g e n e r a t i o n was one of the more time consuming p o r t i o n s of the p r o c e s s . In a r e a l world a p p l i c a t i o n , p r o f e s s i o n a l s i n the f i e l d would be c a l l e d upon to generate the sample sentences and then the system would be a d j u s t e d to handle these p a r t i c u l a r q u e s t i o n s . 7.2 Problems f o r Future Work There are many problems on which more work must be done. Some of these are extensions to both the s y n t a c t i c and semantic p o r t i o n s of the system, a d a p t a t i o n of the system to a new database system, and computational o p t i m i z a t i o n . 7.2.1 Extensions to the S y n t a c t i c Component The s y n t a c t i c component of t h i s system has been l e f t incomplete, f o r obvious reasons. Many a d d i t i o n a l f e a t u r e s of n a t u r a l language should be a b l e to be implemented as p a r t of the c u r r e n t s y n t a c t i c grammar. Expansion of the s y n t a c t i c s t r u c t u r e b u i l d i n g can be done with l i t t l e or no m o d i f i c a t i o n to the p a r s e r . 7.2.2 Extensions to the Semantic Component Some of the open i s s u e s d i s c u s s e d p r e v i o u s l y c o u l d probably be r e s o l v e d with an extension to the semantic component. The 127 i n t e r n a l r e g i s t e r s t r u c t u r e a l l o w s f o r a great d e a l of in f o r m a t i o n to be st o r e d and r e t r i e v e d at any p o i n t of the parse. Because of t h i s , a new f e a t u r e can be added or mo d i f i e d without a f f e c t i n g the e n t i r e system. 7.2.3 Adaptation t o a New Database System Although d i s c u s s e d b r i e f l y i n Chapter 5 and although hooks fo r t h i s have been implemented, a change of database systems was never implemented. This stemmed from the f a c t t h at there were no other database systems a v a i l a b l e on the MTS system at UBC. However, we expect that i t should be r e l a t i v e l y easy to c a r r y out such an implementation. The major advantage of the design of t h i s system with respect to a database system change are i n the m o d u l a r i t y of the system as a whole and of the database i n t e r f a c e i n p a r t i c u l a r . For example, to d e f i n e a new database i n t e r f a c e would r e q u i r e the coding of only 3 f u n c t i o n s . 7.2.4 Computational O p t i m i z a t i o n T h i s system has been b u i l t with l i t t l e regard f o r e i t h e r time or space e f f i c i e n c y - not s u r p r i s i n g i n an experimental system. Consequently there are many areas of the program which c o u l d be op t i m i z e d . For example, the use of an i n v e r t e d index reduces the amount of CPU time r e q u i r e d . I t does t h i s by c u t t i n g down on 128 the database searches (which are c o s t l y ) but i n c r e a s e s the space requirements i f not implemented as part of the o r i g i n a l database. 7.3 Summary The NL system developed has been s p l i t i n t o 3 separate p a r t s . The l i n g u i s t i c core c o n t a i n s a l l of the domain independent components seen i n recent n a t u r a l language q u e s t i o n answering systems. I t parses q u e r i e s , c o n s u l t s the database i n t e r f a c e f o r the data and formulates the a p p r o p r i a t e reponse. The domain d e f i n i t i o n • i s a c o l l e c t i o n of a l l of the domain dependent terms and database v a l u e s . I t has been designed i n such a way as to f a c i l i t a t e d e f i n i t i o n and m o d i f i c a t i o n . The l i n g u i s t i c core c o n s u l t s the domain d e f i n i t i o n d u r i n g a parse to r e t r i e v e the domain dependent i n f o r m a t i o n i t needs to process the query. The database i n t e r f a c e p r o v i d e s an i d e a l i z e d , w e l l - d e f i n e d i n t e r f a c e to the r e a l database. Because of the s i m p l i c i t y of the f u n c t i o n s r e q u i r e d i n t h i s i n t e r f a c e , i t should be able to be r e w r i t t e n f o r a new database with a minimum of e f f o r t . B i b l i o g r a p h y 129 B i b l i o g r a p h y B a l l , Eugene and Hayes, P h i l (1980), R e p r e s e n t a t i o n of T a s k - S p e c i f i c Knowledge i n a G r a c e f u l l y I n t e r a c t i n g User I n t e r f a c e , T e c h n i c a l Report, Computer Science Department, Carnegie-Mellon U n i v e r s i t y , P i t t s b u r g h , P.A. Bobrow, Robert J . , and Webber, Bonnie L. (1980), PSI-KLONE: P a r s i n g and Semantic I n t e r p r e t a t i o n i n the BBN N a t u r a l Language Understanding System, Proceedings 3rd N a t i o n a l  CSCSI/SCEIO Conference, V i c t o r i a , B.C., pp. 131-142. Brown, J . S., Burton, R. R. and B e l l , A. G. (1974), SOPHIE: A S o p h i s t i c a t e d I n s t r u c t i o n a l Environment f o r Teaching E l e c t r o n i c T r o u b l e s h o o t i n g , B o l t Beranek and Newman, Report No. 2790, Cambridge, Mass. Buckland, Tony (1981), An I n t r o d u c t i o n to SPIRES, U n i v e r s i t y of B r i t i s h Columbia Computing Centre, Vancouver, B.C. Celce- M u r c i a , M. (1979), Paradigms f o r Sentence R e c o g n i t i o n , System Development Corp., F i n a l Report No. HRT-15092/7907. Chomsky, Noam (1965), Aspects of the Theory of Syntax, MIT Press, Cambridge Mass. Codd, E. F., Ar n o l d , R. S., Cadiou, J-M., Chang, C. L., and •Roussopoulos, N. (1978), RENDEZVOUS V e r s i o n 1: An Experimental English-Language Query Formulation System f o r Casual Users of R e l a t i o n a l Databases IBM T e c h n i c a l Report RJ2144, IBM Research Laboratory, San Jose, Ca. F i l l m o r e , C. J . (1968), The case f o r case. U n i v e r s a l s i n  L i n g u i s t i c Theory, N.Y., H o l t , Rinehart and Winston, pp. 1-90. H a r r i s , L a r r y R. (1977a), ROBOT: A High Performance Language I n t e r f a c e f o r Database Query, T e c h n i c a l Report TR77-1, Mathematics Department, Dartmouth C o l l e g e , N.H. H a r r i s , L a r r y R. (1977b), N a t u r a l Language Data Base. Query: Using the data base i t s e l f as the d e f i n i t i o n of world knowledge and as an exte n s i o n of the d i c t i o n a r y , T e c h n i c a l Report TR77-2, Mathematics Department, Dartmouth C o l l e g e , B i b l i o g r a p h y B i b l i o g r a p h y 130 N.H. Hayes, P h i l , and Reddy, Raj (1979), An Anatomy of G r a c e f u l I n t e r a c t i o n i n Spoken and W r i t t e n Man-Machine Communication, T e c h n i c a l Report, Computer Science Department, Carnegie-Mellon U n i v e r s i t y , P i t t s b u r g h , P.A. Hendrix, G. G. (1977), Human e n g i n e e r i n g f o r a p p l i e d n a t u r a l language p r o c e s s i n g , F i f t h I n t . J t . Conf. on A r t i f i c i a l  I n t e l l i g e n c e , MIT., pp. 183-191. Hendrix, G. G., S a c e r d o t i , E. D., Sagalowicz, D., and Slocum, J . (1978), Developing a n a t u r a l language i n t e r f a c e to complex data, ACM T r a n s a c t i o n s on Database Systems, 3(2), pp. 105-147. H i r s t , Graeme (1979), Anaphora i n n a t u r a l language understanding: A survey, T e c h n i c a l Report 79-2, Department of Computer Science, U n i v e r s i t y of B r i t i s h Columbia. Hogg,' John (1980), UBC E d i t : The L i n e F i l e E d i t o r , U n i v e r s i t y of B r i t i s h Columbia Computing Centre, Vancouver, B.C. Johnson, Jan (.1981), I n t e l l e c t on Demand, Datamation, 27(12), pp. 73-78. Johnson, S. C , and R i t c h i e , D. M. (1978), P o r t a b i l i t y of C Programs and the UNIX System, The B e l l System T e c h n i c a l  J o u r n a l , 57(6), pp. 2021-2048. Katz, J . J . , and P o s t a l , P. (1964), An I n t e g r a t e d Theory of  L i n g u i s t i c D e s c r i p t i o n , MIT Press, Cambridge Mass. Marcus, M i t c h e l l P. (1979), A Theory of S y n t a c t i c R e c o g n i t i o n fo r N a t u r a l Language, A r t i f i c i a l I n t e l l i g e n c e : An MIT P e r s p e c t i v e , V o l 1, eds. P. H. Winston and R. H. Brown MIT Press, Cambridge Mass. Minsky, M. (1975), A Framework f o r Representing Knowledge, The  Psychology of Computer V i s i o n , ed. P. H. Winston, Mcgraw H i l l , pp. 211-277. B i b l i o g r a p h y B i b l i o g r a p h y 131 Rosenberg, R i c h a r d S. (1980), Approaching D i s c o u r s e Computationally: A Review, R e p r e s e n t a t i o n and P r o c e s s i n g of Na t u r a l Language, C a r l Hanser V e r l a g , pp. 10-83. R e i t e r , Ray (1978), The Woods Augmented T r a n s i t i o n Network Par s e r , T e c h n i c a l Note 78-3, Department of Computer Scien c e , U n i v e r s i t y of B r i t i s h Columbia. R i c h a r d s , M. (1969), BCPL: A T o o l For Compiler W r i t i n g and System Programming, Proc. S p r i n g J o i n t Computer Conf., pp. 557-566. S a c e r d o t i , E. D. (1977), Language access to d i s t r i b u t e d data with e r r o r recovery, F i f t h I n t . J t . Conf. on A r t i f i c i a l  I n t e l l i g e n c e , MIT., pp. 196-202. Schank, Roger C. (1972), Conceptual Dependency: A Theory of N a t u r a l Language Understanding, C o g n i t i v e Psychology, 3(4), pp. 552-631. Schank, Roger'C. ( 1973), C o n c e p t u a l i z a t i o n s U n d e r l y i n g N a t u r a l Language, Computer Models of Thought and Language, eds. R. C. Schank and K. M. Colby, San F r a n c i s c o , Freeman and Co. T a y l o r , B. H. and Rosenberg, R. S. (1975), A c a s e - d r i v e n pars e r f o r n a t u r a l language, American J o u r n a l f o r  Computational L i n g u i s t i c s , AJCL M i c r o f i c h e 31. Waltz, D. L., F i n i n , T., Green, F., Conrad, F., Goodman, B., and Hadden, G. (1976), The PLANES system: n a t u r a l language access to a l a r g e data base, T e c h n i c a l Report T-34, Coordinated Science Lab., U n i v e r s i t y of I l l i n o i s , Urbana. Waltz, David L. (1978), An E n g l i s h Language Question Answering System f o r a Large R e l a t i o n a l Database, Comm. ACM, 21(7), pp. 526-539. Woods, W. A. (1967), Semantics f o r a Question Answering System, Ph.D. t h e s i s . , Report NSF-19, Aiken Computational Lab., Harvard U n i v e r s i t y , Cambridge, Mass. Woods, W. A. (1970), T r a n s i t i o n Network Grammars f o r N a t u r a l Language A n a l y s i s , Comm. ACM, 13(10), pp. 591-606. B i b l i o g r a p h y B i b l i o g r a p h y 132 Woods, W. A., Kaplan, R. M. and Nash-Webber, B. (1972), The Lunar Sciences N a t u r a l Language Information System: F i n a l Report, B o l t Beranek and Newman, Report No. 2378, Cambridge, Mass. Woods, W. A. (1980), Cascaded ATN Grammars, American J o u r n a l  f o r Computational L i n g u i s t i c s , 6 ( 1 ), pp. 1-12. B i b l i o g r a p h y 133 Appendix A T r a n s i t i o n Network Grammar T h i s appendix c o n s i s t s of t r a n s i t i o n network diagrams f o r the grammar d e s c r i b e d i n S e c t i o n 4.1.2. 134 SENTENCE. PREPOSITIONAL PHRA5E CM" C O M J N.U.MSER 135 NO&N PHRASE CAT CONT CAT hi c f i r P R O 136 VERB PHRASE w A o v'NOT" c 1 37 Appendix B Case L i s t T h i s appendix c o n s i s t s of a l i s t of cases s u p p l i e d to the NL system to s i m p l i f y the d e f i n i t i o n process. I t i s n e i t h e r an exhaustive nor completely d e f i n e d l i s t . The case l i s t used here i s a s l i g h t l y m o d i f i e d v e r s i o n of the one found i n T a y l o r and Rosenberg (1975). AG - the AGENT of an a c t i o n - the one who a c t s BEN - the BENEFICIARY of an a c t i o n - the one who r e c e i v e s an advantage CAUS - the CAUSE - the agent which produces an e f f e c t or r e s u l t COAG - the COAGENT of an a c t i o n - the one who a c t s with the agent DEST - the DESTINATION - where something i s d i r e c t e d EN - to ENABLE - to make p o s s i b l e EX - to EXCHANGE - to give and r e c e i v e INST - the INSTRUMENT of an a c t i o n - what i t was done with LOC - the LOCATION of an a c t i o n - where i t took place 138 MAN - the MANNER of an a c t i o n - how i t was done MOT - the MOTIVE behind an a c t i o n - why i t was done PATH - the PATH - the course of a c t i o n PA - the PATIENT of an a c t i o n - the one which i s a c t e d upon PURP - the PURPOSE - the reason f o r c a r r y i n g out the a c t i o n QUAN - the QUANTITY - the amount RE - the RECIPIENT of an a c t i o n - the one who r e c e i v e s something from the a c t i o n SOU - the SOURCE of the a c t i o n - the o r i g i n TIME - the TIME of the a c t i o n - when i t took plac e TOP - the TOPIC of the a c t i o n - what i t i s about 139 Appendix C P a r t i a l D e f i n i t i o n of the S y n t a c t i c D i c t i o n a r y T h i s appendix attempts to gi v e a f l a v o r of the e n t r i e s i n the s y n t a c t i c d i c t i o n a r y (see S e c t i o n 4.2). Included here are the common, domain independent words: the determiners, q u a n t i f i e r s , p r e p o s i t i o n s and even some a d j e c t i v e s and adverbs. Along with the e n t r i e s i s a b r i e f e x p l a n a t i o n ! of the semantic markers used i n t h e i r d e f i n i t i o n . (" , " ; the comma i s t r e a t e d as a c o n j u n c t i o n to stop p a r t s ; of two d i f f e r e n t compound words being j o i n e d together CONJ *) ; the '*' j u s t means that the c o n j u n c t i o n w i l l be ; t r e a t e d the same as the next non-* c o n j u n c t i o n found; ; f o r example, i n " B i l l , John or Mary" the comma i s ; t r e a t e d as an "or" while i n " B i l l , John and Mary" i t ; i s t r e a t e d as an "and" (A DET * ; s i g n i f i e s t h a t "a" i s a determiner DET* ((NUMBER SG) (ARTICLE INDEF)) ; the p r o p e r t i e s that t h i s determiner g i v e to the NP ; f o l l o w i n g i t are " s i n g u l a r " and " i n d e f i n i t e " f Any l i n e beginning with ";" i s a comment and not par t of the word d e f i n i t i o n . 140 (AFTER PREP * ; t h i s word i s a p r e p o s i t i o n PREP* ((CASES (TIME))) ; i t i n d i c a t e s the "time" case INDR *MORE) ; i t i n d i c a t e s the r e l a t i o n ">"; ; f o r example, " a f t e r 1970" means "> 1970" (ALSO CONJ AND) ; t h i s c o n j u n c t i o n i s the same as "and" ; the ABBREV marker c o u l d a l s o have been used here (AM V (BE (TNS PRESENT) (PNCODE 1SG))) ; t h i s d e f i n i t i o n says that "am" i s a verb whose root ; i s "be" ; TNS PRESENT'informs the parser that the verb i s i n the ; present tense and ; PNCODE 1SG says that i t i s f i r s t person s i n g u l a r (AN DET (A)) (AND CONJ *) (ANY QUANT * NVALUE 0 QVALUE *MORE) 141 ; the NVALUE and QVALUE markers d e f i n e "any" to ; mean "> 0" (ANYTHING PRO * PRO* (GENERAL) ; d e f i n e s i t to be a g e n e r a l pronoun QVALUE *ONE) (ARE V (BE (TNS PRESENT) (PNCODE X13SG))) (AREN'T ABBREV (ARE NOT)) ; t h i s i s how a b b r e v i a t i o n s are added to the d i c t i o n a r y (AS ADV *) (BE V * ; the '*' s i g n i f i e s t h a t the verb i s i r r e g u l a r and so ; a l l of i t ' s c o n j u g a t i o n s must be p r e s t o r e d i n the ; d i c t i o n a r y V* (COPULA (AUX PASSIVE))) (BEEN V (BE (TNS PASTPART))) (BEFORE PREP * PREP* ((CASES (LOC TIME BEN))) ; the cases i n d i c a t e d by t h i s p r e p o s i t i o n are " l o c a t i o n " , ; "time" and " b e n e f i c i a r y " 142 INDR *LESS) (BEING V (BE (TNS PRESPART))) (BEST ADJ (GOOD SUPERLATIVE)) (BETTER ADJ (GOOD COMPARATIVE)) (BOTH QUANT * QVALUE *ALL) (BUT CONJ (AND)) (CAN V * 1 V* ((TNS PRESENT) (PNCODE ANY) (AUX MODAL))) (COUPLE QUANT * NVALUE 2)-(DATUM N A) ; mor p h o l o g i c a l i n f o r m a t i o n to d e r i v e the root from ; the p l u r a l (DO V * V* ((AUX TNS))) (EACH DET * 143 QVALUE *ALL) (EARLY ADJ ER-EST) (FOR PREP * PREP* ((CASES (EX BEN)))) ; the cases i n d i c a t e d by t h i s p r e p o s i t i o n are the ; "exchange" and " b e n e f i c i a r y " cases (FROM PREP * PREP* ((CASES (SOU METH)))) ; t h i s i n d i c a t e s the cases "source" and "method" (GOOD ADJ *) ; the a c t u a l d e f i n i t i o n of "good" must be s u p p l i e d i n ; the domain d e f i n i t i o n s i n c e i t w i l l change from domain ; to domain (HANDFUL QUANT * NVALUE 3) (HOW °ADV * ADV* (QUEST (CASES (MAN)))) ((HOW MANY) ; the words are i n p a r e n t h e s i s to inform the parse r to ; to t r e a t them as a s i n g l e e ntry DET * 1 DET* (QUEST) PRO * PRO* (QUEST) INDF *NUMBER) (IN PREP * PREP* ((CASES (LOC TIME MAN DESC)))) ; t h i s indicates the cases "location", "time", "manner" ; and "description" (LEAST ADJ (LITTLE SUPERLATIVE) (LITTLE ADJ * ADV * QVALUE *LESS) (NONE QUANT * NVALUE 0 ) (ON PREP * PREP* ((CASES (LOC TIME)))) ; the cases indicated by thi s preposition are "location" ; and "time" (SECOND ORDINAL * NVALUE 2) (SOME 1 45 QUANT * NVALUE 3) QVALUE *MORE) (THE DET * DET* ((NUMBER SG-PL) (ARTICLE DEFINITE))) (THEY PRO * , PRO* (SUBJ (NUMBER PL) (PNCODE 3PL))) (WHAT DET * DET* (QUEST) PRO * PRO* (QUEST)) (WHEN ADV * ADV* (QUEST (CASES (TIME)))) ; t h i s d e f i n i t i o n a l l o w s "when" to i n d i c a t e any word ; f i l l i n g the "time" case (WHERE ADV * ADV* (QUEST (CASES (LOC)))) ; t h i s a llows "where" to i n d i c a t e any word f i l l i n g the ; " l o c a t i o n " case PRO * PRO* (QUEST RELATIVE)) 146 Appendix D P a r t i a l D e f i n i t i o n of the Restaurant Domain Each domain d e f i n i t i o n (see S e c t i o n 5.1) i s composed of a domain d i c t i o n a r y , case l i s t and an i n v e r t e d database. Samples of these are given here along with a b r i e f e x p l a n a t i o n of t h e i r usef i n the NL system. D.1 The Domain D i c t i o n a r y The domain d i c t i o n a r y c o n t a i n s a d e f i n i t i o n of the f i e l d s i n the database as w e l l as a l l of the jargon common to the domain. Items which w i l l be found i n the database i t s e l f w i l l not u s u a l l y be found here. (ADDRESS N ES ; morphological i n f o r m a t i o n DBFIELD LOC ; s i g n i f i e s that the name of the address f i e l d i n the database i s LOC DBCAT N) ; d i r e c t s the system to t r e a t e n t r i e s i n the datbase ; f i e l d LOC as nouns t Any l i n e beginning with ";" i s a comment and not p a r t of the domain d e f i n i t i o n . 147 (BAD ; mor p h o l o g i c a l i n f o r m a t i o n i s a l r e a d y i n the common ; d i c t i o n a r y so need not be repeated here INDF STARS INDR *LESS INDE 2) ; the 3 tags INDF, INDR and INDE d e f i n e a r e s t a u r a n t to ; be "bad" i f the c o n d i t i o n "STARS < 2" holds (CHEAP ADJ * INDF COST INDR *LESS INDE MODERATE) ; the c o n d i t i o n f o r a "cheap" r e s t a u r a n t i s . ; "COST < MODERATE" (COST N S DBFIELD COST DBCAT ADJ ; elements of t h i s f i e l d are t r e a t e d as a d j e c t i v e s ORDER (INEXPENSIVE MODERATE EXPENSIVE)) ; an o r d e r i n g i s placed on the COST f i e l d where ; INEXPENSIVE i s lower than MODERATE which i s lower than ; EXPENSIVE ; the o r d e r i n g s are used i n answering comparative ; q u e s t i o n s - i n t h i s case q u e s t i o n s r e l a t i n g to ; "cheaper" and "more expensive" (DRINK N S INDF WINE-LIST V IRR 148 ACTION (AG *HUMAN PA WINE-LIST)) ; t h i s ACTION d e f i n i t i o n says that "humans d r i n k what i s ; on the wine l i s t " (EAT V IRR ACTION (AG *HUMAN PA (FOOD MEALS) RE *HUMAN)) ; humans eat both food (e.g. chicken) and meals ; (e.g. lunch) (FOOD N S DBFIELD FOOD DBCAT N) (GET SYNONYM EAT) ; the SYNONYM f e a t u r e a l l o w s us to q u i c k l y d e f i n e many ; words which mean the same t h i n g (GOOD INDF STARS INDR *MORE INDE 3 ) ; d e f i n i t i o n of "STARS > 3" as being "good" i s s u b j e c t i v e ; as are a l l of the a d j e c t i v e d e f i n i t i o n s i n the domain ; d i c t i o n a r y (HAVE SYNONYM SERVE) (LIQUOR N MASS INDF WINE-LIST) (LOCATION 149 N S SYNONYM ADDRESS) (MEAL N S DBFIELD MEALS DBCAT N) ; s i n c e there are only a few MEALS, the order c o u l d be ; d e f i n e d here (as i n COST) to f a c i l i t a t e answering ; q u e s t i o n s concerning " e a r l i e r " and " l a t e r " (MENU N S INDF FOOD) (NAME N S DBFIELD NAME DBCAT NPR) (NUMBER N S INDF PHONE) (OPEN V S-ED ACTION (AG NAME PA MEALS)) ; t h i s ACTION d e f i n i t i o n means that r e s t a u r a n t s are open ; f o r meals (e.g. lunch) (ORDER V S-ED SYNONYM EAT) 150 (PHONE N S DBFIELD PHONE DBCAT N V S-D ACTION (AG *HUMAN PA PHONE)) ((PHONE NUMBER) ; the p a r e n t h e s i s around the d i c t i o n a r y e n t r y mean that ; the two words "phone" and "number" are to be t r e a t e d as ; a s i n g l e e n t r y N S INDF PHONE) (PLACE N S INDF NAME) (PROVIDE V S-D SYNONYM SERVE) (QUALITY N S INDF STARS) (RATE V S-D ACTION (AG *HUMAN PA STARS RE NAME)) (RATING N S INDF STARS) (RESERVATION N S DBFIELD RESERVATIONS DBCAT N) (RESTAURANT N S INDF NAME) (SERVE V S-D ACTION (AG NAME PA (FOOD MEALS) RE *HUMAN)) (STAR N S DBFIELD STARS DBCAT N) (STREET N S INDF ADDRESS) (WHO INDF NAME) ((WINE LIST) N S SYNONYM WINE-LIST) (WINE-LIST N S DBFIELD WINE-LIST DBCAT N) 152 D.2 The Case L i s t There are only a few cases d e f i n e d f o r the r e s t a u r a n t domain. The cases are the b a s i s f o r determining the f u n c t i o n of a p r e p o s i t i o n a l phrase as w e l l as general a v e r b i a l q u e s t i o n s such as "when" and "where". The cases d e f i n e d a r e : (LOCATION (NAME ADDRESS) (TIME MEALS) D.3 The Inv e r t e d Database Most of t h i s i n v e r t e d database was produced a u t o m a t i c a l l y by a )program w r i t t e n i n the MTS E d i t Sublanguage (Hogg 1980). A d d i t i o n s were then made to i t to i n c l u d e a b b r e v i a t i o n s and synonyms. - Restaurant Names -(ACROPOL ELEMENT-OF NAME) ; means that ACROPOL can be found i n the NAME f i e l d ((AKI JAPANESE RESTAURANT NO 2) ELEMENT-OF NAME) (AKI SYNONYM "AKI JAPANESE RESTAURANT NO 2") ; means that " A k i " i s a synonym f o r "Aki Japanese 153 ; Restaurant No 2" ((CANYON GARDENS) ELEMENT-OF NAME) (GAZEBO ELEMENT-OF NAME) (( I L GIARDINO) ELEMENT-OF NAME) ((LAS TAPAS) ELEMENT-OF NAME) ((SALMON HOUSE ON THE HILL) ELEMENT-OF NAME) ((SEVEN SEAS) ELEMENT-OF NAME) ((WHITE SPOT) ELEMENT-OF NAME) ((WILLIAM TELL) ELEMENT-OF NAME) ((YANGTZE KITCHEN) ELEMENT-OF NAME) (YANGTZE SYNONYM YANGTZE KITCHEN) - Types of Food -(AFGHAN ELEMENT-OF FOOD) (AMERICAN ELEMENT-OF FOOD) (AMERICAN FOOD+ (BURGER "HOT -DOG" CHICKEN STEAK)) ; the FOOD+ de s i g n a t o r means that any search f o r American ; food w i l l a l s o look f o r "burger", "hot dog", "chicken" ; and "steak" (BURGER SYNONYM (HAMBURGER CHEESEBURGER)) ; d i r e c t s the database i n t e r f a c e to search f o r "hamburger" ; and "cheeseburger" whenever "burger" i s requested ; the combined d e f i n i t i o n s of "American" and "burger" w i l l ; cause any request f o r "American food" t o produce a query ; l o o k i n g f o r any of "American", "burger", "hamburger", ; "cheeseburger", "hot dog", "chic k e n " or "steak" 1 54 (CHINESE ELEMENT-OF FOOD) (CURRY ELEMENT-OF FOOD) (HAMBURGER ELEMENT-OF FOOD) (HAMBURGER FOOD+ BURGER)) (ITALIAN ELEMENT-OF FOOD) (JAPANESE ELEMENT-OF FOOD) (LASAGNE ELEMENT-OF FOOD) (LOBSTER ELEMENT-OF FOOD) (SCHNITZEL ELEMENT-OF FOOD) (SEAFOOD ELEMENT-OF FOOD) (STEAK ELEMENT-OF FOOD) - Types of Meals -(BREAKFAST ELEMENT-OF MEALS) (DINNER ELEMENT-OF MEALS) (DINNER MEALS+ SUPPER) ((LATE NIGHT) ABBREV LATE-NIGHT) ; ABBREV i s d i f f e r e n t from SYNONYM i n that an a b b r e v i a t i o n ; w i l l occur at the l e x i c a l l e v e l and a synonym w i l l occur ; at the root word l e v e l (LATE-NIGHT ELEMENT-OF MEALS) (LUNCH ELEMENT-OF MEALS) (TEA ELEMENT-OF MEALS) (SUPPER SYNONYM DINNER) - Types of Costs -(CHEAP SYNONYM INEXPENSIVE) 155 (EXPENSIVE ELEMENT-OF COST) (EXPENSIVE COST+ PROHIBITIVE) (INEXPENSIVE ELEMENT-OF COST) (INEXPENSIVE COST+ (CHEAP REASONABLE)) (MODERATE ELEMENT-OF COST) (REASONABLE SYNONYM INEXPENSIVE) - L o c a t i o n s -(BC ABBREV BRITISH COLUMBIA) (BOULEVARD ELEMENT-OF LOC) ((BRITISH COLUMBIA) ELEMENT-OF LOC) (GASTOWN ELEMENT-OF LOC) ((PACIFIC CENTRE) ELEMENT-OF LOC) ((PARK ROYAL) ELEMENT-OF LOC) (VANCOUVER ELEMENT-OF LOC) - P o s s i b l e S t a r s -(0 ELEMENT-OF STARS) ; the r a t i n g goes from a 0 s t a r r e s t a u r a n t to a 5 s t a r ; r e s t a u r a n t (1 ELEMENT-OF STARS) (2 ELEMENT-OF STARS) (3 ELEMENT-OF STARS) (4 ELEMENT-OF STARS) (5 ELEMENT-OF STARS) 156 Appendix E Sample Session T h i s i s a sample s e s s i o n of the NL system. The qu e s t i o n s here are i n d i c a t i v e of ones which can be answered by the system and are not intended to show the system l i m i t s . L i n e s p r e f i x e d With the c h a r a c t e r ">" were output by the system and l i n e s p r e f i x e d with "*" were input by the user. > Which database do you want to look a t? > > 1. Restaurants > 2. Al B i b l i o g r a p h y > 3. Conference * 1 > > You are c u r r e n t l y examining the RESTAURANTS database. > > A l l of your q u e r i e s should be entered i n upper case and > should be terminated by e i t h e r a ' ? 1 , ' . 1 or ' ! '. > Compound names (e.g. the t i t l e of a book) should be > entered with surrounding double quotes ' " ' and > apostrophes may be used to s i g n i f y p o s s e s s i o n . No other > punctuation should be used. > > To leave the system type STOP., to get more i n f o r m a t i o n > type HELP, and to get t h i s message again type INFO. > > > > HELLO THERE > * WHO SERVES CHICKEN? > > PARSE: > WHFIND > *AND > NAME = ? > FOOD = CHICKEN > > "STEER AND STEIN" AND "WHITE SPOT" SERVE CHICKEN. WHAT ARE SOME ITALIAN RESTAURANTS? PARSE: WHFIND *AND *NUMBER = 3 *OR FOOD = ITALIAN THEY ARE "CAFFE DE MEDICI", "FRANCESCO ALONGI" AND "CORSI TRATTORIA". CAN YOU FIND ME A CHEAP JAPANESE PLACE? PARSE: WHFIND *AND *AND *OR NONE CAN BE FOUND. WHICH IS THE MOST EXPENSIVE RESTAURANT IN VANCOUVER? PARSE: WHFIND *AND *AND *OR COST = EXPENSIVE COST = IMMODERATE COST = PROHIBITIVE NAME = ? COST = COST = COST = INEXPENSIVE CHEAP REASONABLE *NUMBER = 1 FOOD = JAPANESE *OR LOC LOC LOC VANCOUVER VANC VAN THERE ARE 19 REFERENCES. DO YOU WANT THEM PRINTED? NO 158 * WHAT IS ON THE MENU AT STEER AND STEIN? > > PARSE: > WHFIND > *AND > NAME = WHITE SPOT > FOOD = ? > > THEY ARE STEAK, SALADS, CHICKEN AND FISH. > * IS THERE A SPANISH PLACE WHICH IS OPEN FOR LUNCH? > > PARSE: > WHFIND > *AND > *AND > *NUMBER = 1 > FOOD = SPANISH > MEALS = LUNCH > > IT IS "LAS TAPAS RESTAURANTE". > * WHO SERVES CHINESE FOOD? > > PARSE: > WHFIND > *AND > NAME = ? > *OR > FOOD = CHINESE > FOOD = INDO-CHINESE > > THERE ARE 21 REFERENCES. > DO YOU WANT THEM PRINTED? * OK > "THE JASMINE INN", "DAI KEE", > "SHANGHAI PALACE RESTAURANT", "NEW DIAMOND RESTAURANT", > "HO INN CHOP SUEY", "SNOW GARDEN", "KAM YUEN RESTAURANT", > "PINK PEARL", "VARSITY GRILL", PENINSULA, > "KEN WONG VILLAGE", "YANGTZE KITCHEN", > "MY TAN RESTAURANT", "DRAGON INN", "CHINA KITCHEN", > "CANTON KITCHEN", "SHANGHAI HOUSE, MANDARIN RESTAURANT", > "HO HO", "THE CATHAYAN", "MY LAI GARDEN RESTAURANT", AND > "NEW WORLD CHOP SUEY" SERVE CHINESE. > * DO THEY SERVE CHICKEN? > > PARSE: > YES-NO > *AND > *REF = * > FOOD = CHICKEN > > NO, THERE ARE NO REFERENCES. 159 > * IS THERE A FRENCH PLACE IN RICHMOND WHICH IS OPEN FOR LUNCH? > > PARSE: > WHFIND > *AND > *AND > *NUMBER = 1 > FOOD = FRENCH > *OR > LOC = RICHMOND > MEALS = LUNCH > > IT IS "LE CERCLE". > * WHO SERVES DINNER? > > PARSE: > WHFIND > *AND > NAME = ? > *OR > MEALS = DINNER > MEALS = SUPPER > > THERE ARE 109 REFERENCES. > DO YOU WANT THEM PRINTED? * NO > * WHERE IS KEN WONG VILLAGE? > > PARSE: > WHFIND > *AND > NAME = KEN WONG VILLAGE > LOC = ? > > IT IS "1262 BURRARD STREET AND 555 W HASTINGS, VAN. B.C.". > * WHEN IS IT OPEN? > > PARSE: > WHFIND > *AND > *REF = * > MEALS = ? > > IT IS "LUNCH AND DINNER". > 160 WHAT DOES WHITE SPOT SERVE? PARSE: WHFIND *AND NAME = WHITE SPOT FOOD = ? WHITE SPOT SERVES AMERICAN, HAMBURGERS AND CHICKEN. WHAT MEALS DOES WHITE SPOT SERVE? PARSE: WHFIND *AND NAME = WHITE SPOT MEALS = ? WHITE SPOT SERVES BREAKFAST, LUNCH AND DINNER. FIND AT LEAST 4 RESTAURANTS THAT HAVE SWISS FOOD. PARSE: WHFIND *AND *NUMBER = 4 NAME = ? FOOD = SWISS THERE ARE ONLY 3 RESTAURANTS THAT FIT THE CONSTRAINTS. "WILLIAM TELL", "LA RACLETTE" AND "GIZELLA SWISS CHALET" SERVE SWISS. STOP. 161 Which database do you want to look at? 1. Restaurants 2. A l B i b l i o g r a p h y 3. Conference You are c u r r e n t l y examining the Al BIBLIOGRAPHY database. To leave the system type STOP., to get more i n f o r m a t i o n type HELP, and to get t h i s message again type INFO. HELLO THERE WHO WROTE APSECTS? PARSE: WHFIND *AND TITLE = ASPECTS AUTHOR = ? "CHOMSKY N" WROTE APSECTS. HOW MANY BOOKS HAS MCCARTHY WRITTEN? PARSE: WHFIND *AND AUTHOR = MCCARTHY *AND •NUMBER = ? TITLE = ? THERE ARE 9 BOOKS. FIND A BOOK WRITTEN BY MINSKY BEFORE 1978. PARSE: WHFIND *AND *AND *NUMBER = 1 TITLE = ? *AND AUTHOR = MINSKY *AND DATE < 1978 IT IS PERCEPTRONS. STOP. 162 Which database do you want to look a t? 1 . 2. 3. Restaurants Al B i b l i o g r a p h y Conference 3 You are c u r r e n t l y examining the IJCAI-81 database. To leave the system type STOP., to get more i n f o r m a t i o n type HELP, and to get t h i s message again type INFO. HELLO THERE WHO IS COMING FROM MIT? PARSE: WHFIND *AND NAME = ? INST = MIT THERE ARE 13 REFERENCES. DO YOU WANT THEM PRINTED? YES THEY ARE "GLASS, BRIAN", "MCALLESTER, DAVID", "SUSSMAN, GERALD J . " , "WHITE, BARBARA", "DAVIS, RANDALL", "HEWITT, CARL", "OGILVIE, WILLIAM", "HAWKINSON, LOWELL B", "HAMSCHER, WALTER", "PITMAN, KENT", "FRY, CHRISTOPHER", "WATERS, RICHARD C" AND "LESCANE, PIERRE". WHEN DID "LAM, MONICA" REGISTER? PARSE: WHFIND *AND NAME = "LAM, MONICA" TYPE = ? LAM, MONICA REGISTERS EARLY-STUDENT. 163 * HAS ROSENBERG REGISTERED YET? > I cannot f i n d 1 YET ' i n the d i c t i o n a r y . > Do you wish to stop p r o c e s s i n g t h i s query? * NO > D i d you m i s s p e l l ' YET ' ? * NO > Would I f i n d ' YET ' i n the database ? * NO > Would i t be safe to ignore the word ' YET '? * YES > > PARSE: > YES-NO > NAME = ROSENBERG > > YES, THERE ARE 2 REFERENCES. > * WHAT ARE THEIR NAMES? > > PARSE: > WHFIND > *AND > *REF = * > NAME = ? > > THEY ARE "ROSENBERG, RICHARD" AND "ROSENBERG, STEVEN". > * STOP. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051837/manifest

Comment

Related Items