AN APPROACH TO THE ORGANIZATION OF TAXONOMIES B.Sc. University of B r i t i s h Columbia A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Department of Computer Science We accept this thesis as conforming THE UNIVERSITY OF BRITISH COLUMBIA CRAIG DAVID BISHOP by to the required standard March, 1981 © Craig David Bishop, 1981 In presenting this thesis in p a r t i a l f u l f i l l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t fr e e l y available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my department or his representative. It i s understood that copying or publication of this thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Craig Bishop PAGE i i Abstract The ISA hierarchy used by present day i n f e r e n t i a l database systems is d e f i -cient i n that i t does not represent a variety of domain relationships (type r e l a t i o n s h i p s ) . Such hierarchies associate e x p l i c i t l y defined sets with the leaves of a tree. Non-leaf nodes in the tree i n h e r i t members from their c h i l d nodes. In keeping with the use of sets, this paper gives motivation for having type relationships other than subset. Included are union, i n t e r -section and a disjointness condition. A formalism for the typed database (TDB) i s given, using the monadic f i r s t order predicate calculus as i t s theoretical basis. Non-unit formulae repre-sent intensional (general) information about the world being represented and unit ground clauses represent extensional ( s p e c i f i c ) information. Predi-cates represent types, and constant symbols represent set members. This is connected to the set concept via predicate extensions. The extension is that set of constant symbols which are provable as arguments to the given predicate. Given the so-called concreteness condition and consistency of the TDB, the desired set theoretic relationships (union, intersection and disjointness) of predicate extensions follow. This strengthens the l i n k between the for-malism of the predicate calculus and the more natural set representation. A canonical form of a TDB is shown that admits an appropriate machine repre-sentation. Using this is can be determined i f a constant symbol as an PAGE i i i argument to a given predicate is provable (domain membership) i n constant time. An update algorithm is developed and is shown to be correct i n that i t maintains concreteness and consistency. Thus the TDB i s shown to be a p r a c t i c a l generalization of the ISA hierarchy but is considerably more expressive. PAGE i v . Table of Contents Page # Section A Introduction 1 Section B Motivation 4 Section C Formal Preliminaries 7 The TDB 7 Extension 9 A c y c l i c i t y of a TDB 9 ALPHA, BETA and TAU 9 Diagram C.l - Sample TDB Intension 11 Lemma C.l - Generalization . . . . . . . . . . . . . 1 2 Co r o l l a r y C.l - Containment 13 Lemma C.2 - Inte r s e c t i o n 13 Section D The D e f i n i t e Component of a TDB .15 The D e f i n i t e Component /\(T) 15 Concreteness 17 The P r o p o s i t i o n a l Component PROPT 17 Lemma D.l - Pro p o s i t i o n a l Equivalence 18 Dead Clauses DEADT 19 Lemma D.2 - Dropping of Dead clauses 19 Theorem D.l - T -/\(T) Equivalence 21 Lemma D.3 - Union 24 Lemma D.4 - Disjointness 25 Coroll a r y D.l - D i s j o i n t Union 25 PAGE v Table of Contents Page # Section E Transformation of a TDB to i t s Reduced Form 26 The Horn Component H(T) 26 Algorithm E . l - DOMAIN^ 27 Sample Run of Algorithm E . l 28 Theorem E . l - Algorithm E . l Correctness 29 Corollary E . l - A(T) - DOMAIN^ Equivalence 32 The Reduced or Canonical TDB, R(T) 32 Lemma E . l - Spanning 33 Theorem E.2 - T - R(T) Equivalence 33 Corollary E.2 . . 34 Corollary E.3 - Concreteness of R(T) 34 Corollary E.4 - Consistency of R(T) 34 Section F Updating a TDB 36 Algorithm F . l - Update 36 Theorem F . l - Algorithm F . l Correctness 37 Sample Run of Algorithm F . l 38 Summary and Conclusions 41 Bibliography 42 Appendix A Dictionary of Symbols 44 PAGE v i Acknowledgements I wish to thank Dr. Raymond Reiter of the University of B r i t i s h Columbia for his perceptive and c r i t i c a l analysis of the ideas that went into this paper. His dedication and insight into i n f e r e n t i a l database theory has been a continuing source of enlightenment. I would also l i k e to thank B e l l Northern Research of Ottawa for the use of their word processing f a c i l i t i e s in the preparation of this manuscript. PAGE 1 Section A Introduction In recent works on r e l a t i o n a l database systems, espe c i a l l y as they arise i n a r t i f i c i a l i n t e l l i g e n c e , the need for a database of domains (using Codd's d e f i n i t i o n of Domain^) has become apparent. In simple data process-ing applications, the acceptance or rejection of data i n a domain i s based on a decision procedure for that domain such as correct alpha/numeric form or membership in a predefined set. This makes the primary assumption that each domain i s independant of a l l others. This may be r e a l i s t i c for domains i n which each entry of one domain is more or less unrelated to other domains ( l i k e numeric ones). However, work in a r t i f i c i a l i n t e l l i g e n c e and i n f e r -e n t i a l database systems acknowledges the dependancies amoung domains. Winograd led the way by using an ISA hierarchy to relate domains of objects and their properties, eg. a MOVEABLE-OBJECT ISA PHYSICAL-OBJECTt 1 5^. McSkimin and Minker formalize the ISA hierarchy by defining a semantic graph as a labelled tree in which the root node i s the universal category (domain) and the leaves are primitive categories [7 ,8 ,9 ]^ j n t h e i r formalism the elements, which are associated with the primitive categories, are inherited by the parent categories. As can be seen i n their examples these trees tend to be "leafy" with many categories, some with unnatural names (such as non/human/male/mam). There i s much duplication of category names but no mechanism i s given for insuring that duplicate categories contain the same elements. PAGE 2 Also, McSkimin and Minker define a pa r t i t i o n i n g (disjointness condition between categories) without defining an update algorithm that w i l l maintain the p a r t i t i o n . In short, though they define the ISA heirarchy, they do not include a mechanism for enforcing data consistency. Reiter has adopted the idea of a typed database formalized i n the monadic predicate calculus ^ ' 1 1 ^. He replaces the idea of an element being a member of a domain with the proof theoretic notion of pr o v a b i l i t y . Here the domain is a monadic predicate i n which the element is i t s argument. The structure, mathematics and algorithms for this typed database are beyond the scope of his paper. It is the purpose of this paper to propose a structure for Reiter's typed database and develop the necessary mathematics and algorithms. A domain in Codd's terminology w i l l hereafter be referred to as a TYPE. An element w i l l be referred to as an INSTANCE. For example, HUMAN i s a TYPE for which JOHN i s an INSTANCE. JOHN i s said to have the property HUMAN. A type database (TDB) i s then a l o g i c a l theory i n which groups of TYPEs (and hence INSTANCES) s a t i s f y certain mathematical relationships c a l l e d type r e l a t i o n s h i p s . In order for a TDB to be p r a c t i c a l i t must s a t i s f y certain conditions: 1) It must be des c r i p t i v e l y r i c h , capturing type relationships other than simple set containment. PAGE 3 2) It must be structured so as to encourage only the most natural information content. 3) It must admit algorithms that are computationally e f f i c i e n t in both time and space. 4) It must admit updates in such a way as to prevent a user from v i o l a t i n g a type relationship. ( i e . enforce i n t e g r i t y con-s t r a i n t s ) Section B motivates the inclu s i o n of type relationships other than set con-tainment. Section C then defines a TDB. Condition 2 i s addressed i n section C which defines an ac y c l i c TDB and introduces a graphic notation. Structuring i s completed i n section D by imposing the so-called concreteness condition on TDBs. With this and consistency a l l the desired set theoretic properties of the TDB follow. Section E i s devoted to an algorithm for reducing a given TDB to one which is e f f i c i e n t i n both time and space. The f i n a l section d e t a i l s an update algorithm which ensures that concreteness and consistency of the TDB are maintained during updates. PAGE 4 Section B Motivation To see why a more robust set of type relationships than simple set contain-ment i s important consider the following statements: 1) JANE i s a WOMAN; hence JANE i s a FEMALE (ISA) 2) JANE i s a FEMALE; hence JANE is a WOMAN or JANE i s a GIRL (but not both) 3) a) JOHN i s a BACHELOR; hence JOHN i s a MAN (ISA) b) JOHN i s a BACHELOR; hence JOHN i s SINGLE (ISA) 4) JOHN i s a MAN and JOHN i s SINGLE; hence JOHN i s a BACHELOR Statements 3a and 3b bear a resemblance to statement 1 i n that both display set containment. However, in view of statements 2 and 4 we see that some-how FEMALE i s composed of WOMAN and GIRL - WOMAN and GIRL being d i s j o i n t TYPEs. In contrast MAN and SINGLE intersect to give BACHELOR. The standard ISA hierarchy does not provide the formal equivalent of statements 2 and 4. An inference of type 2 i s important i n that we would not want JANE to have the property FEMALE without her having one of the properties WOMAN or GIRL. Si m i l a r l y we would not want JANE to have both the properties WOMAN and GIRL. As seen i n the following sections, "concreteness" s a t i s f i e s the former con-d i t i o n , consistency the l a t t e r . The type relationship of type 4 i s important when we want to define a TYPE as having INSTANCES which are common to a set of TYPEs. In an a i r l i n e reservation system JOHN might have the properties NON-SMOKING, COACH, PAGE 5 MOVIE-WATCHING and 747 from which the system could i n f e r that JOHN i s a member of CABIN-B. The way the above statements pair suggests the need for only two classes of type r e l a t i o n s h i p s : A) The TYPE FEMALE i s equivalent to the d i s j o i n t union of the TYPEs WOMAN and GIRL B) The TYPE BACHELOR i s equivalent to the i n t e r s e c t i o n of the TYPEs MAN and SINGLE The union i n A need not be d i s j o i n t , for example: C) The TYPE PEOPLE i s equivalent to the union of the TYPEs LEFT— HANDED and RIGHT-HANDED (allowing for the ambidextrous) Graphically these are represented as: A) • B) C) FEMALE MAN SINGLE PEOPLE WOMAN GIRL BACHELOR RIGHT- LEFT-HANDED HANDED Set t h e o r e t i c a l l y these can be represented as: A) FEMALE = WOMAN # GIRL B) BACHELOR = MAN n SINGLE C) PEOPLE = RIGHT-HANDED U LEFT-HANDED PAGE 6 In the graphical representation, the more r e s t r i c t e d TYPE i s drawn below the TYPE of which i t is a subset. The four o r i g i n a l statements can now be written in terms of sets: 1) JANE £ WOMAN implies JANE £ FEMALE 2) JANE £ FEMALE implies JANE £ WOMAN Q GIRL 3) JOHN £ BACHELOR implies JOHN £ MAN 0 SINGLE 4) JOHN £ MAN fl SINGLE implies JOHN £. BACHELOR PAGE 7 Section C Formal Preliminaries The set notation of the previous section is convenient for v i s u a l i z i n g the TYPE re l a t i o n s h i p s . The f i r s t order predicate calculus, however, offers a r o 131 well developed proof theory 1 » J . For this reason we s h a l l develop our results within a f i r s t order framework. Refer to appendix A for the int e r p r e t a t i o n given to special symbols used i n what follows. D e f i n i t i o n TYPE Database (TDB) A TDB is a 4-tuple ( C , P, I , E ) s a t i s f y i n g : 1) C i s a f i n i t e set of constants. 2) P i s a f i n i t e set of unary predicate signs, c a l l e d TYPES. 3) E i s a f i n i t e set of atomic formulae of the form Pc where P £ P and c £ C . E i s called the extension of the TDB. 4) I i s a f i n i t e set of well formed formulae of the form: i ) (x) Px = Pjx A. P 2x A ... P nx or_ i i ) (x) Px = Pj^x v P 2x v ... P nx where [P, P ^ P 2, ... P n] C P. In the event that I contains a formula of the form ( i i ) , i t may also contain a l l of the formulae: i i i ) (x) ~ (P.jX A P J X ) for each 1 <= i < j <= n. In which case the formula ( i i ) is said to be exclusive. I i s called the intension of the TDB. PAGE 8 For Example: (C, P, I, E) i s a TDB where: C = [John, mary] (INSTANCES) P = [FEMALE, WOMAN, GIRL, MAN, SINGLE, BACHELOR] (TYPES) I = [(x) BACHELOR x = MAN x A SINGLE x, (x) FEMALE x = WOMAN x v GIRL x, ( x ) ~ (WOMAN x A GIRL x) ] (intension) E = [WOMAN mary, MAN John, SINGLE John] (extension) Notation: 1) For brevity, the variable 'x' w i l l be dropped where understood. 2) (C, P, I, E) w i l l be referred to as T's constants, predicates, intension and extension where c l a r i f i c a t i o n i s required. 3) Members of C w i l l be written i n lower case, members of P i n upper case. 4) For reasons of notational convenience we sometimes write W £ I T131 where W is actually a clause 1 J« 5) A TDB w i l l be said to be consistent whenever I U E i s . 6) When we write T \— W we w i l l mean I U E \~ W for any formula W. 7) R (U, V) w i l l refer to the resolvents'- 1"^ of the formulae U and V. With this d e f i n i t i o n of a TDB we can replace the notion of an INSTANCE being a member of a TYPE (John £ BACHELOR) with the predicate calculus notion of pr o v a b i l i t y (I U E \~ BACHELOR John). However, we would s t i l l l i k e to view a type as a set with certain set theoretic relationships. PAGE 9 D e f i n i t i o n Extension The extension of P £ P written ||P||^ i s defined: ||P|| t = [c | c £ C, I U E f- Pc] The subscript on ||P||-J. w i l l be dropped where i m p l i c i t . Hence we can write c £ ||p|| for T f~ Pc. The set theoretic relationships between extensions of the l a s t section do not automatically follow. We w i l l give conditions under which each of the relationships given do follow. A c y c l i c i t y of a TDB The idea of a TYPE viewed as an ISA hierarchy i s firmly related to the ideas of generalization and r e s t r i c t i o n . L o g i c a l l y we would l i k e to think of one type containing another ( i e . ||P|| C ||Q ||). This leads to the notion of a hierarchy of TYPES, motivating the following a c y c l i c i t y d e f i n i t i o n ^ . D efinitions ALPHA, BETA and TAU The relations ALPHAT, BETAT and TAU T are defined on a TDB, T, over P x P: 1) P ALPHAT Q i f f Q E Pj v P 2 v ... P n £ I such that P = P^ for some i , 1 <= i <= n 2) P BETAT Q i f f P EE Qx A Q 2 A ... Q n £ I such that Q = Q i for some i , 1 <= i <= n 3) TAU T = ALPHAT U BETAT The subscripts on ALPHA^, BETA^, and TAU^ , w i l l be dropped where impli-c i t . A TDB i s said to be a c y c l i c i f f TAU"1" i s i r r e f l e x i v e . i e . PAGE 10 P Q T A U Pl and Pj T A U P 2 and ... P n - 1 T A U P. implies not P n T A U P Q (n > 0) The graphical representation of Section B can be adopted with P l P2 * ' * P n representing (x) Px — PjX A P 2x A ... P nx (4i) and with P r l r2 * * * r n representing (x) Px = PjX v P 2x v ... P Rx ( 4 i i ) and f i n a l l y P P 2 . . . P n representing (x) Px = Pjx v P 2x v ... P nx, ( x ) ~ ( P i x A PjX) ( 4 i i and 4 i i i ) , Diagram C.l gives an example of this graphical representation of I. By convention, i f P TAU Q then P is d i r e c t l y below Q in the graph. Thus a TDB i s a c y c l i c when and only when i t s graph is a c y c l i c . Throughout this paper the a c y c l i c i t y of a l l TDBs i s assumed. The following lemma gives us the property of set containment desired: PERSON PAGE 12 Lemma C.l Generalization If T i s a TDB, [P, Q] C P and P TAU+ Q then T (- (x) Px 3 Qx Proof: By induction on n i t is s u f f i c i e n t to prove P TAU Q implies T |~ (x) Px =3 Qx Case n = 1 P TAU Q Subcase a P ALPHA Q then Q = P 1 v P 2 v ... P n £ I such that P = P i, 1 <= i <= n In clausal form P. v Q £ I or (x) Px D Qx £ I Subcase b P BETA Q then P = Q 1 A Q 2 A ... Q n £ I such that Q = Q±, 1 <= i <= n In clausal form P v Q i £ I or (x) Px 3 Qx £ I Case n > 1 P TAU n Q Induction assumption: P TAU n _ 1 Ql implies T |~ (x) Px 3 Q{x for any Q : £ P Since P TAU n Q then for some Q-^ £ P, P TAU n _ 1 Q : and Qx TAU Q Then T \~ (x) Px 3 Q^x by induction assumption and T (- (x) Qjx 3 Qx by case n = 1 Therefore T (- (x) Px Z? Qx QED PAGE 13 Corollary C.l Containment If T i s a TDB, [P, Q] C P and P TAU+ Q then ||P||C HQII In terms of the graph of I, we are assured that i f a path exists between P and Q i n a ' v e r t i c a l ' d i r e c t i o n then every INSTANCE of the TYPE P is an INSTANCE of the TYPE Q. We can see now why a c y c l i c i t y is the only p r a c t i c a l choice for i f P TAU+ Q and Q TAU P then ||p|| C ||Q|| C ||P|| or ||p|| = | | Q | | . This would c e r t a i n l y be of no use in a p r a c t i c a l TDB. The following lemma establishes the relationship between set intersection and part 4i i n the d e f i n i t i o n of a TDB. Lemma C.2 Intersection For a TDB, T, i f P = ?l A P 2 A •.. P n £ I then ||P|| = H p j n ||p2|| n ... ||Pn|| Proof: Case 1 c £ ||P|| (or T f-Pc) In clausal form Px v P^x £ I for each i , 1 <= i <= n By resolution T \~ P^c implying c £ ||p J| or c £ IIPJI n ||P 2 || n . . . ||pn|| P A G E 14 ^ s e 2 c e Hi?!II n l|p 2 l l n . . . | | P n | | (or T f- P ic for each i , 1 <= i <= n) In clausal form Px v P^x v P 2x v ... P nx £. I By resolution T |— Pc implying c £. ||p|| Section D develops a p a r a l l e l result r e l a t i n g the union of extensions to the formula P E P , v P~ v ... P . i z n PAGE 15 Section D The Definite Component of a TDB It has been recognized by Reiter' 1 1-' and Minker'^ that r e s t r i c t i n g inten-sional information to Horn clauses (clauses containing at most one pos i t i v e l i t e r a l ) e f f e c t i v e l y controls the size of the proof search space without overly constraining the expressiveness of the formalism. This can be seen by viewing Horn clauses as procedures'^ i n a non-deterministic programming language similar to MICRO-PLANNER^14^. Ie., the formula, P 1 A P 2 A ... P n D P, could be programmed, THE_CONSEQUENT_OF ( P ^ P 2, ... P ) IS P; and the clause, P[ v P^ v ... P^, could be programmed, THE_CONSEQUENT_OF ( P p ? 2 , ... P ) IS INCONSISTENCY. Proofs done i n this way are reasonably e f f i c i e n t as only unit consequents are ever generated. Thus the maximum number of procedure f i r i n g s is bounded by the number of elements i n P. In this section we w i l l consider clauses of the former form ( d e f i n i t e clauses) to develop the desired result on union of extensions. D e f i n i t i o n Definite Component Given a TDB, T = (C, P, I, E), the d e f i n i t e component of T, A ( T ) , i s defined as the TDB, (C, P, [W £. I | W i s a de f i n i t e clause], E) PAGE 16 Note that E contains only d e f i n i t e clauses. We can now see exactly what parts of a TDB are d e f i n i t e : W £ I P EE P 1 A P 2 A ... P n P EE P^ v P 2 v ... P n ~- (P A Q) We would l i k e a result p a r a l l e l to lemma C.2 but r e l a t i n g unions of exten-sions to the formula P EE Pj v P 2 v ... P . However, as the following example shows, i t would not necessarily hold: FEMALE C = [jane] P = [FEMALE, GIRL, WOMAN] I = [FEMALE EE WOMAN v GIRL, \^ ~ (WOMAN A GIRL) ] GIRL E = [FEMALE jane] | | F E M A L E | | = [jane] and || WOMAN || = ||GIRL|| = [ ] since T \h WOMAN jane and T \h GIRL jane. Clausal Form Definite? P v P l s P v P 2, ... P v P n yes P x v P 2 v ... P n v P yes ? l v P, P 2 v P, ... P n v P yes P v Pj^ v P 2 v ... P n no P v Q no WOMAN So not only do we not have ||FEMALE || = 11WOMAN|| U ||GIRL|| but we have a TDB in which unnatural information i s contained. It is hard to conceptualize jane as a "WOMAN or GIRL" without her being either a WOMAN or a GIRL. Actually, PAGE 17 jane i s one of the two, we just can't determine which. For these reasons we i n s i s t on a TDB being "concrete". D e f i n i t i o n Concreteness 1) The formula P EE P^ v P2 v ••• p n ^ * i s concrete i f f for every c £ C , A ( T ) \~ Pc implies there i s an i such that A ( T ) (— PjC. 2) A TDB i s concrete i f f every formula of the above form is concrete. We could have defined concreteness i n terms of p r o v a b i l i t y from T rather than A(T). Unfortunately as we s h a l l see, i t turns out that at update time i t would then be excessively d i f f i c u l t to maintain concreteness. The disadvantage in the approach taken is that equivalence must be' estab-lished between T and A(T) before the result on union of extensions can be attained. Theorem D.l accomplished t h i s . The Davis and Putnam r e s u l t u s e d i n the proof of this theorem applies only for the propositional calculus. The following d e f i n i t i o n and lemma r e c t i f y t h i s . D e f i n i t i o n Propositional Component Given a TDB, T, and c £ C we define the propositional component of T with respect to c, PROP^ ,c as: PROPTc = [P | Pc L E] U [W | (x)Wx t I] ((x)Wx represents any formula in I with free variable x) PAGE 18 Using the last example: PROPTJane = [FEMALE, FEMALE = WOMAN v GIRL, ~ (WOMAN A GIRL) ] Lemma D.l Propositional Equivalence Given a consistent TDB, T, c £ C and P £ P then T f~ Pc i f f PROPTc (- P Proof: Part 1) Assume E U I U [Pc] i s u n s a t i s f i a b l e . By Herbrand's theorem, i f we replace each clause i n I with the set of clauses obtained by replacing the free variable 'x' with each member of C then the remaining set is u n s a t i s f i a b l e . C a l l this the ground form of I . Since I U E was o r i g i n a l l y consistent, the ground form of I U E i s . [Pc] cannot resolve against any clause which does not have c's as i t s ground constant. Therefore a l l the constants i n C except c are irrelevant to the inconsistency. If we construct the clause set [Q | Qc £ E ] U [W | Wc £ ground form of I ] U [7] then we have a propositionally u n s a t i s f i a b l e set. Therefore, PROPTc \~ P. Part 2) Assume PROPTc U [P] i s un s a t i s f i a b l e . Clearly [Wc | W £ PROPTc U [P]] i s un s a t i s f i a b l e . This is a ground substitution of E U I U [Pc]. Therefore T (- Pc. PAGE 19 Theorem D.l also requires that one factor out certain non-definite clauses. These are the ones that allow jane to be a "WOMAN or GIRL" without the ne-cessity of her being one of WOMAN or GIRL. D e f i n i t i o n Dead Clauses Given a TDB, T and c £ C then the dead clauses of T with respect to c, DEAD^c i s defined: DEADTc = [P 3 ?1 v P 2 v . .. P n £ PROPTc | PROP^ T)C \~ P, n > 1] Lemma D.2 Dropping of Dead Clauses Given a concrete TDB, T, c £ C and W £ DEADTc then PROPTc ~ DEADTc (~ W. Proof: W = P O Pj v P 2 v ... P n £ DEADTc and PROP^^c \~ P. By concreteness for some i , 1 <= i <= n, PROP A ( T )c t - P r P i subsumes W so PROP^ T^c f~ W. Now DEADTc contains no de f i n i t e clauses so PR 0 P ^ T ) C C PR0PTc ~DEAD Tc giving PROPTc ~ DEADTc (- W. QED PAGE 20 We w i l l now prove equivalence of T and A(T). The strategy w i l l be to cate-gorize the set of clauses produced by resolution within PR0P Tc ~ DEADTc and show that i f a unit Qc i s provable from T then Q w i l l be in that set. PAGE 21 Theorem D.l T ~A(T) Equivalence Given a consistent concrete TDB, T, c £ C and Q £ P then T |- Qc i f f A(T) \~ Qc. Proof: Assume PROPTc \~ Q. Then PROPTc ~ DEADTc |~ Q by Lemma D.2. So PROPTc ~DEAD Tc U [Q] i s u n s a t i s f i a b l e . Define the following sets i t e r a t i v e l y : T Q = PROPTc ~ DEADTc U [Q] Ti+1 = T i w h e n Q £• T i ££ when T^ contains no positive units, otherwise = T ± ~ [X v P v Y £ T i] -~ [X v 7 v Y £ T ±] U [X v Y | X v 7 v Y £ T i] where ? i s the lexic o g r a p h i c a l l y f i r s t positive unit i n T^ and X and Y are negative predicate and positive predicate disjunctions, (possibly n u l l ) The following i s an invariant statement on T^: I f W £ T. then A) W = P such that PROP^ T^c (- P for some P £ P or B) W = Pj such that P = ? l v P 2 v ... P n £ I which i s exclusive and PROP^ ( T )c f~ P ± ( i 4 j) or PAGE 22 C) W = Qj v Q 2 v ... Q v P (ra >= 1) such that P = \°l A P 2 A ... P n £ I and [Qp Q 2 , ... Q m] C [p x, P 2, ... P n] and PROP^ T)C r~ W or D) W £ T Q The invariant statement is proved by the following subproof: Subproof: By induction on i Case i = 0 Clear by condition D Case i > 0 If W £ T i + 1 then W £ T ± or W = R (U, V) for [U, V] C T ± with U as the lexic o g r a p h i c a l l y f i r s t unit in T^. PROP^ T^ (— U by A and D above. V i s of the form B, C or D above: B) V = Pj such that P = ? l v P 2 v ... P n £ I which i s exclusive and PROP A ( T )c |- P i ( i ¥ j) U = Pj and PROP^ ( T ) (~ P j . But P. v Pj £. I by exclusiveness. This violates the consistency of T, so V ^ P j . C) V = Ql v Q 2 v ... Q m v P (m >= 1) P =E P, A P~ A ... P £ 1 and 1 Z n [ Q L F Q 2, . . . Q F F L ] C [ P l f P 2, . . . P n] and PROP^ T)C (- V. PAGE 23 U = for some i , 1 <= i <= m. Cle a r l y W has the same form as C with PROP^.-^ (— W unless m = 1 i n which case W = P with PROP^^ \— P which i s in the form A. D) V £ PROPTc ~DEAD Tc U [ Q ] Case 1 V = P x v P 2 £ I, t r i v i a l l y PROP^ T ) C f - V . U = ? l so PROP^ T) f - P 2 which is of form A. Case 2 V = pT v PT v ... F v P £ I i z n See C above. Case 3 V = p T v i T ( i < j) i J where P EE P, v P~ v ... P„ £ I which i s exclusive. 1 Z n Assume without loss of generality that U = P^. Then W = P.. which is in the form B. Case 4 V = P v Pj v P 2 v ... P U = P. Since PROP^ T^c {- P then V £ DEAD^c. But then V t T Q which is a contradiction. Case 5 V = " Q But then U = Q was the lexicographically f i r s t unit in T^ S O by construction T^ +^ = T^. This completes the subproof of the invariant statement on T^. As a corollary, T^ contains no non-unit positive clauses. By the results of Davis and Putnam'^, T i + 1 i s unsatisfiable whenever T i i s . Since T Q i s un s a t i s f i a b l e , by induction for a l l i , is u n s a t i s f i a b l e . PAGE 24 Also, T\ contains a positive unit for every i , for i f not then the i n t e r -pretation i n which every predicate is taken as false is a model for T^. Note that T^ +^ has at least one less predicate symbol than (the le x i c o -graphically f i r s t unit in T^) unless T i + j = T^. Since T i always contains a po s i t i v e unit and since P i s f i n i t e , then T^ +^ = for some i . At this point Q £ T^ since contains a positive unit. By the invariant statement on T., P R O P ^ . - Q C (~ Q QED. We can now relate part 4 i i in the d e f i n i t i o n of a TDB to the union of exten-sions. Lemma D.3 Union Given a consistent concrete TDB, T and a formula P E P, v P 0 v ... P £ I l z n then ||P|| = ||Pl|| U ||P2|| U ... | | P J . Proof: Part 1 c £ ||p|| or T (~ Pc. By theorem D.l, A(T) I -Pc. By concreteness, for some i , \~ P^c giving T f - P^c or c £ | | P i | | . Part 2) c £ ||p±|j or T f- P ic for some i . In clausal form P^ v P £ I. By resolution T |— Pc or c £ ||p||. PAGE 25 U n t i l now we have not d i f f e r e n t i a t e d between d i s j o i n t and non-disjoint union of extensions of the form P EE P^ v P 2 v ... P n £ I . However, in most (but not a l l ) applications we would l i k e disjointness s t r i c t l y enforced. Lemma D.4 Disjointness Given a consistent TDB, T, and an exclusive formula P EE P^ v P 2 v ... P n £ I then for each i , j , 1 <= i < j <= n, ||pj f) ||Pj|| = [ ]. Proof: Let c £ ||p.|| f| ||p..||. Then T (~ P ^ and T |- P^c. But P i v Pj £ I . This violates consistency, so no such c e x i s t s . Corollary D.l D i s j o i n t Union Given a consistent concrete TDB, T, and an exclusive formula P EE Pj v P 2 v ... P n £. I then ||P|| = HPJI • ||P2|| • ... ||PJ. In summary: 1) For conjunctions ||p|| = H P J f l ||P2II H ... | | P J Furthermore i f T i s consistent and concrete: 2) For disjunctions ||p|| = HPJI U ||P2|| U ... ||PN|| 3) For exclusive disjunctions ||P|| = ||p j || « ||P2|| 9 ... ||Pn|| PAGE 26 Section E Transformation of a TDB to i t s Reduced Form One use of a TDB i s to answer the question "Is T I - Pc". Reiter for one re-quires this for his "typed u n i f i c a t i o n algorithm" as described i n [10,11]^ Algorithm E . l below computes [P | T f - Pc] given c £ C , for a concrete TDB, and detects any inconsistency i n PROPTc. As [P | T (~ Pc] may contain a large number of P f o r any given c, we define a reduced form of T, R(T), which contains only the Pc's necessary to span E . This corresponds to associating INSTANCES with primitive semantic categories in [7,8,9]^ -j^us, we can determine i f c £ ||p|| by looking at the primitive categories below P. We have shown that formulae of the form P v P , v P ~ v . . . P £ 1 can be re-1 z n moved with no effect when T is both concrete and consistent. This motivates computing on the Horn component of concrete TDBs. D e f i n i t i o n Horn Component Given a TDB, T = ( C , P, I , E ) , define the Horn component of T, H(T), as H(T) = ( C , P, [W £ I | W i s a Horn clause], E ) Note that E contains only Horn clauses. Looking at this case by case: W £ I Clausal Form P E P , A P o A ... P „ P v P, , P v P 9 , ... P v P„ 1 z n I z' n P, v P 0 v ... P„ v P 1 z n P IE P 1 v P 2 v ... P n ? 1 v P, P 2 v P, ... P R v P P v Pj v P 2 v ... P n ~ (P A Q) P v Q Definite? Horn? yes yes yes yes yes yes no no no yes The following algorithm computes [P | T (— Pc] given c £ C PAGE 27 Algorithm E . l For computing DOMAIN^c given c £ C T Q < - PROP H ( T )c s 0 < - [ ] FOR i = 0 to i n f i n i t y DO BEGIN IF contains no positive unit THEN DOMAIN^ ,c E E S±, k < — i , EXIT P < — the lexicographically f i r s t positive unit in T^ IF 7 £. T. THEN PR0PTc i s inconsistent, D0MAINTc i s undefined, EXIT s 1 + 1 <- S ± U [P] T 1 + 1 <— T i ~ [X v P v Y £ T ±] ~ [X v P v Y £ T ±] U [X v Y | X v P v Y £ T i] where X and Y are disjunctions of negative and posi-t i v e predicates respectively (possibly n u l l ) . END Example: Using the I of diagram C.l and l e t t i n g : E = [SINGLE John, MALE John] and c = John, the algorithm w i l l produce the following values: PAGE 28 0 PERSON SINGLE I T . ^ T 1 + 1 (dropped) MALE MALE MAN 3 MALE BOY 3 MALE MALE 3 PERSON — (MALE A FEMALE) MALE A ADULT 3 MAN MALE A CHILD 3 BOY PERSON FEMALE 3 PERSON ADULT 3 PERSON CHILD 3 PERSON SINGLE BACHELOR 3 SINGLE MAIDEN 3 SINGLE SINGLE 3 ADULT ^(S I N G L E A MARRIED) MAN A SINGLE 3 BACHELOR SINGLE A WOMAN 3 MAIDEN ADULT MAN 3 ADULT WOMAN 3 ADULT MARRIED 3 ADULT — (ADULT A CHILD) ADULT 3 MAN FEMALE A ADULT 3 WOMAN MAN BACHELOR 3 MAN MAN 3 BACHELOR BACHELOR BACHELOR T 1 + 1 — T. (added) ADULT MAN PERSON FEMALE ADULT 3 MAN CHILD 3 BOY ADULT MARRIED MAN 3 BACHELOR WOMAN 3 MAIDEN CHILD MAN FEMALE 3 WOMAN BACHELOR PAGE 29 contains no positive unit. Thus: DOMAIN^John = [MALE, PERSON, SINGLE, ADULT, MAN, BACHELOR] The algorithm can be implemented by hash indexing predicate symbols into a copy of the formulae i n I i n which they appear (only one copy is made for each formula). For e f f i c i e n c y , p ositive units are kept i n a separate l i s t . When a formula of the form X v P v Y £ I i s to be dropped i t i s tagged as deleted. When a formula of the form X v P v Y i s to be replaced by X v Y the predicate symbol P i s deleted from the copy. If a positive unit results i t i s added to the positive unit l i s t . If the n u l l clause r e s u l t s , DOMAINTc is declared to be undefined and the o r i g i n a l clause P v Q that caused the contradiction can be printed for inspection. I f we assume that the time taken to delete the symbol P from X v P v Y i s constant, the time taken to add a symbol to the positive unit l i s t i s con-stant and the hash indexing i s performed in constant time (by design) then the following i s a bound on the time c o m p l e x i t y ^ of the algorithm: 0 # predicate symbols in W JH(T) r Theorem E.1 Algorithm E . l Correctness Given a concrete TDB, T, and c £• C If PROPTc i s consistent then DOMAINTc = [P | T (~ Pc] otherwise DOMAINTc i s undefined Proof: F i r s t , note that the class of Horn clauses i s closed under resolution PAGE 30 ( i e . R(Wp W2) i s Horn whenever and W2 are). It follows that the only p o s i t i v e clauses i n for a l l i are units. This is needed to apply the Davis and P u t n a m t h e o r e m below. Case A PROP^c i s consistent Subcase 1 P £ DOMAIN^c. For some i , P £ T± giving P R 0 P H ( T ) C \~ P and hence PR0P Tc (~ P Subcase 2 T (- Pc Assume P $ 7± for a l l i Define H*(T) = H(T) U [7] Sta r t i n g the algorithm with H'(T) rather than H(T) we produce rather than Note that when defined, T^ = T. U [7] The algorithm w i l l terminate i n one of two ways: 1) [P, 7] C T^ for some i ([Q, Q] <£ T[ when Q £ P and Q ^ P since Tg i s consistent) But then P £ T^ co n t r a d i c t i o n 2) T^ contains no p o s i t i v e unit; but then the i n t e r p r e t a t i o n i n which every predicate symbol i s f a l s e i s a model f o r T^. Since T (— Pc by theorem D.l, A(T) (— Pc giving H(T) (~ Pc or Tg U [P] = Tg i s u n s a t i s f i a b l e . By Davis and Putnam^, T^ i s unsatisf i a b l e which i s a contradic-t i o n . Therefore P £ T^ for some i . The algorithm must terminate since T^ +^ contains at least one fewer predicate symbol then T i. Case B PROP^c i s u n s a t i s f i a b l e Statement: PR0P H^ T^c is u n s a t i s f i a b l e Subproof: By lemma D.2 PR0P Tc ~ DEADTc i s u n s a t i s f i a b l e . If PAGE 31 we define T Q as PROPTc ~ DEADTc and l e t the algorithm define for each i , then the invariant statements of theorem D.l hold with one addition: W £. may be the n u l l clause. The algorithm w i l l terminate i n one of two ways: 1) [P, P] C for some i where P i s the lexicographically f i r s t unit i n T^. By statements A and B of theorem D.l, PROP A( j.)C \~ P and for some Q £. P, P v Q £ 1 and PROP^^^c |- Q. Since P v Q £ H(T) and A(T) C H(T), PROP R (- T )c i s inconsis-tent. 2) By the results of Davis and Putnam^, for each i , T^ i s u n s a t i s f i a b l e . Since T^ +^ contains at least one less l i -t e r a l than T^, for some k, T^ contains no positive l i t e r -a l s . The interpretation i n which every predicate symbol i s false is a model for T^ unless T^ contains the n u l l clause. Since T^ i s unsa t i s f i a b l e i t must contain the n u l l clause. T Q doesn't contain the n u l l clause so for some i , [P, P] C T i where P i s the lexicographically f i r s t unit T^. In this case though, the algorithm w i l l have terminated as in case A. This ends the subproof. As in the subproof, the algorithm w i l l not terminate with T^ con-taining no units (Starting with T Q = PROP^^^c unsatisf i a b l e ) . Therefore, [P, P] C T i for some i . The algorithm w i l l stop with DOMAINTc undefined. QED This ends the correctness proof of algorithm E . l PAGE 32 The following c o r o l l a r y i s used i n the proof of the update algorithm. Corollary E . l A(T) - DOMAIN^ Equivalence Given a TDB, T (not necessarily concrete or consistent) and c £. C 1) I f P £ DOMAINTc then A(T) H p c 2) If A(T) h P c t h e n p & DOMAINTc or DOMAINTc i s undefined Proof: Part 1 P £ D0MAINTc The invariant statements of theorem D.l hold with the addition that W may be the n u l l clause. Therefore A(T) \~ Pc Part 2 A(T) h Pc Hence H(T) (- Pc. If T i s consistent then by theorem E . l , Case A, subcase 2, P £• T^ for some i (we do not need theorem D.l here) or P £ DOMAINTc. If T i s un s a t i s f i a b l e then by theorem E . l , Case B, H(T) i s uns a t i s f i a b l e and DOMAINTc i s undefined. QED As a TDB, T, becomes large, [DOMAINTc | c £ C] may become very large. The following allows us to save only the Pc that span E. Ie., store only the 'lowest' extensions i n the graph. D e f i n i t i o n The Reduced or Canonical TDB Given a consistent concrete TDB, T = (C, P, I, E), the reduced TDB, R(T), i s defined: R(T) = (C, P, I, [Pc | P £ DOMAINTc and i f Q £ DOMAINTc then not Q TAU+ P]) PAGE 33 Using the example of diagram C . l : [P | Pc £ E] = [SINGLE, MALE] DOMAINTc = [MALE, PERSON, SINGLE, ADULT, MAN, BACHELOR] but [P | Pc £ R(T)*s extension] = [BACHELOR] This i s a good space reduction. In fa c t , i n no case i s R(T)'s extension larger than T's. Note that we cannot assume that the concreteness or consistency of R(T) follows from that of T. Lemma E . l Spanning Given a consistent concrete TDB, T, i f P £ DOMAINTc then for some Q £ P with Qc £ R(T)'s extension, Q TAU* P Proof: Assume for each n >= 0 with Qc £ R(T)'s extension, that not Q TAU n P By d e f i n i t i o n of R(T)'s extension, Pc £ R(T)'s extension Choosing n = 0, Q = P we have a contradiction for Q TAU^ P. R(T)'s extension i s said to span [Pc | T |— Pc]. Getting back to the o r i g i n a l problem, "Is T |— Pc?" we see that we need only scan R(T)'s extension for Qc such that Q TAU* P. Theorem E.2 T - R(T) Equivalence Given a consistent concrete TDB, T, then i f Pc £ R(T)'s extension then T (- Pc and i f Pc £ E then R(T) (- Pc Proof: Part 1) Assume Pc £ R(T)'s extension. P £ D0MAINTc so by theorem E . l , T (~ Pc PAGE 34 Part 2) Assume Pc £ E or P £ DOMAINTc. By the spanning lemma (E.l) f o r some Q £ P, such that Qc £ R(T)'s extension, Q TAU* P. By the generalization lemma ( C . l ) , R(T) J— Pc QED Corollary E.2 Given a consistent concrete TDB, T, c £ C, P £ P, T f- Pc i f f R(T) f~ Pc Corollary E.3 Concreteness of R(T) Given a consistent concrete TDB, T, R(T) i s concrete Proof: Let P E P , v P, v ... P £ 1 l l n Assume for some c £ C , A ( R ( T ) ) \~ Pc Then T f— Pc by theorem E.2 a n d A X T ) \~ P c by theorem D.l a n d A X T ) (— P^c by T's concreteness (for some i , 1 <= i <= n) This gives P ^ £ DOMAINTc by theorem D.l By the spanning lemma ( E . l ) , for some Qc £ R(T)'s extension, Q TAU* P i By the generalization lemma (C.l) (which resolves only d e f i n i t e clauses), A ( R ( T ) ) ( - p i c Q E D Corollary E.4 Consistency of R(T) Given a consistent concrete TDB, T, R(T) i s consistent Proof: Let S = [Pc | P £ DOMAINTc] U [Pc | P £ P and P I DOMAINTc] S i s a model for H ( T ) [ 1 2 ^ . PAGE 35 If i t is not a model for T then for some P = Pj_ v P 2 v ... Pfl £ I, P £ DOMAINTc and each P ± $ DOMAINTc, 1 <= i <= n This contradicts concreteness, so S i s a model for T. R(T) i s concrete so s i m i l a r l y S i s a model for R(T). QED Thus R(T) i s a suitable replacement for a consistent TDB, T. PAGE 36 Section F Updating a TDB Thus far we have shown that a TDB w i l l have the desired set theoretic pro-perties i f concreteness and consistency are maintained. We have given a canonical (reduced) form of the TDB that can be e f f i c i e n t l y computed while conserving both space and time i n answering "Is T (— Pc?". We must now spe-c i f y an algorithm that w i l l update the canonical TDB while guaranteeing con-creteness and consistency. Given a concrete and consistent TDB, T, and c £. C the algorithm accepts additions to and deletions from DOMAINTc so as to leave the resulting TDB concrete and consistent. It i s assumed that o r i g i n a l l y T = R(T). The algorithm w i l l produce l ' = R ( T ' ) . Algorithm F . l For updating a canonical TDB T, given c £ C UPDATETc: BEGIN WRITE (c, "has the root properties:", [P | Pc £ E]) S: INPUT (A C P (additions) and D C P (deletions) ) IF [Pc | P £ D] <£ E THEN WRITE (D ~ [P | Pc £ E], "cannot be deleted, t r y again") GOTO S L: T* <— (C, P, I, E U [Pc | P £ A] — [Pc | P £ D]) Calculate DOMAINTc CONCRETE <-- TRUE E: IF DOMAIN^ic i s undefined (because P v Q £ I) THEN WRITE (c, "cannot be both a", P, "and a", Q, "try again") GOTO S PAGE 37 C: FOR EACH P EE P, v P 9 v ... P £ I such that P £ DOMAIN^,tc but no DOMAINTc WRITE ("A", P, "must also be one of", P j , P 2, ... P , "add one of them") CONCRETE <— FALSE IF NOT CONCRETE THEN INPUT (A* C p (additions) ) A <— A U A* GOTO L Calculate R ( T ' ) given DOMAINTc and T ' * T * < — R ( T ' ) . WRITE ("update accepted") END Note that i n the step marked *, DOMAINTtc^ for c-^ 4 c need not be recalculated as DOMAIN^tc^ = DOMAINTc^. The following theorem gives the update properties desired. Theorem F . l Algorithm F . l Correctness Given a consistent concrete and canonical TDB, T and c £ C then the resultant canonical TDB, T ' , i s consistent and concrete. Proof: Part 1 Concreteness Let P E P j v P 2 v ... P n £ I and /\(T*) (~ Pc By c o r o l l a r y E . l , since DOMAINTc i s defined i n order for algorithm termination, P £ DOMAIN^tc. Since the algorithm terminated (CONCRETE i s TRUE), for some i , ? ± £ DOMAINTc. Corollary E . l gives A(T') H p i c ' PAGE 38 Part 2 Consistency If T 1 i s not consistent then since T i s consistent, PROP^tc must be u n s a t i s f i a b l e . Theorem E . l applies since T* i s concrete saying DOMAIN^ic i s undefined. This con-tr a d i c t s termination of the algorithm. Therefore T 1 i s con-s i s t e n t . By c o r o l l a r i e s E.3 and E.4, R ( T ' ) i s concrete and consistent. QED The following i s a Input - Update John - A=[ADULT] - A=[MALE,SINGLE] - Update John - A=[MARRIED] and D=[SINGLE] - A=[MARRIED] A=[MARRIED] and D=[BACHELOR] A=[MALE] sample update using the I Response - John has the root properties [ ] - An ADULT must be one of MARRIED or SINGLE, add one. - Update accepted of diagram C.l and an empty E . E or E * E = [ ] - John has the root properties [BACHELOR] - [SINGLE] cannot be deleted, try again - John cannot be both MARRIED and SINGLE, try again - An ADULT must be one of MALE or FEMALE, add one - Update accepted E ' = [ADULTJohn] E ' = [ A D U L T J o h n , M A L E J o h n , S I N G L E J o h n ] E ' = [ B A C H E L O R J o h n ] E = [ B A C H E L O R J o h n ] E' = [ M A R R I E D J o h n , B A C H E L O R J o h n ] E ' = [ M A R R I E D J o h n ] E* = [ M A R R I E D J o h n , M A L E J o h n ] PAGE 39 It may seem redundant that we must "Add MALE" when we did not e x p l i c i t l y de-lete MALE. However, when BACHELOR i s deleted, the algorithm has no way i n general of determining i f MAN and/or SINGLE are to remain i n the TDB. Ra-ther than trying to second guess the user, the algorithm asks him to correct any dropped information. Before the f i r s t update, when E = [ ], for the update theorem to apply, T must be concrete and consistent. It i s c e r t a i n l y concrete. In order for i t to be consistent, I must be consistent. The interpretation i n which a l l pre-dicate symbols are false under the interpretation i s a model for I, so i t is consistent. Thus by induction on the update theorem we are assured that i f we start with E = [ ] then T w i l l always be concrete and consistent a f t e r the update. Although there are no inconsistencies i n I, there may be constraints that prevent c e r t a i n kinds of updates from taking place. By this we mean that given some P £ P, for no c £ C i s Pc £ E while T i s consistent. Ie. ||p|| = [ ] always. Therefore i f the TDB i s to be kept consistent, ||P 1 f| = [ ] always. This i s c l e a r l y a case of bad structuring of I. It can be avoided by ensuring that PAGE 40 for a l l P £ P, I U [Px] i s consistent. This can be guaranteed by l e t t i n g E = [Pc] for any c £ C and calculating DOMAINTc. If undefined, then I i s badly structured above P. PAGE 41 Summary and Conclusions A generalized structure has been proposed for Reiter's typed database to replace the more r e s t r i c t e d notion of the ISA hierarchy. It provides a r i c h structuring f a c i l i t y and imposes only the natural requirements of so-called concreteness and consistency. Results have been given to ensure that the database has the desired set theoretic properties. An update algorithm has been specified that guarantees concreteness and consistency along with a technique for testing i f a s p e c i f i c element is a member of a given domain. A means for testing a non-atomic formula against the database was not included and i s an open topic. While t r a d i t i o n a l r e l a t i o n a l databases suggest no need for a typed database, work in a r t i f i c i a l i n t e l l i g e n c e and i n f e r e n t i a l database systems often are d e f i c i e n t because of the need for a more interdependant domain structure. Since the TDB provides the most rudimentary l e v e l of inferencing i t must be well structured and mathematically complete i n order that inferencing mechanisms b u i l t on i t stand on firm ground. It must not l i m i t the descrip-t i v e power of a host i n f e r e n t i a l database system, thereby reducing i t s a p p l i c a b i l i t y . We suggest that the proposed structure and algorithms for a typed database are adequate for managing the domain structure of i n f e r e n t i a l database systems. PAGE 42 Bibliography 1. Aho, A.V., Hopcroft, J.E., Ullman, J.D. (1974). The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. 2. Chang, C.L., and Lee, R.C.T. (1973). Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York, 1973. 3. Date, C.J. (1975). An Introduction to Data Base Systems. Addison-Wesley, Reading, Mass., 1975. 4. Davis, M., and Putnam, H. (1959). A Computational Proof Procedure. Rensselaer Polytechnical I n s t i t u t i o n , Troy, New York, AFOSR TR 59-124. 5. G a l l a i r e , H., Minker, J . , and Nicolas, J.M. (1978). An Overview and Introduction to Logic and Data Bases. Logic and Data Bases, Eds. H. G a l l a i r e and J . Minker, Plenum Press, New York, 1978, P.12. 6. Gries, D. (1971). Compiler Construction for D i g i t a l Computers. John Wiley and Sons, Inc., 1971. 7. McSkimin, J.R. (1976). The Use of Semantic Information i n Deductive Question-Answering Systems. Ph.D. Thesis, Dept. of Computer Science, Univ. of Maryland, College Park, Md., 1976. 8. McSkimin, J.R., and Minker, J . (1977). The Use of a Semantic Network i n a Deductive Question-Answering System. Dept. of Computer Science, Univ. of Maryland Tech. Report TR-506, 1977. PAGE 43 "9. Minker, J . (1978). An Experimental Relational Data Base System Based on Logic. Logic and Data Bases, Eds. H. G a l l a i r e and J . Minker, Plenum Press, New York, 1978. 10. Reiter, R. (1977). An Approach to Deductive Question-Answering. Tech-n i c a l report 3649, Bolt Beranek and Newman Inc., Cambridge, Mass., Sept. 1977. 11. Reiter, R. (1978a). Deductive Question-Answering on Relational Data Bases. Logic and Data Bases, Eds. H. G a l l a i r e and J. Minker, Plenum Press, New York, 1978. 12. Reiter, R. (1978b). On closed world data bases. Logic and data bases, Eds. H. G a l l a i r e and J. Minker, Plenum Press, New York, 1978. 13. Robinson, J.A. (1965). A Machine Oriented Logic Based on the Resolution P r i n c i p l e . J . ACM 12 (Jan. 1965), 25-41. 14. Sussman, G., Winograd, T., and Charniak, E. (1970). MICRO-PLANNER Reference Manual. A.I. MEMO no. 203, M.I.T., Cambridge, Mass., 1970. 15. Winograd, T. (1972). Understanding Natural Language. Academic Press, 1972. PAGE 44 Appendix A Dictionary of Symbols Sets: [ ] - Used in set d e f i n i t i o n s as a replacement for £ , € ~ Set membership and non-membership C , <£. - Set i n c l u s i o n and non-inclusion n, U - Set i n t e r s e c t i o n and union $ - D i s j o i n t union - Set difference | - Set q u a l i f i c a t i o n (read "such that") ||p|| - Extension of a predicate, (see section C) Predicate Calculus Formulae: A - Conjunction symbol v - Disjunction symbol >^ - Implication EE - Equivalence -"-^ - Negation P - Also negation (x) - Universal quantificaion |—, |/- - Symbols for provabity and non-provability R(Wj, W2) - The resolvents of formulae W1 and W2l J PAGE 45 Mathematical R e l a t i o n s : ^ ALPHA, BETA and TAU - Relations defined in section C R n - Composition of r e l a t i o n R, n times R + - Tr a n s i t i v e closure of r e l a t i o n R R - Reflexive t r a n s i t i v e closure of r e l a t i o n R General Usage: EE - read " i s defined as", i n de f i n i t i o n s and algorithms 0 - computational complexity of an algorithm (order complexity)^^
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- An approach to the organization of taxonomies
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
An approach to the organization of taxonomies Bishop, Graig David 1981
pdf
Page Metadata
Item Metadata
Title | An approach to the organization of taxonomies |
Creator |
Bishop, Graig David |
Date Issued | 1981 |
Description | The ISA hierarchy used by present day inferential database systems is deficient in that it does not represent a variety of domain relationships (type relationships). Such hierarchies associate explicitly defined sets with the leaves of a tree. Non-leaf nodes in the tree inherit members from their child nodes. In keeping with the use of sets, this paper gives motivation for having type relationships other than subset. Included are union, intersection and a disjointness condition. A formalism for the typed database (TDB) is given, using the monadic first order predicate calculus as its theoretical basis. Non-unit formulae represent intensional (general) information about the world being represented and unit ground clauses represent extensional (specific) information. Predicates represent types, and constant symbols represent set members. This is connected to the set concept via predicate extensions. The extension is that set of constant symbols which are provable as arguments to the given predicate. Given the so-called concreteness condition and consistency of the TDB, the desired set theoretic relationships (union, intersection and disjointness) of predicate extensions follow. This strengthens the link between the formalism of the predicate calculus and the more natural set representation. A canonical form of a TDB is shown that admits an appropriate machine representation. Using this is can be determined if a constant symbol as an argument to a given predicate is provable (domain membership) in constant time. An update algorithm is developed and is shown to be correct in that it maintains concreteness and consistency. Thus the TDB is shown to be a practical generalization of the ISA hierarchy but is considerably more expressive. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-03-29 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051816 |
URI | http://hdl.handle.net/2429/22854 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1981_A6_7 B58.pdf [ 1.96MB ]
- Metadata
- JSON: 831-1.0051816.json
- JSON-LD: 831-1.0051816-ld.json
- RDF/XML (Pretty): 831-1.0051816-rdf.xml
- RDF/JSON: 831-1.0051816-rdf.json
- Turtle: 831-1.0051816-turtle.txt
- N-Triples: 831-1.0051816-rdf-ntriples.txt
- Original Record: 831-1.0051816-source.json
- Full Text
- 831-1.0051816-fulltext.txt
- Citation
- 831-1.0051816.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051816/manifest