OPTIMIZATIONS FOR MODEL COMPUTATION BASED ON PARTIALINSTANTIATIONByXiaomei TianB. Sc. (Computer Science) Tsinghua UniversityM. Sc. (Computer Science) Tsinghua UniversityA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE STUDIESCOMPUTER SCIENCEWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIAAugust 1994© Xiaomei Tian, 1994In presenting this thesis in partial fulfillment of the requirements for an advanced degreeat the University of British Columbia, I agree that the Library shall make it freelyavailable for reference and study. I further agree that permission for extensive copyingof this thesis for scholarly purposes may be granted by the head of my department or byhis or her representatives. It is understood that copying or publication of this thesis forfinancial gain shall not be allowed without my written permission.Computer ScienceThe University of British Columbia2366 Main MallVancouver, CanadaV6T 1Z4Date:ce. 27AbstractVarious methods have been presented for efficiently evaluating deductive databases andlogic programs. It has been shown that mixed integer programming methods can effectively support minimal model, stable model and well-founded model semantics forground deductive databases. However, the “groundness” requirement is a huge drawback because the ground version of a logic program can be very large when comparedto the original logic program. A novel approach, called partial instantiation, has beendeveloped recently which, when integrated with mixed integer programming methods,can handle non-ground logic programs. The goal of this thesis is to explore how thisintegrated framework based on partial instantiation can be optimized. In particular, wehave developed an incremental algorithm that minimizes repetitive computations for reducing the size of a program. We have also developed several optimization techniquesto further enhance the efficiency of our incremental algorithm, to further reduce the sizeof a logic program, and to avoid redundant node expansion in partial instantiation tree.Experimental results have shown that our algorithm and optimization techniques canbring about significant improvement in run-time performance. Last but not least, wehave implemented the integrated framework of partial instantiation under UNIX environment.11Table of ContentsAbstract iiList of Figures ivAcknowledgments V1 Introduction 11.1 Deductive Databases 11.2 Query Evaluation and Optimization 21.3 Partial Instantiation and Thesis Contributions 42 Related Works 72.1 Query Evaluation 72.1.1 Naive Method 82.1.2 Semi-Naive Method 82.1.3 Backward Chaining ... 82.1.4 Linear Programming 92.2 Query Optimization 92.2.1 Magic Sets 102.2.2 Counting Methods . 102.2.3 Partial Instantiation 112.3 Incremental Algorithms 111114 Optimizations of the Incremental Algorithm and4.1 Algorithm IncrOpt4.2 Heuristics: Ordering Clauses to Be Inserted4.2.1 Complexity Analysis4.2.2 Clauses Arbitrarily Ordered4.2.3 Clauses Fully Ordered4.2.4 Only Facts Ordered4.3 Rules for Cutting Redundant NodesPartial Instantiation1313131416192022222828313437374242434345455 Experimental Results5.1 Experiment Overview of the Incremental Algorithms3 An Incremental Algorithm3.1 Preliminaries3.1.1 Basic Definitions3.1.2 Partial Instantiation3.1.3 Algorithm SizeOpt3.2 Incremental Algorithm3.2.1 Graphs for Maintaining Deleted Clauses . .3.2.2 Self-sustaining Cycles3.2.3 Algorithm Incr3.3 Correctness Proof: Incrementality of Algorithm Incr3.3.1 Supporting Lemmas3.3.2 Rank: Bridge Between Incr and SizeOpt3.3.3 Proof of Incrementality5050iv5.2 IncrOptFact vs IncrOptOrder vs IncrOptArb 515.3 Same Number of Disjunctive Clauses: IncrOptFact vs SizeOpt 525.4 Same Number of Definite Clauses: IncrOptFact vs. SizeOpt 545.5 Partial Instantiation Trees: IncrOptFact vs. SizeOpt 545.6 Partial Instantiation Trees: With and Without Applying Lemma 7. . 566 Implementation Details of Partial Instantiation 586.1 General Picture . 586.2 Data Structures 596.2.1 Term and Term Table 606.2.2 Atom and Atom table 626.2.3 Clause and Clause Table 626.2.4 Tree 636.3 Generating New Clauses 646.4 Least/Minimal Model Solver 656.5 Unification 666.5.1 Algorithm UNIFY: an Efficient Unification Algorithm 676.5.2 Implementation of Algorithm UNIFY . . 696.6 Cutting Unifiers 707 Conclusions 717.1 Thesis Summary 717.2 Future Work 727.2.1 Cutting Redundant Nodes 727.2.2 Order for Node Expansion 72v7.2.3 Tree Maintenance 737.2.4 Partial Tree 73Bibliography 74viList of Figures3.1 173.2 203.3 213.4 253.5 253.6 26476.1 The Structure of Term f(a, g(X, Y)) 61Partial Instantiation Tree .Incremental Maintenance . .DC-Graph G1Applying Algorithm Incr to AddApplying Algorithm Incr to AddApplying Algorithm Incr to AddClauses 1, 2, 3 and 4Clauses 4, 3, 2 and 1Clauses 5 and 64.1 Cutting Redundant Branches4.2 Reduced Partial Instantiation Tree48viiAcknowledgmentsI would like to express my sincere thanks to my supervisor Dr. Raymond Ng for manythings, especially for suggesting the thesis topic, for fruitful discussions and most helpfulguidance, and for being a continuous source of advice and encouragement throughout myresearch work.Thanks to the Department of Computer Science at the University of British Columbiafor providing the research environment. Thanks to Dr. Ng and the department forproviding financial support in the form of teaching and research assistantships during mygraduate studies.Thanks to Dr. Craig Boutilier for being the second reader of this thesis. His helpfulcomments are highly appreciated.viiiChapter 1Introduction1.1 Deductive DatabasesOver the last two decades there has been a lot of research done in the area of deductivedatabase systems [1]. Deductive databases extend conventional relational databases withdeductive power by adding clauses into databases from which new facts can be deduced.This extension is based on a very sound theoretical foundation— namely, first order logic.A deductive database can also be considered as a logic program with no function symbols.Knowledge of the basic theory of logic programming and deductive databases can befound in [2] [3]. Here we give the basic concepts needed throughout.Definition. A Horn clause is a clause of the formABiAB2...ABm,m>O,where A and B1,B2,..., Bm are atoms. A is called the head of the clause. B1,B2,..., Bmis a conjunction of atoms and is called the body of the clause. 0Definition. A definite database is a finite set of Horn clauses. 0For a given deductive database there are many models. The nice property of a definitedatabase is that among all these models, there is a unique least model M, i.e., if I is amodel of the same program, then I D M. This least model is the one we choose as themodel of the definite database.The expressive power of Horn clauses is limiting, as Horn clauses are incapable of1Chapter 1. Introduction 2supporting negations and disjunctions. A lot of efforts have been done to extend theexpressive power of Horn logic [2] [4]. Two most common extensions are:i) adding negations to the right side of the clause and,ii) allowing more than one atom in the left side of the clause.Definition A disjunctive clause is a clause of the formiVA2V...VAnBiABA. ABm, 1,m>O,where A is a disjunction of atoms and B is a conjunction of atoms. CDefinition. A disjunctive database is a finite set of disjunctive clauses. CUnlike Horn databases, a disjunctive databases may not have a unique least model.Instead a disjunctive database may very often have multiple minimal models. M is aminimal model of a disjunctive database P ifi) M is a model of P and,ii) there is no model I such that I C M.As far as computation of minimal models is concerned, an atom with negation in thebody of a clause can be moved to become a positive atom in the head, hereafter we onlyconsider clauses with disjunctive heads, but no negations in the bodies.1.2 Query Evaluation and OptimizationQuery evaluation and optimization is a crucial area of deductive database systems. Itsaim is to characterize complete answers in terms of minimal Herbrand models. Variousstrategies have been presented in the literature. [5] gives a complete survey on the variousstrategies. Algorithms, such as magic sets and counting methods, have proven to be verysuccessful for definite and stratified deductive databases. During the past few years,several new semantics for disjunctive programs and programs with negations, such asChapter 1. Introduction 3minimal models, stable models and well-founded models, have been proposed and widelystudied [6] [7] [8]. Recently, it has been shown that mixed integer programming methodscan be used to provide a general and rather effective computational paradigm for thosesemantics [9] [10] [11].Compared with other methods, mixed integer programming methods have many advantages in addition to effectively supporting disjunctive semantics. By translating asymbolic logic problem to a linear programming problem, it takes advantage of all thetheorems, algorithms, and software packages that have been developed in the Operations Research community. More importantly, linear programming methods computeminimal models at compile-time, thus improving the run-time performance of deductivedatabases. Last but not least, linear programming methods are incremental. Adding anew clause to the deductive database amounts to adding a new set of constraints to theassociated linear program.However, like other methods that use linear or integer programming methods for logicdeduction [12] [13], the paradigm proposed in [9] [10] [11] is in effect “propositional”, andcan only deal with the ground versions of deductive databases. The “groundedness”requirement is a huge drawback because the ground version of a logic program can bevery large when compared to the original logic program. To solve this problem, [14]and [15] have proposed a novel approach, called partial instantiation, which combinesunification with mixed integer programming (or with any other propositional deductiontechniques), and which can directly solve a non-ground version of a program. Equallyimportantly, the approach can handle function symbols, thus making it a true logicprogramming computational paradigm.Chapter 1. Introduction 41.3 Partial Instantiation and Thesis ContributionsWhile we will discuss partial instantiation in greater details in Chapter 3, the generalstrategy of partial instantiation is to alternate iteratively between two phases:evaluate(propositional program) —* partial instantiation— evaluateMore specifically, the initial step begins with evaluating a given non-ground logicprogram P that may contain disjunctive heads (and negations in the bodies) as a propositional program using mixed integer programming. This generates a set of true propositional atoms and a set of false propositional atoms. The partial instantiation phasethen begins by checking whether unification or “conflict resolution” is possible betweenatoms in the two sets. If A is an atom in the true set and B an atom in the false set, themost general unifier for A and B is called a conflict-set unifier. Then for each conflict-setunifier 0 (there can be multiple), clauses in P are instantiated with 0 and added to P forfurther evaluation. In other words, in the next iteration, the (propositional) program tobe evaluated is P U PU. This process continues, until either no more conflict-set unifieris found, or the time taken has gone beyond a certain time limit.The goal of this thesis is to explore how the integrated framework of partial instantiation can be optimized. In particular, our work concentrates on the following twoaspects:(i) Optimize the run-time performance of the evaluation phase, and(ii) Avoid redundant computations in the partial instantiation phase.In the evaluation phase, two steps are needed (see [9] [10] [11]). The first step, calledsizeopt(P), is to reduce the size of the program F; and the second step is to actuallyfind the models by the mixed integer programming method. The first step sizeopt(P) ishighly beneficial to the subsequent step since the size of program P is reduced. SupposeChapter 1. Introduction 56, 02, ..., O are all the conflict set unifiers for F, then in partial instantiation, sizeopt(PUP61), sizeopt(P U P02), ..., sizeopt(P U P0,) will be carried out eventually. This leads usto the idea of computing sizeopt(P U P6), 1 < i n, incrementally. That is, we try tooptimize the evaluation phase by reusing sizeopt(P) to compute sizeopt(P U P6j.With regard to the partial instantiation phase, we have noticed that a lot of computations involved in this phase are actually redundant. The main reason for this is thatsome unifiers in the conflict set only lead to redundant information. In order to optimizethe partial instantiation phase, we try to cut off those useless unifiers.More specifically, the principle contributions of this thesis are:• We have developed an algorithm, called Incr, which has been formally proved tobe incremental with regard to operation sizeopt.• We have developed several optimizations which may further reduce the size of alogic program and save time in computing minimal models. For definite programs,it directly gives the solution without computing minimal models.• We have developed three rules which may reduce the size of the set of conflict unifiers, and thus avoiding redundant nodes expansions in the integrated framework.• We have implemented these algorithms and optimizations. Experimental evidencehave shown that these methods can lead to significant improvement in run-timeefficiency.• We have implement the whole integrated framework, i.e., given a program, thesystem generates the partial instantiation tree and returns all the facts. Thoughonly handling definite databases at the current stage, it is easy to be extended todisjunctive databases.Chapter 1. Introduction 6The outline of the thesis is as follows. Chapter 2 reviews the related works. Chapter 3 presents the incremental algorithm Incr and proves that it is indeed incremental.Chapter 4 develops several optimizations to further improve the performance of Incr andpartial instantiation. Chapter 5 gives implementation details and presents experimentalresults showing the effectiveness of the algorithms and optimizations. Chapter 6 gives theimplementation details of the whole framework of partial instantiation. At last Chapter7 concludes this thesis by providing a summary of contributions and an outline of futureworks.Chapter 2Related WorksAs shown in the first chapter, a major goal of this thesis is the development of an incremental algorithm which optimizes partial instantiation. In this chapter we will reviewthe related works. Specifically, we will first discuss different methods for query evaluationand optimization, and then discuss incremental algorithms for database maintenance.2.1 Query EvaluationEvaluating a logic program is to generate the actual set of tuples which satisfy a givenuser query for the given clauses. Generally there are two evaluation paradigms:1. bottom-up: starting from the existing facts, inferring new facts, and proceedingtowards the goals.2. top-down: starting from the goals, verifying the premises which are needed in orderfor the goals to hold, and stop when the premises can be supported by facts.Top-down paradigm is particularly useful in cases where a goal G is specified, and we areonly interested in those facts which are instances in G. On the other hand, bottom-upparadigm is very useful in cases where multiple queries are to be answered.Following are some popular evaluation methods presented in literature. Naive, Seminaive and Linear Programming methods are bottom-up approaches, while the BackwardChaining method is a top-down approach.7Chapter 2. Related Works 82.1.1 Naive MethodNaive method starts with the facts of a deductive database and then evaluates each clauseas follows. When a body of a clause is proven to be true, then the head of the clause canbe inferred to be true. When all clauses have been evaluated, we can repeat this processand perform deduction with the clauses using the original facts and derived facts. Theprocess is continued until no new fact is generated.Naive evaluation is the most widely described method in the literature. It has beenpresented in a number of papers under different forms [16] [17] [18] [19] [20], etc.2.1.2 Semi-Naive MethodSemi-naive method uses the same approach as Naive, but tries to eliminate redundancyin the evaluation of tuples at different iterations. It evaluates the differential of a relationinstead of the entire relation at each step. The idea of semi-naive underlies many papers.A complete description of the method is given in [21] and [22].Naive and Semi-Naive methods are quite simple, but they compute a lot of uselessresults because they do not know what query they are evaluating. On the contrary, thebackward Chaining method below only generates facts that may be relevant to the goal.2.1.3 Backward ChainingBackward Chaining starts from a goal clause and works towards the facts of a deductivedatabase. The initial goal is unified with the left-hand side of some clause, and generatessubgoals corresponding to the right-hand side atoms of that clause. This process iscontinued until the subgoals can be supported by the facts of the database. In this caseonly facts that are related to the goal are involved in the computation. Many of the factsChapter 2. Related Works 9which are not useful for producing the result are disregarded automatically. Prolog is aform of backward chaining.Backward chaining method is often more efficient than Naive and Semi-Naive methodsbecause it takes advantage of the existence of bound arguments in the goal predicate.However, it performs all deductions at run-time, and thus not suitable for applicationswhere run-time performance is critical. The following method solves this problem.2.1.4 Linear ProgrammingThe idea and implementation of linear programming method can be found in [9] [10][11]. In Linear Programming method, deductive databases are translated into sets oflinear constraints. A solution to the derived linear program corresponds to a model of adeductive database.As mentioned in the previous chapter, compared with the other methods, linear programming method has many advantages, such as supporting semantics of minimal models, stable models and well-founded models, performing most deductions at compile-timeand thus improving the run-time efficiency, etc. Its main problem has been the infamous “grounding” problem. The optimization method of partial instantiation solves thisproblem by providing a “instantiate-by-need” theory. The optimization of the partialinstantiation is the goal of this thesis.2.2 Query OptimizationQuery optimization transforms a program into another program which is written in thesame formalism, but which yields a more efficient computation when one applies anevaluation method to it. These methods are different from the pure evaluation methodsChapter 2. Related Works 10which propose effective evaluation strategies.2.2.1 Magic SetsThe idea of the Magic Sets optimization is to cut down on the number of potentiallyrelevant facts by simulating the sideways passing of bindings. In order to simulate thebinding passing strategy of top-down methods, the Magic Sets introduce constraintsinto the program, by means of additional subgoals added to the right hand side of theoriginal clauses, and additional clauses defining these goals added to the program. Theadditional clauses constrain the program variables to satisfy other predicates (called“magic” predicates). Thus, during bottom-up computation, the variables assume onlysome values instead of all possible ones. In most cases, this makes the new programmore efficient. The idea of the magic set strategy was presented in [23] and the precisealgorithm is described in [24].2.2.2 Counting MethodsThe Counting method is a rewriting method based on the knowledge of the goal bindings;the method includes the computation of the Magic Set, but each element of the MagicSet is complemented by additional information expressing its “distance” from the goalelement. Instead of using the entire magic set, we only use the tuples of the correct level,thus minimizing the set of relevant tuples. The idea of counting was presented in [23].Variations and extensions can be found in [25].Magic sets and counting methods are very successful for definite and stratified databases.However, they do not support the newly developed semantics such as minimal modelsChapter 2. Related Works 11and stable models. The following method, partial instantiation, is aimed at optimizinglinear programming evaluation method which supports these new semantics as well.2.2.3 Partial InstantiationThe idea of Partial Instantiation is presented in [14] [15]. As we already know, query processing in a ground deductive database corresponds precisely to a linear programmingproblem. However, the “groundness” requirement is a huge drawback to using linearprogramming techniques for logic computations because the ground version of a logicprogram can be very large when compared to the original logic program. Partial Instantiation is based on the theory of “instantiate-by-need”, that is, performing instantiation(not necessarily ground instantiations) only when needed.The goal of this thesis is to develop optimization methods for partial instantiation.A detailed discussion about partial instantiation will be given in Chapter 3.2.3 Incremental AlgorithmsExcellent work has been done on incremental maintenance for relational, active anddeductive databases [26] [27] [28] [29] [30] [31] [32] [33]. In the following we will firstbriefly introduce these works and then point out the fundamental differences betweenthese works and the work reported in this thesis.[30] has proposed two algorithms: a counting algorithm for nonredursive views, anda delete and re-derive algorithm (DRed) for recursive views. The counting algorithmtracks the number of alternative derivations (counts) for each derived tuple in a view.The DRed algorithm computes an over-estimate of tuples that need to be deleted, andthen re-derives some of them.Chapter 2. Related Works 12[32] has proposed a mechanism which dynamically maintains a class of views. Itquickly updates the view in response to database changes by minimizing the amount ofwork caused by updates.[27] concerns with how to update views efficiently. It first detects and removes thosedatabase updates that cannot possibly affect the view. For the remaining databaseupdates, a differential algorithm is given to re-evaluate the view expression.[33] deals with incremental evaluation of clauses in deductive databases. Its major concern is when changes to the facts of deductive databases take place, how toefficiently incorporate these changes to the inference procedures. An algorithm calledINCRUPDATE is presented to maintain the inference database incrementally.Other related works include [31] which deals with recursive views and [29] whoseconcern is right-linear chains, etc.All the proposals listed above are concerned with changes - insertions, deletions and/orupdates - to the external database predicates or the base relations. As such, there are twomain differences between the work presented here and the existing ones mentioned above.First, the algorithms in this thesis focus on handling clauses inserted or deleted. Second,the operation under consideration in this thesis is not logic deduction, i.e. deducing headsfrom the bodies of clauses. Rather, as will be discussed in greater detail in Chapter 3,the operation sizeopt takes a set of clauses P as input, and returns a subset P’ C P bydeleting clauses that will not be useful in subsequent model computations.In the following chapter we will discuss in detail the incremental algorithm we havedeveloped, and then give a formal proof of its correctness.Chapter 3An Incremental AlgorithmIn this chapter we will first formalize the partial instantiation and the optimization algorithm sizeopt for evaluation in greater details. We will then present the most importantalgorithm of this thesis — Algorithm Incr, and will prove that Incr is indeed incrementalwith regard to sizeopt.3.1 Preliminaries3.1.1 Basic DefinitionsDefinition (Unifier [34]) Two atoms (or expressions) A and B are unifiable if there is asubstitution 9 such that AO = B9. The substitution 9 is called a unifier for A and B. 0Definition (Most General Unifier [34]) A unifier for two atoms A and B is called a mostgeneral unifier (mgu) if for each unifier i for A and B there exists a substitution-y suchthat=9-y. 0Definition (Disagreement Set [14]) Suppose T and F are two sets of atoms. The disagreement set of T, F, denoted by DIS(T,F), is the set {9 there exist atoms A1 E Tand A2 e F such that A1 and A2 are unifiable via mgu 0 }. 013Chapter 3. An Incremental Algorithm 143.1.2 Partial InstantiationPartial instantiation is aimed at solving the “groundness” problem of linear programmingmethod. As described in [14] [15], partial instantiation computes minimal models oflogic programes by expanding and processing nodes of partial instantiation trees; a logicprogram (not necessarily ground) is evaluated as a propositional program at each treenode.Definition (Partial Instantiation Tree [15]) Given a logic program P (definite ordisjunctive), we define the normal partial instantiation tree, NPIT(P), associated with Pas follows:1. Each node, N, in NPIT(P) is labeled with a pair (PN, SN) and:(a) PN is a logic program (definite or disjunctive).(b) 5N is a set of pairs of the form (TN, FN,) where:i. For each TN, there is a minimal model, M, of PN, such that TN = {AA eM3}. (Note that PN is treated propositionally here and the minimal models of P1,can be computed using various methods such as the integer linear programmingalgorithm described in [9]).ii. F= { AA is an atom occurring in PN such that A is not TNJ}.2. The root of NPIT(P) is labeled with the pair (P, S) where P is the original logicprogram. All subsumed clauses in P are deleted.3. If node N is labeled with the pair (PN, SN), N is a leaf node and has no children ifeither of the following conditions stands:(a) for all TN, FN e SN, the disagreement sets of TN, FN are empty.Chapter 3. An Incremental Algorithm 15(b) N is an exact copy of some other node on the path from the root of the treetoN.4. Otherwise, there exists a pair (TN, FN,) e SN such that the disagreement sets ofTN,, FN are not empty. For each such j and each substitution , node N has achild, N3 labeled with the pair (P3, S) defined below:(a) P3 is constructed as follows: P3= PN U {081C E PN}.(b) P3 uniquely determines S. 0The following example shows how partial instantiation works.Example 1 Let P be the program consisting of clauses:p(Xi,Y1)— q(Xi,Yi)q(a,Y2)÷-q(X2,b)—At the root node, P is considered as a propositional programA—Bwhere A, B, C, D denote p(X1,Y1), q(Xi, Y1), q(a, Y2), q(X2,b) respectively. For thispropositional program, the set of true atoms is T = {C, D}, and the set of false atoms isF = {A, B}. “Conflict resolution” then looks for unification between an atom in T withan atom in F. For our example, there are two conflict-set unifiers:(a) t3 ={X1 = a,Y1 =Y2}(b) 02 = {X1 = X2,Y1 = b}Now for each conflict-set unifier 0, a child node is created which is responsible for theprocessing of the instantiated program P U PO. Thus, for our example, the root nodeChapter 3. An Incremental Algorithm 16has two child nodes. One corresponds to the program P1 = P U {p(a, Y2)— q(a, Y2)}.The other child node corresponds to P2 = P U {p(X2,b) — q(X2,b)}. In the evaluationphase of P2, P2 again is treated as a propositional program whose true and false sets areT2 = {q(a, Y2), q(X2,b), p(X2b)} and F2 = F. For T2 and F2, there are two conflict-setunifiers which are identical to O, 02. Thus, the node for P2 has two child nodes. Similarly,it is not difficult to verify that the node for P1 also has two child nodes. This processof expanding child nodes, and alternating between evaluation and partial instantiationcontinues. A node is a leaf node if its true and false set of atoms cannot be unified, or ifit is an exact copy of a previous node. For our example, the partial instantiation tree isfinite and has 11 nodes in total, as shown in Figure 3.1. 03.1.3 Algorithm SizeOptAs far as computing minimal models is concerned, some clauses in a logic program arenot useful and thus can be deleted. Suppose there is an atom A that does not appearin the head of any clause of a program F, it is easy to see that A cannot be in anyminimal model of P. Thus, those clauses with A in their bodies are never useful, andcan therefore be thrown away. The following algorithm, first developed in [9], intends toreduce the size of a given program by deleting clauses whose bodies cannot possibly besatisfied.Algorithm SizeOpt [9]Input: P, a ground disjunctive program, and S0, the set of atoms that do not appearin the head of any clause in P.Output: Q, the set of retained clauses, and Qd, the set of deleted clauses.Chapter 3. An Incremental Algorithm 17E=0 N0=0NODE 3 NODE 4P3=P1 P4=P1 U {p(X2,b)}T3=T1 T4=T1 U {p(X2,b)}F3=F1 F4=F1Figure 3.1: Partia’ Instantiation Tree1. Initialize Q to F, Qd to 0 and i to 0.2. Set R to 0 (R is used to collect the heads of the deleted clauses at each iteration).3. For each clause Cl A1 V ... V Am — B A ... A B in Q, and for some B, such thatB3eS.(a) delete Cl from Q;0i={X1=a,Y1=Y2}07=(X1=X2,Y1=b}ROOTP0TO {q(a, Y2), q(X2,b))FO = {p(X1, Yl), q(X1, Y1)}NODE 1P1=PO U {p(a,Y2)}T1=TO U {p(a,Y2)}Fl = FONODE 2P2=PO U{p(X2,b)}T2 = TO U {p(X2,b)}F2 = FO05=____NODE 5 NODE 6P5 = P2 U (p(a, Y2)) P6= P2T5 = T2 U {p(a, Y2) } T6 = T2F5=F2 F6=F207=01 08=07 08=0 0=07NODE 7 NODE 8 NODE 9 NODE 10P7=P4 P8=P4 P9=P5 P1O=P5T7=T4 T8=T4 T9=T5 T1O=T5F7=F4 F8=F4 F9=F5 F1O=F5(b) add Cl to Qd; andChapter 3. An Incremental Algorithm 18(c) add A1,..., Am to R.4. Increment i by 1, and set S to R.5. For all A in S, if A occurs in the head of some clause in Q, delete A from S.6. If S is empty, then return Q and Qd, and halt. Otherwise, go back to Step 2. DHereafter, we use the notation sizeopt(P) = (Q, Q) to denote the application of theabove algorithm on P, where Q is the set of retained clauses, and Qd is the set of deletedclauses.Example 2 Let P be the following program:A—BAC (3.1)BvD÷—AAE (3.2)B*—EAF (3.3)(3.4)Initially, S is the set {C, E, F}. After Step 3 in the first iteration of AlgorithmSizeOpt, Qd consists of Clauses 1, 2 and 3, and the only clause remained in Q is Clause4. After Step 5, S1 is {A, B}. In the second iteration of Algorithm SizeOpt, the clauseD ÷— A is deleted from Q and added to Qd in Step 3. S2 is the set {D}. In the thirditeration of Algorithm SizeOpt, execution halts as Q becomes empty. DExample 3 Let P’ be the program obtained by adding the following two clauses to Pintroduced in the previous example:CvG— (3.5)E—C (3.6)Chapter 3. An Incremental Algorithm 19When Algorithm SizeOpt is applied to P’, the situation changes drastically. S0 isnow {F}. In the first iteration, Clause 3 is the only clause added to Qd, and S1 is emptyafter Step 5. Thus, the algorithm halts in Step 6 without another iteration. 0The above example demonstrates that Algorithm SizeOpt is not monotonic, i.e. P1 çP2 doesn’t necessarily mean Qci, where sizeopt(Pi) K-, Q,i) and sizeopt(P2)=K-, It is also easy to see that Algorithm SizeOpt is not anti-monotonic either (i.e.c P2 doesn’t imply Qd,2 Qd,1). The following lemma, proved in [9], shows thatAlgorithm SizeOpt preserves minimal models.Lemma 1 [9] Let P be a disjunctive deductive database such that Sizeopt(P) =(Q, Q). M is a minimal model of P if M is a minimal model of Q.3.2 Incremental AlgorithmAs shown in [9] [10] [11], the operation sizeopt(P) is highly beneficial to the subsequentoperation of finding the models for P since the size of program P is reduced. SupposeP is the program considered in a node N of a partial instantiation tree, and 01,are all the conflict-set unifiers. Node N has m children, the j-th of which corresponds tothe instantiated program P U P0, (where 1 < < m). As described above, AlgorithmSizeOpt can be applied to P U P0, to reduce the number of clauses that need to beprocessed. However, this approach of applying Algorithm SizeOpt directly may leadto a lot of repeated computations, as Algorithm SizeOpt has already been applied toP in Node N (and similarly, the programs in the ancestors of N). To avoid repetitivecomputations as much as possible, we develop Algorithm Incr that reuses sizeopt(P) toproduce sizeopt(P U P0), as shown in Figure 3.2:As shown in Example 2.3, sizeopt(P) is not a monotonic operation, i.e., comparedChapter 3. An Incremental Algorithm 20PUPeAlgorithm SizeOpt Algorithm SizeOptsizeopt (P) > sizeopt (P U Pe)Algorithm JncrFigure 3.2: Incremental Maintenancewith SizeOpt(P), SizeOpt(P U P0) may either delete more clauses or less clauses. Thiscomplicates our incremental algorithm.3.2.1 Graphs for Maintaining Deleted ClausesRecall from Chapter 3.1.2 that sizeopt(P) produces the pair (Q, Qd), where Q consists ofclauses retained in P and Qd consists of clauses deleted from P. To facilitate incrementalprocessing, Algorithm Incr uses a directed graph G, called DC-Graph, to organize thedeleted clauses in Qd. The intended properties of a DC-Graph are as follows:• Nodes represent atoms that do not appear in the head of any clause in Q.• If there is an arc from node B to A, then the arc is labeled by a clause Cl E Qdsuch that A appears in the head of Cl and B occurs in the body of Cl.In other words, every arc(clause) in G is from Qd, and every node(atom) in G onlyappears in the bodies of Q or does not appear in Q at all. The only exceptions to theabove properties are the special root node and the arcs originated from this root node.As will be shown later, the root node is the place where a graph traversal begins. Itpoints to the “level 0” iteration in SizeOpt. Arcs originated from the root node are notlabeled, because those arcs do not correspond to any clause in Qd.Chapter 3. An Incremental Algorithm 21Intuitively, DC-Graph can be explained this way: if there is an arc B A in DC-Graph, then Cl is the clause deleted from the original program, B is the reason why Clshould be deleted, and A is the new atoms that might cause more clauses to be deletedin next iteration.Example 4 Consider the program P discussed in Example 2. Qd consists of all 4 clausesin P. Figure 3.3 shows the DC-Graph G1 corresponding to Qd. For convenience, arcs arelabeled by the clause numbers used in Example 2.1. Furthermore, the label 2,3 of thearc from E to B is a shorthand notation that represents two arcs from E to B with labels2 and 3 respectively. Notice that G1 contains a cycle between A and B. 0rootC E F21D 232,42A B1Figure 3.3: DC-Graph G1This example only illustrates how DC-Graph G1 looks like. We will show in Example5 how G1 can be constructed, after Algorithm Incr has been presented. However, beforewe can present the algorithm, we need the following concept.Chapter 3. An Incremental Algorithm 223.2.2 Self-sustaining CyclesCl Cl Cl C1,Definition (Self-sustaining cycle). Let A1 —4 A —4 ... —÷ A. ..A —÷ A1 be a cycle inClDC-Graph G, where A— B denotes an arc from A to B with label Cl. If there does notexist any arc from outside the cycle to some A. with label Cl_1 (i.e. there is no such Bthat B is outside {A1, ..., A} and B CL;i Ai), then the cycle is called self-sustaining. oAs shown in the above example, G1 contains the cycle A B A. This cycle isnot self-sustaining because of the arc C A (or the arc E B). The existence ofthis arc justifies why Clause 1 should be deleted, and why A should remain as a nodein the graph. On the other hand, if the arcs C 4 A and E B were removed, thecycle A ? B A became self-sustaining. Then for the sake of achieving the kind ofincrementality depicted in Figure 3.2, Clause 2 should be restored (i.e. no longer bekept in Qd). This would cause node B to disappear from the graph, which in turn leadsto the restoration of Clause 1 and the disappearance of node A. Example 6 below willgive further details as to why all these actions are necessary. In general, if there existsa self-sustaining cycle in a DC-Graph, all the clauses involved in the cycle need to berestored, and all the nodes of the cycle need to be removed. We are now in a positionto present Algorithm Incr. Notice that DC-Graph G is used by Incr to judge whether anew clause Cl should be deleted or not from the original program.3.2.3 Algorithm IncrAlgorithm IncrInput: P= (Q, Qd), the DC-Graph G corresponding to Qd, and a clause Cl1V...VAm4BA...ABntobeaddedtoP.Output: updated Q, Qd and G.Chapter 3. An Incremental Algorithm 231. For each B that does not appear in Q and Qd (i.e. appearing for the first time),add to graph G a node B and an arc from the root to node B.2. For each B that is a node in G,(a) For each A, where 1 m,(i) If A, does not appear in Q and Qd (i.e., apprearing for the first time), addnode A, to G.(ii) If there is a node A, in G, add an arc from node B to node A, labeledCl. If there is originally an arc from the root to node A3, remove that arc.(b) Add Cl to Qd.3. If there is no such B in the previous step,(a) Add Cl to Q.(b) For each A, that appears as a node in G where 1 j m, call SubroutineRemove(A,).4. For each self-sustaining cycle in G, call Subroutine Remove(D), where D is someatom in the cycle. 0Subroutine Remove Input atom (node) A.1. Remove from graph G node A and all the arcs pointing to A.2. For each arc initially originating from A in G (i.e. A B),(a) Remove the arc from G.(b) If there does not exist another arc pointing to B with label Cl (i.e. thereisn’t a D such that D B),(i) Remove Cl from Qd, and add it to Q.Chapter 3. An Incremental Algorithm 24(ii) Call subroutine Remove(B) recursively.3. For each clause Cl in Qd such that A appears in the body of Cl, if no atom in thebody of Cl remains as nodes in G, remove Cl from Qd, and add it to Q. DHereafter, we use the notation incr((Q, Qd, G), Cl) (Qout, Qut, Gout) to denote theinput and output of Algorithm Incr where:Q — the original set of retained clausesQd — the original set of deleted clausesG — the DC-Graph corresponding to QdCl — the clause to be insertedQOUt— new set of retained clausesQout— new set of deleted clausesGout— new DC-GraphMoreover, we abuse notation by using 0 to denote an empty DC-Graph, i.e., theDC-Graph with the root node only.Example 5 Apply Algorithm Incr to the 4 clauses in the program P discussed in Example 2. In Figure 3.4, the first DC-Graph (labeled (i)) is graph Gri where incr((0, 0,0), Cli) =(0, {C11},Gri). This is the case because nodes B and C are added in Step 1 of AlgorithmIncr, node A and the two arcs pointing to A are added in Step 2a. Steps 3 and 4 are notneeded in this case.Similarly, the second graph in Figure 3.4 is DC-Graph Gr2 where incr((O, {C11},Gri), Cl2) =(0, {C11C12}, Gr2). This time, node E is added in Step 1 of Algorithm Incr, and the fourarcs pointing from A and E to B and D are added in Step 2a. Notice that even thoughthere is a cycle in Gr2, the cycle is not self-sustaining. It is also not difficult to verifythat sizeopt({C11,C12}) = (0, {C11C12}).Chapter 3. An Incremental Algorithm 25Similarly, the third graph in Figure 3.4 is produced by applying Algorithm Incr toadd Cl3 to Gr2, and the fourth one (called G1 in Example 4) is produced by applyingIncr to Cl4 and the third Graph.(i) (ii) (iii) (iv)SBFigure 3.4: Applying Algorithm Incr to Add Clauses 1, 2, 3 and 4(i) (ii) (iii) (iv)Figure 3.5: Applying Algorithm Incr to Add Clauses 4, 3, 2 and 1Figure 3.4 shows how DC-Graphs are constructed when Clauses 1, 2, 3 and 4 areinserted. The graphs in Figure 3.5 show the DC-Graphs obtained by inserting the 4clauses in the reverse order. As expected, the fourth DC-Graphs in Figure 3.4 andFigure 3.5 are the same. Later we will show that inserting the clauses in different ordersgive identical result. DChapter 3. An Incremental Algorithm 26The above example oniy demonstrates the situation when an inserted clause ends upbeing added to the set Qd (i.e. Qd keeps growing). Obviously, this is not always the case,as an inserted clause may indeed end up with being added to the set Q. This additionmay trigger a series of node removals and the shrinkage of Qd.Example 6 Now consider program F’, that is by adding Clauses 5 and 6 discussed inExample 3. Let us add Clause 5 first. Steps 1 and 2 of Algorithm Incr are not invoked.But in Step 3a, the clause is added to Q, and Subroutine Remove(C) is called. In Step 1of Subroutine Remove, node C and the arc from the root to C are removed. As for the arcfrom C to A labeled Cl1, this arc is removed. But because of the existence of the arc fromB to A labeled Cl1, Subroutine Remove is not called recursively. Furthermore, Step 3 ofRemove does not cause any change, and control returns to Algorithm Incr. As for Step 4of Algorithm Incr, even though there is a cycle from between A and B, this cycle is not selfsustaining because of the arc from E to B with label Cl2. Thus, Algorithm Incr halts. Infunctional terms, we have incr((O, {C11 ..., Cl4},G1), Cl5) = ({Cl}, {C11 ..., C14}, Gr5),where Gr5 is the first DC-Graph shown in Figure 3.6. Before we proceed, note that it isnot difficult to verify that sizeopt({Cli,..., C15}) = ({C15}, {C11, ..., C14}).(i) (ii) (iii)ccFigure 3.6: Applying Algorithm Incr to Add Clauses 5 and 6Chapter 3. An Incremental Algorithm 27Now let us add Clause 6. Steps 1 and 2 of Algorithm Incr are not invoked. But inStep 3a, the clause is added to Q, and Subroutine Remove(E) is called. In Step 1 ofSubroutine Remove, node E and the arc from the root to E are removed. As for the arcfrom E to B labeled Cl2, this arc is removed. But because of the existence of the arc fromA to B labeled Cl2, Subroutine Remove is not called recursively. Similarly, the arc fromE to B labeled Cl3 and the arc from E to D labeled Cl2 are deleted without recursivelycalling Remove. Furthermore, Step 3 of Remove does not cause any change, and controlreturns to Algorithm Incr. The second DC-Graph in Figure 3.6 shows the situation atthis point.However, unlike the above situation for Clause 6, this time the cycle between A andB is self-sustaining. Thus, in Step 4 of Algorithm Incr, Subroutine Remove(B) is called.Step 1 of Remove(B) causes node B and the two arcs from F and A to B to be deleted.In Step 2, the arc from B to A is also removed; Clause 1 is moved from Qd to Q; andthis time Subroutine Remove(A) is invoked recursively. In Step 1 of Remove(A), node Ais erased. In Step 2, the arc from A to D is removed; Clauses 2 and 4 are removed fromQd to Q; and Subroutine Remove(D) is called recursively.Step 1 of Remove(D) erases node D, and Step 3 causes no change. Control now returnsto Step 3 of Remove(A). As there is no longer any clause in Qd with A in the body, controlreturns to Step 3 of Remove(B). Again as there is no longer any clause in Qd with B interms, we have incr(({C15},{C11, ..., C14},Gr5),Cl6) = ({C11,Cl2,Cl4,Cl5,C16}, {C13}),where G6 is the last DC-Graph shown in Figure 3.6. 0As shown in Example 3, we have sizeopt({Cli, ..., C16}) = ({C11,Cl2,Cl4,Cl5,Cl6}, {C13}),verifying once again the incremental nature of Algorithm Incr. As detailed above, this isdue largely to Step 4, without which the final situation would be as shown in the secondChapter 3. An Incremental Algorithm 28DC-Graph of figure3.6, but not as in the third graph.Example 7 Thus far, we have not seen a situation in which Step 3 of Subroutine Removeis needed. But given the third graph in Figure 3.6, let us consider adding the clauseF — A to the existing program. Since A appears in Q, Step 3 of Algorithm Incr addsthe clause to Q and calls Remove(F). Now in Step 3 of Remove(F), Clause 3 - which isin Qd, but does not appear as a label in G - is correctly inserted into Q from Qd. 03.3 Correctness Proof: Incrementality of Algorithm IncrIn the remainder of this section, we will present one of the key results of this paper -the theorem proving the incremental property of Algorithm Incr (cf. Theorem 1). Thisproperty has been verified several times in the previous examples.Before we can prove the theorem, we need the following lemmas.3.3.1 Supporting LemmasLemma 2 Let P be the set {C11, ..., Cl}. Then:1. Let sizeopt(P)= (Q, Q). It is the case that Q U Qd = P and Q fl Q = 0.2. Let incr(...incr((O,O,O),Cl), .,Cl) = It is the case that P UPn,d P and P, fl Pn,d = 0.Proof For Part 1, as shown in Algorithm SizeOpt, Q is initialized to P, and Qd toO in Step 1. Afterwards, the only place where a clause is removed is in Step 3. Morespecifically, as shown in Steps 3a and 3b, whenever a clause is removed from Q, thatclause is added to Qd. Thus, it is obvious that Part 1 of the lemma is true.For Part 2, let us prove by induction on n. When n = 1, it is obvious that SubroutineRemove is not invoked in Algorithm Incr. If Cl1 is of the form A1 V ... V Am , then byChapter 3. An Incremental Algorithm 29Step 3, P1 = {C11} and P1,d = 0. Otherwise, Cl1 is of the formA1V...VAm B1A...AB.Then by Step 2, P1 = 0 and P1,d = {C11}. Hence, in both cases, P1 U Pi,d {C11} andP1 n P1,d = 0.Now assume that Part 2 of the lemma is true for n — k — 1. There are two cases. First,consider the case when Subroutine Remove is not called. Then Steps 2 and 3 are the onlyplaces when a clause is either added to Pk or Pk,d. Notice that the conditions of Steps 2and 3 are mutually exclusive to each other. Thus, given the induction assumption thatPk.—1 U Pk—1,d = {C11, ..., Clk_1} and Pk—1 fl Pk—1,d = 0, it is the case that Fk U Pk,d ={Cl,...,Cl} and PkflPk,d=O.Second, consider the case when Subroutine Remove is invoked. The two places inRemove when a clause is moved around are Steps 2a and 3. More specifically, whenevera clause is deleted from Pk_1, it is immediately added to Pk. Thus given the inductionassumption, it is necessary that regardless of how many times Remove is invoked, Pk UPk,d {C11, ..., Clk} and Pk Pk,d = 0.The lemma above shows that for both Algorithm SizeOpt and Algorithm Incr, theset of retained clauses and the set of deleted clauses partition the original program P.The lemma below shows that node A appears in a DC-Graph if and only if all clauseswith A in the heads have already been deleted.Lemma 3 Let incr(...incr((O, 0, 0), Cl1), ..., Cl) = (Pa, P,d, Ga). Then for any atomA, A appears as a node in G if there does not exist any clause in P, with A in thehead.Proof Prove by induction on n. When n = 1, it is obvious that Subroutine Removeis not invoked in Algorithm Incr. If node A appears in the DC-Graph, the node must beChapter 3. An Incremental Algorithm 30added in Step 2a. Then by Step 2c, Cl1 is added to P1,d, and is not in P1. Conversely, ifCl1 appears in F1, then it must be added to P1 in Step 3a. In that case, Step 2a is notexecuted, and A does not appear in the DC-Graph. Now assume that the lemma is truefor n = k — 1. There are two cases.Case 1 Subroutine Remove is not called. For any atom A, there are two subcases.Case 1.1 A does not appear in the head of Clk.If A does not appear in the body of Cik, then A appears as a node in Gk if A appearsas a node in Gk_1, as Subroutine Remove is not invoked. By the induction assumption,A appears in Gk if there does not exist any clause in Pk1 with A in the head. Since Ais not the head of Clk, it is necessary that there does not exist any cause in Pk with Ain the head.Now consider the case when A appears in the body of Clk. If A appears in eitheror Pk1,d, then A appears as a node in Gk if A appears as a node in Gk_1. Thesituation is exactly the same as the one considered in the previous paragraph. Otherwise,if A appears for the first time, then node A is added to Gk in Step 1. But obviously Pkstill does not contain any clause with A in the head.Case 1.2 A appears in the head of Clk.There are two subcases, depending on whether Step 2 or 3 is executed. If Step 3 isexecuted, then Cik is in Pk by Step 3a. But then Step 3b guarantees that Gk does notcontain node A. On the other hand, if Step 2 is executed instead, there are two moresubcases. If A appears in either Pk1 or Pk1,d, then A appears as a node in Gk if Aappears as a node in Gkl. The situation is then similar to the one considered in thefirst paragraph of Case 1.1. Otherwise, if A appears for the first time, then node A isadded to Gk in Step 2a,. But then °1k is added to Pk,d in Step 2b, but not added to Pk.Chapter 3. An Incremental Algorithm 31By the induction assumption, since node A does not appear in Gk_1, there is no clausein Pk—1 with A in the head. Thus, as Cik is added to Pk,d, there is no clause in Pk withA in the head. This completes the analysis of Case 1.Case 2 Subroutine Remove is invoked.For any atom A, there are two subcases.Case 2.1 Remove(A) is invoked.There are three places where Remove(A) can be invoked. If Remove(A) is called fromStep 3b of Incr, then in Step 3a a clause with A in the head is added to Pk. If Remove(A)is called recursively in Step 2b of Remove(B) for some B, B • A is the only arc pointingto A with label Cl for some clause Cl with A in the head. Then in Step 2b of Remove(B),Cl is moved from Fk—1,d to Pk. Finally, if Remove(A) is called from Step 4 of AlgorithmIncr, A is in a self-sustaining cycle. Step 2 of Remove(A) recursively causes all nodesin the self-sustaining cycle be removed. Thus, at least one clause with A in the head ismoved from Pk1,d to Pk.Case 2.2 Remove(A) is not invoked.The analysis for this case is very similar to the one for Case 1. This completes theproof of this lemma. DWe need one more lemma before we can prove Theorem 1. This lemma requires thefollowing concept.3.3.2 Rank: Bridge Between Incr and SizeOptDefinition (Rank). Let A be a node in a DC-Graph G. The rank of A in G, denoted byrank(A), is defined recursively as follows:1. If there is an arc from the root to A, rank(A) = 0;Chapter 3. An Incremental Algorithm 322. Let B1,..., B1..., Bm,i, ..., Bm,um be all the nodes that have arcs pointing to A,such that: a) {C11,..., Clm} are all the labels of these arcs, and b) for all 1. j m,B,1,..., are all the nodes that have arcs pointing to A with label Cl. Thenrank(A) = 1 +max(minrank(B,). DExample 8 Consider the DC-Graph G1 introduced in figure3.4. The nodes with rank =0 are C, E and F. Now consider rank(A). There are the arcs from C and B pointing to A,both with label Cl1. Thus, rank(A) = 1 +min{rank(C), rank(B)}. Since rank(C) = 0,it is obvious that rank(A) = 1 + rank(C) = 1. Now consider rank(B) and all thearcs pointing to B. This time there are two different labels: Cl2 and Cl3. For Cl2,there are the arcs from A and E to B. Based on an analysis similarly to the one forrank(A), the minimum corresponding to Cl2 is rank(E) = 0. For Cl3, there are the arcsfrom E and F to B. Thus, the minimum based on Cl3 is min{rank(E), rank(F)} = 0.Hence, rank(B) = 1 + max{O, 0} = 1, where the two zeros correspond to Cl2 and Cl3respectively. Similarly, it is not difficult to verify that rank(D) 1 + rank(A) 2. Nowcompare the ranks with the set S0, S1 and S2 discussed in Example 2.2. The interestingthing here is that for all atoms A, rank(A) = k if A E Sk This property will be provedformally in the lemma below.Notice that if a DC-Graph contains a self-sustaining cycle, rank assignments to atomsin the cycle are not well-defined. For example, consider the self-sustaining cycle betweenA and B in the second DC-Graph in figure3.6. Then rank(B) depends on rank(A)which in turn depends on rank(B). Thus, both ranks are not well-defined because ofthe cyclic dependency. Fortunately, since Step 4 of Algorithm Incr removes all selfsustaining cycles, all DC-Graphs produced by Incr do not contain any self-sustainingcycle. Then by the definition of self-sustaining cycle, for the non self-sustaining cycleChapter 3. An Incremental Algorithm 33Cl, Cl2 Cl_1A1 —f A —p —* A2...A —* A1, there must exist atoms A such that there existsarc B C1 A for some atom B not e {A1, ..., A}. Thus, in determining rank(A),for Clause Cl_1, min{rank(B), rank(A_i)} is always well-defined (cf. the previousexample). Thus, there is no cyclic dependency on rank assignments. DLemma 4 Let incr((Q, Qd, G), Cl) = (Qout, Q1t, Gout). Then for all nodes A e Gout,rank(A) = n if A e S, where the sets S0, ..., S,,, ... are the ones produced by applyingAlgorithm SizeOpt directly on Q0Lt U Qodut.Proof Prove by induction on n. When n = 0, rank(A) = 0 if there is an arcfrom the root to A. This arc is created in Step 1 of Algorithm Incr. If this arc is notremoved in Step 2b, it must be the case that A does not appear in the head of any clausein Q0Lt U Qodut. Then when applying Algorithm SizeOpt directly on Qout U Qot, it isnecessary that A e S0. Assume that the lemma is true for n k — 1. We prove the ifand only-if part separately.Case 1 rank(A) = kBy the definition of rank, rank(A) = 1+max,(minir nk(B,)). That is, amongthe clauses Cl1,..., Cirn that are the labels of all the arcs pointing to A, there exists oneclause Cl, where 1 j m such that rank(A) = k = 1 + (minrank(B,)). Morespecifically, Cl, must be of the form . . .A... . . .B,, A... A.... Among these u, atoms,let i be the one so that rank(B,,) min1rank(B,. In other words, rank(B,,) = k—i.By the induction assumption, B,, E 5k1 Thus, in Step 3 of Algorithm SizeOpt, Cl, isremoved, and A is added to the set R. By applying a similar argument, it is obvious thatall clauses Cli,..., Cim must be removed at some iteration of Algorithm SizeOpt. Morespecially, since Cl, corresponds to the maximum “minimum-rank”, Cl, must be the lastclause deleted with A appearing in the head. Thus, there must not exist any retainedChapter 3. An Incremental Algorithm 34clause with head A. Hence, in Step 5 of Algorithm SizeOpt, A is kept in the set Sk.Case 2 A Sk.As shown in Algorithm SizeOpt, there must exist a clause Cl3 of the form . . .A...such that this is (one of) the last clause with A in the head, and B,, is in5k1• Bythe induction assumption, rank(Bj,:) = k—i. Now among all B,1,...,B, that appear inthe body of Cl3 and that appear as nodes in the DC-Graph, suppose there exists B,1 suchthat rank(B3,j) <k—i. By the induction assumption, B,1 e S where w < k—i. In thatcase, by Step 3 of Algorithm SizeOpt, the clause Cl3 must have been deleted earlier, andshould not exist for deletion in the current iteration. This is a contradiction. Thus, it isnecessary that rank(B3,)=min1rank(B,). By applying a similar argument, for everyclause Clv, among Cli,..., Clm with A in the heads, there exists an I, for 1 w m suchthat rank(B,j) = minlrank(BW,V). But since Cl3 is the last clause to be deleted, itis necessary that rank(B3,)= rank(B,3) max{B,1,...,Bm,i,,}. Hence, it is necessarythat rank(A) = i + rank(B3,)= k. D3.3.3 Proof of IncrementalityNow we are in a position to present the theorem that proves the incremental property ofAlgorithm Incr.Theorem 1 Let P be a program consisting of clauses Cl1, ..., Cl,. Let sizeopt(P) =(Q, Qd), and incr(...incr((O, 0,0>, Cl1), ..., Gin) = (1n, Pn,d, Gn>. Then Q = P,-, and Qd =Proof Given Lemma 2, it suffices to prove Qd = Pn,d. Let Cl . . .A... .‘ B1 A... A Bmbe a clause in Qd.Chapter 3. An Incremental Algorithm 35Case 1 No clause in Q with A in the head.Then all clauses with A in the head are in Qd, and for some k, A e Sk. By Lemma4, this is true if rank(A) = k. By Lemma 3, this is possible if all clauses with A in theheads have been deleted, i.e., in Pn,d.Case 2 exists some clause in Q with A in the head.Cl is in Qd if there exists B, where 1 < j < m such that B, E Sk for some k.By Lemma 4, this is true if rank(B,) = k. There are now two subcases depending onwhether node B, appears in the DC-Graph when Cl was inserted by Algorithm Incr.Case 2.1 Node B, already created.Then by Step 2c of Algorithm Incr, Cl is added to the set of deleted clauses.Case 2.2 OtherwiseSuppose Cl does not represent the first time B, appears. Let Cl1 be the clause whenB, first appears. Since there does not exist node B, in the DC-Graph, B, must be inthe head of Cl1, as ensured by Step 1 of Algorithm Incr. Furthermore, because of Step2, and because there does not exist node B in the graph, Cl1 must be added to the setof retained clauses in Step 3. But notice that in Algorithm Incr and Subroutine Remove,once a clause is put into the set of retained clauses, it will never be removed. In otherwords, Cl1 must be in P. However, by Lemma 3, B, cannot be a node in the graph G,and rank(B,) cannot be equal to k. This is a contradiction. Hence, it is necessary thatCl represents the first time B, appears. Thus, in Step 1 of Algorithm Incr, a node for B,is created, and the situation is exactly the same as in Case 2.1.Combining case 2.1 and 2.2, it is necessary that Cl was once added to the set ofdeleted clauses. Now Since B, is a node in the DC-Graph, Step 3 of Subroutine Removewill never remove Cl from the set of deleted clauses. Hence, it is necessary that Cl is inChapter 3. An Incremental Algorithm 36Pn,d This completes the proof of the theorem. DCorollary 1 Given Clauses Cl1, ..., Cl, Algorithm Incr produces the same end resultregardless of the order Cl1, ..., Cl,- are inserted.0In this chapter, we have presented the incremental algorithm Incr and formally provedits incrementality. In Incr, we have developed a DC-Graph to organize the deleted clausesand the concept of self-sustaining cycle to achieve the incrementality. In order to provethat Incr is indeed incremental, we have brought about the concept of rank, whichconnects Incr and SizeOpt. In the next chapter, we will show how algorithm Incr can beoptimized.Chapter 4Optimizations of the Incremental Algorithm and Partial InstantiationIn the last chapter, we have presented Algorithm Incr and showed that it achieves thekind of incrementality shown in Figure 3.2. In this chapter, we will develop severalways to optimize this algorithm. First we will study how self-sustaining cycles couldbe retained in DC-Graph. Then we will consider the different orders for inserting newclauses. Finally we will explore the optimization of partial instantiation by cutting downredundant nodes in partial instantiation trees.4.1 Algorithm IncrOptA complexity analysis on Algorithm Incr reveals that Step 4 plays a considerable rolein determining the efficiency of Incr. It involves finding each and every self-sustainingcycle that may exist in the DC-Graph. As shown in Example 6, this is the crucial stepthat leads to the incremental property of Algorithm Incr. However, the following lemmashows that from the point of view of computing minimal models, self-sustaining cyclesneed not be detected, and can be kept in the graph.Lemma 5 Let Q be a set of retained clauses and Qd be a set of deleted clausesCu Cl2 Cl Cl,maintained in the DC-Graph G. Let A1 —+ A2 — ... —* — A1 be a selfsustaining cycle in G. M is a minimal model of Q U {C11, ..., C1} if M is a minimalmodel of Q.37Chapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 38Proof As introduced in Section 3.1, for all 1 <i <n, Gl is a clause with A1 in thehead and A in the body. Since A1, ..., A, are nodes in DC-Graph G, none of A1, ..., A,-,appears in Q. Thus, given any minimal model M of Q, none of A1, ..., A,-, in contained inM. Then it is easy to see that M is a model of Cl1, ..., Gin. Hence M is a minimal modelof Q U {C11, ..., Cl,} if M is a minimal model of Q. DFrom the above Lemma, we can see that taking Step 4 out of Algorithm Incr doesn’taffect the minimal model. This motivates the following Algorithm.Algorithm IncrOpt Exactly the same as Algorithm Incr, but without Step 4 ofIncr. 0One may ask why we include Step 4, i.e., removing all the sustaining-cycles fromDC-Graph, in Algorithm Incr in the first place. Recall that the concept of self-sustainingcycle is brought out to achieve the incrementality. Algorithm Incr can not be formally proved to be incremental without Step 4. However, as shown in Lemma 5, thisstep is not necessary for computing minimal models. Hereafter, we use the notationincropt((Q, Qd, G), Cl) (QOUt, Qodut, Gout) for Algorithm IncrOpt in exactly the sameway as we use incr((Q, Qd, G), Cl) (QoUt, Qodut , Gout) for Incr. The corollary belowfollows directly from Lemma 1, Theorem 1 and Lemma 5.Corollary 2 Let P be a program consisting of clauses Cl1, ..., C1, and let incropt(...incropt((O, 0, 0), Cl1), ..., Gin) (Pn, Pn,d, Ga). M is a minimal model of P if M is aminimal model of Fn. 0As far as supporting minimal model computations is concerned, Algorithm IncrOptis more preferable than Algorithm Incr for the following reasons:First, as discussed above, IncrOpt does not check for self-sustaining cycles. WhileChapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 39cycle detection takes time linear to the number to edges in the graph, checking allcycles to see whether they are self-sustaining takes considerably more time. Thus,by not checking self-sustaining cycles, IncrOpt is more efficient than Incr.• Second, it is easy to see if incropt((Q, Qd, G), Cl) = (Q, -, G) and incr((Q, Qd, G),Cl) (Qout,_, .), then it is necessary that QoUtt Qout. More precisely, IncrOptkeeps all clauses in self-sustaining cycles deleted. Thus, the size of the programQ may be much smaller than Qout. The implication is that finding the minimalmodels based on Q may take considerably less time than finding the minimalmodels based on Q°.• The third reason why Algorithm IncrOpt is more preferred applies only to programP that are definite (i.e. no disjunctive heads). The following lemma shows that fordefinite programs, Algorithm IncrOpt directly finds their least models.Lemma 6 Let P be a definite program consisting of clauses Cl1, ..., Cl, and letincropt(...incropt((O, 0, 0), Cl1), ..., Gin) = (Pn, .Pn,d, Ga). The least model of P is the set{A A is the head of a clause in Pn}.Proof Prove by induction on n. When n 1, if Gi is of the form A—, Step 3 ofIncrOpt adds Cl1 to P1. Then it is obvious that the least model of Cl1 is the set {A}.On the other hand, if Cl1 is of the form A ‘ B1 A ... A Bm, Step 2 of IncrOpt adds Cl1to P1,d, and P1 becomes empty. Then it s easy to see that the least model of Cl1 is theempty set. Now assume that the lemma is true for n = k — 1. There are two cases.Casel 1 Clk is added to Pk,d.This must occur in Step 2 of IncrOpt, and Clk is of the form A ‘ B1 A ... A Bmsuch that there exists a B, for 1 j <m that appears as a node in the DC-Graph Gk.Chapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 40There are two subcases. First, B3 may be added as a node in Step 1 of IncrOpt, in whichcase B3 appears for the first time and must not be in the least model of Cl1,...,Cl,.Alternatively, B3 may be a node in DC-Graph Gk_1. Then according to Lemma 3, B3cannot be the head of a clause in Pk_1. By the induction assumption, B3 is not in theleast model of Cl1,..., Clk_1. By combining the two cases, it is necessary that the leastmodel of Cli,...,Cl, is the same as the least model of Cl1,..., Clk_1. By the inductionassumption, the latter is the set {A A is the head of a clause in Pk_1}. But since Cikis added to Pk,d, it is necessary that Pk =Case 2 Cik is added to Pk.Let Cik be of the form A .‘ B1 A ... A Bm. There are again two subcases depending onwhether subroutine Remove is invoked. First consider the subcase when Remove is notcalled. Then Pk = Pk1 U Clj, and thus { B B is the head of a clause in Pk} is equal to{A} U {B B is the head of a clause in Pk_1}. Moreover, Cik is added to Pk in Step 3 ofIncrOpt. This is possible only if all B,’s do not occur as nodes in Gk_l. Then accordingto Lemma 3, all By’s occur as heads of clauses in Pk1. By the induction assumption, allBa’s are in the least model of Cl1,..., Clk_1. Thus, A is in the least model of Cl1,..., Clk.Now consider the subcase when Subroutine Remove is called. A clause Cl may beadded to Pk is Step 2b or 3 of Remove. If Cl is added in Step 2b, Cl is of the formB * A A B1 A ... A Bm where A occurs as the head of a clause in Pk, and thus is in theleast model based on the analysis for the first subcase. Moreover, due to the conditionof Step 2b, B1,..., Bm must all be in the least model as well. Thus, B has to be in theleast model. Alternatively, if Cl is added in Step 3 or Remove, this is possible only if allatoms in the body of Cl are not in the DC-Graph, and are in the least model. Hence,the head of Cl must also be in the least model. DChapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 41The above lemma shows that when applying Algorithm IncrOpt to a definite program,once IncrOpt completes its execution, no further processing is needed to compute the leastmodel. This is not the case for Algorithm Incr and Algorithm SizeOpt, as shown in thefollowing example.Example 9 Consider the definite program(4.1)B — A, (4.2)(4.3)D—C. (4.4)All 4 clauses remain if either Algorithm Incr or Algorithm SizeOpt is applied. Theapplication of a least model solver is then needed to compute the least model {C, D}.But if Algorithm IncrOpt is used instead, only clauses 3 and 4 remain, whose headsdirectly give the least model. IDOne may wonder whether the above lemma can be generalized to disjunctive programsin the following sense. If P is a disjunctive program consisting of clauses Cl1, ..., Gin, andincropt(...incropt((O, 0, 0), Cli), ..., Gin) = (Pn, Pn,d, Ga), then is it true that for all atomsA that appears in the head of a clause in P,., A occurs in some minimal model of P? Theanswer in no. Consider the following program P:AVB4—, (4.5)A— (4.6)C—B. (4.7)Applying IncrOpt does not cause any change. All the three clauses are retained. Thus theChapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 42set of atoms appearing in the heads is {A, B, C}. However, B and C are not containedin the (unique) minimal model of P.4.2 Heuristics: Ordering Clauses to Be Inserted4.2.1 Complexity AnalysisAccording to Corollary 1 and Lemma 5, when using Algorithm IncrOpt, different ordersof inserting the same collection of clauses do not affect the final DC-Graph, and thefinal sets of retained and deleted clauses. However, different orders may require differentexecution times- depending largely upon how many times Subroutine Remove is invoked.If Remove is not called at all when inserting a clause A1 V ... V Am “ B1 A ... A B1, thecomplexity of Algorithm IncrOpt is O(ml). Otherwise, suppose N is the number ofclauses in the current program, and a is the number of nodes(atoms) in the currentgraph, then the worst case complexity of recursively calling Remove is O(alN), and thatof IncrOpt is O(ml+alN). It is then tempting to conclude that the complexity of IncrOptfor inserting n clauses is O(n(ml + aIN)). However, this is incorrect because during theprocess of inserting the n clauses, Remove(A) for all atoms A can occur at most once —since once A is removed from the graph, it can never come back. Thus, for inserting nclauses, the complexity of IncrOpt should be O(nml + al(N + n)).On the other hand, if Algorithm SizeOpt is used directly, then there are (N + n)clauses. The worst case complexity of Algorithm SizeOpt for (N+n) clauses is O(ml(N+n)2). Thus, comparing the complexity figures of Algorithm SizeOpt and IncrOpt doesnot provide any clear conclusion, as the comparison depends on the magnitude of a, thenumber of atoms in a DC-Graph, relative to the magnitudes of N, n, 1 and m. In Chapter5, we will present experimental results evaluating the effectiveness of Algorithm IncrOpt.Chapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 43The above complexity analysis is rather coarse-grained since a and N vary at eachiteration. Still it reveals that given n clauses to be inserted, the most efficient order isthe one that minimizes the number of times Subroutine Remove needs to be called. Inthe following, we discuss three possible ways to insert n clauses.4.2.2 Clauses Arbitrarily OrderedThe most obvious way is to use IncrOpt to insert the clauses in an arbitrary order (e.g.textual order). For lack of a better name, we will refer to this strategy as IncrOptArb.Algorithm IncrOptArb Let Gi’, 012, ..., Ci,, be the clauses to be inserted. CallIncrOpt n times to insert Cli, 012, ..., Gin based on their input order. 04.2.3 Clauses Fully OrderedTo the other extreme of arbitrary ordering, another way to insert n clauses is to reallytry to minimize the number of times Subroutine Remove will be called.Subroutine Remove is called when any atom A, previously in DC-Graph, turns outto be in the head of some newly inserted clause Ci and thus should not stay in theDC-Graph any more. If clause Ci is inserted first, A will not be in the DC-Graph atthe first place, so no removal will be needed. This leads us to the idea of handlingatoms/clauses that cannot be in DC-Graph or more unlikely to be in DC-Graph aheadof other atoms/clauses.Obviously all the facts of a program must be in the retained clauses and will notappear in DC-Graph. So the facts should be inserted first. Consequently, a clause ismore likely to be retained if it has facts in its bodies, so this clauses should have a higherpriority than ordinary clauses. The following algorithm uses such heuristic order thatChapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 44attempts to reduce the times Subroutine Remove will be called.Algorithm IncrOptOrder Let Cl1, ..., Gin be the clauses to be inserted.1. Initialize R to all the facts among Cl1, ..., Ci,, and S to 0.2. For each clause Cl E R,(a) Call Algorithm IncrOpt with Cl.(b) If Cl is not added to the DC-Graph, then for each atom A in the head ofCl, add all the clauses not considered so far with A in the body to S.3. If S is not empty, set R to S and S to 0. Go to Step 2.4. Apply IncrOpt on each of the clauses not considered so far in an arbitrary order.Example 10 Suppose the six clauses of P and F’ in Examples 2 and 3 are to be inserted.Clause 6 is the first one considered. Since IncrOpt does not add Clause 6 to the DC-Graph, Clauses 1 and 7 are added to the set S and inserted in the next iteration ofIncrOptOrder. While Clause 1 is added to the DC-Graph, Clause 7 is not, which causesClauses 2 and 3 to be considered in the third iteration. This time both clauses are addedto the DC-Graph. Then Step 4 of IncrOptOrder applies IncrOpt to Clause 4, the onlyclause remaining. DNotice that if Clause 6 is inserted after Clause 1, then node C created during theinsertion of Clause 1 will need to be removed. IncrOptOrder relies on inserting facts firstand on Step 2b to prevent all these happening.Chapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 454.2.4 Only Facts OrderedOne possible weakness of Algorithm IncrOptOrder is that there may be too much overhead involved in implementing Step 2. The following algorithm represents a compromise.It inserts the facts among the n clauses first, but leaves the remaining clauses to be inserted in an arbitrary order.Algorithm IncrOptFact Let Cl1, ..., Cl be the clauses to be inserted. Apply Algorithm IncrOpt first to all the facts among the clauses. Then apply Algorithm IncrOptto the remaining clauses in an arbitrary order.Since facts are the basic sources that cause removals in a DC-Graph, IncrOptFactmight considerably improve the performance by considering facts first. Furthermore,IncrOptFact does not have much overhead because it is very easy to distinguish factsfrom ordinary clauses.In Chapter 5, we will present experimental results evaluating the effectiveness of thesethree algorithms.4.3 Rules for Cutting Redundant NodesThus far we have studied the optimization methods for Algorithm Incr. But recall thatour final goal is to expand the partial instantiation trees efficiently for computing minimaland least models. In this section we will study how to optimize partial instantiation trees.As described in Chapter 3.1.1, partial instantiation can be viewed as expanding andprocessing nodes of partial instantiation trees. For each conflict-set unifier 6 of a node ina partial instantiation tree, there is a child node processing P U P0. If a newly generatednode N is identical to one of its ancestors, node N is redundant and could be cut off.Chapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 46Obviously if there is a way to predict redundant nodes, we can save the computations on these nodes by not expanding them, and thus improving the efficiency of partialinstantiation. By carefully examining the unifiers along expansion path in partial instantiation tree, we develop the following lemma which is very useful in avoiding redundantnode expansion.This lemma gives 3 sufficient conditions which are easy to implement. Without loss ofgenerality, it assumes that substitutions in conflict-set unifiers are represented in solvedform [34]. That is, for a set of (substitution) equations, the equations are of the formX3 = t3, and all variables appearing in the left-hand-side of the equations cannot appearin the right-hand-side of any equation. For the following lemma, we use the notationL(O) and R(O) to denote the set of all variables appearing in the left-hand-side and right-hand-side of 8 respectively. We also use the notation P P1 to denote the fact that thenode for program P is the parent of the node for F1, and 0 is the conflict-set unifier, i.e.P1 =PuPO.Lemma 71. Given P P1 and P1 z P2, it is necessary that P2 = P1.L(01)flL(02=O,L(81) n R(82) = 0, and R(01) fl L(02) = 03. Given PP13P2P,Ps=PifL(8) lR(=O.Proof1. By definition, P2 = P1 U P18. Substituting P1 = P U P8 into it, we get P2 =P U P0 U P80. Since 8 is in resolved form, 0 = 88. Thus we have P2 = P U P8,which is equal to P1.Chapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 472. By definition, we have P2 = P1 U P162. Substituting P1 = P U P01 into it, we getP2 = PuP61UP&u0.On the other hand, by definition, we have P4 =P3UP81.Substituting P3 = P U P62 into it, we get P4 = P U P61 U P0 U 0261. It is easy toverify that 0162 = 6201 when L(61)flL(62= 0, L(61)nR(62= 0, R(Oi)flL(02)= 0.Therefore we have P4 = P2.3. By definition, P3 =P2UP61.Substituting P2 =P1UP62,we get P3 =P1UP62uP16UP2.Since L(61)flR(02= 0, P1026 = P162. SoP3 =1uP62.Then substituting P1 = P U P61, we have P3 = P U P61 U P62 U P0102. Onthe other hand, by definition, P2 = P1 U P102 = (P U P61) U (P U P61)2 =PuPO1uP26=3. DApplying the lemma above to tree expansion, some branches can be cut off, as shownin the figure below, when 61, 62 satisfy the corresponding conditions.O— — — — e— —(1) (2) (3)Figure 4.1: Cutting Redundant BranchesAs an example, consider again the program P discussed in Chapter 3.1.1. As shownin Figure 3.1, P1, which is defined by P1 = P U P01, has two child nodes— node 3 andChapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 48node 4, corresponding to the conifict-set unifiers éI and 02. Then according to the firstpart of lemma 7, there is no need to expand the node P3 = P2 U P201, because P3 isidentical to P1. Look at node 7: since 07 = 0, according to the last part of Lemma 7,node 7 does not need to expand either. Applying Lemma 7 to other nodes, we can getthe reduced partial instantiation tree as shown in Figure 4.2. As can be seen, 6 out of11 nodes are cut off.ROOT01=Xl =a,Y1 =Y2} P0— Xl—— bTO = {q(a, Y2), q(X2,b)}— ‘ I FO = {p(X1, Yl), q(X1, Yl))------NODE 1 NODE 2P1 = P0 U {p(a, Y2)} P2 = P0 U {p(X2, b)}T1=TO U {p(a,Y2)} T2=TO U {p(X2,b)}F1=FO F2=FO_______O56=CUT OFF NODE 4 NODE 5 CUT OFFP4 = P1 U {p(X2, b)} P5 = P2 U {p(a, Y2)}T4=T1 U {p(X2,b)} T5=T2 U {p(a,Y2)}F4=F1 F5=F2= =CUT OFF CUT OFF CUT OFF CUT OFFFigure 4.2: Reduced Partial Instantiation TreeIn this chapter, we have optimized Algorithm Incr by deleting clauses in self-sustainingcycles and ordering clauses to be inserted. We have also optimized partial instantiationby avoiding redundant nodes in partial instantiation trees. In next chapter, we willChapter 4. Optimizations of the Incremental Algorithm and Partial Instantiation 49present experimental results evaluating the effectiveness of these optimizations.Chapter 5Experimental ResultsIn the previous chapter, we have explored how Algorithm Incr and partial instantiationcan be optimized. To evaluate the effectiveness of these optimization methods, we implemented the corresponding algorithms and then conducted several series of experiments,including the comparison of the three orders for newly inserted clauses, the comparisonbetween Algorithm IncrOptFact and SizeOpt on both definite and disjunctive cases, andlast but not least, the comparison of partial instantiation trees with and without cuttingredundant nodes. The experiment results to be presented in this chapter will show thatthe optimization methods can bring about significant improvement in performance.5.1 Experiment Overview of the Incremental AlgorithmsFor our experimentation, we implemented Algorithm IncrOpt (and thus trivially IncrOptArb), IncrOptOrder and IncrOptFact. We also implemented two versions of AlgorithmSizeOpt. One is a straightforward encoding of the algorithm presented in Chapter 3.1.2.The other one tries to minimize searching by extensive indexing. Unfortunately, in allthe experiments we have carried out so far, the version with extensive indexing requiresso much overhead to set up and maintain the indices that the straightforward versiontakes much less time. Thus, for all the experimental results reported later for AlgorithmSizeOpt, the straightforward version was used.Recall that in our incremental algorithm, a DC-Graph is used to organize the deleted50Chapter 5. Experimental Results 51clauses. Each arc in the graph represents a deleted clause. However, not every deletedclause has a corresponding arc in the graph. Given a deleted clause Cl A1 V ... V AmB1 A... A B, if all of A1,..., Am do not appear in the graph, then this clause would notappear as a label of an arc. In our implementation of the incremental algorithms, weset up a virtual node so that there is an arc from the appropriate node of an atom thatappears both in the heads of some clauses in Q and in the heads of some clauses in Qd. Inthis way, each deleted clause has a corresponding arc in the DC-Graph. This simplifiesthe construction and maintenance of DC-Graph, and makes the implementation moreefficient. This is because with the use of virtual nodes, Step 3 of Subroutine Remove canbe skipped. Finally, to further speed up the maintenance of DC-graph, a counter is keptfor each clause which records the number of times the clause appears as an arc in thegraph. If this counter decreases to zero, the clause is removed from Qd, and put back toQ.In the remainder of this chapter, we will report experimental results evaluating theeffectiveness of our algorithms. All run-times are in milliseconds, and were obtained byrunning the experiments in a SPARC-LX Unix time-sharing environment.5.2 IncrOptFact vs IncrOptOrder vs IncrOptArbIn this series of experiments, we compared the effectiveness of the heuristics describedin Chapter 4.2. The following results are very representative of all the experiments weconducted. In this particular experiment, the input program P consists of 20 disjunctiveclauses. All of them are of the following form:A1V ... V Am B1A ... A Bn.Chapter 5. Experimental Results 52Here m is less than 5, and n less than 10, which means at most 5 atoms appear in thehead of each clause, and at most 10 appears in the body. An atom is represented by aninteger ranging from 1 to 15. The numbers of heads and bodies in each clause, as wellas atoms in the clause, are randomly generated. The times below count the time takenfor each algorithm to process P.IncrOptFact IncrOptArb IncrOptOrdertime(ms) 3.5 3.6 150.6Recall that IncrOptOrder tries to minimize the number of times Subroutine Removeneeds to be called by first inserting the facts, and then partially ordering the insertionof the remaining clauses. Clearly shown above, the strategy backfires as it requires toomuch overhead. Inserting a set of clauses in arbitrary order, as shown in the third columnof the above table, performs surprisingly well. However, IncrOptFact is considered to bethe best, not so much because it outperforms IncrOptArb by a wide margin, but ratherbecause it is very simple to implement, and almost always performs better than IncrOptArb. In the remainder of this section, we will oniy report the results of IncrOptFact.5.3 Same Number of Disjunctive Clauses: IncrOptFact vs SizeOptIn this series of experiments, we compared the effectiveness of our incremental algorithmIncrOptFact with the original algorithm SizeOpt on different sets of clauses. Each clauseset has 20 disjunctive clauses generated the same way as in section 5.2. Particularlyclause set 1 is the same set used in section 5.2. For both algorithm IncrOptFact andSizeOpt, we report:1) Ti: the time taken to process the 20 clauses2) N: the number of clauses deletedChapter 5. Experimental Results 533) T2: the time taken to compute the minimal models4) T3: the total time, i.e., the sum of Ti and T2Clause Set 1 Clause Set 2 Clause Set 3IncrOptFact SizeOpt IncrOptFact SizeOpt IncrOptFact } SizeOptTi 3.54 0.33 3.65 0.43 3.72 0.35N 19 0 13 0 16 0T2 49.i7 83.61 78.95 ii9.52 60.88 104.32T3 52.7i 83.94 82.6 i19.95 64.4 ] 104.67For just the time taken to process the 20 clauses, our incremental algorithm IncrOptFact takes more time than SizeOpt, primarily for maintaining DC-Graphs. But as shownin the table, the extra time is worth spending because IncrOptFact manages to deletemore clauses than SizeOpt. This is all due to the fact that, as described in Chapter 4.1,IncrOptFact deletes all the clauses in self-sustaining cycles. SizeOpt deletes no clausehere because for disjunctive programs, each clause has multiple heads, and thus makingit hard to find an atom that doesn’t appear in all these heads. Consequently, the timestaken for the two algorithms to find the (same collection of) minimal models differ by awide margin. This demonstrates the importance of deleting more rules, whose impact ismultiplied in model computations. At the end, the total time taken by IncrOptFact isonly about 60%- 70% of the time taken by SizeOpt.Due to lack of proper real-world deductive databases, the clause sets used in theseseries of experiments are randomly generated. The effect of Algorithm IncrOptFact onreal-world deductive databases remains to be examined.Chapter 5. Experimental Results 545.4 Same Number of Definite Clauses: IncrOptFact vs. SizeOptBased on the results of the previous set of experiments for disjunctive clauses, we surelycan predict that for definite clauses, IncrOptFact again outperforms SizeOpt. Moreover,Lemma 6 presents a stronger reason for us to believe that IncrOptFact will perform evenbetter. The lemma shows that for definite clauses, our incremental algorithms can obtainthe least model by simply obtaining the heads of all the clauses not deleted. Indeed, ourbelief is confirmed by this series of experiments, in which each test program contains100 randomly generated definite clauses. The following table reports the run-times for atypical program.The processing time taken by IncrOptFact is longer than that by SizeOpt. Butagain IncrOptFact deletes many more clauses, and requires a minimal amount of timeto obtain the least model. In contrast, SizeOpt is much less effective in deleting clauses,and requires the invocation of the least model solver whose run-time dominates the entireprocess.IncrOptFact SizeOptprocessing time for 100 clauses (ms) 9.22 0.73rules deleted 89 17time to find least model (ms) 5.76 580.06total time taken (ms) 14.98 580.79 15.5 Partial Instantiation Trees: IncrOptFact vs. SizeOptThus far, we have only compared IncrOptFact with SizeOpt in those situations whereboth algorithms are required to process the same number of clauses. But recall that ourincremental algorithms are designed for a slightly different purpose: to expand partialChapter 5. Experimental Results 55instantiation trees efficiently. As described in Chapter 3.1.1, if program P in a node Ngives rise to conflict-set unifiers 0i 0m then N has m child nodes, each corresponding toPU P0,. Thus, as shown in Figure 3.2, the acid test of the effectiveness of our incrementalalgorithms is between the time taken for our incremental algorithms to process the clausesin P0,. Given the results of the previous series of experiments, we expect IncrOptFactto outperform SizeOpt even more in the expansion of partial instantiation trees. Asan example, we fully expand the instantiation tree of the program discussed in Chapter3.1.2 using both algorithms. Note that this example was brought out first in other papers([14] [15]) so it is rather representative.By applying the heuristics of avoiding redundant node expansion discussed in Chapter4.3, our algorithm only needs to process 5 nodes, as shown in Figure 4.2, when comparedwith 11 that would be needed otherwise. This demonstrates the usefulness of the heuristics. The following table compares IncrOptFact with SizeOpt for the expansion of 5nodes only. In other words, the total run-time taken by SizeOpt to expand 11 nodeswould be even higher than the time recorded below. Each entry in the table below givestwo run-times: 1) the time taken to process the clauses- P0, for IncrOptFact (for allnodes), and ii) the time taken to find the least model.IncrOptFact SizeOptNode 1 (ms) 0.67/5.47 0.33/45.88Node 2 (ms) 0.02/5.57 0.34/45.86Node 3 (ms) 0.02/5.57 0.34/53.95Node 4 (ms) 0.02/5.49 0.34/49.49Node 5 (ms) 0.02/5.57 0.34/52.88total (processing time/model solving time) 0.75/27.67 1.69/247.76total (processing time + model solving time) 28.42 249.45As expected, the processing time of IncrOptFact for the first node is relatively longChapter 5. Experimental Results 56(i.e. 0.67ms), whereas the processing times for subsequent nodes are much shorter (i.e.0.O2ms). This reflects the benefit of being incremental. At the end, the total processingtime of IncrOptFact is 0.75ms, less than 50% of that of SizeOpt. Furthermore, as shownin previous experiments, IncrOptFact requires much less time in finding least models.The toal time taken to expand the 5 nodes by using IncrOptFact is merely only 10% ofthe time taken by using SizeOpt in this example.5.6 Partial Instantiation Trees: With and Without Applying Lemma 7As shown before, Lemma 7 provides three sufficient conditions to predict redundantnodes. This certainly saves us the processing time for expanding these redundant nodes ingenerating partial instantiation trees. However, applying Lemma 7 needs extra processingtime to check at each node if any of the three conditions is satisfied. Taking both thegains and overhead into consideration, what will the overall performance be? To answerthis question, again we use the example discussed in Chapter 3.1.1.As we have already seen, by applying Lemma 7 to this example, the number of nodesin the partial instantiation tree can be reduced from 11 to 5. The following table showsthe total processing time IncrOptFact takes to expand the 5 nodes using Lemma 7 vs thetime it takes to expand the 11 nodes without using Lemma ‘7. Unlike the experimentswe did above, the total processing time here includes the I/O time. i.e., reading in thelogic program and print out the facts deduced.E optimization non-optimizationjI time(ms) 108.53 130.67As can be seen, regardless of the overhead needed, 25% of the total processing timecan be saved by using Lemma 7 to cut down redundant nodes.Chapter 5. Experimental Results 57In this chapter, we can see that among the three orders for inserting new clauses,Algorithm IncrOptFact, i.e., processing the facts first, gives the best performance. Whencompared with the original algorithm SizeOpt, IncrOptFact works better in all the experiments we have conducted. Cutting the redundant nodes in partial instantiation treescan also bring about significant improvements. In the next chapter, we will report howwe implement the whole framework of partial instantiation.Chapter 6Implementation Details of Partial InstantiationWe have implemented the framework of partial instantiation system and integrated it withthe optimization methods developed in this thesis. The implementation is written in Clanguage running under UNIX environment. It has roughly 3,000 lines of code. In thischapter, we will discuss the major modules and data structures in our implementation.As introduced in Chapter 1, there is a unique least model for a definite database,and one or more minimal models for a disjunctive database. Due to the lack of a properminimal model solver at this stage, our implementation only handles definite databases.But it is easy to extend the current framework to handle disjunctive databases whenminimal model solver becomes available. All we need to do is to replace the least modelsolver with the minimal model solver. The interface remains the same.6.1 General PictureThe system takes a logic program P as input and returns the partial instantiation treeof P along with all the derived facts. Depth-first strategy is chosen here to expand treenodes for time and space considerations.First let us look at what happens at each node. Suppose a node N is expandedthrough unifier 8, and the logic program associated with its parent is P.1. Get new program PN = P U P058Chapter 6. Implementation Details of Partial Instantiation 592. For new clauses in PN, apply IncrOpt algorithm to it to get the reduced program3. Apply minimal model solver to P, get T and F set.4. From T and F, get the conflict unifier set.5. Cut unifiers by Lemma 7 described in Chapter 4 to avoid redundant nodes.6. Check if the current node is a leaf node. If yes, Stop. Otherwise for every 0 in theconflict unifier set, a corresponding child node is created.This expansion continues recursively until a node is a leaf node. A node is a leaf nodewhen either of the following conditions is satisfied:• no unifiers at this node.• this node is a copy of one of its ancestors, i.e., the programs, T sets and F sets ofthe two nodes are identical.Considering the major steps involved at each node, we can see that Step 2, i.e.,applying IncrOpt algorithm, has been studied thoroughly in the previous chapters. Theimplementation of steps 1, 3, 4, 5 will be discussed step by step later in this chapter.Before that we need to first discuss the supporting data structures.6.2 Data StructuresFollowing we will discuss the major data structures used in the system and the considerations behind them.Chapter 6. Implementation Details of Partial Instantiation 606.2.1 Term and Term TableTerm is the most basic element in a deductive database. The definition of term isrecursive. A term can be a constant, a variable, or of the form f(t1,t2, ..., t) where f isa function name of arity n and each t is a term. Following are the fields in the structurethat represents terms.• type: an integer indicating the type of the term. 0: CONSTANT; 1: VARIABLE;2: FUNCTION.• arity: an integer.• name: a string. We adopt the Prolog convention of denoting variables by stringsof characters starting with an upper case letter and constant names by strings ofcharacters starting with a lower case letter or integers.• subTerms: pointer to the first parameter of a function term. This field is valid ifthe term’s type is FUNCTION and arity is not 0.• nextTerm: pointer to the next term. It is used to link the rest of the parametersin a function term. Only when the term’s type is FUNCTION and arity 2 maythis field be valid.• unifiedTerm: an integer which is the index of some term that is derived by applyinga unifier to this term. This field is used for replacement.As an example, Figure 6.1 shows how the term f(x, g(x, y)) can be represented usingthe structure above.All the terms are organized in a global table— term table. Actually each term isidentified by its index number in term table. At the root node, term table contains allthe terms in the original program. When a child node is created, new terms might beChapter 6. Implementation Details of Partial Instantiation 61term: f(a, g(X, Y))unified sub nexttype anty name Term Terms TermIF1OIi 2 ‘f” ( NUlLunified sub next unified sub nexttype arity name Term Terms Term type arity name Term Terms TermCONSTANT) 0 “a” - NULL FUNCTION 2 “g”-NULLunified sub next unified sub nexttype arity name Term Terms Term type anty name Term Terms TermVARIABLE) 0 “X” - NULL VARIABLE 0 “‘1”-NULL ) NULLFigure 6.1: The Structure of Term f(a, g(X, Y))generated via unification. The new terms are added to the end of the term table. Afterthe child node is done, these newly generated terms at this node can be thrown awaybecause they are not useful for other nodes.As we can see, all the insertion and deletion operations of term table happen to theend of the table. Thereafter we use stacks to represent it. A stack is an array of pointerswith a stack pointer points to the last currently active element. Any element behindthis pointer is a NULL pointer, and any element before this pointer is a pointer to thecorresponding structure (clause, atom or term). We choose this data structure becauseof following reasons:Memory consideration: Compared to ordinary array, i.e., each element of the arrayis an actual structure, this data structure uses less memory. It allocates memoryfor an element only when the element becomes active. Otherwise, an element onlytakes four bytes for an pointer which is NULL. When an active element becomesChapter 6. Implementation Details of Partial Instantiation 62inactive, the memory allocated to it can be released.Efficiency consideration: Compared to data type link, this structure is more efficientas well as easily implemented. An element can be accessed directly by its index tothe array. And when an element is inserted or deleted, what needs to be done isjust to increase or decrease the stack pointer.6.2.2 Atom and Atom tableAn atom is of the form p(ti, t2, ..., t,) where p is a predicate name of arity n and each t2is a term. Following are the fields in the atom structure.• name: a string of characters.• arity: an integer;• terms[MAXATOMARITY]: array of integers. The ith (i < arity) integer representsthe index of the ith term in term table. In this way we only need to store one copyof any term.• unifiedAtom: integer.All the atoms are organized in a global atom table, which is very similar to termtable. For the same considerations as term table, atom table is actually a stack. Eachatom is identified by its index in the atom table.6.2.3 Clause and Clause TableA clause is a statement of the formA1vA2...v —BB.. B.where A’s and Be’s are atoms. Ai’s are called the head of the clause, and B’s are calledthe body of the rules.Chapter 6. Implementation Details of Partial Instantiation 63Each atom in a clause is actually an integer corresponding to the atom’s index inatom table. Considering when an old program P is expanded to a new program P U P0,we need to compare each clause in P0 to all the existing clauses in P to get the newclauses. To make this comparison faster, the atom indices in a clause head and in aclause body are kept by an increasing order.• head..number: integer.• heads[MAXHEAD]: indices to atom table.• bodynumber: integer.• bodies[MAXBODY]: indices to atom table.Again all the clauses are organized in a global clause table. Similar to term table andatom term, clause table is a stack. Each clause is identified by its index in clause table.6.2.4 TreeA tree node contains the following fields:• parent: pointer to its parent node.• children: array of pointers to its children.• childNo: index to the children array of its parent.• uSet: pointer to the set of unifiers of this node.• truthSet: pointer to the T set.• falseSet: pointer the F set.• clauseNum: index of the last active clause in program table.• atomNum: index of the last active atom in atom table.Chapter 6. Implementation Details of Partial Instantiation 64• termNum: index of the last active term in term table.• rules: the rule part of the DC-Graph in incremental program.• atoms: the atom part of the DC-Graph in incremental program.Looking into a tree node, we found that a lot of information in a tree node, such asT set, F set, and DC-Graph, are intermediate results during the processing of the node.Once the node is fully expanded, these pieces of information are not useful anymore.Considering this fact, we define most of the fields of a node as pointers so that the memory allocated to them can be released after the node has been done. This can save us alot of memory.So far, we have introduced the global data structures in our implementation. In therest of this chapter, we will talk about the major steps described in section 6.1 on astep-by-step basis.6.3 Generating New ClausesOne important step while expanding a node is: given a program P and a unifier 8, howto get the new program P U P8? A transformation of this problem is: given P and 6,how to get the new clauses by applying 0 to P. Once we have the new clauses, we canget the new program by simply adding new clauses to the end of the original program P.The following three steps are needed to get the new clauses generated by P6.• For every term t in term table, get its unified term tO. Three possible situationsmight happen: tO is t itself, or it is some other existing term in the term table, or itis a totally new term. In the third case, the new term is added to the end of termtable. Recall that for any term t, there is a field called “uuifiedTerm”. This fieldChapter 6. Implementation Details of Partial Instantiation 65is used to store the index of term tO.• For every atom a in atom table, get its unified atom aO. aO can be simply achievedby replacing its terms with the corresponding unified terms. Similarly, aO couldbe itself, or some other existing atom in atom table, or a totally new atom. The“unifiedAtom” field in the atom structure stores the index of atom aO. In the thirdcase, the new atom is added to the end of the table.• For every clause Cl in F, get its unified clause 010. 010 can be simply achievedby rewriting each atom in Cl with its unified atom. ClO is compared with everyclause in clause table. If it is different with all of the clauses, it is a new clause andshould be added to the end of clause table. Otherwise it is ignored.It can be seen that a lot of comparisons are required here. In order to check if aterm/atom/clause is new or not, we need to compare it with each and every term/atom/clausein the term/atom/clause table. As we discussed before, the term/atom/clause table throwaway all the obsolete elements when a node is done. In this way we minimize the sizeof these tables, thus making the comparison more efficient. Furthermore, we keep theatoms in a clause by an increasing order. When we compare two clauses Gl and 012, weonly need to compare the nth (1 n N, N: the total number of atoms in Cl1) atomin Cl1 with the nth atom in 012, saving the trouble of comparing the nth atom in Cl1with each and every atom in 012.6.4 Least/Minimal Model SolverLeast/Minimal model solver takes a definite/disjunctive program P as input and returnsthe T set and F set of P. As discussed before, by applying incremental algorithm IncrOpt,the least model of a definite program can be easily acquired in the following way: for anyChapter 6. Implementation Details of Partial Instantiation 66atom, it is in T set if it is the head of some retained clause; otherwise it is in F set. Sothe least model solver module is rather simple to implement. All it does is to collect allthe heads of retained clauses and put these atoms in T set. The rest of the atoms areput in F set.Due to the lack of a proper minimal model solver at this stage, our implementation cannot handle disjunctive programs thus far. However, once minimal model solver becomesavailable, our system can be very easily extended to handle disjunctive cases. This isbecause we took disjunctive databases into our consideration when we designed the datastructures and implemented other parts of the system. For example, the structure of aclause includes an integer field which indicates the number of atoms in the head of theclause. In definite cases, this number remains 1, which means that only one head inevery clause. For a disjunctive clause, the number may be equal or greater than 1. Thereis no need to modify the clause structure itself. So the extension from definite cases todisjunctive cases can be simply achieved by replacing the least model solver with anyminimal model solver. All the other parts remain unchanged.6.5 UnificationDefinition [34] (Unifier). Two atoms (or expressions) A and B are unifiable if there is asubstitution 0 such that AO = B0. The substitution 0 is called a unifier for A and B. Itis called a most general unifier if for each unifier ij for A and B there exists a substitution-y such that = 87. oMore straightforwardly, the unification problem can be represented as the solution ofa system of equations. That is, for a set of equationst’ = t”, j=1, ..., kChapter 6. Implementation Details of Partial Instantiation 67a unifier 8 is any solution which makes all pairs of terms ti’, tf’ identical simultaneously.6.5.1 Algorithm UNIFY: an Efficient Unification AlgorithmIn partial instantiation, getting the unifiers for the T set and F set is the key step involvedat each tree node, and thus the performance of unification algorithm affects in a crucialway the global efficiency. Many studies have been made to find efficient unificationalgorithms [35] [36] [37] [34] in literature. We found that the algorithm developed byAlberto Martelli and his colleagues in [34] was proved to have a better performance inall cases as compared with other well known algorithms. This is the algorithm we chooseand implement. In the following we call it Algorithm UNIFY.To understand the algorithm, we should first understand the following concepts.• Multiequation. A multiequation can be seen as a way of grouping many equationstogether. To represent multiequations we use the notation S = M where the left-hand side S is a nonempty set of variables and the right-hand side M is a multisetof nonvariable terms.• Common part. The common part of a multiset of terms M is a term which, intuitively, is obtained by superimposing all terms of M and by taking the part whichis common to all of them starting from the root.• Frontier. The frontier of a multiset of terms is a set of multiequations, whereevery multiequation is associated with a leaf of the common part and consists ofall subterms (one for each term of M) corresponding to that leaf.• Compactification. All multiequations belonging to the same equivalence class shouldbe transformed into single multiequations by taking the union of their left- andright-hand sides. This merging process is called compactification.Chapter 6. Implementation Details of Partial Instantiation 68For instance, given the multiset of termsf(xi, g(a, a, f(x5, b))),f(h(c), g(x2 f(b, X5))),f(h(x4),g(x6,X3)).the common part of these three terms isf (xi, g(x2x3)).The frontier is{x} (h(c), h(x4)),{x2,3}= (a),{X3} = (f(x3, b), f(b, X3)).Algorithm UNIFY [34]input U — the multiequations that needs to be unifiedoutput T— the most general unifier of U. Initialized as empty.1. Select a multiequation S=M of U such that the variables in S do not occur elsewherein U. If no such multiequation STOP with failure.2. if M is emptythen transfer this multiequation from U to the end of T.elsei. compute the common part C and the frontier part F of M. If M has nocommon part STOP with failure.ii. transform U using multiequation reduction on the selected multiequationand compactification.Chapter 6. Implementation Details of Partial Instantiation 69iii. transfer the multiequation S=(C) from U to the end of T.3. if U is empty, stop with success, else go back to step 1. D6.5.2 Implementation of Algorithm UNIFYIn our implementation, we represent a unifier as a set of special multiequations. For eachmultiequation, its left side is a set of variables, and its right side is a term. We keep thevariables on the left hand side of any multiequation by their alphabet order for efficiencyreasons (to be discussed in next section).Given a multiterm M, the computation of its common part C(M) and frontier partF(M) is a recursive.1. If all the terms in M are the same constants c, then F(M) = c, C(M) = 0.2. else If 3t E M, t is a variable then(i)C(M)=t(ii) the left-hand side of F(M) is the set of all variables in M and the right-handside of F(M) is the multiset of all terms in M which are not variables.3. else if all the terms in M has same function symbol f and same arity n then(i) let M = {tt is the Lth term in T where T E M}, i =(ii) F(M) = F(C(M1),..., G(M))(iii) C(M) =u1F(M)4. else failure.Compactification involves a lot of term comparisons. Term comparison again is arecursive processing. To compact two multiequations S1 = M1 and S2 = M2, we firstChapter 6. Implementation Details of Partial Instantiation 70compare if there exists any variable v which appears in both S1 and S2. If yes, themerged multiequation is S = M where S= Si U S2, M = M1 U M2. In order to makecomparisons faster, we order the variables in S and S2 by their alphabet order beforewe start compactification.6.6 Cutting UnifiersRecall that in Chapter 4.3, Figure 4.1 shows three situations when a unifier leads toredundant nodes. These unifiers should be removed from the unifier set of a node inorder to avoid redundant node expansion.This step is rather straightforward, involving a lot of comparisons among unifiers. Asdiscussed before, in every multiequation of a unifier, all the variables on the left handside are kept by their alphabet order. This makes the comparison of unifiers easier andmore efficient.In this chapter, we discussed the implementation of the framework of partial instantiation. We first introduced the global data structures and the considerations behind them.Then we discussed, on a step-by-step basis, all the major steps required at expanding atree node. In the next chapter, we will present the conclusion of thesis, including thesissummary and future works.Chapter 7Conclusions7.1 Thesis SummaryMethods to compute minimal models of disjunctive programs are becoming increasinglyimportant as an intermediate step in query evaluations. Partial instantiation is a newlydeveloped method to compute minimal models on an “instantiation-by-need” basis. Theobjective of this thesis is to study how to optimize the expansion of partial instantiationtrees for computing minimal and least models.Towards this goal, we have developed Algorithm Incr to reduce the size of deductivedatabases. Incr aims to be incremental in the sense that when new clauses are addedto the database, only these new clauses need to be handled for getting the new set ofreduced clauses. A DC-Graph that organizes the deleted clauses is used to achieve thisincrementality. Incr is formally proved to be incremental.We have also optimized Incr to IncrOpt which deletes clauses in self-sustaining cycles.Compared to Incr, IncrOpt not only deletes more clauses, but also solves the program directly when applied to definite programs. For further optimization, we examined differentorders for the clauses to be inserted, which leads to several algorithms. Experimental results indicate that IncrOptFact, which handles the facts ahead of ordinary clauses, givesthe best performance. Most importantly, when compared with the original algorithmSizeOpt, IncrOptFact can give very significant improvements in run-time efficiency.71Chapter 7. Conclusions 72Besides the development of Incr, we have also optimized the expansion of partial instantiation trees. We have designed three rules which cuts down redundant node expansions. Experimental result shows that this optimization further improves the efficiencyof partial instantiation.Finally, we have implemented the whole framework of partial instantiation, integratedwith the optimization methods developed in this thesis. The implementation takes adeductive database as input and returns its complete partial instantiation tree.7.2 Future WorkTechniques to further optimize the run-time performance of partial instantiation are tobe explored. Following are some open problems in this direction.7.2.1 Cutting Redundant NodesIn this thesis, we have developed three rules to cut down redundant nodes. These rulesonly cuts down some of the redundant nodes, not all of them. In extreme cases, redundantnodes could make the process of expansion very inefficient. Thus a deeper study onredundant nodes— what causes them and how to prevent them from being generated—will greatly improve the efficiency of partial instantiation.7.2.2 Order for Node ExpansionA partial instantiation tree can be generated in various orders which may perform differently. Two general strategies used in expanding a tree are breath-first and depth-first.Which method is more preferable in generating partial instantiation trees? Furthermore,at a given node with more than one children, the different orders of expanding theseChapter 7. Conclusions 73children may also have an effect on performance. Which child should be chosen to expand first? Does there exist a best order? If yes, how to find it? These are interestingquestions yet to be answered.7.2.3 Tree MaintenanceEvery deductive database is associated with a partial instantiation tree. The costs ofgenerating these trees are usually expensive. This leads us to the idea of dynamicallymaintaining a tree when the original database undergoes changes. In other words, insteadof generating a new tree each time when a database is updated, we want to modify theoriginal tree and make it adapt to the updated database. This problem can be split intofour smaller problems: a) adding a new fact; b) adding a new clause; c) removing anexisting fact; and d) removing an existing clause. More research can be done to find outwhat kind of modifications should be made to the original tree when one or more of theabove situations occur.7.2.4 Partial TreeIn situations where it is not desirable or too costly to generate an entire partial instantiation tree, we need to generate part of the tree selectively. There are different criteria forselection. For example, generating as many nodes as possible when given a fixed amountof time and memory, or generating as many facts as possible when given fixed amount ofnodes, etc. Which criteria is more reasonable? How to satisfy a certain criteria? Thesequestions are open for further research.Bibliography[1] Jack Minker. Perspectives in deductive databases. Journal of Logic Programming,5:pp33—60, 1988.[2] J. W. Lloyd. Foundations of Logic Programming. Springer, New York, secondedition, 1987.[3] G. Gottlob S.Ceri and L.Tanca. Logic Programming and Databases. Springer-Verlag,New Yourk, 1990.[41 K. Clark. Negation as failure. Logic and Data Bases, pages pp293—322, 1978.[5] F. Bancilhon and R. Ramakrishnan. An amateur’s introduction to recursive equeryprocessing strategies. Proc. ACM-SIGMOD, pages ppl6—52, 1986.[6] J. Minker. On indefinite databases and the closed world assumption. Lecture Notesin Computer Science 138, pages pp292—3O8, 1982.[7] M. Gelfond and V. Lifschitz. The stable model semantics for logic programming.Proc. 5th International Conference and Symposium on Logic Programming, pagespplO’TO—1O8O, 1988.[8] K. Ross A. Van Gelder and J. Schlipf. Unfounded sets and well-founded semanticsfor general logic programs. Proc. 7th Symsopium on Principles of Database Systems,pages pp221—23O, 1988.[9] R.Ng C. Bell, A. Nerode and V.S. Subrahmanian. Implementing deductive databasesby linear programming. Proc. ACM-PODS, pages pp283—29i, 1992.[10] R. Ng C. Bell, A. Nerode and V.S. Subrahmanian. Mixed integer programmingmethods for computing nonmonotonic deductive databases. to appear in: Journalof ACM, 1994.[11] R. Ng A. Nerode and V.S. Subrahmanian. Computing circumscription by linearprogramming, to appear in: Information and Computation, 1994.[12] V. Chandru and J. Hooker. Extended horn sets in propositional logic. Journal ofthe ACM, 38(l):pp2O5—22l, 1991.[13] R.E.Jeroslow. Computation-oriented reduction of predicate to propositional logic.Decision Support Systems, 4:ppl83—187, 1988.[14] A. Nerode V.Kagan and V.S. Subrahmanian. Computing definite logic programs bypartial instantiation, to appear in: Annals of Pure and Applied Logic, 1994.74Bibliography 75[15] A. Nerode V.Kagan and V.S. Subrahmanian. Computing minimal models by partialinstantiation, draft manuscript, 1994.[16] S. Shapiro and D. McKay. Inference with recursive rules. Proc. 1st Annual NationalConference on Artificial Intelligence, August 1980.[17] J. Martins S. Shapiro and D. McKay. Bi-directional inference. Proc. 4th AnnualNational Conference of the Cognitive Science Society, 1982.[18] D. McKay and S. Shapiro. Using active connection graphs for reasoning with recursive rules. Proc. 7th International Joint Conference on Artificial Intelligence,1981.[19] C. Chang. On the evaluation of queries containing derived relations in relationaldatabases. Advances in Data Base Theory, l:pp235—26O, 1981.[20] J. Martin-Gallausiaux G. Marque-Pucheu and G. Jomier. Interfacing prolog andrelational database management systems. New Applications of Databases, 1984.[21] F. Bancilhon. Naive evaluation of recursively defined relations. On Knowledge BaseManagement Systems - Integrating Database and Al Systems, 1985.[22] R. Bayer. Query evaluation and recursion in deductive database systems. Unpublished Manuscript, 1985.[23] Y. Sagiv F. Bancilhon, D. Maier and J. Uliman. Magic sets and other strange waysto implement logic programs. Proc. ACM-PODS, pages ppl—15, 1986.[24] F. Bancilhon and R. Ramakrishnan. Magic sets: Algorithms and examples. Unpublished Manuscript, March 1986.[25] D. Sacca and C. Zaniolo. On the implementation of a simple class of logic queries fordatabases. Proc. 5th ACM SIGMOD-SIGACT Symposium on Principles of DatabaseSystems”, 1986.[26] N. Coburn J.Blakeley and P. Larson. Updating derived relations: Detecting irrelevant and autonomously computable updates. ACM TODS, l4(3):pp369—400, 1989.[27] P. Larson J.Blakeley and F. Tompa. Efficiently updating materialized views. Proc.A CM-SIGMOD, pages pp61—71, 1986.[28] S. Ceri and J. Widom. Deriving production rules for incremental view maintenance.Proc. VLDB, pages pp577—589, 1991.[29] G. Dong and R. Topor. Incremental evaluation of datalog queries. Proc. ICDT,1992.[30] I. Mumick A. Gupta and V.S. Subrahmanian. Maintaining views incrementally.Proc. ACM-SIGMOD, pages ppi57—166, 1993.Bibliography 76[311 J. Harrison and S. Dietrich. Maintenance of materialized views in a deductivedatabase: An update propagation approach. Workshop of JICSLP, 1992.[321 0. Shumeli and A. Itai. Maintenance of views. Sigmod Record, 14(2):pp240—255,1984.[33] S. Stolfo 0. Wolfson, H. Dewan and Y. Yemini. Incremental evaluation of rules andits relationship to parallelism. Proc. ACM-SIGMOD, pages pp78—8’Z, 1991.[34] Consiglio Nazionale delle Ricerche Alberto MarTelli and UGO Montanan. An efficient unification algorithm. ACM Transactions on Programming languages andSystems, 4(2):pp258—282, April 1982.[35] L.D. Baxter. A practically linear unification algorithm. Res. Rep. CS-76-13.[36] R.S. Boyer and J.S. Moore. The sharing of structure in theorem-proving programs.Machine Intelligence, 7:pplOl—l 16, 1972.[37] A. Marteffi and U. Montanan. Unification in linear time and space: A structurepresentation. Internal Rep. B76-16, 1976.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Optimizations for model computation based on partial...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Optimizations for model computation based on partial instantiation Tian, Xiaomei 1994
pdf
Page Metadata
Item Metadata
Title | Optimizations for model computation based on partial instantiation |
Creator |
Tian, Xiaomei |
Date Issued | 1994 |
Description | Various methods have been presented for efficiently evaluating deductive databases and logic programs. It has been shown that mixed integer programming methods can ef fectively support minimal model, stable model and well-founded model semantics for ground deductive databases. However, the “groundness” requirement is a huge draw back because the ground version of a logic program can be very large when compared to the original logic program. A novel approach, called partial instantiation, has been developed recently which, when integrated with mixed integer programming methods, can handle non-ground logic programs. The goal of this thesis is to explore how this integrated framework based on partial instantiation can be optimized. In particular, we have developed an incremental algorithm that minimizes repetitive computations for re ducing the size of a program. We have also developed several optimization techniques to further enhance the efficiency of our incremental algorithm, to further reduce the size of a logic program, and to avoid redundant node expansion in partial instantiation tree. Experimental results have shown that our algorithm and optimization techniques can bring about significant improvement in run-time performance. Last but not least, we have implemented the integrated framework of partial instantiation under UNIX envi ronment. |
Extent | 1431468 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-03-05 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051254 |
URI | http://hdl.handle.net/2429/5559 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1994-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1994-0635.pdf [ 1.37MB ]
- Metadata
- JSON: 831-1.0051254.json
- JSON-LD: 831-1.0051254-ld.json
- RDF/XML (Pretty): 831-1.0051254-rdf.xml
- RDF/JSON: 831-1.0051254-rdf.json
- Turtle: 831-1.0051254-turtle.txt
- N-Triples: 831-1.0051254-rdf-ntriples.txt
- Original Record: 831-1.0051254-source.json
- Full Text
- 831-1.0051254-fulltext.txt
- Citation
- 831-1.0051254.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051254/manifest