Dynamic Bayesian Networks by Michael C. Horsch A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in The Faculty of Graduate Studies Department of Computer Science We accept this thesis as conforming to the required standard University of Br i t i sh Columbia October 1990 © M i c h a e l C . Horsch, 1990 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. \ Department The University of British Columbia Vancouver, Canada Date 12. QCTn&FfiL i^9n DE-6 (2/88) Abstract Given the complexity of the domains for which we would like to use computers as reason-ing engines, an automated reasoning process w i l l often be required to perform under some state of uncertainty. Probabi l i ty provides a normative theory wi th which uncertainty can be modelled. Wi thout assumptions of independence from the domain, naive computations of probabili ty are intractible. If probability theory is to be used effectively in A I applica-tions, the independence assumptions from the domain should be represented explicitly, and used to greatest possible advantage. One such representation is a class of mathematical structures called Bayesian networks. This thesis presents a framework for dynamically constructing and evaluating Bayesian networks. In particular, this thesis investigates the issue of representing probabilistic knowledge which has been abstracted from particular individuals to which this knowl-edge may apply, resulting in a simple representation language. This language makes the independence assumptions for a domain explicit . A simple procedure is provided for bui lding networks from knowledge expressed in this language. The mapping between the knowledge base and network created is precisely defined, so that the network always represents a consistent probabili ty distr ibution. Final ly , this thesis investigates the issue of modifying the network after some evaluation has taken place, and several techniques for correcting the state of the resulting model are derived. n Contents Abstract i Contents iii List of Tables vii List of Figures viii Acknowledgements x 1 Introduction 1 1.1 Bayesian Networks 2 1.2 A dynamic approach to using Bayesian networks 5 1.2.1 Motivation 5 1.2.2 The approach 7 1.2.3 The realization of this approach 7 1.3 Related work 9 1.3.1 Inference using Bayesian networks 9 1.3.2 Other work on dynamic construction of Bayesian networks 10 1.4 The main contributions of this thesis 12 iii 1.5 An outline of this thesis 13 2 Dynamic Bayesian networks 14 2.1 The background knowledge base 14 2.1.1 Schemata 15 2.1.2 Ambiguities in schemata . 18 2.2 Combination nodes 22 2.2.1 Existential Combination 22 2.2.2 Universal Combination 24 2.2.3 Summary 26 2.3 Creating Bayesian networks 26 3 Using Dynamic Bayesian Networks 28 3.1 Influence: An interpreter for dynamic Bayesian networks 29 3.1.1 Declarations 29 3.1.2 Schemata 30 3.1.3 Combination Nodes 32 3.1.4 Creating networks 32 3.1.5 Queries 33 3.2 Diagnosis of multiple faults for digital circuits 33 3.3 Interpreting sketch maps 36 3.3.1 Sketch maps 38 3.3.2 A probabilistic knowledge base 38 3.4 The Burglary of the House of Holmes 41 3.5 Building knowledge bases for Dynamic Bayesian networks 47 3.6 Conclusions 50 iv 4 Implementing dynamic Bayesian networks 51 4.1 The general problem 51 4.2 A n Axiomat iza t ion for Probabil ist ic Reasoning in Prolog 54 4.2.1 The basic computational engine 54 4.2.2 A d d i n g the dynamic combination 55 4.2.3 Discussion 55 4.3 Pearl 's Distr ibuted Propagation and Belief Updat ing 56 4.3.1 Belief updating in singly connected networks 56 4.3.2 A d d i n g dynamic constructs 57 4.3.3 Propagation in general Bayesian networks 64 4.3.4 Discussion 65 4.4 Lauri tzen and Spiegelhalter's computation of belief 65 4.4.1 Belief Propagation on F u l l M o r a l Graphs 66 4.4.2 A d d i n g dynamic constructs 66 4.5 Probabil is t ic inference using Influence diagrams 66 4.5.1 A d d i n g dynamic constructs 67 4.6 Conclusions 69 5 Conclusions 70 5.1 W h a t dynamic Bayesian networks can't do 70 5.2 Other future possibilities 71 5.3 Some final remarks 71 A Influence code for the examples 76 A . l Diagnosis of multiple faults for digi ta l circuits 76 A . 2 Interpreting sketch maps 77 v A . 3 The Burglary of the House of Holmes 79 B Source code to Influence 82 B . l probabili ty.pl 82 B.2 expand.pl 87 B.3 network.pl 91 B.4 priors.pl 92 B.5 combine.pl 94 B.6 combined.pl 95 B.7 consistent.pl 98 B.8 done.pl 99 B.9 infl.pl 99 v i List of Tables 1.1 The contingency tables for the network in Figure 1.1 vii List of Figures 1.1 A small Bayesian network 4 1.2 A division of probabilistic knowledge 8 2.1 The Bayesian network created in Example 2.1-1 18 2.2 The network created by instantiating the parameterized unique schemata from Section 2.1.2 19 2.3 The network created by instantiating the parameterized r ight-mult iple schema from Section 2.1.2 20 2.4 The network created by instantiating the parameterized left-multiple schema from Section 2.1.2 21 2.5 The Bayesian network created for Example 2.2-1 25 3.1 A simple circuit 35 3.2 The Bayesian network constructed by our simple knowledge base for the circuit in Figure 3.1 37 3.3 A simple sketch map 38 3.4 The Bayesian network constructed by our simple knowledge base for the sketch map in Figure 3.3 42 3.5 The network created for Holmes' decision 46 v i i i 4.1 A simple example of a dynamic network, (a) showing the original network, and (b) showing the network after another individual of type t is observed. See Section 4.1 52 4.2 A d d i n g a parent An+x to node C 59 4.3 A d d i n g a child Bn+i to node A 62 4.4 The problem of adding a node after arc reversals have been performed. . . 68 ix Acknowledgements I am very grateful to the many people who helped and supported me during the process of researching and wri t ing this thesis. Dav id Poole, my supervisor and friend, for patience, encouragement, financial support, and lots of good ideas. Dav id Lowe, for patience and enthusiasm during the final stages of revising. M y parents, Hartmut and Heidi , and my siblings, Monika , Steven, and Andrea. For love and support bridging the Euclidean distance which separates us. Andrew Csinger and Manny Noik. For being invaluable colleagues, every-day heroes, occasional maniacs, and friends in the truest sense. The gang: N o r m Goldstein, Hilde Larsen, Steve and Sarah Mason, Sue Rathie, David Sidilkover. For the.right things at the right times, more often than I should have hoped. The heretics: Brad Cotteral l , K e n Gehrs, Maeghan Kenney, Michelle McMaste r , Jon Mikkelsen, Ray Schultz, Joeane Zadrah. For imagination and community, for helping to keep my life interesting and distinct from my occupation. x Chapter 1 Introduction Given the complexity of the domains for which we would like to use computers as reasoning engines, and the complexity of our world in general, an automated reasoning process will often be required to perform under some state of uncertainty. For example, in applications which perform some kind of diagnosis, a single fault or abnormal behaviour may be the direct result of many factors, and determining which factor is responsible usually involves choosing a best hypothesis from amongst many alternatives. Probabilistic analysis is one means by which the uncertainty of many interesting domains can be modelled. Probability provides a normative theory with which uncertainty can be modelled. Without assumptions of independence from the domain, naive computations are quite intractable. For example, suppose we have five random variables, {A, B, C, D,E} 1 and we wanted to calculate p(A). Without assuming any independencies, we would have to perform the following summation: p(A) = J2 p(AABACADAE) B,C,D,E which sums p(A ABACADAE) over every possible combination of the values taken by the random variables {B, C, D, E}, possibly an exponential number of probabilities. By exploiting independence assumptions for particular domains, we can drastically improve the complexity of such computations. *I denote random variables as upper case math italics. If X can take on a set of multiple values, I will write them in lower case, as in {xi,... ,xn}. If Y is propositional, I will write +y for true and ->y for false. More syntactic conventions will be clarified in later sections. 1 1. Introduction 2 If probability theory is to be used effectively in A l applications, the independence as-sumptions from the domain should be represented explicitly, and used to greatest possible advantage. One such representation is a class of mathematical structures called Bayesian networks. This thesis presents a framework for dynamically constructing and evaluating Bayesian networks. The remainder of this chapter will outline the basic theory of Bayesian networks, motivate the approach taken in this thesis, and outline the main issues discussed in future chapters. 1.1 B a y e s i a n N e t w o r k s Bayesian networks are known by many names. They are also called belief networks and causal probabilistic networks. I will refer to them as Bayesian networks, emphasizing the fact that they are based on the assumptions of Bayesian probability. A related network formalism, which subsumes Bayesian networks, is known by the name influence diagram. This section briefly introduces Bayesian networks; for a more thorough treatment, see Pearl [Pearl, 1988]. For a good introduction to Bayesian probability, see [Lindley, 1965]. A Bayesian network is a directed acyclic graph2 (DAG) which represents in graphical form the independence assumptions of the probability distribution for a set of random variables. Nodes in the graph represent random variables, and the arcs connecting the nodes represent the notion of direct dependence: if a random variable A, represented in the graph by a, is known to be directly dependent on the random variable B, represented in the graph by node b, an arc is drawn from b to a. The direction of the arc is often associated with the notion of causality, directed from cause to effect. The set of variables deemed to have direct causal or influential effect on a random variable X are called the parents, or conditioning variables, of X; these are denoted in this thesis by parents(X). The strength of the direct dependency relation is quantified by a contingency table, which are a complete specification of the conditional probabilities associated with a random variable and its parents. Denoted p(X\parents(X)), this table specifies the effect of each combination of the possible outcomes of parents (X) on X. 2 A directed acyclic graph is a graph with directed arcs such that no path following the arcs can lead from a node to itself. For introductory presentation of graph theory, see [Aho et al., 1974]. 1. Introduction 3 The conditional independence assumptions for the domain are represented explicitly in Bayesian networks, and using these assumptions, probability calculations can be performed more efficiently than the naive computation, requiring much fewer prior probabilities. A Bayesian network, as Pearl [Pearl, 1988] argues, is a natural, intuitive framework for writing down knowledge about a domain taking into consideration the ideas of likelihood, relevance, dependency, and causality. These ideas help to provide guidelines for creating networks for particular domains, as I will outline in Chapter 3. A node in the network represents a random variable, which is used to represent possible outcomes, events, or states in the domain. For instance, we could represent the possibility of an earthquake happening by using a random variable, E, which takes on two values: {+earthquake, -iearthquake}, or perhaps for simplicity, {true,false}. The following example is borrowed from Pearl [Pearl, 1988]: Example 1.1—1: Mr Holmes receives a telephone call from his neighbour, Dr Watson, who states that he hears the sound of a burglar alarm from the direction of Mr Holmes' house. While preparing to rush home , Mr Homes recalls that Dr Watson is a tasteless practical joker, and decides first to call another neighbour, Mrs Gib-bons, who,despite occasional drinking problems, is far more reliable. Mr Holmes remembers reading in the instruction manual of his alarm system that the de-vice is sensitive to earthquakes, and can be accidentally triggered by one. He realizes that if an earthquake had occurred, it surely would be on the news. Figure 1.1 shows a a Bayesian network which explicitly represents the independence as-sumptions for our example. The network corresponds to the following equality: p(Alarm A Burglary A Earthquake A Gibbons A Newsreport A Watson) = p(Alarm\Burglary A Earthquake)p(Gibbons\Alarm)p(Watson\Alarm) p(Newsreport\Earthquake)p(Burglary)p(Earthquake) This factorization is possible because of the conditional independence assumptions for this example. Note that this equation makes no assumption about the values taken by the random variables; it is true for any of the possible combinations. Table 1.1 shows the contingency tables required for the network in Figure 1.1. A Bayesian network with at most one path between any two nodes (ignoring the directions of the arcs) is said to be singly connected, and possesses conditional independence proper-ties which are exploitable computationally. Multiply-connected Bayesian networks, that J. Introduction 4 Figure 1.1: A small Bayesian network. 1. Introduction 5 p(Alarm\Burglary A Earthquake) +burglary -iburglary +earthquake 0.95 0.15 -^earthquake 0.95 0.01 p(Newsreport Earthquake) -^-earthquake 0.95 -^earthquake 0.001 p(Watson\Alarm) +alarm 0.80 -lalarm 0.45 p(Gibbons\Alarm) +alarm 0.60 -+alarm 0.25 p(Burglary) 0.10 p(Earthquake) 0.08 Table 1.1: The contingency tables for the network in Figure 1.1. is, Bayesian networks with multiple paths between at least one pair of nodes, are more general, and have fewer exploitable independencies. This topic will be discussed in more detail in Chapter 4. The graphical representation is used in various ways to analyze a problem probabilistically, as Section 1.3 outlines. It is important to emphasize that the arcs in the graph are used as guidelines for computation, and without the conditional independence assumptions which are represented by the arcs in Bayesian networks, probabilistic analysis would be wildly impractical. 1.2 A d y n a m i c a p p r o a c h t o u s i n g B a y e s i a n n e t w o r k s 1.2.1 Motivation Bayesian networks used in typical applications represent static probability models; that is, while the networks are designed to accept conditioning based on observations, they typically do not change their structure. Consider the example in Figure 1.1. This network can only be used as a model of the particular situation for which it was created, and cannot be used, say, to tell us how likely it is that a burglary occurred given that Holmes' third neighbour Bob (who is not represented at all in the network) calls up with a report that 1. Introduction 6 he hears an alarm. In order to incorporate Bob in the model, we have to add a node representing Bob's report and an arc between the node representing the alarm and our new node. Adding a single node and arc in this situation is conceptually a simple matter, but it required knowledge: we, the users of the system, needed to know that the independence assumption for the random variable representing Bob's report is reasonable, and that no other events of interest might cause Bob to report an alarm. As well, we needed to know how Bob's report depends on the event that Holmes' alarm sounded. That is, we needed to know the contingency table p(Bob\Alarm). This example demonstrates the kind of modification to the network structure that this thesis intends to address. However, the example is much too simple, for two reasons: First, we need no special training to gain expertise in this toy domain. Second, the example required very little expertise at building Bayesian networks, certainly no more than the intuition a novice might use, to add the new information. For more complicated domains, a user may have trouble adding nodes and arcs to a Bayesian network because she is not an expert in the domain. However, it may be the case that we are using a Bayesian network application precisely because we are not experts, for consultation purposes perhaps. If, in fact, we know which new aspects of our domain ought to be incorporated in our model, but are not able to change the network suitably, then the application is useless for our situation. There are at least three ways to address the static model problem. One way is to ask the domain expert for a revision to the network when the need arises. The second is for the domain expert to detail the domain so exactly that the need to add arcs never arises. A third way is to automate the process of building Bayesian networks so that the expert's knowledge can be accumulated in a knowledge base before hand; this generalized knowledge could then be retrieved and applied whenever a user determines the need. This thesis examines the third approach—automating the process of building Bayesian networks from a background knowledge base of generalized probabilistic information. This approach adds flexibility to Bayesian networks applications, and tends to keep the networks required reasonably small. 1. Introduction 7 1.2.2 The approach Domain knowledge can be divided into: individuals: entities in the domain of interest possible properties: the properties an individual may have and the relationship between these properties observed state: the properties an individual does have In traditional approaches, a knowledge engineer, perhaps in consultation with a domain expert, is responsible for knowing the individuals to be included in the domain, and the properties these individuals may have. In other words, the knowledge engineer must build the network. The user is responsible for supplying information as to the actual state of the situation (i.e. the properties some of the the individuals do have). This division of knowledge is demonstrated figuratively in Figure 1.2a. In contrast, the dynamic approach taken in this thesis divides the knowledge differently. The knowledge engineer provides a background knowledge base relating properties with-out specifying the particular individuals. The user supplies two kinds of information: the individuals known to be in the domain, which the dynamic system will use to build an appropriate network automatically, and the observations pertaining to the state of the model. This is figured in Figure 1.2b. In this way, the same background knowledge can be used to model different situations in which only the set of individuals under consider-ation changes. In the traditional approach, such changes would have to be made by the knowledge engineer.3 The dynamic approach provides a flexibility previously unrealized in Bayesian network applications. 1.2.3 The realization of this approach To automate the process of building Bayesian networks, we need two steps. First, we need to be able to represent domain knowledge which is abstracted away from particular 3The knowledge engineer-user distinction made here is deliberately simplified. In many cases, the user of dynamic system like the one presented in this system may also be the knowledge engineer. No principle prohibits the user from modifying the knowledge base, but in doing so, the user switches roles. 1. Introduction 8 (a) Traditional Dependencies j • \ between [ Individuals Observations Variables j J Knowledge Engineer User (b) Dynamic Approach Dependencies between Variables Individuals Observations Knowledge Engineer User Figure 1.2: A division of probabilistic knowledge. 1. Introduction 9 individuals. Second, we need to be able to bui ld a network from this knowledge based on the information we can gain from the user. The first step corresponds roughly to knowledge of possible properties (as in Section 1.2.2), which we elicit from the domain expert. These properties are modelled wi th random vari-ables, and to abstract away from particular individuals, parameterized random variables (PRVs) are used. The knowledge engineer supplies the relationships between P R V s in the form of a background knowledge base. The second step corresponds roughly to knowing the individuals that are to be considered in the model, and which the user can supply at the time of consultation. The system combines these two kinds of knowledge automatically, to create a Bayesian network for the situation the user has specified. The user can then consult the system, supplying the information about the observed state of the domain, and querying the network as necessary to obtain the inferred probabilistic analysis. It may arise that, as the user is working wi th the system, some individuals become known after a network has already been constructed, and after observations have been submitted to the network. The user is able to supply this information to the system, and the system is able to use this information and modify the network structure appropriately. 1.3 R e l a t e d w o r k In this section, I w i l l provide a brief introduction to some of the related work done on Bayesian network technology. In particular, I w i l l introduce various methods used to evaluate Bayesian networks, and review similar notions of dynamic Bayesian networks presented by Goldman and Charniak [1990], and Breese [1990]. 1.3.1 Inference using Bayesian networks There are several methods by which Bayesian networks are used to calculate probabili-ties. A more comprehensive discussion of much of the following w i l l be presented in later chapters. Pearl [Pearl, 1988] treats singly connected Bayesian networks as networks of communicating processes. The processes use arcs to propagate causal and diagnostic support values, which originate from the activation of evidence variables. 1. Introduction 10 Lauri tzen and Spiegelhalter [Lauritzen and Spiegelhalter, 1988] perform evidence absorp-t ion and propagation by transforming the Bayesian network into a triangulated undirected graph structure. Posterior probabilities are computed using specialized computations based on this graph. Shachter [Shachter, 1988] uses the arcs to perform arc reversals and node reduction on the network. Evidence absorption and propagation can also be performed using arc reversals and node reductions as well [Shachter, 1989]. 4 Poole and Neufeld [Poole and Neufeld, 1989] provide an axiomatization of probabili ty the-ory in Prolog which uses the arcs of the network to calculate probabilities using "reasoning by cases" wi th the conditioning variables. The prototype implementation of my approach is based (but not dependent) on this work. 1.3.2 Other work on dynamic construction of Bayesian net-works During the wri t ing of this thesis, several authors have presented similar work on building Bayesian networks from schematic knowledge collected in a knowledge base. Goldman and Charniak [Goldman and Charniak, 1990], and Breese [Breese, 1990] have developed, independently of the work presented in this thesis, similar theories of dynamic Bayesian networks. Since their work is very similar to this approach (and each other's), I w i l l treat their work more deeply in following sections. Laskey [Laskey, 1990] presents a framework for using dynamically created Bayesian net-works in a reasoning system based on a hierarchical approach. Knowledge in the form of probabilistic schemata are used to form an argument, which is transformed into a Bayesian network. Conclusions or conflicting evidence which defeat the argument trigger knowl-edge in the schemata, forming an augmented argument. This process proceeds unti l no conflicting arguments are found. A quite different approach to creating Bayesian networks involves the statistical analysis of case data. Pearl [Pearl, 1988] provides a good introduction to the process, which identifies dependencies and independencies. This area of research is current, but essentially unrelated to the methods presented in this thesis. 4Shachter's influence diagrams contain decision and test nodes, as well as other devices not being considered in this thesis. This thesis only looks at probabilistic nodes. 1. Introduction 11 Goldman and Charniak Goldman and Charniak [Goldman and Charniak, 1990] describe a knowledge-based ap-proach to bui lding networks dynamically, as part of a story understanding application. A forward-chaining inference engine takes information in the form of propositions and builds a network according to rules in the knowledge base. These rules provide ways to combine the influence of different causes on common effects by providing specialized func-tions for the domain and by taking advantage of the occasions in which causes interact stereotypically. When the bui lding process is complete, an evaluation of the network takes place, and the network is simplified by accepting as true those statements which are highly probable, and rejecting those propositions which are considered too improbable. Simplifying the network in this way may lead to additional modification of the network. Breese The work done by Breese [Breese, 1990] on building Bayesian networks bears many similar-ities to Goldman and Charniak's work. Breese's approach includes the use of such decision analytic tools as value nodes and decision nodes, which are part of the influence diagram model. Knowledge is stored in a hybrid knowledge base of Horn clause rules, facts, and probabilistic and informational dependencies. The rules and facts are used to determine some of the probabilistic events which should be included in the model. The probabilistic dependencies are used to structure the network once it is determined which nodes ought to be included. Furthermore, these dependencies may identify other events which might play a part in the model, but are not definitely determined by the Horn clause rules. The bui lding process is started by the query entered by the user of the system, who also includes given information and evidence for the problem. The first step of the process is to determine the events which directly influence the query. Backward chaining through the probabilistic dependencies and the Horn clause rules identifies many nodes and arcs to be included in the model. Once this process stops, a forward chaining process identifies those events which are influenced by the nodes in the partial network from the previous step. 1. Introduction 12 Discussion There are many similarities in the approaches of Goldman and Charniak, and Breese. Both systems create Bayesian networks as a secondary reasoning tool after a primary inference engine determines which nodes to consider, and which dependencies to include in the network. In this respect, knowledge about the domain is found in two separate knowledge bases: the rules used to create the Bayesian networks, and the network itself. In contrast, the approach taken in this thesis uses only one knowledge base, and provides only a simple process by which Bayesian networks can be created using this knowledge. This approach is much simpler, and precisely defines the mapping between the knowledge base and the networks which can be created. Both approaches by Goldman and Charniak, and Breese could be used to implement the ideas presented in this thesis: the rule bases in both systems could be used to implement dynamic Bayesian networks as this thesis presents them. However, this thesis presents only a language for representing general probabilistic knowledge, and a simple process for creating Bayesian networks using this language; no work has been done herein on allowing another program to use this language to create networks as a sub-task. 1.4 T h e m a i n c o n t r i b u t i o n s o f t h i s t h e s i s This thesis investigates the issue of representing probabilistic knowledge which has been abstracted from the knowledge of particular individuals, resulting in a simple representa-tion language. As well, a simple procedure for building networks from knowledge expressed in this lan-guage is provided. The mapping between the knowledge base and network created is precisely defined, so that the network always maintains a consistent probability distribu-tion. Finally, this thesis investigates the issue of modifying the network after some evaluation has taken place, and several techniques for correcting the state of the resulting model are derived. 1. Introduction 13 1.5 An outline of this thesis In Chapter 2, dynamic Bayesian networks are presented formally. The major c la im in this thesis is that Bayesian networks can be dynamic, flexible, and simple to use. I demonstrate the abil i ty of my implementation in Chapter 3, thereby giving evidence for my cla im. I apply my approach in such diverse areas as simple decision problems, diagnosis of multiple faults, and interpretation of sketch maps. In Chapter 4 I briefly discuss the details of adapting the various evaluation algorithms to make use of dynamic Bayesian networks. Final ly , in Chapter 5 I conclude that the approach to Bayesian probabili ty demonstrated here is useful in domains for which a priori knowledge is of a general nature, and for which specific details are most conveniently provided during the reasoning or decision-making process. Chapter 2 Dynamic Bayesian networks In order to make Bayesian networks versatile and reasonably manageable in terms of size, I have argued for an approach which creates networks by instantiating generalized proba-bilistic knowledge for the individuals known to be in the model. Thus, when information about new individuals is discovered, the system itself should be able to modify the net-work, by combining the new information about individuals with the knowledge found in a background knowledge base created by a domain expert. This chapter first presents a simple and declarative language for expressing probabilistic knowledge, which I demonstrate is too unconstrained for the purpose of providing a precise framework for creating and modifying Bayesian networks dynamically. This language is restricted to remove the possibility of ambiguous knowledge, and then augmented with special purpose constructs which are based on Canonical Models of Multicausal Interac-tions, as described in Pearl [Pearl, 1988]. The resulting language is shown to be at least as expressive as Bayesian networks, and flexible enough to provide a tool for modelling many kinds of domains. I conclude this chapter by outlining a simple procedure to create Bayesian networks from a knowledge base of statements in this language. 2.1 T h e b a c k g r o u n d k n o w l e d g e b a s e Dynamic Bayesian networks are created from a parameterized background knowledge base of schemata by combining it with individuals who are known to be part of the model. This 14 2. Dynamic Bayesian networks 15 section presents the syntax of the parameterized knowledge base, and describes the process of instantiating schemata for individuals. 2 . 1 . 1 Schemata It is necessary to represent parameterized probabilistic knowledge unambiguously, so that the intended meaning of this knowledge is apparent from the syntax. In particular, the statements in our language must clearly show how the parameters are to be instantiated, and how the resulting instantiation is used in the Bayesian network built from this knowl-edge base. A n atom is any alphanumeric string beginning wi th a letter. A parameter is represented by a capitalized atom. A n individual is represented by a lower case atom. A random variable is an atom followed by a possible empty list of individuals; for convenience, an empty list is usually dropped from the random variable notation. Random variables take values from a finite set of discrete values called the range, and in many cases we use binary-valued random variables, which take {true.false} as values. A parameterized random variable ( P R V ) abstracts the notion of a random variable away from any particular individual ; it is represented as a lower case atom with a list of parame-ters, each of which stands in place of an individual . Instantiating a parameterized random variable replaces a l l the parameters wi th individuals, creating a random variable. Associated wi th parameters and individuals is a corresponding type, which constrains the instantiation of a parameter to only individuals of the associated type. A schema is the fundamental statement in this language. The syntax for a schema is as follows: a i , . . . , a n — • b : = [ u i , . . . , Ufc] where b, a,- are parameterized random variables. The left hand side of the arc (—•) is a conjunction of P R V s (called the parents), which directly influence the single P R V (called the child) on the right hand side. The list [vi,..., vk] is a short-hand notation which lists the probabilities for the contingency table corresponding to the schema. Recal l that a contingency table must have a probability for each combination of values for the P R V s . The short-hand list orders these probabilities by varying each parent over its set of values wi th the left-most parent varying most quickly. This list contains 2 n probabilities when the a ; are binary-valued, and even more when any have more than two possible values. 2. Dynamic Bayesian networks 16 In a collection (or knowledge base) of schemata, a P R V can occur any number of times as a parent for some other P R V , but may occur as the child in only one schema. This restriction is intended to simplify the task of wri t ing schemata: keeping the list of parent P R V s in one schemata localizes relevant knowledge to one statement, and a knowledge base is easier to maintain. A schema is instantiated when al l P R V s in the schema have been instantiated. In the scope of a single schema, parameters shared across P R V s are instantiated wi th the same individual . The instantiated schema is used to bui ld new arcs in a Bayesian network by creating a node for each random variable, and directing an arc from each parent variable to the child variable. For example, the schema: alarm — • reports_alarm(X: person) := [0.9,0.01] might occur in a knowledge base for the alarm example, of Chapter 1. It specifies a single parent P R V , alarm, which has no parameters, and a child variable, reports_alarm(X:person), which has a single parameter, X. Only individuals of the type person can be used to instantiate this schema. In this example (as in most of the simple examples in this chapter, unless otherwise stated), the P R V s are assumed to take values from {true, false}. This example should be interpreted as representing the idea that an alarm might be re-ported by a person, and explici t ly assumes that only the presence or absence of an alarm sound has direct affect on the report the person might make. The two numbers in square brackets give the contingency table for this schema, stating consisely the following two probabilities: p(reports_alarm(X)| alarm) = 0.9 p(reports_alarm(X)| -lalarm) = 0.01 These numbers quantify the effect of the parent variables on the chi ld variable. Note that the probabilities p(-ireports_alarm(X)| alarm) = 0.1 p(->reports_alarm(X)| -lalarm) = 0.99 2. Dynamic Bayesian networks 17 can be inferred by the Negation axiom of probability theory. Discussion When instantiated, each schema of the form • a i , . . . , a n — • b corresponds to a conditional probabili ty of the form p(b\a\,... ,an) Creating a Bayesian network from these schemata corresponds to bui lding an expression for the joint probability distr ibution from these conditional probabilities. Thus, the intended use of these schemata is always well defined. Example 2.1-1: Consider the following schema: alarm —> reports_alarm(X: person) := [0.9,0.01] When instantiated wi th the set person = {John, mary}, the Bayesian network in Figure 2.1 is created, which represents the following expression for the joint probabil i ty distribution: p(Alarm A Reportsjxlarm(john) A ReportsMlarm(mary)) = p(Alarm) p(Reportsjalarm(john)\Alarm) p(Reportsjalarm(mary)\Alarm) This simple declarative language is at least as expressive as Bayesian networks. This can be shown by taking a Bayesian network and wri t ing it in terms of unparameter-ized schemata. The direct and unambiguous correspondence between unparameterized schemata and Bayesian networks is obvious. Add ing parameters to the language allows us to express knowledge which can be used several times by instantiating them differently. However, the correspondence between the knowledge base and the Bayesian networks which can be constructed from it is no longer guaranteed to be unambiguous. These ambiguities arise as a direct result of attempting to parameterize probabilistic knowledge, as we shall see in the next section, and are easily restricted. 2. Dynamic Bayesian networks 18 Figure 2.1: The Bayesian network created in Example 2.1-1. 2.1.2 Ambiguities in schemata In the previous section, it was mentioned that allowing parameterized knowledge in our language could lead to ambiguous knowledge bases. In this section the possible ambiguities are il lustrated and a simple constraint is imposed on the language to remove possible ambiguities. It is useful to examine three extremes which can be attained when wri t ing schemata in parameterized form, in order to bring to light the issues of parameterizing conditional probabili t ies. 1 These extremes can be labelled as: Unique schemata: every instantiation creates a set of random variables which no other instantiation of the same schema shares Right—multiple schemata: different instantiations may create child variables which have identical parent variables Left-multiple schemata: different instantiations may create different parents for a sin-gle child variable We w i l l look at each one of these in detail in the following sections. It is worth mentioning that in general, schemata may exhibit the characteristics of several of these extremes. As well, identifying these extremes will be helpful in the discussion of Chapter 4. 2. Dynamic Bayesian networks 19 Figure 2.2: The network created by instantiating the parameterized unique schemata from Section 2.1.2. Unique schemata These are parameterized schemata in which every parameter in the parent variables (on the left side) also occurs in the child variable. For example: a(X,Y),b(X,Y) — c(X,Y) This is called unique because every instantiation of X, Y creates a unique dependency between parents and chi ld . If, for example, we instantiate X wi th {x i , X2} and Y wi th {yi}, we get two distinct sets of arcs in our network, as in the network of Figure 2.2. There are no ambiguities which result from unique schemata. Right—multiple schemata These are schemata in which parameters occurring in the child variable do not appear in the parent variables, and al l other parameter in the schemata occur on both sides. For example: a,b — c(X,Y) Each instantiation of X and Y results in a new child variable which may share parents with other instantiations of the schema. If we instantiate X wi th {x^ X2} and Y wi th {y i} , we 2. Dynamic Bayesian networks 20 Figure 2.3: The network created by instantiating the parameterized right-multiple schema from Section 2.1.2. obtain the dependencies shown in the network of Figure 2.3. No ambiguities arise as a result of right-multiple schemata. Left-multiple schemata These are schemata in which some parameters occurring in the parent variables do not occur in the child variable. For example: a(X) —-> c Each instantiation adds a parent for the child variable. For example, if X is instantiated with {xi, X2, . . . , x n}, we get the dependencies shown in the network of Figure 2.4. This kind of schema is ambiguous as there is no way to know, when the schema is written down, how many instantiations of the parent variables could be needed. However, since we must provide a contingency table which can be used in numeric calculations, we must supply a contingency table for every possible number of instantiations. This is clearly 2. Dynamic Bayesian networks 21 Figure 2.4: The network created by instantiating the parameterized left-multiple schema from Section 2.1.2. impossible. Even putt ing a bound on the number of possible instantiations is impractical because of the fact that the size of each contingency table is exponential in the number of parent variables. 2 Summary This simple language of conditional dependencies is at least as expressive as Bayesian net-works. Parameterizing these schemata adds flexibility to the language, but also introduces possibilities for ambiguous knowledge. The ambiguities arising from left-multiple schemata (or hybrid schemata which have their characteristic) is an issue of pragmatics, and one possible solution is to disallow left-multiple schemata from our language. This restriction should be taken as much the same kind of restriction prohibit ing left-recursive programs in Prolog, for example. The resulting restricted language now lacks the abil i ty to combine an arbitrary number of causes (or influences) into a single common effect. The next section presents two con-structs which help to redress this deficiency without reintroducing ambiguities or requiring unreasonable numbers of contingency tables. 2In fact, if we were solely concerned with qualitative statements of dependency, this ambiguity would not be a problem. 2. Dynamic Bayesian networks 22 2.2 C o m b i n a t i o n n o d e s The main problem which arises from the ambiguity of left-multiple schemata is that the number of instantiations of the parent variables required for a particular consultation is unknown at the time the knowledge base is being created. If we assume that these instances combine their effects in the domain in a simple and regular manner, providing the required contingency table is only a matter of enumerating the instances. In the following sections, two constructs are presented which assume a simple and regular combination of the effects of an indeterminate number of parent variables on a single child variable. Briefly, these are: Existential Combination: i f one of the parent variables takes true, the child variable takes the value true. Universal Combination: i f a l l of the parent variables take the value true, the child takes the value true. These two are currently used in the language, but similar structures, exploit ing some other regularities, might be added. For example, a combination mechanism could be defined corresponding to the notion that exactly one parent variable (out of many) is true. The two discussed in this section are based on the noisy-or and noisy-and gates as described in Pearl [Pearl, 1988]. 2.2.1 Existential Combination This combination is intended to be used when it is known that any one of a number of conditions is l ikely to affect another variable. The syntax of this structure is as follows: 3 X 6 type-a(X) —• b where a(X), b are binary P R V s , and X is a parameter of type type. The variable b depends on the variable 3X € type • a(X), which is a binary P R V dependent ( implici t ly) on every instantiation of a(X). 2. Dynamic Bayesian networks 23 The contingency table p(b\3X G type • a(X)) for this schema is given in square brackets, and is provided by the knowledge engineer in the knowledge base. The contingency table for p(3X G type • a(X)\a(X)) is automatically generated by the system and takes the form of an O r table based on each instantiation of a ( X ) . Example 2.2-1: People who smell smoke may set off a fire alarm, and the alarm actually making a noise depends on at least one person sounding the alarm. Anyone who hears an alarm is likely to leave the bui lding in which the alarm is found. Fire is a likely cause for someone smelling smoke. fire — • smel ls -smoke(X) := [0.7,0.2] smel ls -smoke(X) — • sets_off_alarm(X) := [0.95,0.01] 3 Y G person • sets_off_alarm(Y) — • alarm-sounds := [0.98,0.1] alarm-sounds — • leaves_building(Z) := [0.7,0.4] Suppose we are given that John and mary are the only known members of the set person. Th i s information combined wi th the above schemata creates the network shown in Figure 2.5. The combination node combines the effects of a l l known members of the set. In the preceding example, it is not unreasonable to expect that some unknown person has set off the alarm, and this possibility is granted in the contingency table for 3Y G person • sets -of f-alarm(Y): p(alarm_sounds| 3 Y G person • sets_off_alarm(Y)) = 0.98 p(alarm_sounds| ->3Y G person • sets_of f .a larm(Y)) = 0.1 The first probabili ty indicates that wi th high probability, i f a person sets off the alarm, the fire alarm w i l l make a noise. The second number indicates that wi th low probability, 2. Dynamic Bayesian networks 24 the alarm w i l l sound when no known member of the set person has set off the alarm. This includes the possibili ty of a malfunction of the alarm, the possibility that some person unknown to the system has set off the alarm, as well as other unknown causes. There is no facility in our language for making hypotheses for these unknown causes. 2.2.2 Universal Combination This combination is intended to be used when it is known that every one of a number of conditions must hold to affect another variable. The syntax of this structure is as follows: VXetype-a(X) —-> b := [«I,M2] where a(X), b are binary P R V s , and X is a parameter of type type. The variable b depends on the variable VX G type • a(X), which is a binary P R V dependent ( implici t ly) on every instantiation of a(X). The contingency table p(b\VX £ type • a(X)) for this schema is given in square brackets, and is provided by the knowledge engineer in the knowledge base. The contingency table for p(VX € type • a(X)\a(X)) is automatically generated by the system and takes the form of an And table based on each instantiation of a(X). Example 2 .2-2: A board meeting for a large corporation requires the presence of a l l board mem-bers, and the meeting may result in some actions, say, buying out a smaller company. A board member may attend the meeting, depending on her relia-bi l i ty and state of health. VX € board-members • present(X) meeting healthy(X),reliable(X) — • meeting := [0.8,0.1] — • buy-out := [0.7,0.3] — • present(X) := [0.9,0.4,0.6,0.1] 2. Dynamic Bayesian networks 25 Figure 2.5: The Bayesian network created for Example 2.2-1 2. Dynamic Bayesian networks 26 We note that at the time the schemata are written, it is not important to know how many board members there may be. However, we must know exactly who is a member on the board before the network can be created. Th i s informa-t ion is supplied at run-t ime, and the construction process can construct the appropriate network. Evidence concerning the reliabili ty and health of any board member can then be submitted to the network, providing appropriate conditioning for queries. 2.2.3 Summary The combination nodes presented in this section are used to allow a number of conditioning P R V s to have direct influence on a single child P R V . The number of these conditioning variables is unknown when the knowledge base is created, and the user of the system can indicate as many instantiations as required in the model. The combination nodes effectively collect each relevant instantiation, and combines them using an or-rule for existential nodes, and an and-rule for universal nodes. These combination nodes have been implemented in the language, and another possibility would be to add a similar mechanism which performs a combination that exclusively selects a single individual as the causal agent. Others are possible in principle, but any similar mechanism should maintain the properties of being a simple and regular combination. 2.3 C r e a t i n g B a y e s i a n n e t w o r k s Creating a Bayesian network from a knowledge base is a simple process. The in i t ia l state is a Bayesian network wi th no nodes and no arcs. The system takes every known individual (which was supplied by the user) and instantiates every schema which has a parameter of the corresponding type. W i t h i n a schema, sev-eral parameterized random variables may have shared parameters (i.e., parameters wi th the same name), and when the schema is instantiated, every occurance of the common parameter is replaced by a common individual . The instantiated schema relates random variables, and is interpreted by the system, adding nodes and arcs to the current Bayesian network. The system creates a node representing each random variable in the instantiated schema, and checks whether an identical node already exists in the network. If not, a node is entered into the network; i f a node already 2. Dynamic Bayesian networks 27 exists in the network, it is not added again. An arc is directed from each node representing a parent random variable to the child variable. In this way, a complex structure can be created from a simple set of schemata. This procedure for creating Bayesian networks from parameterized knowledge could be used a number of ways. First, a user might be responsible for submitting the individuals of the model to the system, and then use the Bayesian network created by the system interactively. Alternatively, a higher level program might use the network creation/evaluation system as a sub-system, and submit individuals and make queries in order to perform some action or decision. The Influence implementation, described briefly in Chapter 3 and more extensively in [Horsch, 1990], can be used as a (somewhat simple) interface between the user and the system. It can also be used as a module for another program to manipulate. Chapter 3 Using Dynamic Bayesian Networks In previous chapters, I presented the dynamic Bayesian network as a useful tool for reason-ing under uncertainty. Chapter 2 presented some simple examples showing the results of bui lding networks dynamically. Chapter 4 w i l l demonstrate that the dynamic constructs are independent of the method by which the posterior distributions are calculated. In this chapter, I demonstrate wi th some examples the usefulness of dynamic Bayesian networks. Since the examples are directly executable in the prototype implementation, the syntax for the In f l uence interpreter, presented in [Horsch, 1990], w i l l be used. I w i l l briefly introduce the syntax in the next section. The rest of this chapter is organized in the following way. The first example takes the domain of diagnosing faults in simple electronic circuits. The second example takes the domain of interpreting simple sketch maps, which is the domain of the Mapsee project at U B C [Mulder et al, 1978], and has been treated logically by Reiter and Mackworth [Reiter and Mackworth , 1989], and wi th default logic by Poole [Poole, 1989]. The thi rd example is borrowed from Breese [Breese, 1990]. A summary section concludes this chapter wi th some observations about how knowledge bases for dynamic Bayesian networks can be created. 28 3. Using Dynamic Bayesian Networks 29 3.1 : A n i n t e r p r e t e r f o r d y n a m i c B a y e s i a n n e t w o r k s The Influence interpreter is an implementation of the techniques of this thesis, written in Prolog. Writing a knowledge base using this implementation has two parts: declaring the PRVs, and writing the schemata. The knowledge base is used within the implementation to create the networks and evaluate queries. 3.1.1 Declarations Parameterized random variables (PRVs) are represented by Prolog terms, and must be declared as being binary, i.e., taking a value from {true, false}, or multi-valued, i.e., taking a value from a given list of mutually exclusive and exhaustive values. The syntax is: where the prvi are PRVs, and arity is the number of parameters the node has. For multi-valued PRVs, the list of values must be separated by commas. Several variables, separated by commas, can be declared at once using either declaration command. In the case of multi-valued nodes, the intention is to declare several variables all taking values from the same set. The / arity can be omitted in place of the direct specification of the parameters, or, if the node has no parameters, then an arity of zero is assumed. Within a PRV, parameters are represented by Prolog variables, i.e., atoms which begin with a capital.1 A special declaration is required for PRVs whose dependence can be written functionally in terms of the values taken by its parents values. For example, we might want to use a PRV for the subtraction operation, such that: xThis is the first and last time that parameters will be referred to as any kind of variable. To reinforce the correct understanding, parameters for parameterized random variables in a knowledge base of schemata are implemented here as Prolog variables. binary prv\ I arity. multi prv2 I arity 9 [value-lisf] . 3. Using Dynamic Bayesian Networks 30 The following declaration is used: f u n c t i o n prv. A n y function for a P R V must be a function of the values taken by its parent's P R V s , as opposed to being a function of the individuals which may later instantiate the schema. The function itself is writ ten wi th the schema in which i t appears as the chi ld . Wr i t ing schemata is the topic of the next section. It is important to emphasize that the parameterized random variables which appear in these declarations are neither random variables nor nodes in the network. A P R V must be instantiated to become a random variable, as described in Chapter 2, and the network creating procedure w i l l create a node in the Bayesian network corresponding to the in-stantiated P R V . However, the phrase the node representing the instantiated parameterized random variable {prv} quickly becomes tiresome, and often w i l l be simply referred to as node. It should be clear by context whether the strict meaning of node is intended. 3.1.2 Schemata Once the P R V s for a knowledge base are declared, the schemata may be writ ten. The syntax is simply: p-prv\, ... , p-prvn => c-prv • = Lpi,-..,Pfc] • which is interpreted as indicating that instantiations of this schema w i l l be used to create a Bayesian network such that arcs w i l l be directed from the nodes representing the parent P R V s to the node representing the child P R V . A restriction on schemata is that every parameter which appears in the list of parent P R V s must also appear in the child P R V ; in the terminology of Chapter 2, schemata are not allowed to be left-multiple. The list [ p\,..., pk ] is a short-hand notation which lists the probabilities for the contingency table corresponding to the schema. Recal l that a contingency table must have a probability for each combination of values for the P R V s . The short-hand list orders these probabilities by varying each parent over its set of values (in the order of the value list given in the m u l t i declaration) wi th the left-most parent varying most quickly. If the child P R V is declared as b i n a r y , only the case i n which the child P R V takes the value t r u e is specified. Otherwise, the parent P R V s are varied for each value of the child P R V . 3. Using Dynamic Bayesian Networks 31 For P R V s declared as f u n c t i o n , the syntax is: p-prvi, . . . . p-prvn => c-prv := {y(c-prv=c-val, [p-prvi=pi-val, p-prvn=pn-val\) - i f test then vtrue e l s e V f a i s e }. where test is a Prolog condition, vtrue and Vfaise are numbers in [0,1]. The E x a m p l e 3.1—1: Suppose we wanted to model a decision for someone to attend a baseball game. Assume that a baseball fan w i l l almost certainly have lots of fun at the game if it doesn't rain, but may find something fun to do elsewhere. The more rain at the game, the less likely it w i l l be that the fan has a lot of fun. b i n a r y a t t e n d s ( X ) . m u l t i r a i n s @ [ n o n e , d r i z z l e , p o u r s ] . m u l t i has - fun(X) @ [ l o t s , l i t t l e ] . a t t e n d s ( X : f a n ) , r a i n s => h a s - f u n ( X : f a n ) := [0.95, 0.5, 0.65, 0.5, 0.2, 0.5, 0.05, 0.5, 0.35, 0.5, 0.8, 0.5] . The ordering of the contingency table (the list [0.95, . . . , 0.5]) is as follows: p(has — fun(X) = lots\attends(X) A rains = none) = 0.95 p(has — fun(X) = lots\->attends(X) A rains = none) = 0.5 p(has — fun(X) = lots\attends(X) A rains = drizzle) = 0.65 p(has — fun(X) = lots\->attends(X) A rains = drizzle) = 0.5 p(has — fun(X) = lots\attends(X) A rains = pours) = 0.2 p(has — fun(X) = lots\-iattends(X) A rains = pours) = 0.5 p(has — fun(X) — little\attends(X) A rains — none) = 0.05 pihas — fun(X) = little\-iattends(X) A rains = none) = 0.5 p(has — fun(X) = little\attends(X) A rains = drizzle) — 0.35 p(has — fun(X) = little\-yattends(X) A rains = drizzle) = 0.5 p(has — fun(X) = little\attends(X) A rains = pours) = 0.8 3. Using Dynamic Bayesian Networks 32 pihas — fun(X) = little\->attends(X) A rains = pours) = 0.5 As an example of using function declarations, consider the following example, which models the probability of taking the sum of two six-sided dice: multi dl,d2 Q [1,2,3,4,5,6]. multi sum® [2,3,4,5,6,7,8,9,10,11,12]. d l , d2 => sum := { p(sum=Z, [dl=X, d2=Y]) = i f (Z is X + Y) then 1.0 else 0.0}. 3.1.3 Combination Nodes Existential and universal combination PRVs must be declared using the reserved PRV names exists and f o r a l l : binary exists (param, type.prv) . binary f o r a l l (param, type.prv) . where param is a parameter which occurs in the PRV prv and is of type type. Declared combination PRVs can be used in schemata as follows: exists (param, type, p-prv) => c-prv := \.vi,v2] . Note that a combination PRV must be the only parent of a child PRV, and is necessarily a binary PRV. 3.1.4 Creating networks Given a set of schemata, the interpreter needs to be given individuals to create a network. An individual can be specified by the following: 3. Using Dynamic Bayesian Networks 33 observe type(indiv, type). where indiv is the name of an individual , and type is the individual 's type. 3 .1 .5 Queries Querying the Bayesian network consists of asking the interpreter to compute the probability of a conjunction of random variables given a conjunction of conditioning knowledge. The syntax is: p( [query-conjunction] , [conditioning-knowledge] ) . The interpreter w i l l return the determined value, repeating the query information. This part of the interpreter uses the Bayesian network evaluation techniques of Poole and Neufeld [Poole and Neufeld, 1989] modified for dynamic Bayesian networks as outlined in Chapter 4. 3.2 D i a g n o s i s o f m u l t i p l e f a u l t s f o r d i g i t a l c i r c u i t s In this section, I present a simple Influence knowledge base which can be used to cre-ate a probabili ty model for any digi tal circuit containing and-gates, or-gates, and x o r -gates. Th i s is a simplification of a common diagnostic domain [Reiter, 1987, de Kleer and Wi l l i ams , 1987, Pearl , 1988], but the purpose of this example is not to demonstrate a new diagnostic technique, but to show how Bayesian networks can be built dynamically based on a carefully designed knowledge base of schemata. Assume that digi ta l gates exhibit random but non-intermittent behaviour when they are broken, and that they have a prior probabili ty of being broken. A gate that is working correctly has an output that is dependent directly on the inputs to the gate. A s well, the behaviour of a gate is the only criterion for diagnosis. A gate is assumed to have two inputs, and to distinguish them, we w i l l label them wi th input port numbers, as in input(and\_gate(gl57) ,1). First we define the variables used in our domain, indicating the parameter types used: 2 2 Because this example is useful pedagogically, it is broken into several pieces, with discussion intermin-gled with code. The complete source is found in Appendix A. 3. Using Dynamic Bayesian Networks 34 binary ok(Gate:gate). binary output(Gate:gate). binary input(Gate:gate, Port:port). binary exists(conn(Gl, G2, Port), connections, output(Gl)). The node ok(Gate) has the value true iff the gate is functional. The node output (Gate) is true iff the output of the gate is on, and similarly input (Gate, Port) is true iff the input to the indicated port of a gate is on. The existential node exists(conn(Gl, G2, Port), c o n n e c t i o n s , output (GI)) tells us there is a connection between the output of a gate and an input port to another gate. We use this parameter conn(Gl, G2, Port) to allow us to submit observations of the network structure to the system, allowing it to create the corresponding network. The behaviour of the gates is modelled wi th the following schemata: / * and—gates * / ok(and_gate(G):gates), input(and_gate(G):gates,1), input(and_gate(G):gates,2) => output(and_gate(G):gates) := [1,0,0,0,0.5,0.5,0.5,0.5]. / * or—gates * / ok(or_gate(G):gates), input(or_gate(G):gates,1), input(or_gate(G):gates,2) => output(or.gate(G):gates) := [1,1,1,0,0.5,0.5,0.5,0.5]. / * xor—gates * / ok(xor_gate(G):gates), input(xor_gate(G):gates,1), input(xor_gate(G):gates,2) => output(xor_gate(G):gates) := [0,1,1,0,0.5,0.5,0.5,0.5]. which states the assumptions about gates. The first four entries in the contingency table for each gate is s imply the truth table for the boolean expression associated wi th each type of gate. The last four entries, a l l 0.5, merely makes explicit the assumption that i f the gate 3. Using Dynamic Bayesian Networks 35 7 Figure 3.1: A simple circuit. is not working correctly due to a malfunction, the output of the gate w i l l be on or off wi th equal probability. The topological connections between the gates is modelled using an existential combination: This schema has several interesting features. Firs t , the use of the connection triple ensures that the output of the gate G I is associated wi th the input of the gate G 2 according to the observations to be supplied by the user. Furthermore, every input to a gate w i l l be associated wi th unique value—there w i l l be no more than one value coming into the input, and only a very t r iv ia l combination (of one value) w i l l be performed. This technique is discussed further in Section 3.5 Note that by setting the contingency table differently, we can use the same schema to model the circuit under the assumption that the connections are not always faultless. For instance, using the following table [0 .85 ,0] says that the connection has a finite probabil i ty of being broken. Figure 3.1 shows a very simple circuit. Gate g l is an and-gate, g2 is an or-gate, and g3 exists(conn(Gl ,G2 ,Port) , connections, output(Gl)) => input(G2:gates,Port:ports). := [ 1 , 0 ] . 3. Using Dynamic Bayesian Networks 36 is an xor-gate. The knowledge base w i l l be able to model this circuit i f we supply the following information about types: observe type(and_gate(gl).gates). observe type(or_gate(g2).gates). observe type(xor_gate(g3).gates). observe type(conn(and_gate(gl),xor_gate(g3) , 1), connections), observe type(conn(or_gate(g2),xor_gate(g3),2), connections). This information creates the Bayesian network shown in Figure 3.2, which we can use for a variety of tasks. We can model the correct behaviour of the circuit by conditioning our queries on the assumptions that every known gate is working correctly. We can also condition a query on the known inputs and outputs to determine the probabili ty of a particular configuration of gates working incorrectly. 3.3 I n t e r p r e t i n g s k e t c h m a p s The domain of interpreting sketch maps provides a good test-bed for knowledge represen-tation techniques and automated reasoning tools. The Mapsee project at the University of Br i t i sh Co lumbia has used a variety of hierarchical knowledge structures and constraint satisfaction techniques to interpret hand drawn sketch maps (for a good overview, see [Mul-der et al, 1978]). To formalize the image interpretation problem, Reiter and Mackworth [Reiter and Mackworth , 1989] posed the problem wi th in a logical representation, showing that there is a notion of correct interpretations which a l l applications must provide (but which is much harder to prove for less rigorously well-defined programs). Th i s problem is of interest as an example of the application of Dynamic Bayesian networks because of the following two questions: 1. Given several possible interpretations for objects in a sketch map, how can we choose a preferred one? 2. How can we provide a Bayesian network which can model any possible sketch map we may want to interpret? The answer to the first question is: Use a Bayesian network. The answer to the second is: Use a dynamic Bayesian network. Figure 3.2: The Bayesian network constructed by our simple knowledge base for the circuit in Figure 3.1. 3. Using Dynamic Bayesian Networks 38 c2 r l Figure 3.3: A simple sketch map. 3.3.1 Sketch maps A sketch map is made up of chains 3 and regions, wi th the intention that, interpreted correctly, chains correspond to roads, rivers or shorelines, and regions correspond to either land or water. Figure 3.3 shows an example of a simple sketch map. Th i s example is ambiguous, as chain c l could be interpreted as either a road which meets a traffic loop (chain c2), or a river which flows into a body of water, region r2 (chain c2 being the shoreline), or a road which ends at a body of water. 3.3.2 A probabilistic knowledge base The following schemata represents enough knowledge to provide answers to queries about sketch maps, like "What is the probabili ty that chain c2 is a road, given that it is a chain which meets another chain?" The contingency tables use probabilities of zero for impossible configurations, such as two rivers crossing, etc. For knowledge like "Given two chains which meet in a tee, the probabili ty that they are both rivers is p" the number p used is only a rough estimation, which is sufficient to demonstrate the dynamic qualities of schemata, but may not reflect the reality of occurrences of rivers or roads joining. 3 A chain is a connected sequence of points. 3. Using Dynamic Bayesian Networks 39 '/,'/, scene objects can be linear or area objects multi isa-linear(X) <3 [road, river, shore] . multi isa-area(X) @ [land, water]. '/,*/, Image objects: binary tee(X.Y). '/,'/, chain X meets chain Y in a tee-shape binary chi(X,Y). VI* chain X and chain Y make a chi—crossing binary bounds(X.Y). '/,'/, chain X bounds area Y binary interior(X,Y). '/„'/» Area Y is in the interior of chain X binary exterior(X,Y). '/,'/, Area Y is exterior of chain X '/,% Scene descriptions: binary joins(X.Y). '/.'/.linear scene objects join in a tee shape binary crosses(X.Y). •/,'/, linear scene objects cross binary inside(X.Y). '/.'/, area object X is inside linear object X binary outside(X,Y) . '/.'/. area object X is outside linear object X °/.'/o Trick implementation of equality predicate! '1,7, Two distinct random variables, which have their VI, arguments as values: multi vall(X) @ [X, X]. multi val2(Y) « [Y, Y]. VL equality is then define i f the values are the same! function equal/2. vall(X), val2(Y) => equal(X.Y) {p(equal(X,Y),[vail(X)=X,val2(Y)=Y]) = i f (X=Y) then 1.0 else 0.0}. °/,4/. Two linear scene objects can join to form a tee isa-linear(X:chain), isa-linear(Y:chain), joins(X:chain,Y:chain), equal(X:chain,Y:chain) => tee(X:chain,Y:chain) : - C . . ] . 4 4There are 36 numbers in this list, and in some of the contingency tables for the remaining schemata there are even more than this, but none of them is crucial to the explanation of the dynamic nature of 3. Using Dynamic Bayesian Networks 40 '/,'/, two linear objects can cross to form a chi isa-linear(X:chain), isa-linear(Y:chain), crosses(X:chain,Y:chain), equal(X:chain,Y:chain) => chi(X:chain,Y:chain) := [...]. */*/» linear objects can form closed loops isa-linear(X:chain), loop(X:chain) => closed(X:chain) := [.. .3. 4/,0/ linear scene object are found beside area objects isa-area(X:region), isa-linear(Y:chain), beside(X:region,Y:chain) => bounds(X:region,Y:chain). := [...]. %% on the linear object which forms the boundary between two '/,'/„ area objects, we must constrain the objects involved °/o% e.g. A road is not a boundary between two bodies of water isa-area(X:region), isa-linear(Y:chain), isa-area(Z:region), inside(X:region,Y:chain), outside(Z:region,Y:chain) => boundary-constraint(X:region,Y:chain,Z:region). := [...]. There are several interesting features of this knowledge base. First of a l l , the use of multiple valued P R V s constrains the objects in the domain, e.g., a linear scene object can only be one of {river, road, shore}, and multi-valued P R V s express this succinctly. Observe also the use of type information; since only individuals of the correct type can be used to instantiate schemata, dividing the domain into two P R V s i s a - l i n e a r ( X ) and isa-area (X) assures that chains and regions are disjoint sets of objects. Final ly , the use of equals(X,Y) is required to ensure that when the schemata are instantiated we only consider interpretations in which, for example, linear scene objects do not intersect themselves. this example. For a complete listing, see Appendix A. 3 . ; Using Dynamic Bayesian Networks 41 The user specifies the following individuals, and their types: type(cl,chain). type(c2,chain). type(rl,region). type(r2,region). which when combined wi th the knowledge base, creates the Bayesian network in Figure 3.4. The user is free to query the state of the network, conditioning on c2 being closed, and r l being on the outside of c2, etc. Note that this network could be used the opposite direction as well: conditioning on the scene objects allows the user to "predict" the image. 3.4 T h e B u r g l a r y o f t h e H o u s e o f H o l m e s The following story is used by Breese [Breese, 1990] to demonstrate his network construc-tion application, and is derived from an example used by Pearl [Pearl, 1988]: M r . Holmes receives a telephone cal l from his neighbour, Dr . Watson, who states that he hears the sound of a burglar alarm from the direction of Holmes' home. A s he is preparing to rush home, M r . Holmes reconsiders his hasty decision. He recalls that today is A p r i l 1st, and in light of the A p r i l Fool 's prank he perpetrated on his neighbour last year, he reconsiders the nature of the phone cal l . He also recalls that the last time the alarm sounded, it had been triggered by an earthquake. If an earthquake has occurred, it w i l l surely be reported on the radio, so he turns on a radio. He also realizes that in the event of a burglary, the l ikelihood of recovery of stolen goods is much higher if the crime is reported immediately. It is therefore important, i f he in fact a burglary d id occur, to get home as soon as possible. O n the other hand, i f he rushes home he w i l l miss an important sales meeting wi th clients from B i g Corp. which could result in a major commission. Should M r . Holmes rush home? The knowledge needed to solve this problem can be expressed in the language of In f luence as follows. 3. Using Dynamic Bayesian Networks 42 Figure 3.4: The Bayesian network constructed by our simple knowledge base for the sketch map in Figure 3.3. 3. Using Dynamic Bayesian Networks 43 binary sale-obtained(X) . '/,'/. has a sale been obtained by X? binary burglary(X). 'IX was there a burglary in X's house? binary go-home(X). •/,'/. does (should) X go home IMMEDIATELY? binary meeting(X.Y). 'IX, was there a meeting between X and Y? binary earthquake. '/,'/, was there an earthquake? binary radio. 'IX, did the radio report an earthquake ? binary client(X). '/,'/, X has a client? function value(X). 'IX what is the value of X's decision? function losses(X). '/,'/, how much did X lose? function income(X). 'IX how much does X stand to gain? 'IX, how much is a sale worth? multi sale-value(X) 9 [0,150,300]. 'IX* how much of X's stolen goods were recovered? multi goods-recovered(X) 9 [0,500,5000]. 'IX how much were X's stolen goods worth? multi stolen-goods-value(X) 0 [0,500,5000]. 'IX, when did X report thre burglary? multi burglary-report(X) 0 [immediate, late, none]. */,'/, compute the difference between X's income and losses '/,'/, ca l l i t value for X losses(X:alarm-owner), income(X:alarm-owner) * => value(X:alarm-owner) := { . . . } . « 'IX, X may recover some of his stolen goods depending on 'IX, when X reported the burglary 5 The function here, as well as the contingency table for the schemata in this example are not essential for the understanding of the dynamic nature of the schemata. The actual numbers are found in the complete listing in Appendix A. 3. Using Dynamic Bayesian Networks 44 burglary(X:alarm-owner), burglary-report(X:alarm-owner) => goods-recovered(X:alarm-owner) := [...]. Vh If X recovers stolen goods, then there is not loss goods-recovered(X:alarm-owner), stolen-goods-value(X:alarm-owner) => losses(X:alarm-owner) := {...}. */,*/, the income for Y depends on getting the sale from a client client(Y:alarm-owner), sale-obtained(Y:alarm-owner), sale-value(Y:alarm-owner) => income(Y:alarm-owner). := {•••}• Vh - A meeting may take place between X and Y Vh i f X doesn't go home go-home(X:alarm-owner) => meeting(X:alarm-owner,Y) := [ . . J . '/.'/. X wil l report a burglary i f one occurred, and X has gone home Vh to verify i t go-home (X: alarm-owner), burglary (X: alarm-owner) => burglary-report(X:alarm-owner) := [ . . . ] • Vh i f there was a burglary, then goods of some value have been Vh stolen burglary(Y:alarm-owner) => stolen-goods-value(Y:alarm-owner) := [...]. */,'/, X wil l meet with Y to talk sales. A sale may occur meeting(X:alarm-owner,Y:corporation) => sale-obtained(X:alarm-owner,Y:corporation) := [ . . . ] • Vh an alarm is affected by earthquakes and burglaries Vh - the alarm wil l almost definitely sound i f both an earthquake Vh and a burglary occur, and almost definitely wil l not sound i f Vh neither occur 3. Using Dynamic Bayesian Networks 45 burglary(Y:alarm-owner), earthquake => alarm(Y:alarm-owner) := C . . ] . 7,7, earthquakes tend to be reported on the radio. . . earthquake => radio := [...]• 7,7, Phone calls from neighbours about alarms. 7,7, - neighbours usually only cal l when an alarm sounds 7,7. - non-neighbours don't cal l at a l l ! neighbour(X:person,Y:alarm-owner), alarm(Y:alarm-owner) => phone-call(X:person,Y:alarm-owner) := [...]• This example demonstrates the power of our simple network creating process. The user specifies the appropriate individuals: observe type(holmes.alarm-owner). observe type(watson,person). observe type(big-corp, corporation). and the network in Figure 3.5 is created. In [Breese, 1990], Breese makes use of influence diagram technology, such as decision nodes, which are useful in this problem. These types of nodes are not implemented in Influence. However, by conditioning on whether or not Holmes should go home, the user can determine the value of each decision by querying value (holmes). The value computed by this technique is not the same as the "value of a decision node" in Breese's program; Breese uses value nodes from influence diagram technology (as in [Shachter, 1986]) which computes an expected value based on the product of the possible values by their respective probability. In Influence there is no equivalent functionality built-in.6 6 However, this computation can be simulated by querying each value of the node to obtain the posterior probability and performing the same sum-of-products by hand. 3. Using Dynamic Bayesian Networks 47 3.5 B u i l d i n g k n o w l e d g e b a s e s f o r D y n a m i c B a y e s i a n n e t w o r k s In this chapter several example knowledge bases have been presented for different do-mains. A s it may not be obvious how these knowledge bases were constructed, this section w i l l present some ideas which have been useful in creating the knowledge bases for these domains. Some common pitfalls are also identified, and alternatives are suggested. Separating properties from individuals This is a fundamental technique. It should be noted that many different concepts may classify as individuals, depending on the domain, and on the intent of the application. For example, a person in a story may be an individual , and the triple (Gatel, Gate2, InputPort) as used in Section 3.2 to represent connections in electronic circuits is also an individual . It seems useful to consider the kinds of information that a user might be able to observe. Direction In many cases, schemata are naturally written in a causal direction. For example, the behaviour of a logic gate is quite naturally described causally, since its output depends on the inputs. The schemata for a given domain need not be in the causal direction, but should be writ ten wi th a consistent direction. This is a general observation which seems to aid in keeping knowledge bases from creating directed loops in the Bayesian networks created from it . It may arise that some causal relationships are not easy to quantify, and care should be taken to avoid wri t ing "loops" in the knowledge base. Conditional probabilities and semantics W h i l e it is true that a conditional probability distribution may take any values which sum to one and remain consistent mathematically, it is important to remember that a contingency table has an intended meaning. When creating the knowledge base, one should be certain that every entry in the contingency table for a schema makes sense. For example, the following schema might look appropriate, according to the direction principle 3. Using Dynamic Bayesian Networks 48 above bird,emu — • flies because both birdness and emuness have some influence on a bird's abil i ty to fly. However, this schema requires an entry in the contingency table for the expression p(flies\-ibird, emu), which, i f we know anything about emus and birds is inconsistent semantically. Assigning this probabili ty is context dependent; that is, i f we have in addition to the schema about flying, a schema such as emu — • bird stating that a l l emus are birds, the actual value of p(flies\~>bird, emu) is irrelevant. Some axiomatizations may specify a particular value for a probabili ty wi th inconsistent condi-t ioning (for example, Alel iunas [Aleliunas, 1988] assigns a value of unity). Nevertheless, even i f the inconsistency may be easy to disregard in small knowledge bases, in larger ones, keeping track of a l l the irrelevant inconsistencies seems to be more work than redesign-ing the problematic schemata for consistency. In our bird example, we might rewrite the schemata as in the following: emu —> bird emu — • abJIying bird, ab-flying — • flies where ab_flying is a random variable representing the possibility that something is abnormal wi th respect to flying, i.e., it doesn't fly. Knowing that emus are certainly birds, and that emus tend to be exceptions to the generality that "birds fly" provides the required numbers for the contingency tables. Keeping the contingency tables semantically consistent and informative w i l l make modifi-cations and extensions of the knowledge base less troublesome. The use of combination nodes Combinat ion nodes, such as the existential and universal combinations discussed in Sec-t ion 2.2 are used to model multiple influences assuming a simple and regular combination 3. Using Dynamic Bayesian Networks 49 of effects. They can be used in a straightforward manner, as demonstrated in Example 3.5-1, where the combination was performed over a simple set of individuals. In Section 3.2, a more complex k ind of individual was used to create a separate structure for each of these individuals. In many domains, causal or influential knowledge is best modelled by considering every cause for a particular effect (e.g., every disease which causes the same symptom) inde-pendently. Further, each cause is assumed to have an enabling/disabling condition. This kind of assumption is modelled in Pearl and others [Pearl, 1988] by a structure called a noisy-Or gate. It has the advantage of requiring a contingency table whose size is on the order of the number of causes, and each entry in the table considers only one cause at a time. The existential combination introduced in this thesis can be used to construct noisy-or gates. Suppose we wanted to model the causes of sneezing wi th a no i sy -Or gate, combining nodes like flu(X) and hay-fever(X), because it is known that either disease is l ikely to cause sneezing and that these diseases are independent. This is stated in the following: , flu(X) — > causes(flu(X), sneezing) hay — fever(Y) — • causes(hay — fever(Y), sneezing) 3Z € causes • causes(Z, sneezing) — > sneezing The user would then have to tell the system which causes to consider in the model, and the system would create a structure which combines the causes. This behaves exactly as a noisy-or gate, since list ing the causes first is equivalent to enabling the cause, and any cause not explici t ly mentioned by the user is not considered by the combination at a l l , as if the cause were disabled. Avoiding too many instantiations It is possible to write a knowledge base of schemata without regard to the process which creates the network. However, it is often helpful to keep the process in mind, to avoid wri t ing schemata which w i l l be instantiated too often. The most obvious way to keep the instantiations l imited is to make use of the type constraint mechanism. For example, when wri t ing the schemata for interpreting sketch maps in Section 3.3, we could have used only one type of scene object for both linear and area objects. If the following schema had been used 3. Using Dynamic Bayesian Networks 50 isa(X), isa(Y), joins(X.Y), equal(X.Y) => tee(X.Y) (where isa(X) could take any value from [road, river, shore, water, land] ) instead of isa-linear(X), isa-linear(Y), joins(X.Y), equal(X.Y) => tee(X,Y) as actually used in Section 3.3, the network creation process would have instantiated schemata for a l l scene objects. No useful information would have been obtained from instantiating them wi th area scene objects. 3.6 C o n c l u s i o n s The example knowledge bases discussed in this chapter demonstrate some of the abilities of dynamic Bayesian networks as presented in this thesis. It is clear that while most schemata are straight forward, some schemata make use of clever techniques which can only be learned by example, and by understanding the network bui lding process. Th i s seems to conflict wi th the simple declarative representation which seems to characterize the language, presented in Chapter 2. The process of bui lding networks is very simple; schemata are instantiated by every in-dividual of the appropriate type, and the instantitations are added to the network. A s we can see from the sketch map example, sometimes there are many instantiations which do not lead to useful information, and could be left out of the network completely, i f the bui lding process were more intelligent. In work done independently of this thesis, Goldman and Charniak [Goldman and Char-niak, 1990] and Breese [Breese, 1990] have presented their work on constructing Bayesian networks from knowledge bases. Bo th approaches address the representation issues dis-cussed in this thesis, but emphasis is placed on more sophisticated procedures for building networks. Goldman and Charniak use a domain specific rule base to construct networks for the domain, and Breese has domain knowledge in the form of Horn clauses which are used to determine which nodes to include in the network. Chapter 4 Implementing dynamic Bayesian networks This chapter deals wi th the issue of implementing the dynamic features described in Chap-ter 2. I present a brief description of the algorithms of Poole and Neufeld [1989],Pearl [1988, 1986], Shachter [1986, 1988, 1989], and Lauri tzen and Spiegelhalter [1988], consid-ering how each might be adapted for dynamic Bayesian networks. 4.1 The general problem Ideally, the dynamic constructs would be implemented so that when new individuals are added to the model, resulting in arcs and nodes being added to the network, the joint probabil i ty distr ibution w i l l be correct and obtainable without recomputation of the pre-vious evidence. That is, we want to give some preliminary observations to our system to get some results, possibly observe new individuals, and get the updated results without having to recompute the effects of the in i t ia l observations. Take the simple example of the following schemata: a —• b(X) 3Xet-b(X) — • c This example contains both a right-mult iple schema and a combination schema, and F ig -51 4. Implementing dynamic Bayesian networks 52 Figure 4.1: A simple example of a dynamic network, (a) showing the original network, and (b) showing the network after another individual of type t is observed. See Section 4.1. 4. Implementing dynamic Bayesian networks i 53 ure 4.1a shows the result of instantiating the schemata for the individuals x\ and x2, which are of type t. This network represents the independence assumptions which gives the following equality: p(a, 6(ari), b(x2), 3X G t • b(X), c) = P(c\3X e t • b(x)) p(3X G * • 6(A-)|6(ari)6(ar2)) p(Kxi)\a)p(b(x2)\a) p(a) Suppose a new individual x3 is observed dynamically. The probability model changes as a result, and ideally, the algorithm used in the inference process can absorb the change in the model without too much recomputation. The network is shown in Figure 4.1b, and the joint distribution for our example becomes: p(a, bfa), b(x2), b(x3), 3X G t • b(X), c) = p(c\3X G t • b(X)) p(3X G t • b(X)\b(x1)b(x2)b(x3)) p(Kxi)\a)p(Kx2)\a)p(b(x3)\a) p(a) This equation demonstrates the change to the model. One new factor has been added, and the existential combination has one more conditioning variable, b(x3). It is important to note that the factor p(b(x3)\a) uses the same contingency table as p(b(x3)\a), so no new information is required. Since the existential combination uses a contingency table which takes the or of all the conditions, all the system needs to know is the number of conditions. To be able to use dynamic schemata efficiently, adding new arcs to the network should allow the algorithm which processes evidence and computes posterior distributions to modify the state of the network in a straightforward manner without recomputing the effects of evidence already submitted. This is not always easy to do, mainly because most of the algorithms perform some direct manipulation on the structure of the network, either to create a computationally tractible representation or to perform the calculations themselves. I will show that the addition of dynamic constructs to the work of Poole and Neufeld [Poole and Neufeld, 1989] is quite straight-forward, and that under certain conditions, Pearl's algorithm can be adapted as well. The algorithm due to Lauritzen and Spiegelhalter is not 4. Implementing dynamic Bayesian networks 54 difficult to adapt. Final ly , I show that Shachter's network evaluation algorithm performs too many structural changes on the the network to allow a feasible implementation of dynamic schemata. 4 . 2 A n A x i o m a t i z a t i o n f o r P r o b a b i l i s t i c R e a s o n i n g i n P r o l o g The influence diagram interpreter described by Poole and Neufeld [Poole and Neufeld, 1989] provides a sound axiomatization for probabilistic reasoning. The computation is goal directed, and only the computations necessary to compute solutions to queries are performed. The In f l uence implementation of dynamic Bayesian networks is based on this interpreter. 4.2.1 The basic computational engine A Bayesian network is specified by a database of direct dependency relations and contin-gency tables associated wi th each variable. The basic computation is goal directed: the desired conjunction of variables which make up a query is supplied to the system together wi th the given evidence. Only those com-putations which are necessary to solve the query are performed, and no computation is performed outside the context of a query. Furthermore, each query is computed indepen-dently from any other query. The axiomatization of probabili ty theory is straight forward, and axioms such as the Nega-tion Rule, the Law of Mul t ip l ica t ion of Probabilit ies, and Bayes' Rule are are used to simplify and solve complex queries recursively. In particular, Bayes' Rule is used to refor-mulate diagnostic queries (i.e. queries which ask about causes given effects) into a causal form (i.e. queries which ask about effects given causes). The direct dependencies of variables, made explicit in the network structure, are compiled into Prolog rules which calculate posterior distributions by "reasoning by cases" wi th respect to a l l of the variable's direct parents. The independence assumption for the parents of a variable provides some simplification of the general reasoning by cases formula, and weights the cases according to the contingency table provided for the variable. A query can be seen as a proof of a symbolic probability expression, grounded in terms of 4. Implementing dynamic Bayesian networks 55 prior probabilities and given knowledge. This expression can be evaluated numerically to provide a quantitative result. 4.2.2 Adding the dynamic combination Adding the dynamic constructs presented in Chapter 2 requires the addition of several axioms, recognizing the special syntax, and translating a query involving a combination node into a series of simpler queries. For existential combination nodes, the results of the simpler queries are combined using the following lemma: Lemma 1 If U = {u±,..., uk} is a set of random variables such that and Uj are con-ditionally independent given v for all i ^ j, then k p ( « i V «a V . . . V uk\v) = 1-11(1- p(Ui\v)) i=i k p(ui A u 2 A ... A uk\v) = "[Jp(ui\v) i= l The first step in the combination procedure is to gather the individuals of the correct type. Each individual instantiates the parameterized random variable given in the combi-nation node, and each instantiation is assumed to be independent, given the state of the combination. For existential combination, the collection of instantiations are combined using Lemma 1 which combines the effects of all the instantiations using the first result. The universal combination is similar, using the second equality. The ability to observe individuals dynamically is provided by the independence of succes-sive queries. A probability model is based on the information provided by the query only, and therefore a new query, with new evidence in the form of individuals, builds a new probability model implicitly as the proof is generated. 4.2.3 Discussion The simplicity of this implementation makes it very attractive, both formally and pedagog-ically. The use of a logical specification helped to keep the developing theory of dynamic combinations correct as well as simple. 4. Implementing dynamic Bayesian networks 56 One advantage to this approach is that computation is goal driven, and only the calcula-tions necessary to compute the query are performed. The primary disadvantage is that, for consultations wi th multiple queries, many identical computations may be performed more than once. A s a general belief computation engine, the current implementation is l imited, as only very small networks can be queried in reasonable time. Various improvements to the compiler and probabil i ty axioms are possible, resulting in a loss of clarity and obvious correctness. Some of these issues are explored in [Horsch, 1990] 4 .3 Pearl's Distributed Propagation and Belief Up-dating The work on belief propagation done by Pearl [Pearl, 1988] provides a probabilistic com-putational paradigm which is local in nature, and each local computation requires l i t t le or no supervision from a controlling agent. Pearl has proposed a highly parallel "fusion and propagation" algorithm for computing posterior marginal probabilities using singly connected Bayesian networks. I w i l l describe the algorithm abstractly at first, saving the mathematical details for later. 4.3.1 Belief updating in singly connected networks In Pearl 's approach, a Bayesian network describes the channels of communication between related variables. For Bayesian networks which are singly-connected, there are simple rules which govern the way the network is affected by the submission of evidence. These rules deal wi th : • How a variable updates its own marginal distribution based on messages received from its direct parents or children. • The content of messages sent by a variable to its direct parents and children. For singly-connected graphs, the rules for updating a node's marginal distribution and communicating changes to its parents and children guarantee that the network w i l l con-verge to a stable equil ibrium and the marginal distributions w i l l be correct. 4. Implementing dynamic Bayesian networks 57 Evidence is submitted to a network by asserting a value for a variable (or a collection of values for different variables). This affects the marginal probabili ty of that particular variable, and this variable sends evidential support values to its parent variables, and causal support values to its children. Each parent or child variable then updates its own posterior distr ibution based on the information gained from its parents and children, and sends out messages to its parents and children. These messages take into account that some of the parents or children have already seen the evidence, thus ensuring that no support is received more than once. When al l variables in the network have updated their distributions in this manner, the propagation ceases. One of the features of belief propagation is the natural way it can be expressed as a distributed process: the message passing and belief updating are local computations and can be performed in a highly parallel computation. 4.3.2 Adding dynamic constructs In this section I demonstrate how the dynamic combination nodes can be added to Pearl's singly connected message passing algorithm. In general, a combination node in a singly connected network could create a second path between two nodes. Therefore, to be com-pletely general, Pearl 's algorithm must be adapted to handle general graphs. I assume for s implici ty that no loops are created when an individual is observed dynamically. This assumption is valid i f any of the methods of Section 4.3.3 is used to evaluate the network in the general case of multiply-connected networks. There are three cases: r ight-mult iple schemata, combination nodes and unique schemata. Right -mul t ip le schemata involves the addition of a new child to a set of nodes. Combi-nation nodes involve the addition of a new parent to a single node. Unique schemata are merely attached to either another unique schema, or at the ta i l of a r ight-mult iple schema or at the head of a combination node. The presentation follows Pearl [1988, page 183ff, Equations 4.47 to 4.53], considering the general case of adding new parents or children to the network. The special syntax for combination nodes and r ight-mult iple schemata is important for determining where the new sections of network are to be added, but unimportant wi th respect to the propagation of support once added. 4. Implementing dynamic Bayesian networks 58 The dynamic addition of parents Suppose we have a node C in the network wi th parents parents(C) = {Ai,... ,An}, and children children(C) = {Bi,... ,Bm} as in Figure 4.2 (the dashed arc from An+\ to C indicates the arc I w i l l be adding). Intuitively, adding a parent to a node can be seen as revising an assumption about the domain. In principle, the new parent can be treated as i f it had always been connected, but having had no effect on the child variable. A d d i n g the new node by a real process only changes the effect the assumed node has on its child. In the case of an existential combination, we can treat the in i t ia l state of the parent as having a prior probabili ty of zero. Add ing the node can be seen as changing the message from zero to the value specified in the knowledge base for this node. Th i s assumption effectively states that "a l l unknown individuals do not have the property we are looking for." For universal combination nodes, the in i t ia l assumed message sent by the parent is unity, that is, we assume that the new node complies wi th the property being combined. Add ing the new parent in this case changes this value from unity to the prior specified in the knowledge base. Before the addition of An+i the internal state of C can be inferred from C ' s neighbours by the following: BEL(c) = p(c|e) (4.1) = P(^x\x)p(x\e^) (4.2) = aA(c)7r(c) (4.3) where e is the evidence submitted so far, e^ are those items of the evidence connected to C through its children, e^ are those items of evidence connected to C through its parents, and a; is a normalizing constant such that £ c aA(c)7r(c) = 1. Now A(c) = p(e-x\x) (4.4) = nw<o (4-5) i TT(C) = p(x\e+) (4.6) = P(c |a i , - . . ,an) I I 7 r c(a,-) (4.7) ai,...,a„ i The AB((C) are messages received by C from its children, and 7Tc(a;) are messages received by C from its parents. 4. Implementing dynamic Bayesian networks 59 Figure 4.2: Adding a parent An+i to node C. 4. Implementing dynamic Bayesian networks 60 The probabilities p(c\ai,an) are those supplied as contingency tables. In the case of combination nodes, these are functional: for existential combinations, we use a func-t ion which computes p(c\ai,..., an) = V"=ia,-, and for universal combination we use p ( c | a i , . . . , a B ) = A?=i a*-A d d i n g a parent An+i to parents(C) requires an update for the state of C. Assume that no new evidence is submitted at the time of adding the new parent. Let BEL'(c) be the new state. Therefore: BEL'(c) = p(c\e) (4.8) = a\'(cW(c) (4.9) A'(c) = UXB.(C) (4.10) = A(c) (4.11) *'(c) = Yl P(c\ai,...,an+1)Y[TTc(ai) (4.12) ai,...,a n+i Note that A(c), which is the support gained from C ' s children doesn't change. The new 7r'(c) reflects the addition of the new parent. The contingency table used before cannot in general be used in this new calculation; a new table must be supplied. However, in the case of combination nodes, these probabilities are functional and can be writ ten to accept an arbitrary number of dependent variables. Having updated its own internal state, C must send appropriate messages to a l l of its children and every parent except the new one. A s well , the new parent must be made aware of the evidence previously seen by the network. The content of the message sent by C to its parents due to the addition of A n + i can be writ ten as: n+l Ac(ai) = P^2\(c) ^2 p(c\au...,an+i)Y[7rc(ak)i^n + l (4.13) where (3 is any convenient constant. Note that each parent of C receives information from each of its mates, and about every child of C from A(c). The message C sends to An+\ should in one message provide information about a l l the evidence seen by C so far. The message sent can be written: Ac(o„+i) = (5y£j\(c) K c l a i > - • • > « « + ! ) I l M a i ) (4-1 4) 4. Implementing dynamic Bayesian networks 61 The message sent to each child Bi in children(c) is: *Bi(c) = (*l[\Bk(c) P (c |« i , . . . , an+l (4.15) k^j ai,...,a„+i BEL'(c) (4.16) Final ly , the added parent An+i can update its own state based on the message received from C and send messages to its children and parents in the manner originally outlined by Pearl . The dynamic addition of children Suppose we have node A in a network wi th parents parents(A) = {D±,..., D/} and children children(A) = {Bi,... ,Bn}, as in Figure 4.3 (again, the dashed arc from A to Bn+i indicates the one that w i l l be added). Intuitively, the addition of the new child is similar to the way parents were added in combination nodes. It can be assumed that the child has always been connected, but having had no in i t ia l effect on its parent. The assumed in i t ia l state gives no preference to the values of the child, that is, the assumed prior is 1/v, where v is the number of possible outcomes of the new child. Add ing the child has the effect of changing the assumed irrelevance to the value specified by the knowledge base. Before the addition of the new child Bn+i, the internal state is given by: BEL(a) = p(a|e) = a\(a)TT(a) (4.17) (4.18) where e is the evidence submitted so far, and a is a normalizing constant. Now (4.19) i ^(a) = Y, p(a\du...,di)Y[irA(di) (4.20) A d d i n g a child Dn+i to children(A) requires an update for the state of A. Assume that no new evidence is submitted at the time of adding the new parent. Let BEL'(a) be the 4. Implementing dynamic Bayesian networks 62 Figure 4.3: Adding a child B n + i to node A. 4. Implementing dynamic Bayesian networks 63 new state. Therefore: BEL'(a) p{a\e) a\'(ay(a) nw«) (4.21) (4.22) (4.23) A'(a) i X(a)XBn+1(a) •••)rf/)II7rc(ai) (4.24) (4.25) di,...,di 7T(o) (4.26) Note that 7r(a), which is the support gained from A ' s parents doesn't change. The new A'(a) reflects the addition of the new child. Having updated its own internal state, A must send appropriate messages to a l l of its children except the new one, and to every parent. A s well, the new child must be made aware of the evidence previously seen by the network. The content of the message sent by A to its parents due to the addition of Bn+i can be writ ten as: where (3 is any convenient constant. Note that each parent of A receives information from each of its mates from TrA(dk) and about every child of A from A'(a) and The message sent to each child Bi, i ^ n + 1 in children(A) is: xA(di) = £ p(c\d1,...,dl)H*A{dk) (4.27) TrBi(a) = aY[XBk{a) £ P(a\du ..., a{) fj 7 r A « ) (4.28) BEL'(a) XBi(a) (4.29) The message A sends to Bn+i can be written: n 7T£„ + 1 = a]]XBk(a)Y2di,--.,dip(a\di,...,ai)'[l'irA(di) (4.30) (4.31) (4.32) 4. Implementing dynamic Bayesian networks 64 which is the previous state of A. Final ly , the added child Dn+\ can update its own state based on the message received from A and send messages to its children and parents in the manner originally outlined by Pearl . Adding unique schemata dynamically A d d i n g unique schemata requires no special handling. This is because they can be attached only to nodes which were added dynamically as part of a r ight-mult iple or combination structure. Their internal state is inferred from their parents and children, which are also new to the network. The standard computation applies to these nodes. When the message due to the change in network structure reaches the nodes of the unique schemata, the updating and subsequent message passing occurs in the manner described by Pearl . 4.3.3 Propagation in general Bayesian networks For multiply-connected networks, Pearl's message passing scheme does not apply. The problem is that networks which have multiple causal chains between variables lose the conditional independence assumption which permitted the local propagation algorithm to distribute messages along each arc. Furthermore, then network can get caught in an infinite stream of messages. Solutions to this problem include: • Keeping a list of variables in the history of the propagation. Th i s requires message lengths exponential in the number of variables in the network. • Clustering sibling nodes along the multiple paths, creating a single chain. This requires possibly exponential time to identify loops and form the clusters. • Disconnecting the loop at the head (i.e. at the variable which begins the multiple chains). The variable at the head is instantiated to each value it can attain, and the effects of this instantiation are averaged. This requires computations exponential in the number of values the head of a loop can attain. • Stochastic simulation, in which the variables take random values according to the probabil i ty distr ibution derived from the current state of its direct parents. The 4. Implementing dynamic Bayesian networks 65 posterior distr ibution is then taken as the ratio of assigned values to the number of t r ia l runs made. The drawback to this approach is that convergence to the correct distr ibution is guaranteed under certain conditions, but the number of trials necessary for a particular accuracy is undetermined. 4.3.4 Discussion I have shown how Pearl 's fusion and propagation algorithm can be modified to handle dynamic constructs in a way which correctly reflects the change in the probabili ty model. The adaptation is at a cost of locality: the purely local nature must be enhanced by a controlling program which adds nodes and arcs to the net and starts the appropriate message propagation. Final ly , I have only treated the special case where the dynamic constructs do not create loops in previously singly-connected networks. A dynamic scheme to handle this general case, using the ideas presented in Section 4.3.3, would be another interesting project. 4.4 L a u r i t z e n a n d Sp iege lha l te r ' s c o m p u t a t i o n o f be -l i e f Lauri tzen and Spiegelhalter [Lauritzen and Spiegelhalter, 1988] address the issue of per-forming local computations of beliefs using Bayesian networks as expert system inference engines. In particular, they are concerned wi th computing beliefs on large, sparse networks which are not necessarily singly connected. Briefly, their approach first modifies the structure of the Bayesian network representing the domain knowledge, creating a full moral graph. Cliques are identified, and the belief computation uses them to propagate support values throughout the network. 4. Implementing dynamic Bayesian networks 66 4.4.1 Belief Propagation on Full Moral Graphs Their algori thm can be described as consisting of three steps. 1 First , the Bayesian network is triangulated (and referred to as a full moral graph). Second, cliques in the triangularized network are ordered using a maximal cardinality search (i.e. starting from an arbitrary node, of a l l neighbour nodes yet to be labelled, the node having the most already labelled neighbours is labelled first), and each clique is treated as a compound variable in a tree structure connecting these cliques. Final ly , the ordering of the maximal cardinality search is used to direct the arcs in the tree, and the method of propagation in singly-connected networks is used to update probabilities. 4.4.2 Adding dynamic constructs Briefly, Lauri tzen and Spiegelhalter's algorithm can be adapted to include the dynamic structures in the following manner. The new section of the network is added (dropping the direction of the arcs) to the extant triangularized structure. The network is retriangu-larized, and following the above procedure, cliques are identified and a tree of compound nodes is created. A t this point the propagation procedes on the new structure in the familiar manner. The complication of adding the new nodes to the triangularized structure can be ap-proached by using the observation from Section 4.3.2, namely treating the arcs as i f they had always been in the network, but having had no effect on the rest of the network. This observation can be used to change the marginal distribution of each clique which has a new node in it or is a neighbour to a such a clique. 4 .5 P r o b a b i l i s t i c i n f e r e n c e u s i n g I n f l u e n c e d i a g r a m s In Operations Research, influence diagrams have been used as a common analysis tool for decision analysis under uncertainty. The approach of Shachter is to provide a normative axiomatic framework for this domain, which has often used ad hoc methods [Shachter, 1986, Shachter, 1988, Shachter, 1989]. A n influence diagram is a network similar to a Bayesian network, relating random vari-1This observation is due to Pearl, from his commentary on [Lauritzen and Spiegelhalter, 1988], and also noted by Shachter in [Shachter, 1989]. 4. Implementing dynamic Bayesian networks 67 ables and identifying probabili ty distributions graphically. In addition to random variables (called chance nodes in Operations Research), influence diagrams also have nodes to rep-resent expected outcomes and decisions. Influence diagrams without these special nodes are Bayesian networks, and in this overview, these w i l l only be considered in passing. For purposes of decision analysis, a l l probabilistic nodes are removed from the network, transferring their effect on decisions to a single table which ranks the decision options based on expected cost. For less specific applications, Shacter has shown [Shachter, 1989] that his algorithm can be used for evidence absorption and propagation by removing evidence nodes from the network. Network evaluation is performed using two techniques: arc reversal and node removal. A barren node, i.e. a node wi th no children, can be removed from a network without affecting the underlying distribution. A n arc can be reversed, that is, the arc from a node A to a node B can be tranformed into an arc from B to A, i f there is no other directed path from A to B. Th i s transformation reverses the conditional probabilities as well , and both A and B are "adopted" by each other's direct parents. When only probabilistic nodes are used in the diagram, the procedure of arc reversal corresponds to Bayes' theorem. A r c reversals are used to change nodes having children into barren nodes so they can be removed from the network. Network evaluation can be seen as the sequence of node removals and arc reversals neces-sary to remove al l nodes from the network. Queries to the network can be performed by the sequence of arc reversals necessary to remove the nodes which condition the query. 4.5.1 Adding dynamic constructs Given the very dynamic network evaluation already inherent in Shachter's algorithm, it is not especially helpful to add the k ind of dynamic structures I have presented in Chapter 2. A s an example of the difficulties involved, consider the network in Figure 4.4a. Follow-ing Shachter's algorithm, evaluating the network wi th a series of node removals and arc reversals results in the network of Figure 4.4b. If a node B3 is added to the network in this state, perhaps because a user has just discovered a new individual in the model, the system must perform the evaluation again from the original network. In general, an influence diagram could be built by instantiating a collection of schemata, and Breese demonstrates this by taking a similar approach to building influence diagrams [Breese, 1990]. Shachter's evaluation could be adapted to use combination nodes (i.e., to 4. Implementing dynamic Bayesian networks 68 Figure 4.4: The problem of adding a node after arc reversals have been performed. 4. Implementing dynamic Bayesian networks 69 be used as a noisy-Or gate as in Section 3 . 5 , for example), but if individuals are observed after a network has already been created, and after some evaluation has been performed, a new network must be created, and all of the previous arc reversals and node removals must be repeated for the new network. 4.6 Conclusions In this chapter I have presented several Bayesian network evaluation techniques, and dis-cussed how the dynamic techniques of Chapter 2 might be implemented. Some of these algorithms lend themselves well to dynamic schemata, and they benefit from the observation that the parts of the network which are added dynamically can be treated as if they had always been present in the network; the addition merely has the effect of changing how these nodes affect the rest of the network. There are other network evaluation algorithms in use or being developed which have not been discussed in this chapter. These may also be adaptable to dynamic constructs and may benefit from this same observation. The question of whether an evaluation algorithm can be modified to use dynamic schemata says more about the intended application than about the algorithm itself. For expert system use, a Bayesian network is a batch process, and the computation is designed to be autonomous and inflexible. These constraints are quite exploitable for implementing dynamic constructs for these algorithms. The fact that Shachter's algorithm is typically used in a very interactive consultation with a user, who is quite likely an expert in the domain she is modelling, means that the dynamic constructs are essentially unnecessary. The user herself can add to the network interactively should the necessity arise. Chapter 5 Conclusions 5.1 W h a t d y n a m i c B a y e s i a n n e t w o r k s c a n ' t d o In this section, we consider the l imitations and disadvantages of dynamic Bayesian networks as presented in this thesis. Some of these issues could be addressed as part of future research springing from this thesis. The most obvious disadvantage to the process of instantiating parameterized schemata to create Bayesian networks is the problem of finding individuals: the user of the system must be fully aware of the individual types, and the criteria for deciding types. Furthermore, the correctness of the model constructed by the process is dependent on the competence of the user who supplies the individuals. In a similar vein, the combination mechanisms presented in Chapter 2 take only known individuals into account when performing the combination. A facility for hypothesizing some unknown individual , perhaps based on the condition that a l l causes due to known individuals have been dismissed, seems to be a valuable addition. Another related issue is the type hierarchy itself. A s implemented, the network creating process requires that the user identify the individual 's type, and i f the individual belongs to more than one type (perhaps because private detectives are people too), the user must identify a l l the types to which an individual belongs. O f immediate benefit to the ideas pre-sented in this thesis would be a study on hierarchical types, wi th an eye towards efficiency considerations and representational adequacy. The fact that the knowledge base of schemata impl ic i t ly represents possibly infinitely many, 70 5. Conclusions 71 possibly infinitely large directed networks, the possibility that some of those networks are not acyclic, i.e., not Bayesian networks, is cause for concern. A d d i n g a loop check to the network creation process implementation is a simple matter, but it only identifies directed loops after the schemata are instantiated. Some ut i l i ty or theorem which assures that a knowledge base w i l l not result in cyclic networks would be a desirable result. Currently, the possibility of loops is treated in the same way left-recursive logic programs are treated by the logic programming community. 5.2 O t h e r f u t u r e p o s s i b i l i t i e s One of the fundamental ideas of my approach was to keep the network construction process simple, so that the issue of representation could be more fully addressed. One obvious extension of this thesis would be to design a more sophisticated construction process which would maintain the consistency of the probability distribution and the precision of the mapping between the knowledge base and the networks which can be created from it. The existential and universal combination mechanisms presented in Chapter 2 are only two of many possible ways to combine information. It seems useful to consider how to implement a mechanism in which some fraction of a set of parent random variables must be true in order for an effect to be triggered. 5.3 S o m e f i n a l r e m a r k s This thesis has presented a dynamic approach to the use of Bayesian networks. The approach was motivated by the need to model domains in which it would be difficult to anticipate some of the details. In particular, for domains which can be modelled by general knowledge of properties and their interactions, this general knowledge can be used in specific instances to create a probabilistic model. Changing the instances changes the model. A simple language of parameterized probabilistic knowledge, augmented wi th structures which dynamical ly combine the effects of several causes, was described. A procedure to create networks by instantiating statements in this language wi th particular individuals which have been submitted to the system, was outlined as well . A n implementation of this language was described briefly, and several examples were pre-5. Conclusions 72 sented using this implementation. The language seems to be versatile, but requires some careful programming for effective use. Several algorithms were discussed, due to Pearl [1988], Poole and Neufeld [1989], Lauri tzen and Spiegelhalter [1988], and Shachter [1986], currently used to evaluate Bayesian networks, and some modifications were suggested which would implement dynamic Bayesian networks in these systems. This thesis focussed on providing a precise framework for representing parameterized prob-abilistic knowledge, and a firm basis for building networks from this knowledge. Because the network construction process presented here was kept quite simple, the issue of rep-resentation could be more fully addressed. The result is a language for representing pa-rameterized probabilistic dependencies which precisely defines the mapping between the knowledge and the Bayesian networks created from it , as well as ensuring a consistent probabili ty distr ibution which is based on al l available knowledge. Bibliography [Aho et al, 1974] A . V . Aho , J . E . Hopcroft, and J . D . U l l m a n . The Design and Analysis of v Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. [Aleliunas, 1988] R . Alel iunas. A new normative theory of probabilistic logic. In Proc. Seventh Conf. of the CSCSI, pages 67-74, 1988. [Andreassen et al, 1987] S Andreassen, M . Woldbye, B . Falck, and S .K . Andersen. Munin-—a causal probabilistic network for interpretation of electromyographic findings. In Procedings of the Tenth International Joint Conference on Artificial Intelligence, pages 366-372, 1987. [Breese, 1990] Jack Breese. Construction of belief and decision networks. Rockwell Inter-nation Science Center, forthcoming, 1990. [Cheeseman, 1988] P. Cheeseman. In defense of probability. Computational Intelligence, 4(l) :58-66, 1988. [Cooper, 1990] G . F . Cooper. The computational complexity of probabilistc inference using bayesian belief networks (research note). Artificial Intelligence, 42(2-3):393-405, 1990. [D'Ambrosio, 1989] Bruce D 'Ambros io . Incremental goal-directed probabilistic inference. Dept. of Computer Science, Oregon State University (draft), 1989. [de Kleer and Wi l l i ams , 1987] J . de Kleer and B . C . Wi l l i ams . Diagnosing multiple faults. Artificial Intelligence, 32(1):97-130, 1987. [Gardenfors, 1988] Peter Gardenfors. Knowledge in Flux. M I T Press, Cambridge, Mass., 1988. 73 [Geiger, 1990] Dan Geiger. Graphoids: A qualitative framework for probabiolistic in-ference. Technical Report R-142, Cognitive Systems Laboratory, Dept. of Computer Science, U C L A , 1990. [Goldman and Charniak, 1990] Robert P. Goldman and Eugene Charniak. Dynamic con-struction of belief networks. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 90-97, 1990. [Horsch and Poole, 1990] Michael C . Horsch and David Poole. A dynamic approach to probabilistic inference using bayesian networks. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 155-161, 1990. [Horsch, 1990] Michael C . Horsch. Influence: A n interpreter for dynamic bayesian net-works. Dept. of Computer Science, University of Br i t i sh Columbia . In preparation., 1990. [Jensen et al, 1988] F . V . Jensen, K . G . Olesen, and S .K. Andersen. A n algebra of bayesian belief universes for knowledge based systems. Technical Report R-88-25, Institute of Electronic Systems, Aa lborg University, 1988. [Jensen et al, 1990] F . V . Jensen, S.L Lauri tzen, and K . G Olesen. Bayesian updating in recursive graphical models by local computations. To appear in : Networks, 1990. [Laskey, 1990] K a t h r y n Blackmond Laskey. A probabilistic reasoning environment. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 415-422, 1990. [Lauritzen and Spiegelhalter, 1988] S. L . Lauri tzen and D . J . Spiegelhalter. Loca l compu-tation wi th probabilities on graphical structures and their application to expert systems. J. R. Statist Soc B, 50(2):157-224, 1988. [Lindley, 1965] D . V . Lindley. Introduction to Probability & Statistics from a Bayesian viewpoint. Part 1. Probability. Cambridge University Press, 1965. [Mulder et al, 1978] J . A . Mulder , A . K . Mackworth, and W . S. Havens. Knowledge structuring and constraint satisfaction: The mapsee approach. Technical Report 87-21, Dept. of Computer Science, University of Br i t i sh Columbia , Vancouver., 1978. [Neufeld and Poole, 1987] E . Neufeld and D . Poole. Towards solving the multiple extension problem: Combining defaults wi th probability. In Procedings of the Third Workshop on Uncertainty in Artificial Intelligence, pages 305-312, Seattle, 1987. 74 [Pearl, 1986] Judea Pearl . Fusion, propagation and structuring in belief networks. Artificial Intelligence, 29(3):241-288, 1986. [Pearl, 1988] Judea Pearl . Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Reasoning. Morgan Kauffman Publishers, Los Al tos , 1988. [Poole and Neufeld, 1989] David Poole and Er ic Neufeld. Sound probabilistic inference in prolog: A n executible specification of bayesian networks. 1989. [Poole and Provan, 1990] David Poole and Gregory Provan. Wha t is an opt imal diagnosis. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 46-53, 1990. [Poole, 1989] Dav id Poole. A methodology for using a default and abductive reasoning system. Technical Report 89-20, Dept. of Computer Science, University of Br i t i sh Columbia , Vancouver, 1989. [Reiter and Mackworth, 1989] R . Reiter and A . K . Mackworth. A logical framework for depiction and image interpretation. Artificial Intelligence, 41(2):125-155, 1989. [Reiter, 1987] R . Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32(l) :57-95, 1987. [Schubert, 1988] J . K . Schubert. Cheeseman: A travesty of truth. Computational Intelli-gence, 4(1):118-121, 1988. [Shachter, 1986] Ross D . Shachter. Evaluating influence diagrams. Opns Rsch, 34(6):871-882, 1986. [Shachter, 1988] Ross D . Shachter. Probabil ist ic inference and influence diagrams. Opns Rsch, 36(4):589-604, 1988. [Shachter, 1989] Ross D . Shachter. Evidence absorption and propagation through evi-dence reversals. In Proceedings of the 9th Annual Workshop on Uncertainty in Artificial Intelligence, pages 303-308, 1989. 75 A p p e n d i x A Influence c o d e f o r t h e e x a m p l e s A . l D i a g n o s i s o f m u l t i p l e f a u l t s f o r d i g i t a l c i r c u i t s The following is a complete listing of the knowledge base for modelling circuits. See Section 3.2. b i n a r y o k ( G a t e : g a t e ) . b i n a r y o u t p u t ( G a t e : g a t e ) . b i n a r y i n p u t ( G a t e : g a t e , P o r t : p o r t ) . b i n a r y e x i s t s ( ( G I , G2, P o r t ) , c o n n e c t i o n s , o u t p u t ( G I ) ) . /* a n d — g a t e s */ o k ( a n d _ g a t e ( G ) : g a t e s ) , i n p u t ( a n d _ g a t e ( G ) : g a t e s , 1 ) , i n p u t ( a n d - g a t e ( G ) : g a t e s , 2 ) => o u t p u t ( a n d . g a t e ( G ) : g a t e s ) := [1,0,0,0,0.5,0.5,0.5,0.5]. /* o r — g a t e s */ o k ( o r _ g a t e ( G ) : g a t e s ) , i n p u t ( o r _ g a t e ( G ) : g a t e s , 1 ) , i n p u t ( o r _ g a t e ( G ) : g a t e s , 2 ) => o u t p u t ( o r _ g a t e ( G ) : g a t e s ) := [1,1,1,0,0.5,0.5,0.5,0.5]. /* x o r — g a t e s */ o k ( x o r _ g a t e ( G ) : g a t e s ) , i n p u t ( x o r _ g a t e ( G ) : g a t e s , 1 ) , i n p u t ( x o r _ g a t e ( G ) : g a t e s , 2 ) => o u t p u t ( x o r _ g a t e ( G ) : g a t e s ) := [0,1,1,0,0.5,0.5,0.5,0.5]. 76 A. Influence code for the examples 77 exists((Gl,G2,Port), connections, output(Gl)) => input(G2:gates,Port:ports). := [1,0]. observe type(and-gate(gl).gates). observe type(or.gate(g2).gates). observe type(xor_gate(g3).gates). observe type((and-gate(gl),xor-gate(g3),1) connections), observe type((or_gate(g2),xor_gate(g3),2) connections). A . 2 I n t e r p r e t i n g s k e t c h m a p s The following Influence code creates dynamic Bayesian networks for the domain discussed in Section 3.3 scene objects can be linear or area objects multi isa-linear(X) Q [road, r iver , shore], multi isa-area(X) ® [land, water]. 7.7. Image objects: binary tee(X.Y). '/,'/. chain X meets chain Y in a tee-shape binary chi(X.Y). 7.7. chain X and chain Y make a chi—crossing binary bounds(X,Y). '/,'/, chain X bounds area Y binary interior(X,Y) . '/,'/, Area Y is in the interior of chain X binary exterior(X.Y) . '/.•/, Area Y is exterior of chain X 7.7. Scene descriptions: binary joins(X,Y). '/,'/,linear scene objects join in a tee shape binary crosses(X,Y) . '/,'/, linear scene objects cross binary inside(X.Y). '/,'/, area object X is inside linear object X binary outside(X.Y) . 7.7. area object X is outside linear object X 'I,'!, Trick implementation of equality predicate! '/,'!, Two distinct random variables, which have their Vl, arguments as values: multi vall(X) Q [X, X]. multi val2(Y) 8 [Y, Y]. 7.7. equality is then define i f the values are the same! function equal/2. A. Influence code for the examples 78 v a l l ( X ) , val2(Y) => equal(X,Y) {p(equal(X,Y),[vall(X)=X,val2(Y)=Y]) = i f (X=Y) then 1.0 e lse 0.0}. '/,'/, Two l i n e a r scene objects can j o i n to form a tee i s a - l i n e a r ( X : c h a i n ) , i s a - l i n e a r ( Y : c h a i n ) , j o i n s ( X : c h a i n , Y : c h a i n ) , e q u a l ( X : c h a i n , Y : c h a i n ) => t e e ( X : c h a i n , Y : c h a i n ) := [1, '/.'/. a road can meet i t s e l f i n a t e e , but nothing else 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, •/.'/, equal o b j e c t s . . . 1,1,0, '/,'/, a road or shore can j o i n a r o a d , a shore cannot 1,1,0, 7,7, a road or a r i v e r can j o i n a r i v e r , a shore cannot 1,1,0, 7.7. a road or a r i v e r can j o i n a shore, a shore cannot 0,0,0,0,0,0,0,0,0]. 7.7. j o i n i s f a l s e 7.7. two l i n e a r objects can cross to form a c h i i s a - l i n e a r ( X : c h a i n ) , i s a - l i n e a r ( Y : c h a i n ) , c r o s s e s ( X : c h a i n , Y : c h a i n ) , e q u a l ( X : c h a i n , Y : c h a i n ) => c h i ( X : c h a i n , Y : c h a i n ) 7.7. no scene objects cross themselves := [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 7.7. (equal i s true) 1,1,0, 7.7. roads and r i v e r s can cross a r o a d , not shores 1,0,0, 7.7. roads can cross a r i v e r , not r i v e r s or shores 0,0,0, 7.7. shores cannot cross anything 0,0,0,0,0,0,0,0,0]. 7.7. cross i s f a l s e , so c h i i s not p o s s i b l e 7.7. l i n e a r objects can form closed loops i s a - l i n e a r ( X : c h a i n ) , loop(X:chain) => closed(X:chain) := [1,0,1,0,0,0]. 7.7. only roads and shores form loops 7.7. l i n e a r scene object are found beside area objects i s a - a r e a ( X : r e g i o n ) , i s a - l i n e a r ( Y : c h a i n ) , b e s i d e ( X : r e g i o n , Y : c h a i n ) => b o u n d s ( X : r e g i o n , Y : c h a i n ) . := [1,0, 7.7. only l a n d can be beside roads 1.0, 7.7. only land can be beside r i v e r s 1.1, 7.7. land and water can be beside shores 0,0,0,0,0,0]. 7.7. beside i s f a l s e 7.7. on the l i n e a r object which forms the boundary between two 7,7. area objects , we must c o n s t r a i n the objects involved 7.7. e . g . A road i s not a boundary between two bodies of water i s a - a r e a ( X : r e g i o n ) , i s a - l i n e a r ( Y : c h a i n ) , i s a - a r e a ( Z : r e g i o n ) , i n s i d e ( X : r e g i o n , Y : c h a i n ) , o u t s i d e ( Z : r e g i o n , Y : c h a i n ) => b o u n d a r y - c o n s t r a i n t ( X : r e g i o n , Y : c h a i n , Z : r e g i o n ) . := [1,0, 7.7. only land can be i n s i d e a road boundary 0,0, 7.7. r i v e r boundaries are not allowed A. Influence code for the examples 79 0,1, '/,'/, shores have water inside, land outside 0,0,0,0, 7.7. when water is outside, roads, rivers can't bound 1,0, '/,'/, water outside, shore boundary then only land inside 0,0,0,0,0,0,0,0,0,0,0,0, 7.7. inside and outside are false 0,0,0,0,0,0,0,0,0,0,0,0]. A . 3 T h e B u r g l a r y o f t h e H o u s e o f H o l m e s The following Influence code models the problem of Section 3.4. binary sale-obtained(X). '/,'/, has a sale been obtained by X? binary burglary(X). '/,'/, was there a burglary in X's house? binary go-home(X). '/.'/. does (should) X go home IMMEDIATELY? binary meeting(X.Y). '/,*/, was there a meeting between X and Y? binary earthquake. '/.'/, was there an earthquake? binary radio. 7.7. did the radio report an earthquake ? binary client (X). '/,'/, X has a client? function value(X). '/,'/, what is the value of X's decision? function losses (X). '/,'/, how much did X lose? function income(X). '/,'/, how much does X stand to gain? 7,7, how much is a sale worth? multi sale-value(X) 9 [0,150,300]. 7.7. how much of X's stolen goods were recovered? multi goods-recovered(X) ® [0,500,5000]. 7.7. how much were X's stolen goods worth? multi stolen-goods-value(X) (3 [0,500,5000]. 7.7. when did X report thre burglary? multi burglary-report(X) (3 [immediate, late, none]. 7.7. compute the difference between X's income and losses 7.7. ca l l i t value for X losses(X:alarm-owner), income(X:alarm-owner) => value(X:alarm-owner) := {p(value(X)=V,[losses(X)=V1,income(X)=V2]) i f (V is V2 - VI) then 1.0 else 0.0}. 7.7. X may recover some of his stolen goods depending on 7.7. when X reported the burglary A. Influence code for the examples 80 burglary(X:alarm-owner), burglary-report(X:alarm-owner) => goods-recovered(X:alarm-owner) := [0.01, 1, 0.5, 1 , 0 . 1 , 1 ] . 7.7. If X recovers stolen goods, then there is not loss goods-recovered(X:alarm-owner), stolen-goods-value(X:alarm-owner) => losses(X:alarm-owner) := {p(losses(X)=V,[goods-recovered(X)=V1,stolen-goods-value(X)=V2]) i f (V is V2 - VI) then 1.0 else 0.0}. 7.7. the income for Y depends on getting the sale from a client client(Y:alarm-owner), sale-obtained(Y:alarm-owner), sale-value(Y:alarm-owner) => income(Y:alarm-owner). := {p(income(Y)=V,[client(Y), sale-obtained(Y), sale-value(Y)=V1]) i f (V is VI) then 1.0 else 0.0}. '/,'/, - A meeting may take place between X and Y '/,'/, i f X doesn't go home go-home(X:alarm-owner) => meeting(X:alarm-owner,Y:corporation) := [0, 0.75]. 7.7. X may get a client i f X has a meeting with a corporation exists(X,corporation,meeting(Y:alarm-owner,X)) => client(Y:alarm-owner) := [0.75,0.0]. 7.7. X wi l l report a burglary i f one occurred, and X has gone home */,'/, to verify i t go-home (X: alarm-owner), burglary(X: alarm-owner) => burglary-report(X:alarm-owner) := [1.0, 7,7, X went home immediately and reported a burglary 0.0, 7.7. so the report isn't l a t e . . . 0.0, 7.7. . . . or not reported at a l l 0.0, 7.7. didn't go home, so report can't be immediate 1.0, 7.7. assume X reports a burglary eventually... 0.0, 7.7. (X reported the burglary LATE, not NONE) 7.7. no burglary, no report! 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]. 7.7. i f there was a burglary, then goods of some value have been 7.7. stolen burglary(Y:alarm-owner) => stolen-goods-value(Y:alarm-owner) := [0.05, 7.7. burglary, but s-g-v=0 0.55, 7.7. burglary, but s-g-v=500 0.4, 7.7. burglary, but s-g-v=5000 1.0, 7.7. no burglary, and s-g-v=0 0.0, 7.7. no burglary, and s-g-v=500 0.0] .7.7. no burglary, and s-g-v=5000 A. Influence code for the examples 81 '/,'/, X w i l l meet w i t h Y t o t a l k s a l e s . A s a l e may o c c u r m e e t i n g ( X : a l a r m - o w n e r , Y : c o r p o r a t i o n ) => s a l e - o b t a i n e d ( X : a l a r m - o w n e r , Y : c o r p o r a t i o n ) := [0.5, 0.1]. 7.7. an a l a r m i s a f f e c t e d by e a r t h q u a k e s and b u r g l a r i e s 7,7. - t h e a l a r m w i l l a lmost d e f i n i t e l y sound i f b o t h an ea r t h q u a k e 7,7. and a b u r g l a r y o c c u r , and almost d e f i n i t e l y w i l l n o t sound i f 7.7. n e i t h e r o c c u r b u r g l a r y ( Y : a l a r m - o w n e r ) , earthquake => alarm(Y:alarm-owner) := [0.99, 0.1, 0.8, 0.001]. 7.7. e a r t h q u a k e s t e n d t o be r e p o r t e d on t h e r a d i o . . . e a r t h q u a k e => r a d i o := [0.98, 0.0]. 7.7. Phone c a l l s from n e i g h b o u r s about a l a r m s . 7.7. - n e i g h b o u r s u s u a l l y o n l y c a l l when an a l a r m sounds 7.7. - non - n e i g h b o u r s don't c a l l a t a l l ! ne i g h b o u r ( X : p e r s o n , Y : a l a r m - o w n e r ) , alarm(Y:alarm-owner) => p h o n e - c a l l ( X : p e r s o n , Y : a l a r m - o w n e r ) := [0.6, 0.0, 0.2, 0.0]. Appendix B Source code to Influence B . l p r o b a b i l i t y . p l /* * p r o b a b i l i t y . p l * The s e t o f p r o b a b i l i t y axioms from which we can compute * many p r o b a b i l i t i e s w i t h o u t much e f f o r t . The r e a l * work i s i n t h e l a s t axiom, w h i c h i s Bayes' theorem, * and t h e c a l l t o dyn_p, where we r e a s o n by c a s e s . * * (The M u l t i p l i c a t i o n R u l e f o r p r o b a b i l i t i e s used here * i s r e a l l y s t u p i d . Improvements can be made here!) */ '/.'/, The f i r s t two r u l e s a l l o w t h e u s e r t o query c o n v e n i e n t l y . '/,'/, And d i s p l a y s t h e answer n i c e l y , t o b o o t . . . p(A) :- p ( A , [ ] ) . 7,7, F i r s t , check t o see i f t h e c o n d i t i o n i n g knowledge i s c o n s i s t e n t . . . p(A,B) :-g i v e n _ i n c o n s i s t e n t ( B ) , w r i t e _ p ( A , B , ' u n d e f i n e d ' ) , n l . p(A,B) :- p ( A , B , P ) , w r i t e _ p ( A , B , P ) , n l . 82 B. Source code to Influence 83 7.7. p (+A, +Given, - P r o b a b i l i t y ) '/,'/, - c a l c u l a t e the p r o b a b i l i t y of l o g i c a l expression +A condit ioned VI, by +Given, to be returned i n - P r o b a b i l i t y 7.7. - i f +A i s a l i s t , then t h i s l i s t i s a conjunct of p r o p o s i t i o n s VI, Check to see i f i t i s a p r i o r that we know i n the database. •/„•/. p ( X , A , P ) 7.7. : - p r ( X , A , P , _ ) . 7.7. I f X i s a predefined node type, then we do i t here. p ( X , A , P ) : - b u i l t _ i n _ p ( X , A , P ) . 7.7. Otherwise, we have to do some c a l c u l a t i o n s . . .through axiom_p p ( X , A , P ) : - axiom_p(X,A,P). 7.7. I f X i s a q u a n t i f i c a t i o n n o d e . . . p ( X , A , P ) : - quant_p(X,A,P). 7.7. I f none of the axioms above can c a l c u l a t e a value, then 7.7. we must r e s o r t to reasoning by cases, which i s performed 7.7. by the r u l e s created during c o m p i l a t i o n , under the 7.7. head of 'dyn_p' . p ( X , A , P ) : - dyn_p(X,A,P). quant_p(exists(Parameter,Type,Variable),Cond,P) : - get_indivs_of_type(Type,Cond,Indivs), i n s t a n t i a t e ( V a r i a b l e , Parameter, I n d i v s , Instances) , or_p(Instances,Cond,P). quant_p(foral l(Parameter.Type.Variable) ,Cond,P) : - get_indivs_of_type(Type,Cond,Indivs), i n s t a n t i a t e ( V a r i a b l e , Parameter, I n d i v s , Instances) , and_p(Instances,Cond,P). 7.7. get_indivs_of_type(Type,Cond,Indivs) g e t _ i n d i v s _ o f . t y p e ( _ , [ ] , [ ] ) . get_indivs_of_type(Type,[type(Indiv,Type)|Conds],[Indiv|Indivs] ) • - i • * get_indivs_of_type(Type,Conds,Indivs). get_indivs_of_type(Type,[_|Conds],Indivs) : - get_indivs_of_type(Type,Conds,Indivs). B. Source code to Influence 84 instantiate (_, _ , [], []). instantiate(Variable, Parameter, Indivs, Instances) : - bagof(Variable,member(Parameter,Indivs).Instances). built_in_p(X,L,P) : - and_node(X), i parents(X,PL), and_p(PL,L,P). built_in_p(X,L,P) : - or_node(X), ! » parents(X,PL), or_p(PL,L,P). / * theorem : i f a node X is a model of 'and', and i t has parents U=-Cul,.. ,un} then p(X|C) = p(ul unlC). Proof: t r i v i a l . :-) */ and_p(L,C,P) : - p(L,C,P) . / * theorem: i f a node X is a model of 'or' and i t has parents U={ul,..,un} then p(X|C) = p(ul ; . . . ;un |C) = p(ul|C) + p(u2;. . . ;un|"ul C)*p(~ul|C) = p(ul|C) + p(u2; . . . ;un | 'u l C)*( l - p(ul|C)) which is recursive. Proof: t r i v i a l . :-) */ or_p([],_,0). or_p([A],C,P) : - p(A,C,P). or_p([A|L],C,P) B. Source code to Influence 85 :- p(A,C,Pl), or_p(L,[-A|C],P2), P = PI + P2*(l-Pl). 7.7. axiom_p(+A,+Given,-Prob) 7,7. - some axioms for calculating probabilities. 7.7. - variables the same as in p(A,Given,Prob) 7.7. Multiplication rule for probabilities 7,7. _ base case: one conjunct should be computed simply. axiom_p([A],B,P) :- !, p(A,B,P). 7.7. Multiplication rule for probabilities 7.7. - recursive step: find the probability of the f i r s t 7,7. conjunct, and i f this is non-zero, find the probability 7.7. of the remainder of the conjuncts conditioned on the f i r s t 7.7. by multiplication. 7,7. - this could be a lot smarter, I think. 7.7. (by making use of conditional independence, for example) 7,7. (or by using the fact that some variables may have no 7.7. parents) axiom_p([B|A],C,P) :- !, p(B,C,PEval), eval(PEval,P2), (P2 =:= 0, P = 0; p(A,[B|C],Pl), P = Pl*PEval). 7.7. Negation rule 7.7. - pCAlC) = 1 - p(A|C) axiom_p(~A,B,P) :- !, p(A,B,Pl), P = 1-P1. 7.7. Definiteness rule #1 7.7. - p(A|A) = 1 axiom_p(A,B,p(A,B,l)) :-member(A,B),!. 7,7, Definiteness rule #2 7.7. - p(A|-A) =0 B. Source code to Influence 86 axiom_p(A,B,p(A,B,0)) :-member(~A,B),! . 7.7. D e f i n i t e n e s s r u l e #2 f o r m u l t i v a l u e d v a r i a b l e s axiom_p(A=X,B,p(A,B,0)) :-member(A=Y,B),X \== Y,!. '/,'/, I n c o m p l e t e n e s s r u l e f o r m u l t i v a l u e d v a r i a b l e s '/,'/, - i f X i s n o t i n t h e l i s t o f known v a l u e s f o r '/,'/, A, t h e n sum a l l t h e p r o b a b i l i t i e s f o r a l l t h e v a l u e s 7.7. w h i c h A can t a k e , and s u b t r a c t t h i s from 1. 7.7. - assumes sum_p(A,L,B,P) <= 1 7.7. - o n l y u s e f u l i f we a l l o w i n c o m p l e t e s p e c i f i c a t i o n o f 7.7. v a l u e l i s t s . axiom_p(A=X,B,Pl) :-m u l t i v a l u e d _ n o d e ( A , L ) , \+ member(X,L), sum_p(A,L,B,P), P I = 1 -P. axiom_p(A=X,B,Pl) :-m u l t i v a l u e d _ n o d e ( A , _ ) , remove("A=Y,B,BB), p(~A=Y,BB,P2) ) p(A=X,BB,P3), P i = P3 / P2. 7.7. Bayes' R u l e 7.7. - i f t h e r e i s e v i d e n c e i n t h e g i v e n l i s t w h i c h i n f l u e n c e s 7.7. t h e p r o p o s i t i o n H d i r e c t l y , t h e n do bayes' r u l e . . . 7.7. - Why? Because we have knowledge about p(E|H) which 7,7. i s f o u n d i n t h e p r i o r s . axiom_p(H,L,P) :-remove(E,L,K), i n f l u e n c e s ( H , E ) , i • » p ( E , [ H | K ] , P E v a l ) , e v a l ( P E v a l , P l ) , ( P I =:= 0, P = 0 ;p(H,K,P2), p ( E , K , P 3 ) , B. Source code to Influence 87 (P3 = 0, ! , format('Error: Division by zero: ~w."n',[p(E,K,P3)] ) , f a i l ;P = PEval * P2 / P3)). 7.7. Auxiliary predicate sum_p used in Indefiniteness rule for 7.7. multivalued variables. 7.*7.sum_p(_, [] ,_,0). sum_p(A,[V],B,P) : -p(A=V,B,P). sum_p(A,[VlValues],B,P) : -sum_p(A,Values,B,PI), p(A=V,B,P2), P = PI + P2. B . 2 e x p a n d . p l /* * expand — expand for multi-valued and binary variables. * no independence or any other clever tr ick . * expands propositional variables in the base manner * expands multivalued variables as i f X=xl were a proposition * / y////////////////////:/:///;///////////////;///.v.y;/x/////////.y//x///.7. '/.'/. expand( +A, +T, +L, +CV, -Body, -P) 7, 7//.7.7;/.r/.7.7.7.7//////////.7.7.7.7.7.7////////.7X/J.7.7////.7.7.7.7.7.7.7. 7.7. expansion of binary proposition A on parents in T VI, Variables: 7.7. + A is the variable around which we expand 7.7. +T is a truth table entry for A; i n i t i a l l y empty l i s t ! 7.7. + L is a l i s t of immediate parents of A, from which T is 7,7. constructed recursively 7.7. +CV is a l i s t of conditioning variables, not necessarily VI, parents of A 7.7. -Body is the body of the rule which computes VI, subcalculations for p(A|CV) 7,7. _ P is the algebraic expression which ' i s ' must use 7,7. to compute p(A|CV) from the subcalcs done in -Body 7.7. B. Source code to Influence 88 '/.'/. p(A|T CV) = p(A|Pred T CV)p(Pred|T CV) + p(A|"Pred T CV)p("Pred|T CV) 7.7. Method 1: (base case) 7.7. - the parents of A which were in L have been transferred to '/,•/, the truth table l i s t T, leaving L = []. '/,'/, - in this case, we need the prior for this configuration T 7,7, - ask Prolog for the prior, which in turn may ask the user, 7,7. i f there is no such prior in the database. 7.7. - returns the prior, as well as the Body 'true', meaning 7.7. that there is no associated calculation necessary to 7,7. compute the prior (it's a given quantity) expand(A,T,[],_,pr(A,T,P,_),P) . 7.7.expand(A,T,[] ,_,true,pr(A,T,P)) :-7.7.pr(A,T,P,_). 7.7. Method 2: (multiple valued variables) 7.7. - i f the f i r s t variable in the truth table is a multivalued variable, 7.7. handle i t separately. expand(A,T,[ZlPred],CV,Body ,P) :-multivalued_node(Z,Values), i expand.mult i(Z,Value s,A,T,Pred,CV,Body,P). 7.7. Method 3: 7.7. - to expand a binary variable, construct the truth table 7.7. for i t by recursively calling expand, once with the 7.7. f i r s t remaining parent Z, and once with Z negated. 7,7. - return as the Body the sub-calculation of P(Z|CV), and 7.7. the reasoning-by-cases formula which uses this value expand(A,T,[ZlPred],CV,Body ,(E1*PZ+E2*(1-PZ)) ) :-expand(A,[Z|T],Pred,[Z|CV],Cl,El), expand(A,["Z|T],Pred,["Z|CV],C2,E2), simp_body((Cl,C2,p(Z,CV,PZ)).Body) . 7.7. expand_multi(+Multi, +Values, +A, +T, +L, +CV, -Body, -P) :-7.7. similar to expand but for multivalued variables 7.7. +Multi is the multiply valued variable in question 7.7. +Values is the l i s t of values for Multi 7.7. +A is the variable we are expanding around (as in expand) 7.7. +T, +L, +CV, -Body, -P are a l l exactly as in expand 7,7. Basic description: for each value in +Values we find 7,7. the expansion of Multi=value; kinda breadth-wise 7.7. (we have to use each value here, whereas in expand B. Source code to Influence 89 7.7. i t s u f f i c e d t o use p(Z|CV) and c a l c u l a t e p("Z|CV) 7.7. as (1 - p ( Z | C V ) ) . When Z i s m u l t i v a l u e d , t h i s doesn't 7.7. work. 7,7. Method 1: (base case) 7.7. - i f we have o n l y one v a l u e l e f t i n t h e l i s t o f v a l u e s , t h e n 7,7. c a l l expand t o c o n t i n u e t h e e x p a n s i o n t o f u l l d e p t h 7.7. - r e t u r n as Body t h e s u b c a l c u l a t i o n c a l l t o p ( M u l t i = V a l u e |CV), 7.7. and as t h e n u m e r i c a l e x p r e s s i o n t h e p r o d u c t o f t h i s and 7,7. t h e e x p r e s s i o n r e t u r n e d by expand. e x p a n d . m u l t i ( M u l t i , [ V a l u e ] , A, T, P r e d , CV, Body, (PM * E)) :-e x p a n d ( A , [ M u l t i = V a l u e I T ] . P r e d , [ M u l t i = V a l u e | C V ] , B , E ) , s i m p _ b o d y ( ( p ( M u l t i = V a l u e , C V , P M ) , B ) . B o d y ) . 7.7. Method 2: 7.7. - r e c u r s i v e l l y expand A u s i n g each v a l u i e o f M u l t i i n t h e V a l u e s 7.7. l i s t 7.7. - t h e body r e t u r n e d i s t h e c o n j u n c t i o n o f t h e s u b c a l c s 7.7. and t h e e x p r e s s i o n i s a s i m p l e a d d i t i o n . e x p a n d _ m u l t i ( M u l t i , [Value I V a l u e s ] , A, T, P r e d , CV, Body, ( E l + E2)) :-expand.mult i ( M u l t i , V a l u e s , A , T , P r e d , C V , B l , E l ) , e x p a n d . m u l t i ( M u l t i , [ V a l u e ] , A , T , P r e d , C V , B 2 , E 2 ) , s i m p _ b o d y ( ( B l , B 2 ) , Body). 7.7. R u l i f y ( + H , + B , - R u l e ) 7.7. - c r e a t e s a Ru l e t o c a l c u l a t e p(H| A n y t h i n g ) by 7.7. u s i n g r e a s o n i n g by c a s e s . 7.7. +H i s t h e v a r i a b l e w h i ch i s t o be r u l i f i e d 7.7. + B i s t h e l i s t o f immediate p a r e n t s o f H 7.7. -Rule i s a p r o l o g r u l e w h i c h s h o u l d be a s s e r t e d . 7.7. Method 1: m u l t i p l e v a r i a b l e s . 7.7. - c r e a t e s a r u l e w h i ch h a n d l e s o n l y one v a l u e 7.7. o f t h e v a r i a b l e . 7.7. - u s i n g b a c k t r a c k i n g , a l l t h e r u l e s w i l l be g e n e r a t e d 7.7. (so be su r e t o ' f a i l ' a f t e r a c a l l t o r u l i f y t o get 7.7. a l l t h e a p p r o p r i a t e r u l e s ) . 7.7. I s s u e : we t r e a t each 'H=value' t h e same as a b i n a r y v a r i a b l e r u l i f y ( H , B , ( d y n _ p ( H = V , C , P ) :- Body)) :-m u l t i v a l u e d _ n o d e ( H , . V a l u e s ) , ! , 7,*7. r e m o v e ( V , V a l u e s , _ ) , expand(H=V,[],B,C,R,E), simp_body((R, P = E ) , Body). B. Source code to Influence 90 '/,'/, Method 2: Binary values '/,'/, - creates a rule for binary variables '/,'/, - doesn't succeed on backtracking •/.*y.rulify(H,B,((dyn_p(H,C,Combine) : - Body))) : -y.*'/.binary_node(H) , '/.•'/.bagof (type(Param.Type) .remove(type(Param,Type) ,B,_) .TLis t ) , •/.••/.remove.KTList.B.NewB), '/.*y.expand(H, [] .NewB.C.R.E) , •/,*y.simp_body((R, P = E) , Prebody), '/.•"/.Body = (bagof (combine_p(Prebody,P,M) .mapmember(TList,C,M) .Combine) , y,*y,combine_p(H,C,Combine)), */.*•/.!. rulify(H,B,((dyn_p(H,C,P) : - Body))) : -binary_node(H), expand(H,[],B,C,R,E), simp_body((R, P = E) , Body). •/.•/, simp_body(+01d, -New) '/,'/, - does some simplemided simplification '/,'/, on the rules generated by expand, or ru l i fy . •/.'/, - easy simp_body((true,P),PS) : -i simp_body(P,PS). simp_body((P,true),PS) : -i simp_body(P,PS). simp_body((A;B),(AS;BS)) : -• » simp_body(A,AS), simp.body(B,BS). simp_body((A,P),(AS,PS)) : -i simp_body(A,AS), simp_body(P,PS). simp_body(P,P). B. Source code to Influence 91 B . 3 n e t w o r k . p l •/.•/, add_to_net(Parents,Child) add_to_net(Parents.Child) :- listify(Parents,L), add_parents(Child,L), add_children(L,Child). '/.•/.(member(exists(X,T,V) ,L) '/,'/-> add_parents(exists(X,T,V), [V]) %%; true), 7.7.(member(forall(X,T,V) ,L) 7.7.-> add.parents(forall(X,T,V), [V]) 7.7.; true) . add_parents(A,P) :- (retract(st_node(A,S,Pl,C)) -> append(P,PI,Pall), assert(st_node(A,S,Pall,C)) ; assert(st_node(A,[bin],P,[]))), assert_check(chgd(A)). add_children([],_). add_children([P|Pl].Child) :- add.child(P,Child), add_children(Pl.Child),!. add_child(A,C) :- (retract(st_node(A,S,P,Cl)) -> assert(st_node(A,S,P,[C|C1])) ; assert(st_node(A,[bin],[],[C]))). 7.7.assert_check(chgd(A)). remove_from_net(Parent.Child) :- remove_child(Parent.Child), remove_parent(Parent.Child). remove_child(P,C) :- children(P,L), remove(C,L,NewL) -> retract(st.node(P,S,GP, L)), assert(st.node(P,S,GP.NewL)) ; formatC'Can't delete "w from child l i s t of "w!"n", [C.P]). remove_parent(P,C) :- parents(C,L), B. Source code to Influence 92 remove(P,L,NewL) -> r e t r a c t ( s t _ n o d e ( C , S , L , G C ) ) , a s s e r t ( s t _ n o d e ( C , S , N e w L , G C ) ) , a s s e r t _ c h e c k ( c h g d ( C ) ) ; f o r m a t ( " C a n ' t d e l e t e "w from p a r e n t l i s t o f ~w!~n", [ P , C ] ) . c h i l d r e n ( A , L ) :- s t _ n o d e ( A , _ , _ , L ) . p a r e n t s ( A , L ) :- s t _ n o d e ( A , _ , L , _ ) . node(A) :- s t _ n o d e ( A , _ , _ , _ ) . B .4 priors.pl a l l _ p r i o r s ( _ , [ ] , L ) :- a s s e r t a _ l i s t ( L ) . a l l _ p r i o r s ( V a r , [ n e w _ p r i o r ( A , B , P ) | L ] , L 2 ) w r i t e _ p ( A , B , P ) , n l , w r i t e ( ' C h a n g e t h i s v a l u e ? ' ) , r e a d ( A n s ) , ( p r i o r ( A , B , P ) . r e t r a c t ( p r i o r ( A , B , P ) ) ; n e w _ p r i o r ( A , B , P ) , r e t r a c t ( n e w _ p r i o r ( A , B , P ) ) ) , (Ans == y, a s s e r t _ c h e c k ( c h g d ( V a r ) ) , a s k _ p r i o r ( A , B , P l ) , a l l _ p r i o r s ( V a r , L , [ n e w _ p r i o r ( A , B , P l ) | L 2 ] ) ; a l l . p r i o r s ( V a r , L , [ n e w . p r i o r ( A , B , P ) | L 2 ] ) ) . a s k _ p r i o r ( P r o p , L , P ) :- n e w _ p r ( P r o p , L , P , L 2 ) , ! , r e t r a c t ( n e w _ p r i o r ( P r o p , L 2 , P ) ) . a s k _ p r i o r ( P r o p , L , P ) :- \+ \+ w r i t e _ p r o b a b i l i t y ( P r o p , L ) , r e a d ( P ) . g e t _ p r i o r s ( P r o p , _ , _ ) B. Source code to Influence 9 3 :- h i d d e n i . n o d e ( P r o p ) , ; g e t _ p r i o r s ( P r o p , [ ] , L ) :- m u l t i v a l u e d _ n o d e ( P r o p , V a l u e s ) , i e a c h ( ( r e m o v e ( V , V a l u e s , _ ) , a s k _ p r i o r ( P r o p = V , L , P ) , */,'/, u n n u m b e r v a r s ( p r i o r ( P r o p = V , L , P ) ,NP), a s s e r t a ( p r i o r ( P r o p = V , L , P ) ) ) ) . g e t _ p r i o r s ( P r o p , [ ] , L ) :-a s k _ p r i o r ( P r o p , L , P ) , 7.7. unnumbervars ( p r i o r ( P r o p , L,P) ,NP), a s s e r t a ( p r i o r ( P r o p , L , P ) ) . g e t _ p r i o r s ( P r o p , [ A I L i s t ] , L i s t 2 ) :- m u l t i v a l u e d _ n o d e ( A , V a l u e s ) , * 9 each. ( ( r e m o v e ( V , V a l u e s , . ) , g e t _ p r i o r s ( P r o p , L i s t , [ A = V | L i s t 2 ] ))). 7 . * 7.get_priors(Prop, [A | L i s t ] ,List2) 7.*7.:- b l o c k s (A,B2,Prop), 7.*7.remove(B2.List .NewList) , 7.*7.get_priors ( P r o p , L i s t , [A|List2]), 7.*7.get_priors ( P r o p , N e w L i s t , ["A | List2] ) . 7 . * 7.get_priors(Prop, [ t y p e ( _ , _ ) | L i s t ] ,List2) 7.*7.:~ ! , g e t . p r i o r s ( P r o p , L i s t ,List2). g e t _ p r i o r s ( P r o p , [ A l L i s t ] , L i s t 2 ) :- g e t _ p r i o r s ( P r o p , L i s t , [ A | L i s t 2 ] ) , g e t _ p r i o r s ( P r o p , L i s t , [ ~ A | L i s t 2 ] ) . c h a n g e _ p r i o r s ( A ) :- m u l t i v a l u e d _ n o d e ( A , _ ) , ( s e t o f ( n e w _ p r i o r ( A = V , B , P ) , p r i o r ( A = V , B , P ) , B a g ) ; s e t o f ( n e w _ p r i o r ( A = V , B , P l ) , n e w _ p r i o r ( A = V , B , P l ) , B a g ) ) , a l l _ p r i o r s ( A , B a g , [ ] ) . c h a n g e _ p r i o r s ( A ) :- ( s e t o f ( n e w _ p r i o r ( A , B , P ) , p r i o r ( A , B , P ) , B a g ) ; s e t o f ( n e w _ p r i o r ( A , B , P l ) , n e w _ p r i o r ( A , B , P l ) , B a g ) ) , B. Source code to I n f l u e n c e 94 a l l _ p r i o r s ( A , B a g , [ ] ) . a s s i g n _ p r i o r s ( A = V , L , P ) :- m u l t i v a l u e d _ n o d e ( A , V a l u e s ) , member(V,Value s ) , ( p r i o r ( A = V , _ , _ ) , f o r m a t ( ' P r i o r s e x i s t f o r ~w. Use "change ~w".~n',[A,A]) ; a s s e r t ( n e w _ p r i o r ( A = V , L , P ) ) ) , j . a s s i g n _ p r i o r s ( A , L , P ) :- b i n a r y _ n o d e ( A ) , ( p r i o r ( A , _ , _ ) . f o r m a t ( ' P r i o r s e x i s t f o r "w. Use "change ~w"."n',[A,A]) ; a s s e r t ( n e w _ p r i o r ( A , L , P ) ) ) , ; l i s t _ p r i o r s :-p r i o r ( A , L , P ) , w r i t e _ p ( A , L , P ) , f a i l . l i s t _ p r i o r s :-n e w _ p r i o r ( A , L , P ) , w r i t e _ p ( A , L , P ) , f a i l . l i s t _ p r i o r s . p r ( A ) L , p r ( A ) L , P ) , L ) :- prior(A ,L2,P), match(L,L2),!. new.pr(A,L,P,L2) :- new_prior(A ,L2,P), match(L,L2),!. B .5 combine.pl p(A,C,P) :- ground(A,C,GroundedA), c o m b i n a t i o n _ m o d e l ( A , M o d e l ) , combine_p(Model,GroundedA,C,P). B. Source code to Influence 95 ground ( [ ] , [ ] ) . g r o u n d ( [ A | L ] , L 1 ) :- ground(A,NewA), ground(L.NewL), append(NewA,NewL,Ll). gr o u n d ( A , [ A ] ) :- atom(A). g r o u n d ( A , [ A ] ) :- f u n c t o r ( A , N a m e , A r g s ) , a l l _ g r o u n d e d ( A , A r g s ) . ground(A,GA) :- f u n c t o r ( A , N a m e , A r g s ) , f i r s t _ n o t _ g r o u n d e d ( A , A r g s , N ) , c o l l e c t _ i n d i v i d u a l s ( A , N , N e w _ A ) , ground(New_A,GA). c o l l e c t _ i n d i v i d u a l s ( P r o p , A r g , L i s t ) a l l _ g r o u n d e d ( A , N ) :- f i r s t _ n o t _ g r o u n d e d ( A , N , 0 ) . f i r s t _ n o t . g r o u n d e d ( A , 0 , 0 ) . f i r s t _ n o t _ g r o u n d e d ( A , N , N ) :- a r g ( N , A , X ) , v a r ( X ) . f i r s t _ n o t _ g r o u n d e d ( A , N , M ) :- a r g ( N , A , X ) , \+ v a r ( X ) , NI i s N - l , f i r s t _ n o t _ g r o u n d e d ( A , N l , M ) . B .6 c o m b i n e d . p l '/,'/, From p r e p r o c e s s . p l d i r _ i n f l ( A , B ) B. Source code to Influence 96 :- c o n s i s t e n t ( A , B ) , l i s t ( A , P ) , c o l l e c t _ f r e e ( P , N e w P , P a r a m s ) , combine(NewP,B,Params). combine(P,B,[]) :- ! , a d d _ t o _ n e t ( P , B ) . combine(P,B,Params) :- f u n c t o r ( B , N a m e , A r g s ) , gensym(Name.NewName), l e n g t h ( P a r a m s , L ) , NewArgs i s Args + L, functor(NewB,NewName,NewArgs), declare_combined(Name/Args), declare_bin(NewName/NewArgs), declare_hidden(NewName/NewArgs), unify_args(B,Args,NewB,NewArgs,Params), add_to_net(P,NewB), add_to_net([NewB|Params],B). u n i f y _ a r g s ( _ , 0 , _ , _ , [ ] ) . unify_args(Old, 0,New,NewArgs,[type(X,_)I O t h e r A r g s ] ) :- arg(NewArgs,New,X), N e x t A r g i s NewArgs - 1, u n i f y _ a r g s ( O l d , 0 , N e w , N e x t A r g , O t h e r A r g s ) . u n i f y _ a r g s ( O l d , A r g s , N e w , N e w A r g s , O t h e r A r g s ) :- a r g ( A r g s , O l d , X ) , arg(Args,New,X), N e x t A r g i s Args - 1, u n i f y _ a r g s ( O l d , N e x t A r g , N e w , N e w A r g s , O t h e r A r g s ) . c o l l e c t . f r e e ( [ ] , [ ] , [ ] ) : - ! . c o l l e c t _ f r e e ( P , N e w , [ t y p e ( X . T y p e ) | P a r a m s ] ) :- remove(type(X,Type),P,NewP), i • > c o l l e c t _ f r e e ( N e w P , N e w , P a r a m s ) . c o l l e c t _ f r e e ( P , P , [ ] ) . '/•'/• ******************* '/,'/, From p r o b a b i l i t y . p l •/,*/, ******************* B. Source code to Inf luence 97 '/,'/, I f t h e node i s p a r a m e t e r i z e d and has ' f r e e ' p a r a m e t e r s , t h e n '/,'/, we have t o d e a l w i h t i t s p e c i a l l y . . . p(A,B,C) :- combined_p(A,B,C). combined_p(X,L,P) :- combined_node(X), i • > p a r e n t s ( X , [ H i d d e n I T y p e s ] ) , c o l l e c t _ t y p e s ( H i d d e n , T y p e s , L , C o m b i n e ) , or_p(Combine,L,P). c o l l e c t _ a l l _ t y p e s ( L , [ ] , _ , L ) . c o l l e c t _ a l l _ t y p e s ( [ ] , _ , _ , [ ] ) . c o l l e c t _ a l l _ t y p e s ( [ H i d d e n I O t h e r s ] . T y p e s , G i v e n , O u t ) :- c o l l e c t _ t y p e s ( H i d d e n , T y p e s , G i v e n , O u t l ) , c o l l e c t _ a l l _ t y p e s ( O t h e r s . T y p e s . G i v e n , O u t 2 ) , append(Outl ,0ut2,0ut) . c o l l e c t _ t y p e s ( H i d d e n , _ , _ , [ H i d d e n ] ) :- i n s t a n t i a t e d ( H i d d e n ) . c o l l e c t _ t y p e s ( H i d d e n , T y p e s . G i v e n , O u t ) :- p a r e n t s ( H i d d e n , P ) , mapmember(P,Given,Outl), r e m o v e ( O u t l . G i v e n , N G i v e n ) , c o l l e c t _ t y p e s ( H i d d e n , T y p e s , N G i v e n , O u t ) . c o l l e c t _ t y p e s ( H i d d e n , [ t y p e ( Z . T y p e ) i T y p e s ] . G i v e n , O u t ) :- b a g o f i ( t y p e ( Z , T y p e ) , H i d ) , ( m e m b e r ( t y p e ( Z , T y p e ) . G i v e n ) , Hidden = Hid),Comb), e x t r a c t _ h i d ( C o m b , E x t r a c t ) , c o l l e c t _ a l l _ t y p e s ( E x t r a c t . T y p e s . G i v e n , O u t ) . i n s t a n t i a t e d ( F ) :- f u n c t o r ( F , _ , A r i t y ) , i n s t a n t i a t e d ( F , A r i t y ) . i n s t a n t i a t e d ( _ , 0 ) . i n s t a n t i a t e d ( F , A r i t y ) :- a r g ( A r i t y . F . X ) . \+ v a r ( X ) , AA i s A r i t y - 1, i n s t a n t i a t e d ( F , A A ) . B. Source code to Inf luence 98 B . 7 consistent.pl TI. c o n s i s t e n t ( + A , + B ) '/,'/, - c o m p l a i n i f t h e u s e r t r i e s any k i n d o f s t u p i d i t y '/.'/. - +A, wh i c h may be a l i s t , a re i n t e n d e d t o be d i r e c t p a r e n t s o f +B '/,'/, - we don't want d i r e c t e d c y c l e s i n our network! c o n s i s t e n t ( ( A , L ) , B ) c o n s i s t e n t ( A , B ) , c o n s i s t e n t ( L , B ) . c o n s i s t e n t ( A , B ) :- \+ \+ ( n u m b e r v a r s ( ( A , B ) , 0 , _ ) , A = B ) , !, f o r m a t ( ' E r r o r : "w cannot i n f l u e n c e i t s e l f . ~ n ' , [ A ] ) , f a i l . c o n s i s t e n t ( A , B ) :- p a r e n t s ( B , L ) , member(A,L), i • > f o r m a t ( ' E r r o r : A l r e a d y know ~w => ~w.~n',[A,B]), f a i l . c o n s i s t e n t ( A , B ) :- c h i l d r e n ( B , L ) , member(A,L), i • i f o r m a t ( ' E r r o r : A l r e a d y know ~w => ~w.~n',[B,A]), f a i l . c o n s i s t e n t ( A , B ) :- i n f K B . A ) , ! » f o r m a t ( ' E r r o r : ~w i n f l u e n c e s "w a l r e a d y . ~ n ' , [ B , A ] ) , f a i l . c o n s i s t e n t ( _ , _ ) . g i v e n _ i n c o n s i s t e n t ( [ _ ] ) :- f a i l . g i v e n _ i n c o n s i s t e n t ( [ " A | L ] ) :-member(A,L). g i v e n _ i n c o n s i s t e n t ( [ A | L ] ) :-member("A,L). g i v e n _ i n c o n s i s t e n t ( [ _ | L ] ) :-g i v e n _ i n c o n s i s t e n t ( L ) . B. Source code to Influence 99 B . 8 d o n e . p l p r o c e s s _ a n d _ c o m p i l e :- e a c h ( ( c h g d ( A ) , ( m u l t i v a l u e d _ n o d e ( A , _ ) -> r e t r a c t ( p r i o r ( A = _ , _ , _ ) ) , each( r e t r a c t ( s t _ i n f l ( A , B ) ) ) , e a c h ( ( s t _ n o d e ( A , _ , _ , _ ) , s t _ n o d e ( B , _ , _ , _ ) , i n f l ( A , B ) , a s s e r t ( s t . i n f l ( A , B ) ) ) ) , e a c h ( ( c h g d ( A ) , p a r e n t s ( A , L ) , g e t _ p r i o r s ( A , L , [ ] ) ) ) , •/. • / , b u i l d _ p r i o r _ s t r ( A , P S ) , '/.'/.assert(PS) ) ) , e a c h ( ( c h g d ( A ) , '/,*/. ( b i n a r y _ n o d e ( A ) ; m u l t i v a l u e d _ n o d e ( A , _ ) ) , p a r e n t s ( A , L ) , r e v e r s e d . , R L ) , r u l i f y ( A , R L , R ) , a s s e r t ( R ) ) ) , each( r e t r a c t ( c h g d ( _ ) ) ) , each( r e t r a c t ( n e w _ p r i o r ( _ , _ , _ ) ) ) . B . 9 i n f l . p l •/.•/. d i r _ i n f l ( A , ( B , C ) ) ll :- !, ll d i r _ i n f l ( A , B ) , •/.*/. d i r _ i n f l ( A , C ) . r e t r a c t ( ( d y n _ p ( A = _ , _ , _) r e t r a c t ( p r i o r ( A , _ , _ ) ) , r e t r a c t ( ( d y n _ p ( A , _ , _ ) :- ))) ) ) , _)) B. Source code to Influence 100 '/.*/. d i r _ i n f l ( A , B ) '/,'/, :- c o n s i s t e n t (A, B ) , •/,•/, ! , a d d _ t o _ n e t ( A , B ) . d i r _ i n f l ( A , B ) :- c o n s i s t e n t ( A , B ) , ! , a d d _ t o _ n e t ( A , B ) . d e l e t e _ i n f l u e n c e s ( [ ] , _ ) : - ! . d e l e t e _ i n f l u e n c e s ( _ , [ ] ) : - ! . d e l e t e _ i n f l u e n c e s ( [ A IL] ,B) :- d e l e t e _ i n f l ( A , B ) , d e l e t e _ i n f l u e n c e s ( L . B ) . d e l e t e _ i n f l u e n c e s ( A , [ B I L ] ) :- d e l e t e _ i n f l ( A , B ) , d e l e t e _ i n f l u e n c e s ( A . L ) . d e l e t e . i n f 1 ( A , B ) :- n o d e ( A ) , n o d e ( B ) , remove_from_net(A,B), a s s e r t _ c h e c k ( c h g d ( B ) ) . i n f l ( A , " B ) :-i n f l ( A . B ) . i n f l ( A = _ , B ) :-i n f l ( A . B ) . i n f l ( A , B = _ ) :-i n f l ( A , B ) . i n f l ( A . B ) :-c h i l d r e n ( A , L ) , member(B,L). i n f l ( A , B ) :-c h i l d r e n ( A , L ) , member(C,L), i n f l ( C , B ) . B. Source code to Inf luenc i n f l u e n c e s ( A , ~ B ) :-i n f l u e n c e s ( A . B ) . i n f l u e n c e s ( A = _ , B ) :-i n f l u e n c e s ( A . B ) . i n f l u e n c e s ( A , B = _ ) :-i n f l u e n c e s ( A , B ) . '///.influences (A ,B) :-y.'/.childrenCA.L), '/,7,member(B,L). '///.influences(A,B) :- » '///.children(A,L), '///.member(C,L), '///.influences(C,B). i n f l u e n c e s ( A . B ) s t _ i n f l ( A , B ) .
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Dynamic Bayesian networks
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Dynamic Bayesian networks Horsch, Michael C. 1990
pdf
Page Metadata
Item Metadata
Title | Dynamic Bayesian networks |
Creator |
Horsch, Michael C. |
Publisher | University of British Columbia |
Date Issued | 1990 |
Description | Given the complexity of the domains for which we would like to use computers as reasoning engines, an automated reasoning process will often be required to perform under some state of uncertainty. Probability provides a normative theory with which uncertainty can be modelled. Without assumptions of independence from the domain, naive computations of probability are intractible. If probability theory is to be used effectively in AI applications, the independence assumptions from the domain should be represented explicitly, and used to greatest possible advantage. One such representation is a class of mathematical structures called Bayesian networks. This thesis presents a framework for dynamically constructing and evaluating Bayesian networks. In particular, this thesis investigates the issue of representing probabilistic knowledge which has been abstracted from particular individuals to which this knowledge may apply, resulting in a simple representation language. This language makes the independence assumptions for a domain explicit. A simple procedure is provided for building networks from knowledge expressed in this language. The mapping between the knowledge base and network created is precisely defined, so that the network always represents a consistent probability distribution. Finally, this thesis investigates the issue of modifying the network after some evaluation has taken place, and several techniques for correcting the state of the resulting model are derived. |
Subject |
Probabilities Artificial intelligence Reasoning |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-10-04 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051146 |
URI | http://hdl.handle.net/2429/28909 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1990_A6_7 H67.pdf [ 5.09MB ]
- Metadata
- JSON: 831-1.0051146.json
- JSON-LD: 831-1.0051146-ld.json
- RDF/XML (Pretty): 831-1.0051146-rdf.xml
- RDF/JSON: 831-1.0051146-rdf.json
- Turtle: 831-1.0051146-turtle.txt
- N-Triples: 831-1.0051146-rdf-ntriples.txt
- Original Record: 831-1.0051146-source.json
- Full Text
- 831-1.0051146-fulltext.txt
- Citation
- 831-1.0051146.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051146/manifest