Dynamic Bayesian Networks by Michael C. Horsch A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in The Faculty of Graduate Studies Department of Computer Science W e accept this thesis as conforming to the required standard University of B r i t i s h C o l u m b i a October 1990 © M i c h a e l C . Horsch, 1990 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. \ Department The University of British Columbia Vancouver, Canada Date 12. QCTn&FfiL i^9n DE-6 (2/88) Abstract G i v e n the complexity of the domains for which we would like to use computers as reasoning engines, an automated reasoning process w i l l often be required to perform under some state of uncertainty. P r o b a b i l i t y provides a normative theory w i t h which uncertainty can be modelled. W i t h o u t assumptions of independence from the d o m a i n , naive computations of probability are intractible. If probability theory is to be used effectively i n A I applications, the independence assumptions from the d o m a i n should be represented explicitly, and used to greatest possible advantage. One such representation is a class of mathematical structures called Bayesian networks. T h i s thesis presents a framework for dynamically constructing and evaluating Bayesian networks. In particular, this thesis investigates the issue of representing probabilistic knowledge which has been abstracted from particular individuals to which this knowledge may apply, resulting i n a simple representation language. T h i s language makes the independence assumptions for a d o m a i n explicit. A simple procedure is provided for b u i l d i n g networks from knowledge expressed i n this language. T h e m a p p i n g between the knowledge base and network created is precisely defined, so that the network always represents a consistent probability d i s t r i b u t i o n . Finally, this thesis investigates the issue of modifying the network after some evaluation has taken place, and several techniques for correcting the state of the resulting model are derived. n Contents Abstract i Contents iii List of Tables vii List of Figures viii Acknowledgements x 1 Introduction 1 1.1 Bayesian Networks 2 1.2 A dynamic approach to using Bayesian networks 5 1.2.1 Motivation 5 1.2.2 The approach 7 1.2.3 The realization of this approach 7 1.3 1.4 Related work 9 1.3.1 Inference using Bayesian networks 9 1.3.2 Other work on dynamic construction of Bayesian networks The main contributions of this thesis iii 10 12 1.5 An outline of this thesis 13 2 Dynamic Bayesian networks 2.1 2.2 2.3 14 The background knowledge base 14 2.1.1 Schemata 15 2.1.2 Ambiguities in schemata . 18 Combination nodes 22 2.2.1 Existential Combination 22 2.2.2 Universal Combination 24 2.2.3 Summary 26 Creating Bayesian networks 26 3 Using Dynamic Bayesian Networks 28 3.1 Influence: An interpreter for dynamic Bayesian networks 29 3.1.1 Declarations 29 3.1.2 Schemata 30 3.1.3 Combination Nodes 32 3.1.4 Creating networks 32 3.1.5 Queries 33 3.2 Diagnosis of multiple faults for digital circuits 33 3.3 Interpreting sketch maps 36 3.3.1 Sketch maps 38 3.3.2 A probabilistic knowledge base 38 3.4 The Burglary of the House of Holmes 41 3.5 Building knowledge bases for Dynamic Bayesian networks 47 3.6 Conclusions 50 iv 4 Implementing dynamic Bayesian networks 51 4.1 T h e general problem 51 4.2 A n A x i o m a t i z a t i o n for Probabilistic Reasoning i n P r o l o g 54 4.2.1 T h e basic computational engine 54 4.2.2 A d d i n g the dynamic combination 55 4.2.3 Discussion 55 4.3 4.4 4.5 4.6 Pearl's D i s t r i b u t e d Propagation and Belief U p d a t i n g 56 4.3.1 Belief updating i n singly connected networks 56 4.3.2 A d d i n g dynamic constructs 57 4.3.3 Propagation i n general Bayesian networks 64 4.3.4 Discussion 65 L a u r i t z e n and Spiegelhalter's computation of belief 65 4.4.1 Belief Propagation on F u l l M o r a l Graphs 66 4.4.2 A d d i n g dynamic constructs 66 P r o b a b i l i s t i c inference using Influence diagrams 66 4.5.1 67 A d d i n g dynamic constructs Conclusions 69 5 Conclusions 70 5.1 W h a t dynamic Bayesian networks can't do 70 5.2 Other future possibilities 71 5.3 Some final remarks 71 A Influence code for the examples 76 A.l Diagnosis of multiple faults for d i g i t a l circuits 76 A.2 Interpreting sketch maps 77 v B A . 3 T h e B u r g l a r y of the House of Holmes 79 Source code to Influence 82 B. l probability.pl 82 B.2 expand.pl 87 B.3 network.pl 91 B.4 priors.pl 92 B.5 combine.pl 94 B.6 combined.pl 95 B.7 consistent.pl 98 B.8 done.pl 99 B.9 infl.pl 99 vi List of Tables 1.1 The contingency tables for the network in Figure 1.1 vii List of Figures 1.1 A s m a l l Bayesian network 4 1.2 A division of probabilistic knowledge 8 2.1 T h e Bayesian network created i n E x a m p l e 2.1-1 18 2.2 T h e network created by instantiating the parameterized unique schemata from Section 2.1.2 19 2.3 T h e network created by instantiating the parameterized r i g h t - m u l t i p l e schema from Section 2.1.2 20 2.4 T h e network created by instantiating the parameterized left-multiple schema from Section 2.1.2 21 2.5 T h e Bayesian network created for E x a m p l e 2.2-1 25 3.1 A simple circuit 35 3.2 T h e Bayesian network constructed by our simple knowledge base for the circuit i n Figure 3.1 37 3.3 A simple sketch map 38 3.4 T h e Bayesian network constructed by our simple knowledge base for the sketch map i n Figure 3.3 42 T h e network created for Holmes' decision 46 3.5 viii 4.1 A simple example of a dynamic network, (a) showing the original network, and (b) showing the network after another i n d i v i d u a l of type t is observed. See Section 4.1 52 4.2 A d d i n g a parent A x 59 4.3 A d d i n g a child B i 4.4 T h e problem of adding a node after arc reversals have been performed. n+ n+ to node C to node A 62 ix . . 68 Acknowledgements I a m very grateful to the many people who helped and supported me during the process of researching and w r i t i n g this thesis. D a v i d Poole, m y supervisor and friend, for patience, encouragement, and lots of good ideas. financial support, D a v i d Lowe, for patience and enthusiasm during the final stages of revising. M y parents, H a r t m u t and H e i d i , and m y siblings, M o n i k a , Steven, and A n d r e a . For love and support bridging the Euclidean distance which separates us. A n d r e w Csinger and M a n n y N o i k . For being invaluable colleagues, every-day heroes, occasional maniacs, and friends i n the truest sense. T h e gang: N o r m G o l d s t e i n , H i l d e Larsen, Steve and Sarah M a s o n , Sue R a t h i e , D a v i d Sidilkover. For the.right things at the right times, more often than I should have hoped. T h e heretics: B r a d C o t t e r a l l , K e n Gehrs, Maeghan Kenney, Michelle M c M a s t e r , Jon Mikkelsen, R a y Schultz, Joeane Zadrah. For imagination and community, for helping to keep m y life interesting and distinct from m y occupation. x Chapter 1 Introduction Given the complexity of the domains for which we would like to use computers as reasoning engines, and the complexity of our world in general, an automated reasoning process will often be required to perform under some state of uncertainty. For example, in applications which perform some kind of diagnosis, a single fault or abnormal behaviour may be the direct result of many factors, and determining which factor is responsible usually involves choosing a best hypothesis from amongst many alternatives. Probabilistic analysis is one means by which the uncertainty of many interesting domains can be modelled. Probability provides a normative theory with which uncertainty can be modelled. Without assumptions of independence from the domain, naive computations are quite intractable. For example, suppose we have five random variables, {A, B, C, D,E} and we wanted to calculate p(A). Without assuming any independencies, we would have to perform the following summation: 1 p(A) = J2 p(AABACADAE) B,C,D,E which sums p(A ABACADAE) over every possible combination of the values taken by the random variables {B, C, D, E}, possibly an exponential number of probabilities. By exploiting independence assumptions for particular domains, we can drastically improve the complexity of such computations. *I denote random variables as upper case math italics. If X can take on a set of multiple values, I will write them in lower case, as in {xi,... ,x }. If Y is propositional, I will write +y for true and ->y for false. More syntactic conventions will be clarified in later sections. n 1 Introduction 1. 2 If probability theory is to be used effectively in A l applications, the independence assumptions from the domain should be represented explicitly, and used to greatest possible advantage. One such representation is a class of mathematical structures called Bayesian networks. This thesis presents a framework for dynamically constructing and evaluating Bayesian networks. The remainder of this chapter will outline the basic theory of Bayesian networks, motivate the approach taken in this thesis, and outline the main issues discussed in future chapters. 1.1 Bayesian Networks Bayesian networks are known by many names. They are also called belief networks and causal probabilistic networks. I will refer to them as Bayesian networks, emphasizing the fact that they are based on the assumptions of Bayesian probability. A related network formalism, which subsumes Bayesian networks, is known by the name influence diagram. This section briefly introduces Bayesian networks; for a more thorough treatment, see Pearl [Pearl, 1988]. For a good introduction to Bayesian probability, see [Lindley, 1965]. A Bayesian network is a directed acyclic graph (DAG) which represents in graphical form the independence assumptions of the probability distribution for a set of random variables. Nodes in the graph represent random variables, and the arcs connecting the nodes represent the notion of direct dependence: if a random variable A, represented in the graph by a, is known to be directly dependent on the random variable B, represented in the graph by node b, an arc is drawn from b to a. The direction of the arc is often associated with the notion of causality, directed from cause to effect. 2 The set of variables deemed to have direct causal or influential effect on a random variable X are called the parents, or conditioning variables, of X; these are denoted in this thesis by parents(X). The strength of the direct dependency relation is quantified by a contingency table, which are a complete specification of the conditional probabilities associated with a random variable and its parents. Denoted p(X\parents(X)), this table specifies the effect of each combination of the possible outcomes of parents (X) on X. A directed acyclic graph is a graph with directed arcs such that no path following the arcs can lead from a node to itself. For introductory presentation of graph theory, see [Aho et al., 1974]. 2 1. Introduction 3 The conditional independence assumptions for the domain are represented explicitly in Bayesian networks, and using these assumptions, probability calculations can be performed more efficiently than the naive computation, requiring much fewer prior probabilities. A Bayesian network, as Pearl [Pearl, 1988] argues, is a natural, intuitive framework for writing down knowledge about a domain taking into consideration the ideas of likelihood, relevance, dependency, and causality. These ideas help to provide guidelines for creating networks for particular domains, as I will outline in Chapter 3. A node in the network represents a random variable, which is used to represent possible outcomes, events, or states in the domain. For instance, we could represent the possibility of an earthquake happening by using a random variable, E, which takes on two values: {+earthquake, -iearthquake}, or perhaps for simplicity, {true,false}. The following example is borrowed from Pearl [Pearl, 1988]: Example 1.1—1: Mr Holmes receives a telephone call from his neighbour, Dr Watson, who states that he hears the sound of a burglar alarm from the direction of Mr Holmes' house. While preparing to rush home , Mr Homes recalls that Dr Watson is a tasteless practical joker, and decides first to call another neighbour, Mrs Gibbons, who,despite occasional drinking problems, is far more reliable. Mr Holmes remembers reading in the instruction manual of his alarm system that the device is sensitive to earthquakes, and can be accidentally triggered by one. He realizes that if an earthquake had occurred, it surely would be on the news. Figure 1.1 shows a a Bayesian network which explicitly represents the independence assumptions for our example. The network corresponds to the following equality: p(Alarm A Burglary A Earthquake A Gibbons A Newsreport A Watson) = p(Alarm\Burglary A Earthquake)p(Gibbons\Alarm)p(Watson\Alarm) p(Newsreport\Earthquake)p(Burglary)p(Earthquake) This factorization is possible because of the conditional independence assumptions for this example. Note that this equation makes no assumption about the values taken by the random variables; it is true for any of the possible combinations. Table 1.1 shows the contingency tables required for the network in Figure 1.1. A Bayesian network with at most one path between any two nodes (ignoring the directions of the arcs) is said to be singly connected, and possesses conditional independence properties which are exploitable computationally. Multiply-connected Bayesian networks, that J. 4 Introduction Figure 1.1: A small Bayesian network. 5 1. Introduction p(Alarm\Burglary A Earthquake) +burglary +earthquake -^earthquake -iburglary 0.95 0.95 0.15 0.01 p(Burglary) Earthquake) 0.95 0.001 p(Gibbons\Alarm) p(Watson\Alarm) +alarm -lalarm p(Newsreport -^-earthquake -^earthquake 0.80 0.45 0.10 +alarm -+alarm p(Earthquake) 0.60 0.25 0.08 Table 1.1: The contingency tables for the network in Figure 1.1. is, Bayesian networks with multiple paths between at least one pair of nodes, are more general, and have fewer exploitable independencies. This topic will be discussed in more detail in Chapter 4. The graphical representation is used in various ways to analyze a problem probabilistically, as Section 1.3 outlines. It is important to emphasize that the arcs in the graph are used as guidelines for computation, and without the conditional independence assumptions which are represented by the arcs in Bayesian networks, probabilistic analysis would be wildly impractical. 1.2 1.2.1 A dynamic approach to using Bayesian networks Motivation Bayesian networks used in typical applications represent static probability models; that is, while the networks are designed to accept conditioning based on observations, they typically do not change their structure. Consider the example in Figure 1.1. This network can only be used as a model of the particular situation for which it was created, and cannot be used, say, to tell us how likely it is that a burglary occurred given that Holmes' third neighbour Bob (who is not represented at all in the network) calls up with a report that 1. Introduction 6 he hears an alarm. In order to incorporate Bob in the model, we have to add a node representing Bob's report and an arc between the node representing the alarm and our new node. Adding a single node and arc in this situation is conceptually a simple matter, but it required knowledge: we, the users of the system, needed to know that the independence assumption for the random variable representing Bob's report is reasonable, and that no other events of interest might cause Bob to report an alarm. As well, we needed to know how Bob's report depends on the event that Holmes' alarm sounded. That is, we needed to know the contingency table p(Bob\Alarm). This example demonstrates the kind of modification to the network structure that this thesis intends to address. However, the example is much too simple, for two reasons: First, we need no special training to gain expertise in this toy domain. Second, the example required very little expertise at building Bayesian networks, certainly no more than the intuition a novice might use, to add the new information. For more complicated domains, a user may have trouble adding nodes and arcs to a Bayesian network because she is not an expert in the domain. However, it may be the case that we are using a Bayesian network application precisely because we are not experts, for consultation purposes perhaps. If, in fact, we know which new aspects of our domain ought to be incorporated in our model, but are not able to change the network suitably, then the application is useless for our situation. There are at least three ways to address the static model problem. One way is to ask the domain expert for a revision to the network when the need arises. The second is for the domain expert to detail the domain so exactly that the need to add arcs never arises. A third way is to automate the process of building Bayesian networks so that the expert's knowledge can be accumulated in a knowledge base before hand; this generalized knowledge could then be retrieved and applied whenever a user determines the need. This thesis examines the third approach—automating the process of building Bayesian networks from a background knowledge base of generalized probabilistic information. This approach adds flexibility to Bayesian networks applications, and tends to keep the networks required reasonably small. 1. Introduction 1.2.2 7 The approach Domain knowledge can be divided into: individuals: entities in the domain of interest possible properties: the properties an individual may have and the relationship between these properties observed state: the properties an individual does have In traditional approaches, a knowledge engineer, perhaps in consultation with a domain expert, is responsible for knowing the individuals to be included in the domain, and the properties these individuals may have. In other words, the knowledge engineer must build the network. The user is responsible for supplying information as to the actual state of the situation (i.e. the properties some of the the individuals do have). This division of knowledge is demonstratedfigurativelyin Figure 1.2a. In contrast, the dynamic approach taken in this thesis divides the knowledge differently. The knowledge engineer provides a background knowledge base relating properties without specifying the particular individuals. The user supplies two kinds of information: the individuals known to be in the domain, which the dynamic system will use to build an appropriate network automatically, and the observations pertaining to the state of the model. This isfiguredin Figure 1.2b. In this way, the same background knowledge can be used to model different situations in which only the set of individuals under consideration changes. In the traditional approach, such changes would have to be made by the knowledge engineer. 3 The dynamic approach provides a flexibility previously unrealized in Bayesian network applications. 1.2.3 The realization of this approach To automate the process of building Bayesian networks, we need two steps. First, we need to be able to represent domain knowledge which is abstracted away from particular The knowledge engineer-user distinction made here is deliberately simplified. In many cases, the user of dynamic system like the one presented in this system may also be the knowledge engineer. No principle prohibits the user from modifying the knowledge base, but in doing so, the user switches roles. 3 1. 8 Introduction (a) Traditional •\ Dependencies j between [ Variables j Individuals Observations J Knowledge Engineer (b) User Dynamic Approach Dependencies between Individuals Observations Variables Knowledge User Engineer Figure 1.2: A division of probabilistic knowledge. 1. Introduction 9 individuals. Second, we need to be able to b u i l d a network from this knowledge based on the information we can gain from the user. T h e first step corresponds roughly to knowledge of possible properties (as i n Section 1.2.2), which we elicit from the d o m a i n expert. These properties are modelled w i t h random variables, and to abstract away from particular individuals, parameterized random variables ( P R V s ) are used. T h e knowledge engineer supplies the relationships between P R V s i n the form of a background knowledge base. T h e second step corresponds roughly to knowing the individuals that are to be considered in the model, and which the user can supply at the time of consultation. T h e system combines these two kinds of knowledge automatically, to create a Bayesian network for the situation the user has specified. T h e user can then consult the system, supplying the information about the observed state of the domain, and querying the network as necessary to obtain the inferred probabilistic analysis. It may arise that, as the user is working w i t h the system, some individuals become known after a network has already been constructed, and after observations have been submitted to the network. T h e user is able to supply this information to the system, and the system is able to use this information and modify the network structure appropriately. 1.3 Related work In this section, I w i l l provide a brief introduction to some of the related work done on Bayesian network technology. In particular, I w i l l introduce various methods used to evaluate Bayesian networks, and review similar notions of dynamic Bayesian networks presented by G o l d m a n and C h a r n i a k [1990], and Breese [1990]. 1.3.1 Inference using Bayesian networks There are several methods by which Bayesian networks are used to calculate probabilities. A more comprehensive discussion of much of the following w i l l be presented i n later chapters. Pearl [Pearl, 1988] treats singly connected Bayesian networks as networks of communicating processes. T h e processes use arcs to propagate causal and diagnostic support values, which originate from the activation of evidence variables. 1. Introduction 10 L a u r i t z e n and Spiegelhalter [Lauritzen and Spiegelhalter, 1988] perform evidence absorption and propagation by transforming the Bayesian network into a triangulated undirected graph structure. Posterior probabilities are computed using specialized computations based on this graph. Shachter [Shachter, 1988] uses the arcs to perform arc reversals and node reduction on the network. Evidence absorption and propagation can also be performed using arc reversals and node reductions as well [Shachter, 1989]. 4 Poole and Neufeld [Poole and Neufeld, 1989] provide an axiomatization of probability theory i n P r o l o g which uses the arcs of the network to calculate probabilities using "reasoning by cases" w i t h the conditioning variables. T h e prototype implementation of m y approach is based (but not dependent) on this work. 1.3.2 Other work on dynamic construction of Bayesian networks D u r i n g the w r i t i n g of this thesis, several authors have presented similar work on building Bayesian networks from schematic knowledge collected i n a knowledge base. G o l d m a n and C h a r n i a k [Goldman and C h a r n i a k , 1990], and Breese [Breese, 1990] have developed, independently of the work presented i n this thesis, similar theories of d y n a m i c Bayesian networks. Since their work is very similar to this approach (and each other's), I w i l l treat their work more deeply i n following sections. Laskey [Laskey, 1990] presents a framework for using d y n a m i c a l l y created Bayesian networks i n a reasoning system based on a hierarchical approach. Knowledge i n the form of probabilistic schemata are used to form an argument, which is transformed into a Bayesian network. Conclusions or conflicting evidence which defeat the argument trigger knowledge i n the schemata, forming an augmented argument. T h i s process proceeds until no conflicting arguments are found. A quite different approach to creating Bayesian networks involves the statistical analysis of case data. P e a r l [Pearl, 1988] provides a good introduction to the process, which identifies dependencies and independencies. T h i s area of research is current, but essentially unrelated to the methods presented i n this thesis. Shachter's influence diagrams contain decision and test nodes, as well as other devices not being considered in this thesis. This thesis only looks at probabilistic nodes. 4 1. Introduction 11 Goldman and Charniak G o l d m a n and C h a r n i a k [Goldman and Charniak, 1990] describe a knowledge-based approach to b u i l d i n g networks dynamically, as part of a story understanding application. A forward-chaining inference engine takes information i n the form of propositions and builds a network according to rules i n the knowledge base. These rules provide ways to combine the influence of different causes on common effects by p r o v i d i n g specialized functions for the d o m a i n and by taking advantage of the occasions i n which causes interact stereotypically. W h e n the b u i l d i n g process is complete, an evaluation of the network takes place, and the network is simplified by accepting as true those statements which are highly probable, and rejecting those propositions which are considered too improbable. Simplifying the network i n this way m a y lead to additional modification of the network. Breese T h e work done by Breese [Breese, 1990] on building Bayesian networks bears many similarities to G o l d m a n and C h a r n i a k ' s work. Breese's approach includes the use of such decision analytic tools as value nodes and decision nodes, which are part of the influence diagram model. Knowledge is stored i n a hybrid knowledge base of Horn clause rules, facts, and probabilistic and informational dependencies. T h e rules and facts are used to determine some of the probabilistic events which should be included i n the model. T h e probabilistic dependencies are used to structure the network once it is determined which nodes ought to be included. Furthermore, these dependencies may identify other events which might play a part i n the model, but are not definitely determined by the Horn clause rules. T h e b u i l d i n g process is started by the query entered by the user of the system, who also includes given information and evidence for the problem. T h e first step of the process is to determine the events which directly influence the query. Backward chaining through the probabilistic dependencies and the H o r n clause rules identifies many nodes and arcs to be included i n the model. Once this process stops, a forward chaining process identifies those events which are influenced by the nodes i n the partial network from the previous step. 1. Introduction 12 Discussion There are many similarities in the approaches of Goldman and Charniak, and Breese. Both systems create Bayesian networks as a secondary reasoning tool after a primary inference engine determines which nodes to consider, and which dependencies to include in the network. In this respect, knowledge about the domain is found in two separate knowledge bases: the rules used to create the Bayesian networks, and the network itself. In contrast, the approach taken in this thesis uses only one knowledge base, and provides only a simple process by which Bayesian networks can be created using this knowledge. This approach is much simpler, and precisely defines the mapping between the knowledge base and the networks which can be created. Both approaches by Goldman and Charniak, and Breese could be used to implement the ideas presented in this thesis: the rule bases in both systems could be used to implement dynamic Bayesian networks as this thesis presents them. However, this thesis presents only a language for representing general probabilistic knowledge, and a simple process for creating Bayesian networks using this language; no work has been done herein on allowing another program to use this language to create networks as a sub-task. 1.4 T h e m a i n contributions of this thesis This thesis investigates the issue of representing probabilistic knowledge which has been abstracted from the knowledge of particular individuals, resulting in a simple representation language. As well, a simple procedure for building networks from knowledge expressed in this language is provided. The mapping between the knowledge base and network created is precisely defined, so that the network always maintains a consistent probability distribution. Finally, this thesis investigates the issue of modifying the network after some evaluation has taken place, and several techniques for correcting the state of the resulting model are derived. 1. Introduction 1.5 13 A n outline of this thesis In Chapter 2, d y n a m i c Bayesian networks are presented formally. T h e major c l a i m i n this thesis is that Bayesian networks can be d y n a m i c , flexible, and simple to use. I demonstrate the ability of m y implementation i n Chapter 3, thereby giving evidence for my c l a i m . I apply m y approach i n such diverse areas as simple decision problems, diagnosis of multiple faults, and interpretation of sketch maps. In Chapter 4 I briefly discuss the details of adapting the various evaluation algorithms to make use of d y n a m i c Bayesian networks. F i n a l l y , i n Chapter 5 I conclude that the approach to Bayesian probability demonstrated here is useful i n domains for which a priori knowledge is of a general nature, and for which specific details are most conveniently provided during the reasoning or decision-making process. Chapter 2 Dynamic Bayesian networks In order to make Bayesian networks versatile and reasonably manageable in terms of size, I have argued for an approach which creates networks by instantiating generalized probabilistic knowledge for the individuals known to be in the model. Thus, when information about new individuals is discovered, the system itself should be able to modify the network, by combining the new information about individuals with the knowledge found in a background knowledge base created by a domain expert. This chapter first presents a simple and declarative language for expressing probabilistic knowledge, which I demonstrate is too unconstrained for the purpose of providing a precise framework for creating and modifying Bayesian networks dynamically. This language is restricted to remove the possibility of ambiguous knowledge, and then augmented with special purpose constructs which are based on Canonical Models of Multicausal Interactions, as described in Pearl [Pearl, 1988]. The resulting language is shown to be at least as expressive as Bayesian networks, and flexible enough to provide a tool for modelling many kinds of domains. I conclude this chapter by outlining a simple procedure to create Bayesian networks from a knowledge base of statements in this language. 2.1 T h e background knowledge base Dynamic Bayesian networks are created from a parameterized background knowledge base of schemata by combining it with individuals who are known to be part of the model. This 14 15 2. Dynamic Bayesian networks section presents the syntax of the parameterized knowledge base, and describes the process of instantiating schemata for individuals. 2.1.1 Schemata It is necessary to represent parameterized probabilistic knowledge unambiguously, so that the intended meaning of this knowledge is apparent from the syntax. In particular, the statements i n our language must clearly show how the parameters are to be instantiated, and how the resulting instantiation is used i n the Bayesian network built from this knowledge base. A n atom is any alphanumeric string beginning w i t h a letter. A parameter is represented by a capitalized atom. A n individual is represented by a lower case atom. A random variable is an atom followed by a possible empty list of individuals; for convenience, an empty list is usually dropped from the random variable notation. R a n d o m variables take values from a finite set of discrete values called the range, and i n many cases we use binary-valued random variables, which take {true.false} as values. A parameterized random variable ( P R V ) abstracts the notion of a random variable away from any particular i n d i v i d u a l ; it is represented as a lower case atom w i t h a list of parameters, each of which stands i n place of an i n d i v i d u a l . Instantiating a parameterized random variable replaces a l l the parameters w i t h individuals, creating a random variable. Associated w i t h parameters and individuals is a corresponding type, which constrains the instantiation of a parameter to only individuals of the associated type. A schema is the fundamental statement i n this language. T h e syntax for a schema is as follows: ai,...,a n —• := b [ u i , . . . , Ufc] where b, a,- are parameterized random variables. T h e left hand side of the arc (—•) is a conjunction of P R V s (called the parents), which directly influence the single P R V (called the child) on the right hand side. T h e list [vi,..., v ] is a short-hand notation which lists the probabilities for the contingency table corresponding to the schema. Recall that a contingency table must have a probability for each combination of values for the P R V s . T h e s h o r t - h a n d list orders these probabilities by varying each parent over its set of values w i t h the left-most parent varying most quickly. T h i s list contains 2 probabilities when the a ; are b i n a r y - v a l u e d , and even more when any have more than two possible values. k n 2. Dynamic Bayesian networks 16 In a collection (or knowledge base) of schemata, a P R V can occur any number of times as a parent for some other P R V , but may occur as the child i n only one schema. T h i s restriction is intended to simplify the task of w r i t i n g schemata: keeping the list of parent P R V s i n one schemata localizes relevant knowledge to one statement, and a knowledge base is easier to m a i n t a i n . A schema is instantiated when a l l P R V s i n the schema have been instantiated. In the scope of a single schema, parameters shared across P R V s are instantiated w i t h the same i n d i v i d u a l . T h e instantiated schema is used to b u i l d new arcs i n a Bayesian network by creating a node for each random variable, and directing an arc from each parent variable to the child variable. For example, the schema: alarm —• := reports_alarm(X: person) [0.9,0.01] might occur i n a knowledge base for the a l a r m example, of Chapter 1. It specifies a single parent P R V , alarm, which has no parameters, and a child variable, reports_alarm(X:person), which has a single parameter, X. O n l y individuals of the type person can be used to instantiate this schema. In this example (as i n most of the simple examples i n this chapter, unless otherwise stated), the P R V s are assumed to take values from {true, false}. T h i s example should be interpreted as representing the idea that an a l a r m might be reported by a person, and explicitly assumes that only the presence or absence of an a l a r m sound has direct affect on the report the person might make. T h e two numbers i n square brackets give the contingency table for this schema, stating consisely the following two probabilities: p(reports_alarm(X)| alarm) = 0.9 p(reports_alarm(X)| -lalarm) = 0.01 These numbers quantify the effect of the parent variables on the child variable. Note that the probabilities p(-ireports_alarm(X)| alarm) = 0.1 p(->reports_alarm(X)| -lalarm) = 0.99 2. Dynamic Bayesian networks 17 can be inferred by the Negation axiom of probability theory. Discussion W h e n instantiated, each schema of the form • ai,...,a n — • b corresponds to a conditional probability of the form p(b\a\,... ,a ) C r e a t i n g a Bayesian network from these schemata corresponds to building an expression for the joint probability distribution from these conditional probabilities. Thus, the intended use of these schemata is always well defined. n Example 2.1-1: Consider the following schema: alarm —> := reports_alarm(X: person) [0.9,0.01] W h e n instantiated w i t h the set person = {John, mary}, the Bayesian network in Figure 2.1 is created, which represents the following expression for the joint probability distribution: p(Alarm A Reportsjxlarm(john) = A ReportsMlarm(mary)) p(Alarm) p(Reportsjalarm(john)\Alarm) p(Reportsjalarm(mary)\Alarm) T h i s simple declarative language is at least as expressive as Bayesian networks. T h i s can be shown by taking a Bayesian network and writing it i n terms of unparameterized schemata. T h e direct and unambiguous correspondence between unparameterized schemata and Bayesian networks is obvious. A d d i n g parameters to the language allows us to express knowledge which can be used several times by instantiating them differently. However, the correspondence between the knowledge base and the Bayesian networks which can be constructed from it is no longer guaranteed to be unambiguous. These ambiguities arise as a direct result of attempting to parameterize probabilistic knowledge, as we shall see i n the next section, and are easily restricted. 2. Dynamic Bayesian networks 18 Figure 2.1: T h e Bayesian network created i n E x a m p l e 2.1-1. 2.1.2 Ambiguities in schemata In the previous section, it was mentioned that allowing parameterized knowledge i n our language could lead to ambiguous knowledge bases. In this section the possible ambiguities are illustrated and a simple constraint is imposed on the language to remove possible ambiguities. It is useful to examine three extremes which can be attained when w r i t i n g schemata i n parameterized form, i n order to bring to light the issues of parameterizing conditional probabilities. These extremes can be labelled as: 1 Unique schemata: every instantiation creates a set of random variables which no other instantiation of the same schema shares Right—multiple schemata: different instantiations may create child variables which have identical parent variables Left-multiple schemata: different instantiations may create different parents for a sin- gle child variable W e w i l l look at each one of these i n detail i n the following sections. It is worth mentioning that i n general, schemata may exhibit the characteristics of several of these extremes. As well, identifying these extremes will be helpful in the discussion of Chapter 4. 2. Dynamic Bayesian networks 19 Figure 2.2: T h e network created by instantiating the parameterized unique schemata from Section 2.1.2. Unique schemata These are parameterized schemata i n which every parameter i n the parent variables (on the left side) also occurs i n the child variable. For example: a(X,Y),b(X,Y) — c(X,Y) T h i s is called unique because every instantiation of X, Y creates a unique dependency between parents and c h i l d . If, for example, we instantiate X w i t h { x i , X2} and Y w i t h {yi}, we get two distinct sets of arcs i n our network, as i n the network of Figure 2.2. There are no ambiguities which result from unique schemata. Right—multiple schemata These are schemata i n which parameters occurring i n the child variable do not appear i n the parent variables, and a l l other parameter i n the schemata occur on b o t h sides. For example: a,b — c(X,Y) E a c h instantiation of X and Y results i n a new child variable which may share parents w i t h other instantiations of the schema. If we instantiate X w i t h { x ^ X2} and Y w i t h { y i } , we 2. Dynamic Bayesian networks 20 Figure 2.3: The network created by instantiating the parameterized right-multiple schema from Section 2.1.2. obtain the dependencies shown in the network of Figure 2.3. No ambiguities arise as a result of right-multiple schemata. Left-multiple schemata These are schemata in which some parameters occurring in the parent variables do not occur in the child variable. For example: a(X) —-> c Each instantiation adds a parent for the child variable. For example, if X is instantiated with {xi, X2, . . . , x }, we get the dependencies shown in the network of Figure 2.4. n This kind of schema is ambiguous as there is no way to know, when the schema is written down, how many instantiations of the parent variables could be needed. However, since we must provide a contingency table which can be used in numeric calculations, we must supply a contingency table for every possible number of instantiations. This is clearly 2. Dynamic Bayesian networks 21 Figure 2.4: T h e network created by instantiating the parameterized left-multiple schema from Section 2.1.2. impossible. Even p u t t i n g a bound on the number of possible instantiations is impractical because of the fact that the size of each contingency table is exponential i n the number of parent variables. 2 Summary T h i s simple language of conditional dependencies is at least as expressive as Bayesian networks. Parameterizing these schemata adds flexibility to the language, but also introduces possibilities for ambiguous knowledge. T h e ambiguities arising from left-multiple schemata (or h y b r i d schemata which have their characteristic) is an issue of pragmatics, and one possible solution is to disallow leftmultiple schemata from our language. T h i s restriction should be taken as much the same k i n d of restriction p r o h i b i t i n g left-recursive programs i n P r o l o g , for example. T h e resulting restricted language now lacks the ability to combine an arbitrary number of causes (or influences) into a single common effect. T h e next section presents two constructs which help to redress this deficiency without reintroducing ambiguities or requiring unreasonable numbers of contingency tables. In fact, if we were solely concerned with qualitative statements of dependency, this ambiguity would not be a problem. 2 2. Dynamic Bayesian networks 2.2 22 C o m b i n a t i o n nodes T h e m a i n problem which arises from the ambiguity of left-multiple schemata is that the number of instantiations of the parent variables required for a particular consultation is unknown at the time the knowledge base is being created. If we assume that these instances combine their effects i n the domain i n a simple and regular manner, providing the required contingency table is only a matter of enumerating the instances. In the following sections, two constructs are presented which assume a simple and regular combination of the effects of an indeterminate number of parent variables on a single child variable. Briefly, these are: Existential Combination: takes the value true. Universal Combination: the value true. i f one of the parent variables takes true, i f all of the parent variables take the value the child variable true, the child takes These two are currently used i n the language, but similar structures, exploiting some other regularities, might be added. For example, a combination mechanism could be defined corresponding to the notion that exactly one parent variable (out of many) is true. T h e two discussed i n this section are based on the noisy-or and noisy-and gates as described in P e a r l [Pearl, 1988]. 2.2.1 Existential Combination T h i s combination is intended to be used when it is known that any one of a number of conditions is likely to affect another variable. T h e syntax of this structure is as follows: 3 X 6 type-a(X) —• b where a(X), b are binary P R V s , and X is a parameter of type type. T h e variable b depends on the variable 3X € type • a(X), which is a binary P R V dependent (implicitly) on every instantiation of a(X). 2. Dynamic Bayesian 23 networks T h e contingency table p(b\3X G type • a(X)) for this schema is given i n square brackets, and is provided by the knowledge engineer i n the knowledge base. T h e contingency table for p(3X G type • a(X)\a(X)) is automatically generated by the system and takes the form of an O r table based on each instantiation of a ( X ) . Example 2.2-1: People who smell smoke may set off a fire alarm, and the a l a r m actually m a k i n g a noise depends on at least one person sounding the alarm. Anyone who hears an a l a r m is likely to leave the building i n which the alarm is found. Fire is a likely cause for someone smelling smoke. fire —• := smells-smoke(X) —• := 3 Y G person • sets_off_alarm(Y) —• := alarm-sounds —• := smells-smoke(X) [0.7,0.2] sets_off_alarm(X) [0.95,0.01] alarm-sounds [0.98,0.1] leaves_building(Z) [0.7,0.4] Suppose we are given that John and mary are the only known members of the set person. T h i s information combined w i t h the above schemata creates the network shown i n Figure 2.5. T h e combination node combines the effects of a l l known members of the set. In the preceding example, it is not unreasonable to expect that some unknown person has set off the alarm, and this possibility is granted i n the contingency table for 3Y G person • sets -of f-alarm(Y): p(alarm_sounds| 3 Y G person • sets_off_alarm(Y)) = 0.98 p(alarm_sounds| ->3Y G person • s e t s _ o f f . a l a r m ( Y ) ) = 0.1 T h e first probability indicates that w i t h high probability, i f a person sets off the alarm, the fire a l a r m w i l l make a noise. T h e second number indicates that w i t h low probability, 2. Dynamic Bayesian 24 networks the alarm w i l l sound when no known member of the set person has set off the alarm. T h i s includes the possibility of a malfunction of the alarm, the possibility that some person unknown to the system has set off the alarm, as well as other unknown causes. There is no facility i n our language for m a k i n g hypotheses for these unknown causes. 2.2.2 Universal Combination T h i s combination is intended to be used when it is known that every one of a number of conditions must hold to affect another variable. T h e syntax of this structure is as follows: VXetype-a(X) —-> b := [«I,M ] 2 where a(X), b are binary P R V s , and X is a parameter of type type. T h e variable b depends on the variable V X G type • a(X), which is a binary P R V dependent (implicitly) on every instantiation of a(X). T h e contingency table p(b\VX £ type • a(X)) for this schema is given i n square brackets, and is provided by the knowledge engineer i n the knowledge base. T h e contingency table for p(VX € type • a(X)\a(X)) is automatically generated by the system and takes the form of an And table based on each instantiation of a(X). Example 2.2-2: A board meeting for a large corporation requires the presence of a l l board members, and the meeting may result i n some actions, say, buying out a smaller company. A board member may attend the meeting, depending on her reliability and state of health. VX € board-members • present(X) meeting healthy(X),reliable(X) —• meeting := [0.8,0.1] —• buy-out := [0.7,0.3] —• present(X) := [0.9,0.4,0.6,0.1] 2. Dynamic Bayesian networks Figure 2.5: T h e Bayesian network created for E x a m p l e 2.2-1 25 2. Dynamic Bayesian networks 26 W e note that at the time the schemata are written, it is not important to know how many board members there may be. However, we must know exactly who is a member on the board before the network can be created. T h i s informat i o n is supplied at r u n - t i m e , and the construction process can construct the appropriate network. Evidence concerning the reliability and health of any board member can then be submitted to the network, providing appropriate conditioning for queries. 2.2.3 Summary T h e combination nodes presented i n this section are used to allow a number of conditioning P R V s to have direct influence on a single child P R V . T h e number of these conditioning variables is unknown when the knowledge base is created, and the user of the system can indicate as many instantiations as required i n the model. T h e combination nodes effectively collect each relevant instantiation, and combines them using an or-rule for existential nodes, and an and-rule for universal nodes. These combination nodes have been implemented i n the language, and another possibility would be to add a similar mechanism which performs a combination that exclusively selects a single i n d i v i d u a l as the causal agent. Others are possible i n principle, but any similar mechanism should maintain the properties of being a simple and regular combination. 2.3 Creating Bayesian networks Creating a Bayesian network from a knowledge base is a simple process. T h e i n i t i a l state is a Bayesian network w i t h no nodes and no arcs. T h e system takes every known i n d i v i d u a l (which was supplied by the user) and instantiates every schema which has a parameter of the corresponding type. W i t h i n a schema, several parameterized random variables may have shared parameters (i.e., parameters w i t h the same name), and when the schema is instantiated, every occurance of the common parameter is replaced by a common i n d i v i d u a l . T h e instantiated schema relates random variables, and is interpreted by the system, adding nodes and arcs to the current Bayesian network. T h e system creates a node representing each random variable i n the instantiated schema, and checks whether an identical node already exists i n the network. If not, a node is entered into the network; i f a node already 2. Dynamic Bayesian networks 27 exists in the network, it is not added again. An arc is directed from each node representing a parent random variable to the child variable. In this way, a complex structure can be created from a simple set of schemata. This procedure for creating Bayesian networks from parameterized knowledge could be used a number of ways. First, a user might be responsible for submitting the individuals of the model to the system, and then use the Bayesian network created by the system interactively. Alternatively, a higher level program might use the network creation/evaluation system as a sub-system, and submit individuals and make queries in order to perform some action or decision. The Influence implementation, described briefly in Chapter 3 and more extensively in [Horsch, 1990], can be used as a (somewhat simple) interface between the user and the system. It can also be used as a module for another program to manipulate. Chapter 3 Using Dynamic Bayesian Networks In previous chapters, I presented the dynamic Bayesian network as a useful t o o l for reasoning under uncertainty. Chapter 2 presented some simple examples showing the results of b u i l d i n g networks dynamically. Chapter 4 w i l l demonstrate that the d y n a m i c constructs are independent of the method by which the posterior distributions are calculated. In this chapter, I demonstrate w i t h some examples the usefulness of d y n a m i c Bayesian networks. Since the examples are directly executable i n the prototype implementation, the syntax for the I n f l u e n c e interpreter, presented i n [Horsch, 1990], w i l l be used. I w i l l briefly introduce the syntax i n the next section. The rest of this chapter is organized i n the following way. T h e first example takes the domain of diagnosing faults i n simple electronic circuits. T h e second example takes the d o m a i n of interpreting simple sketch maps, which is the domain of the Mapsee project at U B C [Mulder et al, 1978], and has been treated logically by Reiter and M a c k w o r t h [Reiter and M a c k w o r t h , 1989], and w i t h default logic by Poole [Poole, 1989]. T h e t h i r d example is borrowed from Breese [Breese, 1990]. A summary section concludes this chapter w i t h some observations about how knowledge bases for dynamic Bayesian networks can be created. 28 3. Using Dynamic Bayesian 29 Networks : A n interpreter for dynamic Bayesian 3.1 networks The Influence interpreter is an implementation of the techniques of this thesis, written in Prolog. Writing a knowledge base using this implementation has two parts: declaring the PRVs, and writing the schemata. The knowledge base is used within the implementation to create the networks and evaluate queries. 3.1.1 Declarations Parameterized random variables (PRVs) are represented by Prolog terms, and must be declared as being binary, i.e., taking a value from {true, false}, or multi-valued, i.e., taking a value from a given list of mutually exclusive and exhaustive values. The syntax is: binary prv\ I arity. multi prv2 I arity 9 [value-lisf] . where the prvi are PRVs, and arity is the number of parameters the node has. For multivalued PRVs, the list of values must be separated by commas. Several variables, separated by commas, can be declared at once using either declaration command. In the case of multi-valued nodes, the intention is to declare several variables all taking values from the same set. The / arity can be omitted in place of the direct specification of the parameters, or, if the node has no parameters, then an arity of zero is assumed. Within a PRV, parameters are represented by Prolog variables, i.e., atoms which begin with a capital. 1 A special declaration is required for PRVs whose dependence can be written functionally in terms of the values taken by its parents values. For example, we might want to use a PRV for the subtraction operation, such that: This is the first and last time that parameters will be referred to as any kind of variable. To reinforce the correct understanding, parameters for parameterized random variables in a knowledge base of schemata are implemented here as Prolog variables. x 3. Using Dynamic Bayesian Networks 30 The following declaration is used: function prv. A n y function for a P R V must be a function of the values taken by its parent's P R V s , as opposed to being a function of the individuals which may later instantiate the schema. The function itself is written w i t h the schema i n which it appears as the child. W r i t i n g schemata is the topic of the next section. It is important to emphasize that the parameterized random variables which appear i n these declarations are neither random variables nor nodes i n the network. A P R V must be instantiated to become a random variable, as described i n Chapter 2, and the network creating procedure w i l l create a node i n the Bayesian network corresponding to the i n stantiated P R V . However, the phrase the node representing the instantiated parameterized random variable {prv} quickly becomes tiresome, and often w i l l be simply referred to as node. It should be clear by context whether the strict meaning of node is intended. 3.1.2 Schemata Once the P R V s for a knowledge base are declared, the schemata may be written. T h e syntax is simply: p-prv\, ... , p-prv => c-prv n • = Lpi,-..,Pfc] • which is interpreted as indicating that instantiations of this schema w i l l be used to create a Bayesian network such that arcs w i l l be directed from the nodes representing the parent P R V s to the node representing the child P R V . A restriction on schemata is that every parameter which appears i n the list of parent P R V s must also appear i n the child P R V ; in the terminology of Chapter 2, schemata are not allowed to be left-multiple. T h e list [ p\,..., pk ] is a s h o r t - h a n d notation which lists the probabilities for the contingency table corresponding to the schema. Recall that a contingency table must have a probability for each combination of values for the P R V s . T h e short-hand list orders these probabilities by varying each parent over its set of values (in the order of the value list given i n the m u l t i declaration) w i t h the left-most parent varying most quickly. If the child P R V is declared as b i n a r y , only the case i n which the child P R V takes the value t r u e is specified. Otherwise, the parent P R V s are varied for each value of the child P R V . 31 3. Using Dynamic Bayesian Networks For P R V s declared as f u n c t i o n , the syntax is: p-prvi, . . . . p-prv => c-prv := {y(c-prv=c-val, [p-prvi=pi-val, - i f test t h e n v else Vf i n true where test is a P r o l o g condition, v true and Vf i a se a p-prv =p -val\) n s e n }. are numbers i n [0,1]. T h e E x a m p l e 3.1—1: Suppose we wanted to model a decision for someone to attend a baseball game. Assume that a baseball fan w i l l almost certainly have lots of fun at the game if i t doesn't rain, but m a y find something fun to do elsewhere. T h e more rain at the game, the less likely it w i l l be that the fan has a lot of fun. binary attends(X). multi rains @ [none,drizzle,pours]. m u l t i has-fun(X) @ [ l o t s , l i t t l e ] . attends(X:fan), r a i n s => h a s - f u n ( X : f a n ) := [0.95, 0.5, 0.65, 0.5, 0.2, 0.5, 0.05, 0.5, 0.35, 0.5, 0.8, 0.5] . T h e ordering of the contingency table (the list [0.95, . . . , 0.5]) is as follows: p(has — fun(X) = lots\attends(X) A rains = none) = p(has — fun(X) p(has — fun(X) p(has — fun(X) = lots\->attends(X) A rains = none) = 0.5 = lots\attends(X) A rains = drizzle) = lots\->attends(X) A rains = drizzle) p(has — fun(X) = lots\attends(X) A rains = pours) p(has — fun(X) = lots\-iattends(X) A rains = pours) p(has — fun(X) 0.95 = 0.65 = 0.5 = 0.2 = 0.5 — little\attends(X) A rains — none) = 0.05 pihas — fun(X) = little\-iattends(X) A rains = none) = 0.5 p(has — fun(X) p(has — fun(X) = little\attends(X) A rains = drizzle) = little\-yattends(X) A rains = drizzle) p(has — fun(X) = little\attends(X) A rains = pours) — 0.35 = 0.5 = 0.8 32 3. Using Dynamic Bayesian Networks pihas — fun(X) = little\->attends(X) A rains = pours) = 0.5 As an example of using function declarations, consider the following example, which models the probability of taking the sum of two six-sided dice: multi dl,d2 Q [1,2,3,4,5,6]. multi s u m ® [2,3,4,5,6,7,8,9,10,11,12]. d l , d2 => sum := { p(sum=Z, [dl=X, d2=Y]) = i f (Z i s X + Y) then 1.0 else 0.0}. 3.1.3 Combination Nodes Existential and universal combination PRVs must be declared using the reserved PRV names exists and f o r a l l : binary e x i s t s (param, type.prv) . binary f o r a l l (param, type.prv) . where param is a parameter which occurs in the PRV prv and is of type type. Declared combination PRVs can be used in schemata as follows: e x i s t s (param, type, p-prv) => c-prv := \.vi,v ] . 2 Note that a combination PRV must be the only parent of a child PRV, and is necessarily a binary PRV. 3.1.4 Creating networks Given a set of schemata, the interpreter needs to be given individuals to create a network. An individual can be specified by the following: 3. Using Dynamic Bayesian Networks 33 observe type(indiv, type). where indiv is the name of an i n d i v i d u a l , and type is the individual's type. 3.1.5 Queries Querying the Bayesian network consists of asking the interpreter to compute the probability of a conjunction of random variables given a conjunction of conditioning knowledge. T h e syntax is: p( [query-conjunction] , [conditioning-knowledge] ) . T h e interpreter w i l l return the determined value, repeating the query information. T h i s part of the interpreter uses the Bayesian network evaluation techniques of Poole and Neufeld [Poole a n d Neufeld, 1989] modified for dynamic Bayesian networks as outlined in Chapter 4. 3.2 D i a g n o s i s of m u l t i p l e faults for digital circuits In this section, I present a simple Influence knowledge base which c a n be used to create a probability model for any d i g i t a l circuit containing and-gates, or-gates, and x o r gates. T h i s is a simplification of a common diagnostic domain [Reiter, 1987, de Kleer and W i l l i a m s , 1987, Pearl, 1988], but the purpose of this example is not to demonstrate a new diagnostic technique, but to show how Bayesian networks can be built d y n a m i c a l l y based on a carefully designed knowledge base of schemata. Assume that d i g i t a l gates exhibit random but non-intermittent behaviour when they are broken, a n d that they have a prior probability of being broken. A gate that is working correctly has an output that is dependent directly on the inputs to the gate. A s well, the behaviour of a gate is the only criterion for diagnosis. A gate is assumed to have two inputs, a n d to distinguish them, we w i l l label them w i t h input port numbers, as i n input(and\_gate(gl57) ,1). First we define the variables used i n our domain, indicating the parameter types used: 2 Because this example is useful pedagogically, it is broken into several pieces, with discussion intermingled with code. The complete source is found in Appendix A. 2 3. Using Dynamic Bayesian Networks binary binary binary binary 34 ok(Gate:gate). output(Gate:gate). input(Gate:gate, Port:port). exists(conn(Gl, G2, Port), connections, output(Gl)). T h e node ok(Gate) has the value true iff the gate is functional. T h e node output (Gate) is true iff the output of the gate is on, and similarly input (Gate, Port) is true iff the input to the indicated port of a gate is on. T h e existential node exists(conn(Gl, G2, Port), c o n n e c t i o n s , output (GI)) tells us there is a connection between the output of a gate and an input port to another gate. W e use this parameter conn(Gl, G2, Port) to allow us to submit observations of the network structure to the system, allowing it to create the corresponding network. T h e behaviour of the gates is modelled w i t h the following schemata: / * and—gates * / ok(and_gate(G):gates), input(and_gate(G):gates,1), input(and_gate(G):gates,2) => output(and_gate(G):gates) := [1,0,0,0,0.5,0.5,0.5,0.5]. / * or—gates * / ok(or_gate(G):gates), input(or_gate(G):gates,1), input(or_gate(G):gates,2) => output(or.gate(G):gates) := [1,1,1,0,0.5,0.5,0.5,0.5]. / * xor—gates * / ok(xor_gate(G):gates), input(xor_gate(G):gates,1), input(xor_gate(G):gates,2) => output(xor_gate(G):gates) := [0,1,1,0,0.5,0.5,0.5,0.5]. which states the assumptions about gates. T h e first four entries i n the contingency table for each gate is simply the truth table for the boolean expression associated w i t h each type of gate. T h e last four entries, a l l 0.5, merely makes explicit the assumption that i f the gate 3. Using Dynamic Bayesian Networks 35 7 Figure 3.1: A simple circuit. is not working correctly due to a malfunction, the output of the gate w i l l be on or off w i t h equal probability. T h e topological connections between the gates is modelled using an existential combination: e x i s t s ( c o n n ( G l , G 2 , P o r t ) , connections, output(Gl)) => i n p u t ( G 2 : g a t e s , P o r t : p o r t s ) . := [ 1 , 0 ] . T h i s schema has several interesting features. F i r s t , the use of the connection triple ensures that the output of the gate G I is associated w i t h the input of the gate G 2 according to the observations to be supplied by the user. Furthermore, every input to a gate w i l l be associated w i t h unique value—there w i l l be no more than one value coming into the input, and only a very t r i v i a l combination (of one value) w i l l be performed. T h i s technique is discussed further i n Section 3.5 Note that by setting the contingency table differently, we can use the same schema to model the circuit under the assumption that the connections are not always faultless. For instance, using the following table [ 0 . 8 5 , 0 ] says that the connection has a finite probability of being broken. Figure 3.1 shows a very simple circuit. Gate g l is an and-gate, g2 is an or-gate, and g3 3. Using Dynamic Bayesian Networks 36 is an xor-gate. T h e knowledge base w i l l be able to model this circuit i f we supply the following information about types: observe observe observe observe observe type(and_gate(gl).gates). type(or_gate(g2).gates). type(xor_gate(g3).gates). type(conn(and_gate(gl),xor_gate(g3) , 1), connections), type(conn(or_gate(g2),xor_gate(g3),2), connections). T h i s information creates the Bayesian network shown i n Figure 3.2, which we can use for a variety of tasks. W e can model the correct behaviour of the circuit by conditioning our queries o n the assumptions that every known gate is working correctly. W e can also condition a query on the known inputs and outputs to determine the probability of a particular configuration of gates working incorrectly. 3.3 Interpreting sketch maps T h e d o m a i n of interpreting sketch maps provides a good test-bed for knowledge representation techniques a n d automated reasoning tools. T h e Mapsee project at the University of B r i t i s h C o l u m b i a has used a variety of hierarchical knowledge structures and constraint satisfaction techniques to interpret hand drawn sketch maps (for a good overview, see [Mulder et al, 1978]). T o formalize the image interpretation problem, Reiter a n d M a c k w o r t h [Reiter a n d M a c k w o r t h , 1989] posed the problem w i t h i n a logical representation, showing that there is a notion of correct interpretations which a l l applications must provide (but which is much harder to prove for less rigorously well-defined programs). T h i s problem is of interest as an example of the application of D y n a m i c Bayesian networks because of the following two questions: 1. G i v e n several possible interpretations for objects i n a sketch m a p , how can we choose a preferred one? 2. H o w c a n we provide a Bayesian network which can model any possible sketch m a p we may want to interpret? T h e answer to the first question is: Use a Bayesian network. T h e answer to the second is: Use a dynamic Bayesian network. Figure 3.2: T h e Bayesian network constructed by our simple knowledge base for the circuit in Figure 3.1. 3. Using Dynamic Bayesian Networks 38 c2 rl Figure 3.3: A simple sketch map. 3.3.1 Sketch maps A sketch m a p is made up of chains and regions, w i t h the intention that, interpreted correctly, chains correspond to roads, rivers or shorelines, and regions correspond to either land or water. Figure 3.3 shows an example of a simple sketch map. T h i s example is ambiguous, as chain c l could be interpreted as either a road which meets a traffic loop (chain c2), or a river which flows into a body of water, region r2 (chain c2 being the shoreline), or a road which ends at a body of water. 3 3.3.2 A probabilistic knowledge base T h e following schemata represents enough knowledge to provide answers to queries about sketch maps, like " W h a t is the probability that chain c2 is a road, given that it is a chain which meets another chain?" T h e contingency tables use probabilities of zero for impossible configurations, such as two rivers crossing, etc. For knowledge like "Given two chains which meet i n a tee, the probability that they are both rivers is p" the number p used is only a rough estimation, which is sufficient to demonstrate the dynamic qualities of schemata, but may not reflect the reality of occurrences of rivers or roads j o i n i n g . 3 A chain is a connected sequence of points. 3. Using Dynamic Bayesian Networks 39 '/,'/, scene objects can be linear or area objects multi isa-linear(X) <3 [road, river, shore] . multi isa-area(X) @ [land, water]. '/,*/, Image objects: binary tee(X.Y). '/,'/, chain X meets chain Y in a tee-shape binary chi(X,Y). VI* chain X and chain Y make a chi—crossing binary bounds(X.Y). '/,'/, chain X bounds area Y binary interior(X,Y). '/„'/» Area Y is in the interior of chain X binary exterior(X,Y). '/,'/, Area Y is exterior of chain X '/,% Scene descriptions: binary joins(X.Y). '/.'/.linear scene objects join in a tee shape binary crosses(X.Y). •/,'/, linear scene objects cross binary inside(X.Y). '/.'/, area object X is inside linear object X binary outside(X,Y) . '/.'/. area object X is outside linear object X °/.'/o Trick implementation of equality predicate! '1,7, Two distinct random variables, which have their VI, arguments as values: multi vall(X) @ [X, X]. multi val2(Y) « [Y, Y]. VL equality is then define i f the values are the same! function equal/2. vall(X), val2(Y) => equal(X.Y) {p(equal(X,Y),[vail(X)=X,val2(Y)=Y]) = i f (X=Y) then 1.0 else 0.0}. °/,/. Two linear scene objects can join to form a tee isa-linear(X:chain), isa-linear(Y:chain), joins(X:chain,Y:chain), equal(X:chain,Y:chain) => tee(X:chain,Y:chain) 4 :- C . . ] . 4 There are 36 numbers in this list, and in some of the contingency tables for the remaining schemata there are even more than this, but none of them is crucial to the explanation of the dynamic nature of 4 3. Using Dynamic Bayesian Networks 40 '/,'/, two linear objects can cross to form a chi isa-linear(X:chain), isa-linear(Y:chain), crosses(X:chain,Y:chain), equal(X:chain,Y:chain) => chi(X:chain,Y:chain) := [...]. */*/» linear objects can form closed loops isa-linear(X:chain), loop(X:chain) => closed(X:chain) := [ . . . 3 . /, / linear scene object are found beside area objects isa-area(X:region), isa-linear(Y:chain), beside(X:region,Y:chain) => bounds(X:region,Y:chain). := [...]. 4 0 %% on the linear object which forms the boundary between two '/,'/„ area objects, we must constrain the objects involved °/o% e.g. A road is not a boundary between two bodies of water isa-area(X:region), isa-linear(Y:chain), isa-area(Z:region), inside(X:region,Y:chain), outside(Z:region,Y:chain) => boundary-constraint(X:region,Y:chain,Z:region). := [...]. There are several interesting features of this knowledge base. First of a l l , the use of multiple valued P R V s constrains the objects i n the domain, e.g., a linear scene object can only be one of {river, road, shore}, and multi-valued P R V s express this succinctly. Observe also the use of type information; since only individuals of the correct type can be used to instantiate schemata, d i v i d i n g the domain into two P R V s i s a - l i n e a r ( X ) and i s a - a r e a ( X ) assures that chains and regions are disjoint sets of objects. Finally, the use of equals(X,Y) is required to ensure that when the schemata are instantiated we only consider interpretations i n which, for example, linear scene objects do not intersect themselves. this example. For a complete listing, see Appendix A. 3 . Using Dynamic Bayesian Networks ; 41 The user specifies the following individuals, and their types: type(cl,chain). type(c2,chain). type(rl,region). type(r2,region). which when combined w i t h the knowledge base, creates the Bayesian network i n Figure 3.4. The user is free to query the state of the network, conditioning on c2 being closed, and r l being on the outside of c2, etc. Note that this network could be used the opposite direction as well: conditioning on the scene objects allows the user to "predict" the image. 3.4 T h e Burglary of the House of Holmes The following story is used by Breese [Breese, 1990] to demonstrate his network construction application, and is derived from an example used by Pearl [Pearl, 1988]: M r . Holmes receives a telephone call from his neighbour, D r . W a t s o n , who states that he hears the sound of a burglar alarm from the direction of Holmes' home. A s he is preparing to rush home, M r . Holmes reconsiders his hasty decision. He recalls that today is A p r i l 1st, and i n light of the A p r i l Fool's prank he perpetrated on his neighbour last year, he reconsiders the nature of the phone call. He also recalls that the last time the alarm sounded, it had been triggered by an earthquake. If an earthquake has occurred, it w i l l surely be reported on the radio, so he turns on a radio. He also realizes that i n the event of a burglary, the likelihood of recovery of stolen goods is much higher if the crime is reported immediately. It is therefore important, i f he i n fact a burglary d i d occur, to get home as soon as possible. O n the other hand, i f he rushes home he w i l l miss an important sales meeting w i t h clients from B i g C o r p . which could result i n a major commission. Should M r . Holmes rush home? The knowledge needed to solve this problem can be expressed i n the language of I n f l u e n c e as follows. 3. Using Dynamic Bayesian Networks 42 Figure 3.4: T h e Bayesian network constructed by our simple knowledge base for the sketch m a p i n Figure 3.3. 3. Using Dynamic Bayesian Networks binary binary binary binary binary binary binary 43 sale-obtained(X) . '/,'/. has a sale been obtained by X? burglary(X). 'IX was there a burglary in X's house? go-home(X). •/,'/. does (should) X go home IMMEDIATELY? meeting(X.Y). 'IX, was there a meeting between X and Y? earthquake. '/,'/, was there an earthquake? radio. 'IX, did the radio report an earthquake ? client(X). '/,'/, X has a client? function value(X). 'IX what is the value of X's decision? function losses(X). '/,'/, how much did X lose? function income(X). 'IX how much does X stand to gain? 'IX, how much is a sale worth? multi sale-value(X) 9 [0,150,300]. 'IX* how much of X's stolen goods were recovered? multi goods-recovered(X) 9 [0,500,5000]. 'IX how much were X's stolen goods worth? multi stolen-goods-value(X) 0 [0,500,5000]. 'IX, when did X report thre burglary? multi burglary-report(X) 0 [immediate, late, none]. */,'/, compute the difference between X's income and losses '/,'/, c a l l i t value for X losses(X:alarm-owner), income(X:alarm-owner) * => value(X:alarm-owner) := { . . . } . « 'IX, X may recover some of his stolen goods depending on 'IX, when X reported the burglary The function here, as well as the contingency table for the schemata in this example are not essential for the understanding of the dynamic nature of the schemata. The actual numbers are found in the complete listing in Appendix A. 5 3. Using Dynamic Bayesian Networks burglary(X:alarm-owner), burglary-report(X:alarm-owner) => goods-recovered(X:alarm-owner) := [...]. Vh If X recovers stolen goods, then there is not loss goods-recovered(X:alarm-owner), stolen-goods-value(X:alarm-owner) => losses(X:alarm-owner) := {...}. */,*/, the income for Y depends on getting the sale from a client client(Y:alarm-owner), sale-obtained(Y:alarm-owner), sale-value(Y:alarm-owner) => income(Y:alarm-owner). := {•••}• Vh - A meeting may take place between X and Y Vh i f X doesn't go home go-home(X:alarm-owner) => meeting(X:alarm-owner,Y) := [ . . J . '/.'/. X w i l l report a burglary i f one occurred, and X has gone home Vh to verify i t go-home (X: alarm-owner), burglary (X: alarm-owner) => burglary-report(X:alarm-owner) := [...]• Vh i f there was a burglary, then goods of some value have been Vh stolen burglary(Y:alarm-owner) => stolen-goods-value(Y:alarm-owner) := [...]. */,'/, X w i l l meet with Y to talk sales. A sale may occur meeting(X:alarm-owner,Y:corporation) => sale-obtained(X:alarm-owner,Y:corporation) := [...]• Vh an alarm is affected by earthquakes and burglaries Vh - the alarm w i l l almost definitely sound i f both an earthquake Vh and a burglary occur, and almost definitely w i l l not sound i f Vh neither occur 44 3. Using Dynamic Bayesian Networks 45 burglary(Y:alarm-owner), earthquake => alarm(Y:alarm-owner) := C . . ] . 7,7, earthquakes tend to be reported on the r a d i o . . . earthquake => radio := [...]• 7,7, Phone calls from neighbours about alarms. 7,7, neighbours usually only c a l l when an alarm sounds 7,7. - non-neighbours don't c a l l at a l l ! neighbour(X:person,Y:alarm-owner), alarm(Y:alarm-owner) => phone-call(X:person,Y:alarm-owner) := [...]• - This example demonstrates the power of our simple network creating process. The user specifies the appropriate individuals: observe type(holmes.alarm-owner). observe type(watson,person). observe type(big-corp, corporation). and the network in Figure 3.5 is created. In [Breese, 1990], Breese makes use of influence diagram technology, such as decision nodes, which are useful in this problem. These types of nodes are not implemented in Influence. However, by conditioning on whether or not Holmes should go home, the user can determine the value of each decision by querying value (holmes). The value computed by this technique is not the same as the "value of a decision node" in Breese's program; Breese uses value nodes from influence diagram technology (as in [Shachter, 1986]) which computes an expected value based on the product of the possible values by their respective probability. In Influence there is no equivalent functionality built-in. 6 However, this computation can be simulated by querying each value of the node to obtain the posterior probability and performing the same sum-of-products by hand. 6 3. Using Dynamic Bayesian Networks 3.5 47 B u i l d i n g k n o w l e d g e bases for D y n a m i c B a y e s i a n networks In this chapter several example knowledge bases have been presented for different domains. A s it may not be obvious how these knowledge bases were constructed, this section w i l l present some ideas which have been useful i n creating the knowledge bases for these domains. Some common pitfalls are also identified, and alternatives are suggested. Separating properties from individuals T h i s is a fundamental technique. It should be noted that many different concepts may classify as individuals, depending on the domain, and on the intent of the application. For example, a person i n a story may be an i n d i v i d u a l , and the triple (Gatel, Gate2, InputPort) as used i n Section 3.2 to represent connections i n electronic circuits is also an i n d i v i d u a l . It seems useful to consider the kinds of information that a user might be able to observe. Direction In many cases, schemata are naturally written i n a causal direction. For example, the behaviour of a logic gate is quite naturally described causally, since its output depends on the inputs. T h e schemata for a given domain need not be i n the causal direction, but should be written w i t h a consistent direction. T h i s is a general observation which seems to aid i n keeping knowledge bases from creating directed loops i n the Bayesian networks created from it. It may arise that some causal relationships are not easy to quantify, and care should be taken to avoid w r i t i n g "loops" i n the knowledge base. Conditional probabilities and semantics W h i l e it is true that a conditional probability distribution may take any values which sum to one and remain consistent mathematically, it is important to remember that a contingency table has an intended meaning. W h e n creating the knowledge base, one should be certain that every entry i n the contingency table for a schema makes sense. For example, the following schema might look appropriate, according to the direction principle 3. Using Dynamic Bayesian Networks 48 above bird,emu —• flies because b o t h birdness and emuness have some influence on a bird's ability to fly. However, this schema requires an entry i n the contingency table for the expression p(flies\-ibird, emu), which, i f we know anything about emus and birds is inconsistent semantically. Assigning this probability is context dependent; that is, i f we have i n addition to the schema about flying, a schema such as emu —• bird stating that a l l emus are birds, the actual value of p(flies\~>bird, emu) is irrelevant. Some axiomatizations may specify a particular value for a probability w i t h inconsistent conditioning (for example, Aleliunas [Aleliunas, 1988] assigns a value of unity). Nevertheless, even i f the inconsistency may be easy to disregard i n s m a l l knowledge bases, i n larger ones, keeping track of a l l the irrelevant inconsistencies seems to be more work than redesigning the problematic schemata for consistency. In our b i r d example, we might rewrite the schemata as i n the following: emu —> bird emu —• abJIying bird, ab-flying —• flies where ab_flying is a random variable representing the possibility that something is abnormal w i t h respect to flying, i.e., it doesn't fly. K n o w i n g that emus are certainly birds, and that emus tend to be exceptions to the generality that "birds fly" provides the required numbers for the contingency tables. K e e p i n g the contingency tables semantically consistent and informative w i l l make modifications and extensions of the knowledge base less troublesome. The use of combination nodes C o m b i n a t i o n nodes, such as the existential and universal combinations discussed i n Section 2.2 are used to model multiple influences assuming a simple and regular combination 3. Using Dynamic Bayesian Networks 49 of effects. T h e y can be used i n a straightforward manner, as demonstrated i n E x a m p l e 3.51, where the combination was performed over a simple set of individuals. In Section 3.2, a more complex k i n d of i n d i v i d u a l was used to create a separate structure for each of these individuals. In many domains, causal or influential knowledge is best modelled by considering every cause for a particular effect (e.g., every disease which causes the same symptom) independently. Further, each cause is assumed to have an enabling/disabling condition. T h i s k i n d of assumption is modelled i n Pearl and others [Pearl, 1988] by a structure called a noisy-Or gate. It has the advantage of requiring a contingency table whose size is on the order of the number of causes, and each entry i n the table considers only one cause at a time. The existential combination introduced i n this thesis can be used to construct noisy-or gates. Suppose we wanted to model the causes of sneezing w i t h a n o i s y - O r gate, combining nodes like flu(X) and hay-fever(X), because it is known that either disease is likely to cause sneezing and that these diseases are independent. T h i s is stated i n the following: , flu(X) —> causes(flu(X), sneezing) hay — fever(Y) —• causes(hay — fever(Y), sneezing) 3Z € causes • causes(Z, sneezing) —> sneezing The user would then have to tell the system which causes to consider i n the model, and the system would create a structure which combines the causes. T h i s behaves exactly as a n o i s y - o r gate, since listing the causes first is equivalent to enabling the cause, and any cause not explicitly mentioned by the user is not considered by the combination at a l l , as if the cause were disabled. Avoiding too many instantiations It is possible to write a knowledge base of schemata without regard to the process which creates the network. However, it is often helpful to keep the process i n m i n d , to avoid w r i t i n g schemata which w i l l be instantiated too often. T h e most obvious way to keep the instantiations l i m i t e d is to make use of the type constraint mechanism. For example, when w r i t i n g the schemata for interpreting sketch maps i n Section 3.3, we could have used only one type of scene object for b o t h linear and area objects. If the following schema had been used 3. Using Dynamic Bayesian Networks 50 isa(X), isa(Y), joins(X.Y), equal(X.Y) => tee(X.Y) (where isa(X) could take any value from [road, river, shore, water, land] ) instead of isa-linear(X), isa-linear(Y), joins(X.Y), equal(X.Y) => tee(X,Y) as actually used i n Section 3.3, the network creation process would have instantiated schemata for a l l scene objects. N o useful information would have been obtained from instantiating them w i t h area scene objects. 3.6 Conclusions T h e example knowledge bases discussed i n this chapter demonstrate some of the abilities of d y n a m i c Bayesian networks as presented i n this thesis. It is clear that while most schemata are straight forward, some schemata make use of clever techniques which can only be learned by example, and by understanding the network b u i l d i n g process. T h i s seems to conflict w i t h the simple declarative representation which seems to characterize the language, presented i n Chapter 2. T h e process of b u i l d i n g networks is very simple; schemata are instantiated by every i n d i v i d u a l of the appropriate type, and the instantitations are added to the network. A s we c a n see from the sketch map example, sometimes there are many instantiations which do not lead to useful information, a n d could be left out of the network completely, i f the b u i l d i n g process were more intelligent. In work done independently of this thesis, G o l d m a n a n d C h a r n i a k [Goldman a n d Charniak, 1990] a n d Breese [Breese, 1990] have presented their work on constructing Bayesian networks from knowledge bases. B o t h approaches address the representation issues discussed i n this thesis, but emphasis is placed on more sophisticated procedures for building networks. G o l d m a n and C h a r n i a k use a d o m a i n specific rule base to construct networks for the d o m a i n , a n d Breese has d o m a i n knowledge i n the form of H o r n clauses which are used to determine which nodes to include i n the network. Chapter 4 Implementing dynamic Bayesian networks T h i s chapter deals w i t h the issue of implementing the dynamic features described i n C h a p ter 2. I present a brief description of the algorithms of Poole and Neufeld [1989],Pearl [1988, 1986], Shachter [1986, 1988, 1989], and Lauritzen and Spiegelhalter [1988], considering how each might be adapted for dynamic Bayesian networks. 4.1 The general problem Ideally, the dynamic constructs would be implemented so that when new individuals are added to the model, resulting i n arcs and nodes being added to the network, the joint probability distribution w i l l be correct and obtainable without recomputation of the previous evidence. T h a t is, we want to give some preliminary observations to our system to get some results, possibly observe new individuals, and get the updated results without having to recompute the effects of the initial observations. Take the simple example of the following schemata: a 3Xet-b(X) —• b(X) — • c T h i s example contains both a right-multiple schema and a combination schema, and F i g 51 4. Implementing dynamic Bayesian networks 52 Figure 4.1: A simple example of a dynamic network, (a) showing the original network, and (b) showing the network after another individual of type t is observed. See Section 4.1. i 4. Implementing dynamic Bayesian networks 53 ure 4.1a shows the result of instantiating the schemata for the individuals x\ and x , which are of type t. This network represents the independence assumptions which gives the following equality: 2 p(a, 6(ari), b(x ), 3X G t • b(X), c) 2 = e t • b(x)) (c\3X P p(3X G * • 6(A-)|6(ari)6(ar)) 2 p(K i)\ )p( ( 2)\a) x a b x p(a) Suppose a new individual x is observed dynamically. The probability model changes as a result, and ideally, the algorithm used in the inference process can absorb the change in the model without too much recomputation. The network is shown in Figure 4.1b, and the joint distribution for our example becomes: 3 p(a, bfa), b(x ), b(x ), 3X G t • b(X), c) 2 = 3 p(c\3X G t • b(X)) p(3X Gt • b(X)\b(x )b(x )b(x )) 1 2 3 p(K i)\ )p(K 2)\a)p(b(x )\a) x a x 3 p(a) This equation demonstrates the change to the model. One new factor has been added, and the existential combination has one more conditioning variable, b(x ). It is important to note that the factor p(b(x )\a) uses the same contingency table as p(b(x )\a), so no new information is required. Since the existential combination uses a contingency table which takes the or of all the conditions, all the system needs to know is the number of conditions. 3 3 3 To be able to use dynamic schemata efficiently, adding new arcs to the network should allow the algorithm which processes evidence and computes posterior distributions to modify the state of the network in a straightforward manner without recomputing the effects of evidence already submitted. This is not always easy to do, mainly because most of the algorithms perform some direct manipulation on the structure of the network, either to create a computationally tractible representation or to perform the calculations themselves. I will show that the addition of dynamic constructs to the work of Poole and Neufeld [Poole and Neufeld, 1989] is quite straight-forward, and that under certain conditions, Pearl's algorithm can be adapted as well. The algorithm due to Lauritzen and Spiegelhalter is not 4. Implementing dynamic Bayesian networks 54 difficult to adapt. F i n a l l y , I show that Shachter's network evaluation algorithm performs too many structural changes on the the network to allow a feasible implementation of dynamic schemata. 4.2 A n A x i o m a t i z a t i o n forProbabilistic in Prolog Reasoning T h e influence diagram interpreter described by Poole and Neufeld [Poole and Neufeld, 1989] provides a sound axiomatization for probabilistic reasoning. T h e computation is goal directed, and only the computations necessary to compute solutions to queries are performed. T h e I n f l u e n c e implementation of dynamic Bayesian networks is based on this interpreter. 4.2.1 The basic computational engine A Bayesian network is specified by a database of direct dependency relations and contingency tables associated w i t h each variable. T h e basic computation is goal directed: the desired conjunction of variables which make up a query is supplied to the system together w i t h the given evidence. O n l y those computations which are necessary to solve the query are performed, and no computation is performed outside the context of a query. Furthermore, each query is computed independently from any other query. T h e axiomatization of probability theory is straight forward, and axioms such as the Negation Rule, the L a w of M u l t i p l i c a t i o n of Probabilities, and Bayes' R u l e are are used to simplify and solve complex queries recursively. In particular, Bayes' R u l e is used to reformulate diagnostic queries (i.e. queries which ask about causes given effects) into a causal form (i.e. queries which ask about effects given causes). T h e direct dependencies of variables, made explicit i n the network structure, are compiled into P r o l o g rules which calculate posterior distributions by "reasoning by cases" w i t h respect to a l l of the variable's direct parents. T h e independence assumption for the parents of a variable provides some simplification of the general reasoning by cases formula, and weights the cases according to the contingency table provided for the variable. A query can be seen as a proof of a symbolic probability expression, grounded i n terms of 55 4. Implementing dynamic Bayesian networks prior probabilities and given knowledge. This expression can be evaluated numerically to provide a quantitative result. 4.2.2 Adding the dynamic combination Adding the dynamic constructs presented in Chapter 2 requires the addition of several axioms, recognizing the special syntax, and translating a query involving a combination node into a series of simpler queries. For existential combination nodes, the results of the simpler queries are combined using the following lemma: Lemma 1 If U = {u±,..., u } is a set of random variables such that ditionally independent given v for all i ^ j, then k and Uj are con- k p ( « i V «a V . . . V u \v) = 1-11(1i=i k p( \v)) Ui k p(ui A u A ... A u \v) = "[Jp(ui\v) 2 k i=l The first step in the combination procedure is to gather the individuals of the correct type. Each individual instantiates the parameterized random variable given in the combination node, and each instantiation is assumed to be independent, given the state of the combination. For existential combination, the collection of instantiations are combined using Lemma 1 which combines the effects of all the instantiations using the first result. The universal combination is similar, using the second equality. The ability to observe individuals dynamically is provided by the independence of successive queries. A probability model is based on the information provided by the query only, and therefore a new query, with new evidence in the form of individuals, builds a new probability model implicitly as the proof is generated. 4.2.3 Discussion The simplicity of this implementation makes it very attractive, both formally and pedagogically. The use of a logical specification helped to keep the developing theory of dynamic combinations correct as well as simple. 4. Implementing dynamic Bayesian networks 56 One advantage to this approach is that computation is goal driven, and only the calculations necessary to compute the query are performed. T h e p r i m a r y disadvantage is that, for consultations w i t h multiple queries, many identical computations may be performed more than once. A s a general belief computation engine, the current implementation is l i m i t e d , as only very s m a l l networks can be queried i n reasonable time. Various improvements to the compiler and probability axioms are possible, resulting i n a loss of clarity and obvious correctness. Some of these issues are explored i n [Horsch, 1990] 4.3 Pearl's Distributed Propagation and Belief Updating T h e work on belief propagation done by Pearl [Pearl, 1988] provides a probabilistic computational paradigm which is local i n nature, and each local computation requires little or no supervision from a controlling agent. Pearl has proposed a highly parallel "fusion and propagation" algorithm for computing posterior marginal probabilities using singly connected Bayesian networks. I w i l l describe the algorithm abstractly at first, saving the mathematical details for later. 4.31. Belief updating in singly connected networks In Pearl's approach, a Bayesian network describes the channels of communication between related variables. For Bayesian networks which are singly-connected, there are simple rules which govern the way the network is affected by the submission of evidence. These rules deal w i t h : • H o w a variable updates its own marginal distribution based on messages received from its direct parents or children. • T h e content of messages sent by a variable to its direct parents and children. For singly-connected graphs, the rules for updating a node's marginal distribution and communicating changes to its parents and children guarantee that the network w i l l converge to a stable equilibrium and the marginal distributions w i l l be correct. 4. Implementing dynamic Bayesian networks 57 Evidence is submitted to a network by asserting a value for a variable (or a collection of values for different variables). T h i s affects the marginal probability of that particular variable, and this variable sends evidential support values to its parent variables, and causal support values to its children. E a c h parent or child variable then updates its own posterior distribution based on the information gained from its parents and children, and sends out messages to its parents and children. These messages take into account that some of the parents or children have already seen the evidence, thus ensuring that no support is received more than once. W h e n a l l variables i n the network have updated their distributions i n this manner, the propagation ceases. One of the features of belief propagation is the natural way it can be expressed as a distributed process: the message passing and belief updating are local computations and can be performed i n a highly parallel computation. 4.3.2 Adding dynamic constructs In this section I demonstrate how the dynamic combination nodes can be added to Pearl's singly connected message passing algorithm. In general, a combination node i n a singly connected network could create a second path between two nodes. Therefore, to be completely general, Pearl's algorithm must be adapted to handle general graphs. I assume for simplicity that no loops are created when an i n d i v i d u a l is observed dynamically. T h i s assumption is valid i f any of the methods of Section 4.3.3 is used to evaluate the network i n the general case of multiply-connected networks. There are three cases: r i g h t - m u l t i p l e schemata, combination nodes and unique schemata. R i g h t - m u l t i p l e schemata involves the addition of a new child to a set of nodes. C o m b i nation nodes involve the addition of a new parent to a single node. Unique schemata are merely attached to either another unique schema, or at the t a i l of a r i g h t - m u l t i p l e schema or at the head of a combination node. T h e presentation follows Pearl [1988, page 183ff, Equations 4.47 to 4.53], considering the general case of adding new parents or children to the network. T h e special syntax for combination nodes and r i g h t - m u l t i p l e schemata is important for determining where the new sections of network are to be added, but unimportant w i t h respect to the propagation of support once added. 4. Implementing dynamic Bayesian networks 58 T h e d y n a m i c a d d i t i o n of parents Suppose we have a node C i n the network w i t h parents parents(C) = {Ai,... ,A }, and children children(C) = {Bi,... ,B } as i n Figure 4.2 (the dashed arc from A +\ t o C indicates the arc I w i l l be adding). n m n Intuitively, adding a parent to a node can be seen as revising an assumption about the d o m a i n . I n principle, the new parent can be treated as i f it h a d always been connected, but having had no effect on the child variable. A d d i n g the new node by a real process only changes the effect the assumed node has on its child. In the case of a n existential combination, we can treat the i n i t i a l state of the parent as having a prior probability of zero. A d d i n g the node can be seen as changing the message from zero to the value specified i n the knowledge base for this node. T h i s assumption effectively states that " a l l unknown individuals do not have the property we are looking for." For universal combination nodes, the i n i t i a l assumed message sent by the parent is unity, that is, we assume that the new node complies w i t h the property being combined. A d d i n g the new parent i n this case changes this value from unity to the prior specified i n the knowledge base. Before the a d d i t i o n of A +i the internal state of C can be inferred from C ' s neighbours by the following: n BEL(c) = p(c|e) (4.1) = P(^x\x)p(x\e^) (4.2) = (4.3) aA(c)7r(c) where e is the evidence submitted so far, e^ are those items of the evidence connected to C through its children, e^ are those items of evidence connected to C through its parents, and a; is a n o r m a l i z i n g constant such that £ aA(c)7r(c) = 1. N o w c A(c) = p(e- \x) (4.4) x = nw<o (4-5) i TT(C) = p(x\e+) (4.6) P(c|ai,-..,an)II c(a,-) = 7r ai,...,a„ (4.7) i The AB((C) are messages received by C from its children, a n d 7Tc(a;) are messages received by C from its parents. 4. Implementing dynamic Bayesian networks Figure 4.2: Adding a parent A 59 n+i to node C. 4. Implementing dynamic Bayesian networks The probabilities p(c\ai,a ) 60 are those supplied as contingency tables. n In the case of c o m b i n a t i o n nodes, these are functional: for existential combinations, we use a funct i o n w h i c h computes p(c\ai,..., a ) n p ( c | a i , . . . , a B ) = = V"=ia,-, and for universal combination we use A?=i a*- A d d i n g a parent A i to parents(C) requires an update for the state of C. A s s u m e that no new evidence is submitted at the time of adding the new parent. L e t BEL'(c) be the new state. Therefore: n+ BEL'(c) A'(c) = p(c\e) (4.8) = a\'(cW(c) (4.9) = U B.(C) (4.10) = A(c) (4.11) X *'( ) = P(c\ai,...,a )Y[TTc(ai) Yl c (4.12) n+1 ai,...,a +i n Note that A(c), which is the support gained from C ' s children doesn't change. T h e new 7r'(c) reflects the addition of the new parent. T h e contingency table used before cannot i n general be used i n this new calculation; a new table must be supplied. However, i n the case of c o m b i n a t i o n nodes, these probabilities are functional and can be w r i t t e n to accept an arbitrary number of dependent variables. H a v i n g updated its own internal state, C must send appropriate messages to a l l of its children and every parent except the new one. A s well, the new parent must be made aware of the evidence previously seen by the network. The content of the message sent by C to its parents due to the a d d i t i o n of A written as: n + i can be n+l Ac(ai) = P^2\(c) ^2 p(c\a ...,a i)Y[7r (a )i^n u c n+ k + l (4.13) where (3 is any convenient constant. Note that each parent of C receives information from each of its mates, and about every child of C from A(c). The message C sends to A +\ should i n one message provide information about a l l the evidence seen by C so far. T h e message sent can be written: n Ac(o„+i) = (5 £ \(c) y j K l i>-••>««+!)IlMai) c a (- ) 4 14 4. Implementing dynamic Bayesian networks The 61 message sent to each child Bi i n children(c) is: *Bi(c) (*l[\ (c) = P ( c | « i , . . . , n+l a Bk k^j (4.15) ai,...,a„+i BEL'(c) (4.16) Finally, the added parent A +i can update its own state based on the message received from C and send messages to its children and parents i n the manner originally outlined by Pearl. n The dynamic addition of children Suppose we have node A i n a network w i t h parents parents(A) = {D±,..., D / } and children children(A) = {Bi,... ,B }, as i n Figure 4.3 (again, the dashed arc from A to B i indicates the one that w i l l be added). n n+ Intuitively, the addition of the new child is similar to the way parents were added in combination nodes. It can be assumed that the child has always been connected, but having had no i n i t i a l effect on its parent. T h e assumed i n i t i a l state gives no preference to the values of the child, that is, the assumed prior is 1/v, where v is the number of possible outcomes of the new child. A d d i n g the child has the effect of changing the assumed irrelevance to the value specified by the knowledge base. Before the addition of the new child B i, n+ BEL(a) the internal state is given by: = p(a|e) (4.17) = a\(a)TT(a) (4.18) where e is the evidence submitted so far, and a is a normalizing constant. N o w (4.19) i ^(a) = Y, p(a\d ...,di)Y[ir (di) u A (4.20) A d d i n g a child D i to children(A) requires an update for the state of A. Assume that no new evidence is submitted at the time of adding the new parent. Let BEL'(a) be the n+ 4. Implementing dynamic Bayesian networks Figure 4.3: Adding a child 62 B n + i to node A. 4. Implementing dynamic Bayesian 63 networks new state. Therefore: BEL'(a) p{a\e) (4.21) a\'( y(a) (4.22) a nw«) A'(a) (4.23) i X(a)X (a) (4.24) Bn+1 •••) /)II c(ai) rf di,...,di (4.25) 7r 7T(o) (4.26) Note that 7r(a), which is the support gained from A ' s parents doesn't change. T h e new A'(a) reflects the a d d i t i o n of the new child. H a v i n g updated its own internal state, A must send appropriate messages to a l l of its children except the new one, and to every parent. A s well, the new child must be made aware of the evidence previously seen by the network. T h e content of the message sent by A to its parents due to the a d d i t i o n of B i written as: n+ x (di) = A £ p(c\d ,...,d )H* {d ) 1 l A k can be (4.27) where (3 is any convenient constant. Note that each parent of A receives information from each of its mates from Tr (d ) and about every child of A from A'(a) and A k T h e message sent to each child Bi, i ^ n + 1 i n children(A) Tr (a) Bi = aY[X {a) Bk £ P(a\d u is: ..., a{) fj 7r A «) BEL'(a) (4.28) (4.29) X (a) Bi T h e message A sends to B i can be written: n+ n 7T£„ +1 = a]]X (a)Y2di,--.,dip(a\di,...,ai)'[l'ir (di) Bk A (4.30) (4.31) (4.32) 4. Implementing dynamic Bayesian networks 64 which is the previous state of A. F i n a l l y , the added child D \ can update its own state based on the message received from A and send messages to its children and parents i n the manner originally outlined by Pearl. n+ Adding unique schemata dynamically A d d i n g unique schemata requires no special handling. T h i s is because they can be attached only to nodes which were added d y n a m i c a l l y as part of a r i g h t - m u l t i p l e or combination structure. T h e i r internal state is inferred from their parents and children, which are also new to the network. T h e standard computation applies to these nodes. W h e n the message due to the change i n network structure reaches the nodes of the unique schemata, the u p d a t i n g and subsequent message passing occurs i n the manner described by P e a r l . 4.3.3 Propagation in general Bayesian networks For multiply-connected networks, Pearl's message passing scheme does not apply. T h e problem is that networks which have multiple causal chains between variables lose the conditional independence assumption which permitted the local propagation algorithm to distribute messages along each arc. Furthermore, then network can get caught i n an infinite stream of messages. Solutions to this problem include: • K e e p i n g a list of variables i n the history of the propagation. T h i s requires message lengths exponential i n the number of variables i n the network. • C l u s t e r i n g sibling nodes along the multiple paths, creating a single chain. requires possibly exponential time to identify loops and form the clusters. This • Disconnecting the loop at the head (i.e. at the variable which begins the multiple chains). T h e variable at the head is instantiated to each value it can attain, and the effects of this instantiation are averaged. T h i s requires computations exponential i n the number of values the head of a loop can attain. • Stochastic simulation, i n which the variables take random values according to the probability distribution derived from the current state of its direct parents. T h e 4. Implementing dynamic Bayesian networks 65 posterior distribution is then taken as the ratio of assigned values to the number of t r i a l runs made. T h e drawback to this approach is that convergence to the correct distribution is guaranteed under certain conditions, but the number of trials necessary for a particular accuracy is undetermined. 4.3.4 Discussion I have shown how Pearl's fusion and propagation algorithm can be modified to handle dynamic constructs i n a way which correctly reflects the change i n the probability model. The adaptation is at a cost of locality: the purely local nature must be enhanced by a controlling program which adds nodes and arcs to the net and starts the appropriate message propagation. F i n a l l y , I have only treated the special case where the dynamic constructs do not create loops i n previously singly-connected networks. A dynamic scheme to handle this general case, using the ideas presented i n Section 4.3.3, would be another interesting project. 4.4 L a u r i t z e n and Spiegelhalter's c o m p u t a t i o n of belief L a u r i t z e n and Spiegelhalter [Lauritzen and Spiegelhalter, 1988] address the issue of performing local computations of beliefs using Bayesian networks as expert system inference engines. In particular, they are concerned w i t h computing beliefs on large, sparse networks which are not necessarily singly connected. Briefly, their approach first modifies the structure of the Bayesian network representing the d o m a i n knowledge, creating a full moral graph. Cliques are identified, and the belief computation uses them to propagate support values throughout the network. 4. Implementing dynamic Bayesian networks 4.4.1 66 Belief Propagation on Full Moral Graphs T h e i r a l g o r i t h m can be described as consisting of three steps. First, the Bayesian network is triangulated (and referred to as a full moral graph). Second, cliques i n the triangularized network are ordered using a m a x i m a l cardinality search (i.e. starting from an arbitrary node, of a l l neighbour nodes yet to be labelled, the node having the most already labelled neighbours is labelled first), and each clique is treated as a compound variable i n a tree structure connecting these cliques. Finally, the ordering of the m a x i m a l cardinality search is used to direct the arcs i n the tree, and the method of propagation i n singly-connected networks is used to update probabilities. 1 4.4.2 Adding dynamic constructs Briefly, L a u r i t z e n and Spiegelhalter's algorithm can be adapted to include the dynamic structures i n the following manner. T h e new section of the network is added (dropping the direction of the arcs) to the extant triangularized structure. T h e network is retriangularized, and following the above procedure, cliques are identified and a tree of compound nodes is created. A t this point the propagation procedes on the new structure i n the familiar manner. T h e complication of adding the new nodes to the triangularized structure can be approached by using the observation from Section 4.3.2, namely treating the arcs as i f they had always been i n the network, but having had no effect on the rest of the network. T h i s observation can be used to change the marginal distribution of each clique which has a new node i n it or is a neighbour to a such a clique. 4.5 Probabilistic inference using Influence diagrams In Operations Research, influence diagrams have been used as a c o m m o n analysis tool for decision analysis under uncertainty. T h e approach of Shachter is to provide a normative axiomatic framework for this domain, which has often used ad hoc methods [Shachter, 1986, Shachter, 1988, Shachter, 1989]. A n influence diagram is a network similar to a Bayesian network, relating random variThis observation is due to Pearl, from his commentary on [Lauritzen and Spiegelhalter, 1988], and also noted by Shachter in [Shachter, 1989]. 1 4. Implementing dynamic Bayesian networks 67 ables and identifying probability distributions graphically. In addition to random variables (called chance nodes i n Operations Research), influence diagrams also have nodes to represent expected outcomes and decisions. Influence diagrams without these special nodes are Bayesian networks, and i n this overview, these w i l l only be considered i n passing. For purposes of decision analysis, a l l probabilistic nodes are removed from the network, transferring their effect on decisions to a single table which ranks the decision options based on expected cost. For less specific applications, Shacter has shown [Shachter, 1989] that his algorithm can be used for evidence absorption and propagation by removing evidence nodes from the network. Network evaluation is performed using two techniques: arc reversal and node removal. A barren node, i.e. a node w i t h no children, can be removed from a network without affecting the underlying distribution. A n arc can be reversed, that is, the arc from a node A to a node B can be tranformed into an arc from B to A, i f there is no other directed path from A to B. T h i s transformation reverses the conditional probabilities as well, and both A and B are "adopted" by each other's direct parents. W h e n only probabilistic nodes are used i n the diagram, the procedure of arc reversal corresponds to Bayes' theorem. A r c reversals are used to change nodes having children into barren nodes so they can be removed from the network. Network evaluation can be seen as the sequence of node removals and arc reversals necessary to remove a l l nodes from the network. Queries to the network can be performed by the sequence of arc reversals necessary to remove the nodes which condition the query. 4.5.1 Adding dynamic constructs G i v e n the very d y n a m i c network evaluation already inherent i n Shachter's algorithm, it is not especially helpful to add the k i n d of dynamic structures I have presented i n Chapter 2. A s an example of the difficulties involved, consider the network i n Figure 4.4a. Following Shachter's algorithm, evaluating the network w i t h a series of node removals and arc reversals results i n the network of Figure 4.4b. If a node B3 is added to the network i n this state, perhaps because a user has just discovered a new i n d i v i d u a l i n the model, the system must perform the evaluation again from the original network. In general, an influence diagram could be built by instantiating a collection of schemata, and Breese demonstrates this by taking a similar approach to building influence diagrams [Breese, 1990]. Shachter's evaluation could be adapted to use combination nodes (i.e., to 4. Implementing dynamic Bayesian networks Figure 4.4: The problem of adding a node after arc reversals have been performed. 68 4. Implementing dynamic Bayesian networks 69 be used as a noisy-Or gate as in Section 3 . 5 , for example), but if individuals are observed after a network has already been created, and after some evaluation has been performed, a new network must be created, and all of the previous arc reversals and node removals must be repeated for the new network. 4.6 Conclusions In this chapter I have presented several Bayesian network evaluation techniques, and discussed how the dynamic techniques of Chapter 2 might be implemented. Some of these algorithms lend themselves well to dynamic schemata, and they benefit from the observation that the parts of the network which are added dynamically can be treated as if they had always been present in the network; the addition merely has the effect of changing how these nodes affect the rest of the network. There are other network evaluation algorithms in use or being developed which have not been discussed in this chapter. These may also be adaptable to dynamic constructs and may benefit from this same observation. The question of whether an evaluation algorithm can be modified to use dynamic schemata says more about the intended application than about the algorithm itself. For expert system use, a Bayesian network is a batch process, and the computation is designed to be autonomous and inflexible. These constraints are quite exploitable for implementing dynamic constructs for these algorithms. The fact that Shachter's algorithm is typically used in a very interactive consultation with a user, who is quite likely an expert in the domain she is modelling, means that the dynamic constructs are essentially unnecessary. The user herself can add to the network interactively should the necessity arise. Chapter 5 Conclusions 5.1 W h a t dynamic Bayesian networks can't do In this section, we consider the limitations and disadvantages of dynamic Bayesian networks as presented i n this thesis. Some of these issues could be addressed as part of future research springing from this thesis. T h e most obvious disadvantage to the process of instantiating parameterized schemata to create Bayesian networks is the problem of finding individuals: the user of the system must be fully aware of the i n d i v i d u a l types, and the criteria for deciding types. Furthermore, the correctness of the model constructed by the process is dependent on the competence of the user who supplies the individuals. In a similar vein, the combination mechanisms presented i n Chapter 2 take only known individuals into account when performing the combination. A facility for hypothesizing some unknown i n d i v i d u a l , perhaps based on the condition that a l l causes due to known individuals have been dismissed, seems to be a valuable addition. A n o t h e r related issue is the type hierarchy itself. A s implemented, the network creating process requires that the user identify the individual's type, and i f the i n d i v i d u a l belongs to more than one type (perhaps because private detectives are people too), the user must identify a l l the types to which an i n d i v i d u a l belongs. O f immediate benefit to the ideas presented i n this thesis would be a study on hierarchical types, w i t h an eye towards efficiency considerations and representational adequacy. T h e fact that the knowledge base of schemata i m p l i c i t l y represents possibly infinitely many, 70 5. Conclusions 71 possibly infinitely large directed networks, the possibility that some of those networks are not acyclic, i.e., not Bayesian networks, is cause for concern. A d d i n g a loop check to the network creation process implementation is a simple matter, but it only identifies directed loops after the schemata are instantiated. Some u t i l i t y or theorem which assures that a knowledge base w i l l not result i n cyclic networks would be a desirable result. Currently, the possibility of loops is treated i n the same way left-recursive logic programs are treated by the logic programming community. 5.2 Other future possibilities One of the fundamental ideas of my approach was to keep the network construction process simple, so that the issue of representation could be more fully addressed. One obvious extension of this thesis would be to design a more sophisticated construction process which would m a i n t a i n the consistency of the probability distribution and the precision of the m a p p i n g between the knowledge base and the networks which can be created from it. T h e existential and universal combination mechanisms presented i n Chapter 2 are only two of many possible ways to combine information. It seems useful to consider how to implement a mechanism i n which some fraction of a set of parent random variables must be true i n order for an effect to be triggered. 5.3 Some final remarks T h i s thesis has presented a dynamic approach to the use of Bayesian networks. T h e approach was motivated by the need to model domains i n which it would be difficult to anticipate some of the details. In particular, for domains which can be modelled by general knowledge of properties and their interactions, this general knowledge can be used in specific instances to create a probabilistic model. C h a n g i n g the instances changes the model. A simple language of parameterized probabilistic knowledge, augmented w i t h structures which d y n a m i c a l l y combine the effects of several causes, was described. A procedure to create networks by instantiating statements i n this language w i t h particular individuals which have been submitted to the system, was outlined as well. A n implementation of this language was described briefly, and several examples were pre- 5. Conclusions 72 sented using this implementation. T h e language seems to be versatile, but requires some careful programming for effective use. Several algorithms were discussed, due to Pearl [1988], Poole and Neufeld [1989], Lauritzen and Spiegelhalter [1988], and Shachter [1986], currently used to evaluate Bayesian networks, and some modifications were suggested which would implement d y n a m i c Bayesian networks in these systems. T h i s thesis focussed on providing a precise framework for representing parameterized probabilistic knowledge, and a firm basis for building networks from this knowledge. Because the network construction process presented here was kept quite simple, the issue of representation could be more fully addressed. T h e result is a language for representing parameterized probabilistic dependencies which precisely defines the m a p p i n g between the knowledge and the Bayesian networks created from it, as well as ensuring a consistent probability d i s t r i b u t i o n which is based on a l l available knowledge. Bibliography [Aho et al, 1974] A . V . A h o , J . E . Hopcroft, and J . D . U l l m a n . The Design and Analysis of v Computer Algorithms. Addison-Wesley, Reading, Mass., 1974. [Aleliunas, 1988] R . Aleliunas. A new normative theory of probabilistic logic. In Proc. Seventh Conf. of the CSCSI, pages 67-74, 1988. [Andreassen et al, 1987] S Andreassen, M . Woldbye, B . Falck, and S . K . Andersen. Munin-—a causal probabilistic network for interpretation of electromyographic findings. In Procedings of the Tenth International Joint Conference on Artificial Intelligence pages 366-372, 1987. [Breese, 1990] Jack Breese. Construction of belief and decision networks. Rockwell Internation Science Center, forthcoming, 1990. [Cheeseman, 1988] P. Cheeseman. In defense of probability. Computational Intelligence, 4(l):58-66, 1988. [Cooper, 1990] G . F . Cooper. T h e computational complexity of probabilistc inference using bayesian belief networks (research note). Artificial Intelligence, 42(2-3):393-405, 1990. [D'Ambrosio, 1989] Bruce D ' A m b r o s i o . Incremental goal-directed probabilistic inference. Dept. of C o m p u t e r Science, Oregon State University (draft), 1989. [de Kleer and W i l l i a m s , 1987] J . de Kleer and B . C . W i l l i a m s . Diagnosing multiple faults. Artificial Intelligence, 32(1):97-130, 1987. [Gardenfors, 1988] Peter Gardenfors. Knowledge in Flux. M I T Press, Cambridge, Mass., 1988. 73 [Geiger, 1990] D a n Geiger. Graphoids: A qualitative framework for probabiolistic i n ference. Technical Report R-142, Cognitive Systems Laboratory, Dept. of Computer Science, U C L A , 1990. [Goldman and Charniak, 1990] Robert P. G o l d m a n and Eugene C h a r n i a k . D y n a m i c construction of belief networks. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 90-97, 1990. [Horsch and Poole, 1990] M i c h a e l C . Horsch and D a v i d Poole. A dynamic approach to probabilistic inference using bayesian networks. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 155-161, 1990. [Horsch, 1990] M i c h a e l C . Horsch. Influence: A n interpreter for dynamic bayesian networks. Dept. of Computer Science, University of B r i t i s h C o l u m b i a . In preparation., 1990. [Jensen et al, 1988] F . V . Jensen, K . G . Olesen, and S . K . Andersen. A n algebra of bayesian belief universes for knowledge based systems. Technical Report R-88-25, Institute of Electronic Systems, A a l b o r g University, 1988. [Jensen et al, 1990] F . V . Jensen, S.L Lauritzen, and K . G Olesen. Bayesian updating i n recursive graphical models by local computations. T o appear i n : Networks, 1990. [Laskey, 1990] K a t h r y n B l a c k m o n d Laskey. A probabilistic reasoning environment. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 4 1 5 422, 1990. [Lauritzen and Spiegelhalter, 1988] S. L . Lauritzen and D . J . Spiegelhalter. L o c a l computation w i t h probabilities on graphical structures and their application to expert systems. J. R. Statist Soc B, 50(2):157-224, 1988. [Lindley, 1965] D . V . Lindley. Introduction to Probability & Statistics from a Bayesian viewpoint. Part 1. Probability. Cambridge University Press, 1965. [Mulder et al, 1978] J . A . M u l d e r , A . K . M a c k w o r t h , and W . S. Havens. Knowledge structuring and constraint satisfaction: T h e mapsee approach. Technical Report 87-21, Dept. of Computer Science, University of B r i t i s h C o l u m b i a , Vancouver., 1978. [Neufeld and Poole, 1987] E . Neufeld and D . Poole. Towards solving the multiple extension problem: C o m b i n i n g defaults w i t h probability. In Procedings of the Third Workshop on Uncertainty in Artificial Intelligence, pages 305-312, Seattle, 1987. 74 [Pearl, 1986] Judea Pearl. Fusion, propagation and structuring i n belief networks. Artificial Intelligence, 29(3):241-288, 1986. [Pearl, 1988] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Reasoning. M o r g a n Kauffman Publishers, L o s A l t o s , 1988. [Poole and Neufeld, 1989] D a v i d Poole and E r i c Neufeld. Sound probabilistic inference i n prolog: A n executible specification of bayesian networks. 1989. [Poole and Provan, 1990] D a v i d Poole and Gregory Provan. W h a t is an o p t i m a l diagnosis. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, pages 46-53, 1990. [Poole, 1989] D a v i d Poole. A methodology for using a default and abductive reasoning system. Technical Report 89-20, Dept. of Computer Science, University of B r i t i s h C o l u m b i a , Vancouver, 1989. [Reiter and M a c k w o r t h , 1989] R . Reiter and A . K . M a c k w o r t h . A logical framework for depiction and image interpretation. Artificial Intelligence, 41(2):125-155, 1989. [Reiter, 1987] R . Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 3 2 ( l ) : 5 7 - 9 5 , 1987. [Schubert, 1988] J . K . Schubert. Cheeseman: A travesty of truth. Computational Intelligence, 4(1):118-121, 1988. [Shachter, 1986] Ross D . Shachter. E v a l u a t i n g influence diagrams. Opns Rsch, 34(6):871882, 1986. [Shachter, 1988] Ross D . Shachter. P r o b a b i l i s t i c inference and influence diagrams. Opns Rsch, 36(4):589-604, 1988. [Shachter, 1989] Ross D . Shachter. Evidence absorption and propagation through evidence reversals. In Proceedings of the 9th Annual Workshop on Uncertainty in Artificial Intelligence, pages 303-308, 1989. 75 A p p e n d i x A Influence c o d e f o r t h e A.l examples Diagnosis of multiple faults for digital circuits The following is a complete listing of the knowledge base for modelling circuits. Section 3.2. binary ok(Gate:gate). binary output(Gate:gate). binary input(Gate:gate, Port:port). b i n a r y e x i s t s ( ( G I , G2, P o r t ) , c o n n e c t i o n s , o u t p u t ( G I ) ) . /* a n d — g a t e s */ ok(and_gate(G):gates), input(and_gate(G):gates,1), i n p u t ( a n d - g a t e ( G ) : g a t e s , 2 ) => o u t p u t ( a n d . g a t e ( G ) : g a t e s ) := [ 1 , 0 , 0 , 0 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ] . /* o r — g a t e s */ ok(or_gate(G):gates), input(or_gate(G):gates,1), i n p u t ( o r _ g a t e ( G ) : g a t e s , 2 ) => o u t p u t ( o r _ g a t e ( G ) : g a t e s ) := [ 1 , 1 , 1 , 0 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ] . /* x o r — g a t e s */ ok(xor_gate(G):gates), input(xor_gate(G):gates,1), i n p u t ( x o r _ g a t e ( G ) : g a t e s , 2 ) => o u t p u t ( x o r _ g a t e ( G ) : g a t e s ) := [ 0 , 1 , 1 , 0 , 0 . 5 , 0 . 5 , 0 . 5 , 0 . 5 ] . 76 See A. Influence code for the examples 77 exists((Gl,G2,Port), connections, output(Gl)) => input(G2:gates,Port:ports). := [1,0]. observe observe observe observe observe A.2 type(and-gate(gl).gates). type(or.gate(g2).gates). type(xor_gate(g3).gates). type((and-gate(gl),xor-gate(g3),1) connections), type((or_gate(g2),xor_gate(g3),2) connections). Interpreting sketch maps The following Influence code creates dynamic Bayesian networks for the d o m a i n discussed i n Section 3.3 scene objects can be linear or area objects multi isa-linear(X) Q [road, r i v e r , shore], multi isa-area(X) ® [land, water]. 7.7. Image objects: binary tee(X.Y). '/,'/. chain X meets chain Y in a tee-shape binary chi(X.Y). 7.7. chain X and chain Y make a chi—crossing binary bounds(X,Y). '/,'/, chain X bounds area Y binary interior(X,Y) . '/,'/, Area Y is in the interior of chain X binary exterior(X.Y) . '/.•/, Area Y i s exterior of chain X 7.7. Scene descriptions: binary joins(X,Y). '/,'/,linear scene objects j o i n i n a tee shape binary crosses(X,Y) . '/,'/, linear scene objects cross binary inside(X.Y). '/,'/, area object X i s inside linear object X binary outside(X.Y) . 7.7. area object X is outside linear object X 'I,'!, Trick implementation of equality predicate! '/,'!, Two d i s t i n c t random variables, which have their Vl, arguments as values: multi vall(X) Q [X, X]. multi val2(Y) 8 [Y, Y ] . 7.7. equality is then define i f the values are the same! function equal/2. A. Influence code for the examples vall(X), val2(Y) 78 => equal(X,Y) {p(equal(X,Y),[vall(X)=X,val2(Y)=Y]) = i f (X=Y) then 1.0 e l s e 0.0}. '/,'/, Two l i n e a r scene o b j e c t s can j o i n t o form a t e e isa-linear(X:chain), isa-linear(Y:chain), joins(X:chain,Y:chain), equal(X:chain,Y:chain) => t e e ( X : c h a i n , Y : c h a i n ) := [1, '/.'/. a road can meet i t s e l f i n a t e e , but n o t h i n g e l s e 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, •/.'/, equal o b j e c t s . . . 1,1,0, '/,'/, a road or shore can j o i n a r o a d , a shore cannot 1,1,0, 7,7, a road o r a r i v e r can j o i n a r i v e r , a shore cannot 1,1,0, 7.7. a road o r a r i v e r can j o i n a s h o r e , a shore cannot 0,0,0,0,0,0,0,0,0]. 7.7. j o i n i s f a l s e 7.7. two l i n e a r o b j e c t s can c r o s s t o form a c h i isa-linear(X:chain), isa-linear(Y:chain), crosses(X:chain,Y:chain), equal(X:chain,Y:chain) => c h i ( X : c h a i n , Y : c h a i n ) 7.7. no scene o b j e c t s c r o s s themselves := [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 7.7. (equal i s t r u e ) 1,1,0, 7.7. roads and r i v e r s can c r o s s a r o a d , not shores 1,0,0, 7.7. roads can c r o s s a r i v e r , not r i v e r s o r shores 0,0,0, 7.7. shores cannot c r o s s anything 0,0,0,0,0,0,0,0,0]. 7.7. c r o s s i s f a l s e , so c h i i s not p o s s i b l e 7.7. l i n e a r o b j e c t s can form c l o s e d loops i s a - l i n e a r ( X : c h a i n ) , l o o p ( X : c h a i n ) => c l o s e d ( X : c h a i n ) := [1,0,1,0,0,0]. 7.7. only roads and shores form loops 7.7. l i n e a r scene object are found beside isa-area(X:region), isa-linear(Y:chain), => b o u n d s ( X : r e g i o n , Y : c h a i n ) . area o b j e c t s beside(X:region,Y:chain) := [1,0, 7.7. o n l y l a n d can be beside 1.0, 7.7. o n l y l a n d can be beside roads rivers 1.1, 7.7. l a n d and water can be beside 0,0,0,0,0,0]. 7.7. beside i s f a l s e shores 7.7. on t h e l i n e a r object which forms the boundary between two 7,7. a r e a o b j e c t s , we must c o n s t r a i n the o b j e c t s i n v o l v e d 7.7. e . g . A r o a d i s not a boundary between two bodies of water isa-area(X:region), isa-linear(Y:chain), isa-area(Z:region), inside(X:region,Y:chain), outside(Z:region,Y:chain) => b o u n d a r y - c o n s t r a i n t ( X : r e g i o n , Y : c h a i n , Z : r e g i o n ) . := [1,0, 7.7. only l a n d can be i n s i d e a r o a d boundary 0,0, 7.7. r i v e r boundaries are not allowed A. Influence code for the examples 0,1, '/,'/, shores have water inside, land outside 0,0,0,0, 7.7. when water i s outside, roads, rivers can't bound 1,0, '/,'/, water outside, shore boundary then only land inside 0,0,0,0,0,0,0,0,0,0,0,0, 7.7. inside and outside are false 0,0,0,0,0,0,0,0,0,0,0,0]. A.3 T h e Burglary of the House of Holmes The following Influence code models the problem of Section 3.4. binary binary binary binary binary binary binary sale-obtained(X). '/,'/, has a sale been obtained by X? burglary(X). '/,'/, was there a burglary i n X's house? go-home(X). '/.'/. does (should) X go home IMMEDIATELY? meeting(X.Y). '/,*/, was there a meeting between X and Y? earthquake. '/.'/, was there an earthquake? radio. 7.7. did the radio report an earthquake ? client (X). '/,'/, X has a client? function value(X). '/,'/, what i s the value of X's decision? function losses (X). '/,'/, how much did X lose? function income(X). '/,'/, how much does X stand to gain? 7,7, how much i s a sale worth? multi sale-value(X) 9 [0,150,300]. 7.7. how much of X's stolen goods were recovered? multi goods-recovered(X) ® [0,500,5000]. 7.7. how much were X's stolen goods worth? multi stolen-goods-value(X) (3 [0,500,5000]. 7.7. when did X report thre burglary? multi burglary-report(X) (3 [immediate, l a t e , none]. 7.7. compute the difference between X's income and losses 7.7. c a l l i t value for X losses(X:alarm-owner), income(X:alarm-owner) => value(X:alarm-owner) := {p(value(X)=V,[losses(X)=V1,income(X)=V2]) i f (V i s V2 - VI) then 1.0 else 0.0}. 7.7. X may recover some of his stolen goods depending on 7.7. when X reported the burglary 79 A. Influence code for the examples burglary(X:alarm-owner), burglary-report(X:alarm-owner) => goods-recovered(X:alarm-owner) := [0.01, 1, 0.5, 1 , 0 . 1 , 1 ] . 7.7. If X recovers stolen goods, then there is not loss goods-recovered(X:alarm-owner), stolen-goods-value(X:alarm-owner) => losses(X:alarm-owner) := {p(losses(X)=V,[goods-recovered(X)=V1,stolen-goods-value(X)=V2]) i f (V is V2 - VI) then 1.0 else 0.0}. 7.7. the income for Y depends on getting the sale from a client client(Y:alarm-owner), sale-obtained(Y:alarm-owner), sale-value(Y:alarm-owner) => income(Y:alarm-owner). := {p(income(Y)=V,[client(Y), sale-obtained(Y), sale-value(Y)=V1]) i f (V is VI) then 1.0 else 0.0}. '/,'/, - A meeting may take place between X and Y '/,'/, i f X doesn't go home go-home(X:alarm-owner) => meeting(X:alarm-owner,Y:corporation) := [0, 0.75]. 7.7. X may get a client i f X has a meeting with a corporation exists(X,corporation,meeting(Y:alarm-owner,X)) => client(Y:alarm-owner) := [0.75,0.0]. 7.7. X w i l l report a burglary i f one occurred, and X has gone home */,'/, to verify i t go-home (X: alarm-owner), burglary(X: alarm-owner) => burglary-report(X:alarm-owner) := [1.0, 7,7, X went home immediately and reported a burglary 0.0, 7.7. so the report i s n ' t l a t e . . . 0.0, 7.7. . . . or not reported at a l l 0.0, 7.7. didn't go home, so report can't be immediate 1.0, 7.7. assume X reports a burglary eventually... 0.0, 7.7. (X reported the burglary LATE, not NONE) 7.7. no burglary, no report! 0.0, 0.0, 1.0, 0.0, 0.0, 1.0]. 7.7. i f there was a burglary, then goods of some value have been 7.7. stolen burglary(Y:alarm-owner) => stolen-goods-value(Y:alarm-owner) := [0.05, 7.7. burglary, but s-g-v=0 0.55, 7.7. burglary, but s-g-v=500 0.4, 7.7. burglary, but s-g-v=5000 1.0, 7.7. no burglary, and s-g-v=0 0.0, 7.7. no burglary, and s-g-v=500 0.0] .7.7. no burglary, and s-g-v=5000 80 A. Influence code for the examples '/,'/, X w i l l meet w i t h Y t o t a l k s a l e s . meeting(X:alarm-owner,Y:corporation) 81 A s a l e may occur => s a l e - o b t a i n e d ( X : a l a r m - o w n e r , Y : c o r p o r a t i o n ) := [ 0 . 5 , 0 . 1 ] . 7.7. 7,7. 7,7. 7.7. an a l a r m i s a f f e c t e d by e a r t h q u a k e s and b u r g l a r i e s - t h e a l a r m w i l l almost d e f i n i t e l y sound i f b o t h an earthquake and a b u r g l a r y o c c u r , and almost d e f i n i t e l y w i l l n o t sound i f neither occur b u r g l a r y ( Y : a l a r m - o w n e r ) , e a r t h q u a k e => alarm(Y:alarm-owner) := [ 0 . 9 9 , 0.1, 0.8, 0 . 0 0 1 ] . 7.7. e a r t h q u a k e s t e n d t o be r e p o r t e d o n t h e r a d i o . . . e a r t h q u a k e => r a d i o := [ 0 . 9 8 , 0 . 0 ] . 7.7. 7.7. 7.7. ne Phone c a l l s f r o m n e i g h b o u r s a b o u t a l a r m s . - n e i g h b o u r s u s u a l l y o n l y c a l l when a n a l a r m s o u n d s - non-neighbours don't c a l l a t a l l ! ighbour(X:person,Y:alarm-owner), a l a r m ( Y : a l a r m - o w n e r ) => p h o n e - c a l l ( X : p e r s o n , Y : a l a r m - o w n e r ) := [ 0 . 6 , 0.0, 0.2, 0 . 0 ] . Appendix B Source code to Influence B.l probability.pl /* * * * * * * probability.pl The s e t o f p r o b a b i l i t y a x i o m s f r o m w h i c h we c a n compute many p r o b a b i l i t i e s w i t h o u t much e f f o r t . The r e a l work i s i n t h e l a s t axiom, w h i c h i s B a y e s ' theorem, a n d t h e c a l l t o d y n _ p , where we r e a s o n b y c a s e s . * (The M u l t i p l i c a t i o n Rule f o r p r o b a b i l i t i e s used here * i s r e a l l y s t u p i d . I m p r o v e m e n t s c a n be made h e r e ! ) */ '/.'/, The f i r s t t w o r u l e s a l l o w t h e u s e r t o q u e r y '/,'/, A n d d i s p l a y s t h e a n s w e r n i c e l y , t o b o o t . . . conveniently. p(A) :- p(A,[]). 7,7, F i r s t , c h e c k t o s e e i f t h e c o n d i t i o n i n g p ( A , B ) :- knowledge given_inconsistent(B), write_p(A,B,'undefined'),nl. p(A,B) :- p ( A , B , P ) , write_p(A,B,P),nl. 82 i s consistent... B. Source code to Influence 83 7.7. p (+A, +Given, - P r o b a b i l i t y ) '/,'/, - c a l c u l a t e the p r o b a b i l i t y of l o g i c a l e x p r e s s i o n +A c o n d i t i o n e d VI, by +Given, t o be r e t u r n e d i n - P r o b a b i l i t y 7.7. - i f +A i s a l i s t , then t h i s l i s t i s a conjunct of p r o p o s i t i o n s VI, Check t o see i f i t •/„•/. p ( X , A , P ) 7.7. : - p r ( X , A , P , _ ) . i s a p r i o r t h a t we know i n the 7.7. I f X i s a p r e d e f i n e d node t y p e , then we do i t p(X,A,P) :- built_in_p(X,A,P). database. here. 7.7. Otherwise, we have t o do some c a l c u l a t i o n s . . .through axiom_p p(X,A,P) : - axiom_p(X,A,P). 7.7. I f X i s a q u a n t i f i c a t i o n p(X,A,P) :- quant_p(X,A,P). node... 7.7. I f none of the axioms above can c a l c u l a t e a v a l u e , then 7.7. we must r e s o r t t o r e a s o n i n g by c a s e s , which i s performed 7.7. by the r u l e s c r e a t e d d u r i n g c o m p i l a t i o n , under the 7.7. head of 'dyn_p' . p(X,A,P) : - dyn_p(X,A,P). quant_p(exists(Parameter,Type,Variable),Cond,P) :get_indivs_of_type(Type,Cond,Indivs), i n s t a n t i a t e ( V a r i a b l e , Parameter, I n d i v s , I n s t a n c e s ) , or_p(Instances,Cond,P). quant_p(forall(Parameter.Type.Variable),Cond,P) :- get_indivs_of_type(Type,Cond,Indivs), i n s t a n t i a t e ( V a r i a b l e , Parameter, I n d i v s , and_p(Instances,Cond,P). Instances), 7.7. get_indivs_of_type(Type,Cond,Indivs) get_indivs_of.type(_,[],[]). get_indivs_of_type(Type,[type(Indiv,Type)|Conds],[Indiv|Indivs] •- i • * get_indivs_of_type(Type,Conds,Indivs). get_indivs_of_type(Type,[_|Conds],Indivs) :get_indivs_of_type(Type,Conds,Indivs). ) B. Source code to Influence instantiate (_, _ , [], []). instantiate(Variable, Parameter, Indivs, Instances) :bagof(Variable,member(Parameter,Indivs).Instances). built_in_p(X,L,P) : - and_node(X), i parents(X,PL), and_p(PL,L,P). built_in_p(X,L,P) : - or_node(X), ! » parents(X,PL), or_p(PL,L,P). / * theorem : i f a node X i s a model of 'and', and i t has parents U=-Cul,.. ,un} then p(X|C) = p(ul unlC). Proof: t r i v i a l . :-) */ and_p(L,C,P) :- p(L,C,P). / * theorem: i f a node X is a model of ' o r ' and i t has parents U={ul,..,un} then p(X|C) = p ( u l ; . . . ; u n | C ) = p(ul|C) + p ( u 2 ; . . . ; u n | " u l C)*p(~ul|C) = p(ul|C) + p ( u 2 ; . . . ; u n | ' u l C ) * ( l - p(ul|C)) which i s recursive. Proof: t r i v i a l . */ or_p([],_,0). or_p([A],C,P) : - p(A,C,P). or_p([A|L],C,P) :-) 84 B. Source code to I n f l u e n c e :- p(A,C,Pl), or_p(L,[-A|C],P2), P = PI + P2*(l-Pl). 7.7. axiom_p(+A,+Given,-Prob) 7,7. some axioms for calculating p r o b a b i l i t i e s . 7.7. variables the same as i n p(A,Given,Prob) - - 7.7. Multiplication rule for probabilities 7,7. base case: one conjunct should be computed simply. _ axiom_p([A],B,P) :- !, p(A,B,P). 7.7. Multiplication rule for probabilities 7.7. recursive step: f i n d the probability of the f i r s t 7,7. conjunct, and i f this i s non-zero, f i n d the probability 7.7. of the remainder of the conjuncts conditioned on the f i r s t 7.7. by multiplication. 7,7. - t h i s could be a l o t smarter, I think. 7.7. (by making use of conditional independence, for example) 7,7. (or by using the fact that some variables may have no 7.7. parents) - axiom_p([B|A],C,P) :- !, p(B,C,PEval), eval(PEval,P2), (P2 =:= 0, P = 0; p(A,[B|C],Pl), P = Pl*PEval). 7.7. Negation rule 7.7. - pCAlC) = 1 - p(A|C) axiom_p(~A,B,P) :- !, p(A,B,Pl), P = 1-P1. 7.7. Definiteness rule #1 7.7. - p(A|A) = 1 axiom_p(A,B,p(A,B,l)) :member(A,B),!. 7,7, Definiteness rule #2 7.7. - p(A|-A) = 0 85 B. Source code to Influence axiom_p(A,B,p(A,B,0)) member(~A,B),! . 86 :- 7.7. D e f i n i t e n e s s r u l e #2 f o r m u l t i v a l u e d axiom_p(A=X,B,p(A,B,0)) member(A=Y,B),X \== variables :Y,!. '/,'/, I n c o m p l e t e n e s s r u l e f o r m u l t i v a l u e d v a r i a b l e s '/,'/, - i f X i s n o t i n t h e l i s t o f known v a l u e s f o r '/,'/, A, t h e n sum a l l t h e p r o b a b i l i t i e s f o r a l l t h e v a l u e s 7.7. w h i c h A c a n t a k e , a n d s u b t r a c t t h i s f r o m 1. 7.7. - a s s u m e s sum_p(A,L,B,P) <= 1 7.7. o n l y u s e f u l i f we a l l o w i n c o m p l e t e s p e c i f i c a t i o n o f 7.7. value l i s t s . - axiom_p(A=X,B,Pl) :multivalued_node(A,L), \+ m e m b e r ( X , L ) , sum_p(A,L,B,P), P I = 1 -P. axiom_p(A=X,B,Pl) :multivalued_node(A,_), remove("A=Y,B,BB), p(~A=Y,BB,P2) p(A=X,BB,P3), P i = P3 / P 2 . ) 7.7. B a y e s ' R u l e 7.7. i f there i s evidence i n the given l i s t which i n f l u e n c e s 7.7. t h e p r o p o s i t i o n H d i r e c t l y , t h e n do b a y e s ' r u l e . . . 7.7. - Why? B e c a u s e we h a v e k n o w l e d g e a b o u t p ( E | H ) w h i c h 7,7. i s found i n the p r i o r s . - a x i o m _ p ( H , L , P ) :remove(E,L,K), influences(H,E), i • » p(E,[H|K],PEval), eval(PEval,Pl), ( P I =:= 0, P = 0 ;p(H,K,P2), p(E,K,P3), B. Source code to Influence (P3 = 0, ! , format('Error: Division by zero: ~w."n',[p(E,K,P3)] ) , f a i l ;P = PEval * P2 / P3)). 7.7. Auxiliary predicate sum_p used in Indefiniteness rule for 7.7. multivalued variables. 7.*7.sum_p(_, [] ,_,0). sum_p(A,[V],B,P) p(A=V,B,P). :- sum_p(A,[VlValues],B,P) : sum_p(A,Values,B,PI), p(A=V,B,P2), P = PI + P2. B.2 expand.pl /* * expand * * * */ — expand for multi-valued and binary variables. no independence or any other clever t r i c k . expands propositional variables in the base manner expands multivalued variables as i f X=xl were a proposition y////////////////////:/:///;///////////////;///.v.y;/x/////////.y// 7./x .// '/.'/. expand( +A, +T, +L, +CV, -Body, -P) 7, 7//.7.7;/.r/.7.7.7.7//////////.7.7.7.7.7.7////////.7X/J.7.7////.7.7.7.7.7.7.7. 7.7. expansion of binary proposition A on parents in T VI, Variables: 7.7. A i s the variable around which we expand 7.7. +T i s a truth table entry for A; i n i t i a l l y empty l i s t ! 7.7. L i s a l i s t of immediate parents of A, from which T i s 7,7. constructed recursively 7.7. +CV i s a l i s t of conditioning variables, not necessarily VI, parents of A 7.7. -Body i s the body of the rule which computes VI, subcalculations for p(A|CV) 7,7. P i s the algebraic expression which ' i s ' must use 7,7. to compute p(A|CV) from the subcalcs done in -Body + + _ 7.7. 87 B. Source code to Influence '/.'/. p(A|T CV) = p(A|Pred T CV)p(Pred|T CV) + p(A|"Pred T CV)p("Pred|T CV) 7.7. Method 1: (base case) 7.7. - the parents of A which were i n L have been transferred to '/,•/, the truth table l i s t T, leaving L = [ ] . '/,'/, - i n t h i s case, we need the prior for this configuration T 7,7, - ask Prolog f o r the prior, which i n turn may ask the user, 7,7. i f there i s no such prior i n the database. 7.7. returns the prior, as well as the Body 'true', meaning 7.7. that there i s no associated calculation necessary to 7,7. compute the prior ( i t ' s a given quantity) - expand(A,T,[],_,pr(A,T,P,_),P) . 7.7.expand(A,T,[] ,_,true,pr(A,T,P)) :7.7.pr(A,T,P,_). 7.7. Method 2: (multiple valued variables) 7.7. - i f the f i r s t variable i n the truth table i s a multivalued variable, 7.7. handle i t separately. expand(A,T,[ZlPred],CV,Body ,P) :multivalued_node(Z,Values), i expand.mult i(Z,Value s,A,T,Pred,CV,Body,P). 7.7. Method 3: 7.7. - to expand a binary variable, construct the truth table 7.7. f o r i t by recursively c a l l i n g expand, once with the 7.7. f i r s t remaining parent Z, and once with Z negated. 7,7. - return as the Body the sub-calculation of P(Z|CV), and 7.7. the reasoning-by-cases formula which uses t h i s value expand(A,T,[ZlPred],CV,Body ,(E1*PZ+E2*(1-PZ)) ) :expand(A,[Z|T],Pred,[Z|CV],Cl,El), expand(A,["Z|T],Pred,["Z|CV],C2,E2), simp_body((Cl,C2,p(Z,CV,PZ)).Body) . 7.7. expand_multi(+Multi, +Values, +A, +T, +L, +CV, -Body, -P) :7.7. similar to expand but f o r multivalued variables 7.7. +Multi i s the multiply valued variable i n question 7.7. +Values i s the l i s t of values for Multi 7.7. A i s the variable we are expanding around (as i n expand) 7.7. +T, +L, +CV, -Body, -P are a l l exactly as i n expand 7,7. Basic description: f o r each value i n +Values we f i n d 7,7. the expansion of Multi=value; kinda breadth-wise 7.7. (we have to use each value here, whereas i n expand + 88 B. Source code to Influence 7.7. 7.7. 7.7. 89 i t s u f f i c e d t o u s e p(Z|CV) and c a l c u l a t e p("Z|CV) a s (1 - p ( Z | C V ) ) . When Z i s m u l t i v a l u e d , t h i s d o e s n ' t work. 7,7. M e t h o d 1: (base case) 7.7. i f we h a v e o n l y one v a l u e l e f t i n t h e l i s t o f v a l u e s , t h e n 7,7. c a l l expand t o c o n t i n u e the expansion t o f u l l depth 7.7. - r e t u r n a s Body t h e s u b c a l c u l a t i o n c a l l t o p ( M u l t i = V a l u e |CV), 7.7. and as the n u m e r i c a l e x p r e s s i o n the product o f t h i s and 7,7. t h e e x p r e s s i o n r e t u r n e d by expand. - e x p a n d . m u l t i ( M u l t i , [ V a l u e ] , A, T, P r e d , CV, B o d y , (PM * E ) ) :expand(A,[Multi=ValueIT].Pred,[Multi=Value|CV],B,E), simp_body((p(Multi=Value,CV,PM),B).Body). 7.7. M e t h o d 2: 7.7. - r e c u r s i v e l l y e x p a n d A u s i n g e a c h v a l u i e o f M u l t i i n t h e 7.7. list 7.7. - t h e b o d y r e t u r n e d i s t h e c o n j u n c t i o n o f t h e s u b c a l c s 7.7. and the e x p r e s s i o n i s a simple a d d i t i o n . Values e x p a n d _ m u l t i ( M u l t i , [ V a l u e I V a l u e s ] , A, T, P r e d , CV, B o d y , ( E l + E2)) expand.mult i ( M u l t i , V a l u e s,A,T,Pred,CV,Bl,El), expand.multi(Multi,[Value],A,T,Pred,CV,B2,E2), simp_body((Bl,B2), Body). 7.7. 7.7. 7.7. 7.7. 7.7. 7.7. - Rulify(+H,+B,-Rule) c r e a t e s a Rule t o c a l c u l a t e p(H| Anything) by u s i n g r e a s o n i n g by cases. +H i s t h e v a r i a b l e w h i c h i s t o b e r u l i f i e d + B i s the l i s t o f immediate parents of H s h o u l d be a s s e r t e d . -Rule i s a p r o l o g r u l e which 7.7. M e t h o d 1: m u l t i p l e v a r i a b l e s . 7.7. - c r e a t e s a r u l e w h i c h h a n d l e s o n l y one v a l u e 7.7. o f the v a r i a b l e . 7.7. - u s i n g b a c k t r a c k i n g , a l l t h e r u l e s w i l l be g e n e r a t e d 7.7. ( s o be s u r e t o ' f a i l ' a f t e r a c a l l t o r u l i f y t o g e t 7.7. a l lthe appropriate r u l e s ) . 7.7. I s s u e : we t r e a t e a c h ' H = v a l u e ' t h e same a s a b i n a r y v a r i a b l e r u l i f y ( H , B , ( d y n _ p ( H = V , C , P ) :- B o d y ) ) :multivalued_node(H,.Values),!, 7,*7. remove(V,Values,_), expand(H=V,[],B,C,R,E), simp_body((R, P = E ) , Body). :- B. Source code to Influence 90 '/,'/, Method 2: Binary values '/,'/, - creates a rule for binary variables '/,'/, - doesn't succeed on backtracking •/.*y.rulify(H,B,((dyn_p(H,C,Combine) : - Body))) : y.*'/.binary_node(H) , '/.•'/.bagof (type(Param.Type) .remove(type(Param,Type) ,B,_) . T L i s t ) , •/.••/.remove.KTList.B.NewB), '/.*y.expand(H, [] .NewB.C.R.E) , •/,*y.simp_body((R, P = E ) , Prebody), '/.•"/.Body = (bagof (combine_p(Prebody,P,M) .mapmember(TList,C,M) .Combine) , y,*y,combine_p(H,C,Combine)), */.*•/.!. rulify(H,B,((dyn_p(H,C,P) : - Body))) binary_node(H), expand(H,[],B,C,R,E), simp_body((R, P = E ) , Body). :- •/.•/, simp_body(+01d, -New) '/,'/, - does some simplemided simplification '/,'/, on the rules generated by expand, or r u l i f y . •/.'/, - easy simp_body((true,P),PS) :- i simp_body(P,PS). simp_body((P,true),PS) :- i simp_body(P,PS). simp_body((A;B),(AS;BS)) • » simp_body(A,AS), simp.body(B,BS). :- simp_body((A,P),(AS,PS)) :- i simp_body(A,AS), simp_body(P,PS). simp_body(P,P). B. Source B.3 code to Influence network.pl •/.•/, add_to_net(Parents,Child) add_to_net(Parents.Child) :- l i s t i f y ( P a r e n t s , L ) , add_parents(Child,L), add_children(L,Child). '/.•/.(member(exists(X,T,V) ,L) '/,'/-> add_parents(exists(X,T,V), [V]) %%; true), 7.7.(member(forall(X,T,V) ,L) 7.7.-> add.parents(forall(X,T,V), [V]) 7.7.; true) . add_parents(A,P) :- (retract(st_node(A,S,Pl,C)) -> append(P,PI,Pall), assert(st_node(A,S,Pall,C)) ; assert(st_node(A,[bin],P,[]))), assert_check(chgd(A)). add_children([],_). add_children([P|Pl].Child) :- add.child(P,Child), add_children(Pl.Child),!. add_child(A,C) :- (retract(st_node(A,S,P,Cl)) -> assert(st_node(A,S,P,[C|C1])) ; assert(st_node(A,[bin],[],[C]))). 7.7.assert_check(chgd(A)). remove_from_net(Parent.Child) :- remove_child(Parent.Child), remove_parent(Parent.Child). remove_child(P,C) :- children(P,L), remove(C,L,NewL) -> retract(st.node(P,S,GP, L)), assert(st.node(P,S,GP.NewL)) ; formatC'Can't delete "w from child l i s t of "w!"n", [C.P]). remove_parent(P,C) :- parents(C,L), 91 B. Source code to Influence 92 remove(P,L,NewL) -> retract(st_node(C,S,L,GC)), assert(st_node(C,S,NewL,GC)), assert_check(chgd(C)) ; format("Can't delete "w f r o m p a r e n t l i s t children(A,L) :- st_node(A,_,_,L). parents(A,L) :- st_node(A,_,L,_). node(A) :- st_node(A,_,_,_). B.4 priors.pl all_priors(_,[],L) :- asserta_list(L). all_priors(Var,[new_prior(A,B,P)|L],L2) write_p(A,B,P), n l , write('Change t h i s value? ' ) , read(Ans), (prior(A,B,P).retract(prior(A,B,P)) ;new_prior(A,B,P),retract(new_prior(A,B,P))), (Ans == y , assert_check(chgd(Var)), ask_prior(A,B,Pl), all_priors(Var,L,[new_prior(A,B,Pl)|L2]) ;all.priors(Var,L,[new.prior(A,B,P)|L2] ask_prior(Prop,L,P) :- n e w _ p r ( P r o p , L , P , L 2 ) , ! , retract(new_prior(Prop,L2,P)). ask_prior(Prop,L,P) :- \+ \+ w r i t e _ p r o b a b i l i t y ( P r o p , L ) , read(P). get_priors(Prop,_,_) )). o f ~w!~n", [P,C]). B. Source code to Influence :- h i d d e n i . n o d e ( P r o p ) , ; get_priors(Prop,[],L) :- m u l t i v a l u e d _ n o d e ( P r o p , V a l u e s ) , i each(( remove(V,Values,_), ask_prior(Prop=V,L,P), */,'/, u n n u m b e r v a r s ( p r i o r ( P r o p = V , L , P ) , N P ) , asserta(prior(Prop=V,L,P)))). g e t _ p r i o r s ( P r o p , [ ] , L ) :ask_prior(Prop,L,P), 7.7. u n n u m b e r v a r s ( p r i o r ( P r o p , L,P) , N P ) , asserta(prior(Prop,L,P)). get_priors(Prop,[A I List],List2) :- m u l t i v a l u e d _ n o d e ( A , V a l u e s ) , * 9 each. ( ( r e m o v e ( V , V a l u e s , . ) , get_priors(Prop,List,[A=V|List2] ))). 7 . * 7 . g e t _ p r i o r s ( P r o p , [A | L i s t ] ,List2) 7.*7.:- b l o c k s ( A , B 2 , P r o p ) , 7.*7.remove(B2.List . N e w L i s t ) , 7.*7.get_priors ( P r o p , L i s t , [ A | L i s t 2 ] ) , 7.*7.get_priors ( P r o p , N e w L i s t , ["A | List2] ) . 7 . * 7 . g e t _ p r i o r s ( P r o p , [ t y p e ( _ , _ ) | L i s t ] ,List2) 7.*7.:~ ! , g e t . p r i o r s ( P r o p , L i s t ,List2). get_priors(Prop,[AlList],List2) :- g e t _ p r i o r s ( P r o p , L i s t , [ A | L i s t 2 ] ) , get_priors(Prop,List,[~A|List2]). change_priors(A) :- m u l t i v a l u e d _ n o d e ( A , _ ) , (setof(new_prior(A=V,B,P),prior(A=V,B,P),Bag) ;setof(new_prior(A=V,B,Pl),new_prior(A=V,B,Pl),Bag)), all_priors(A,Bag,[]). change_priors(A) :- ( s e t o f ( n e w _ p r i o r ( A , B , P ) , p r i o r ( A , B , P ) , B a g ) ;setof(new_prior(A,B,Pl),new_prior(A,B,Pl),Bag)), 93 B. 94 Source code to I n f l u e n c e all_priors(A,Bag,[]). assign_priors(A=V,L,P) :- m u l t i v a l u e d _ n o d e ( A , V a l u e s ) , member(V,Value s ) , ( p r i o r ( A = V , _ , _ ) , f o r m a t ( ' P r i o r s e x i s t f o r ~w. ;assert(new_prior(A=V,L,P))), j . assign_priors(A,L,P) :- b i n a r y _ n o d e ( A ) , ( p r i o r ( A , _ , _ ) . f o r m a t ( ' P r i o r s e x i s t f o r "w. ;assert(new_prior(A,L,P))), ; l i s t _ p r i o r s :prior(A,L,P), write_p(A,L,P), fail. l i s t _ p r i o r s :new_prior(A,L,P), write_p(A,L,P), fail. list_priors. pr(A L,pr(A L,P),L) ) ) :- prior(A,L2,P), match(L,L2),!. new.pr(A,L,P,L2) :- new_prior(A,L2,P), match(L,L2),!. B.5 combine.pl p(A,C,P) :- g r o u n d ( A , C , G r o u n d e d A ) , combination_model(A,Model), combine_p(Model,GroundedA,C,P). Use Use "change "change ~w".~n',[A,A]) ~w"."n',[A,A]) B. Source code to Influence ground ( [ ] , [ ] ) . ground([A|L],L1) :- g r o u n d ( A , N e w A ) , ground(L.NewL), append(NewA,NewL,Ll). ground(A,[A]) :- atom(A). ground(A,[A]) :- f u n c t o r ( A , N a m e , A r g s ) , all_grounded(A,Args). ground(A,GA) :- f u n c t o r ( A , N a m e , A r g s ) , first_not_grounded(A,Args,N), collect_individuals(A,N,New_A), ground(New_A,GA). collect_individuals(Prop,Arg,List) all_grounded(A,N) :- f i r s t _ n o t _ g r o u n d e d ( A , N , 0 ) . first_not.grounded(A,0,0). f irst_not_grounded(A,N,N) :- a r g ( N , A , X ) , var(X). f irst_not_grounded(A,N,M) :- a r g ( N , A , X ) , \+ v a r ( X ) , NI i s N - l , first_not_grounded(A,Nl,M). B.6 '/,'/, From combined.pl preprocess.pl dir_infl(A,B) 95 B. Source code to Influence :- c o n s i s t e n t ( A , B ) , list(A,P), collect_free(P,NewP,Params), combine(NewP,B,Params). combine(P,B,[]) :- ! , a d d _ t o _ n e t ( P , B ) . combine(P,B,Params) :- f u n c t o r ( B , N a m e , A r g s ) , gensym(Name.NewName), length(Params,L), NewArgs i s A r g s + L , functor(NewB,NewName,NewArgs), declare_combined(Name/Args), declare_bin(NewName/NewArgs), declare_hidden(NewName/NewArgs), unify_args(B,Args,NewB,NewArgs,Params), add_to_net(P,NewB), add_to_net([NewB|Params],B). unify_args(_,0,_,_,[]). unify_args(Old,0,New,NewArgs,[type(X,_)I OtherArgs]) :- a r g ( N e w A r g s , N e w , X ) , N e x t A r g i s NewArgs - 1, unify_args(Old,0,New,NextArg,OtherArgs). unify_args(Old,Args,New,NewArgs,OtherArgs) :- a r g ( A r g s , O l d , X ) , arg(Args,New,X), N e x t A r g i s A r g s - 1, unify_args(Old,NextArg,New,NewArgs,OtherArgs). c o l l e c t . f ree ( [ ] , [ ] , [ ] ) :- ! . collect_free(P,New,[type(X.Type)|Params]) :remove(type(X,Type),P,NewP), i • > collect_free(NewP,New,Params). collect_free(P,P,[]). '/•'/• ******************* '/,'/, From p r o b a b i l i t y . p l •/,*/, ******************* 96 B. 97 Source code to I n f l u e n c e '/,'/, I f t h e n o d e i s p a r a m e t e r i z e d a n d h a s ' f r e e ' p a r a m e t e r s , '/,'/, we h a v e t o d e a l w i h t i t s p e c i a l l y . . . p(A,B,C) :- c o m b i n e d _ p ( A , B , C ) . combined_p(X,L,P) :- c o m b i n e d _ n o d e ( X ) , i • > parents(X,[Hidden I Types]), collect_types(Hidden,Types,L,Combine), or_p(Combine,L,P). collect_all_types(L,[],_,L). collect_all_types([],_,_,[]). collect_all_types([HiddenI Others].Types,Given,Out) :- c o l l e c t _ t y p e s ( H i d d e n , T y p e s , G i v e n , O u t l ) , collect_all_types(Others.Types.Given,Out2), append(Outl,0ut2,0ut). collect_types(Hidden,_,_,[Hidden]) :- i n s t a n t i a t e d ( H i d d e n ) . collect_types(Hidden,Types.Given,Out) :- p a r e n t s ( H i d d e n , P ) , mapmember(P,Given,Outl), remove(Outl.Given,NGiven), collect_types(Hidden,Types,NGiven,Out). collect_types(Hidden,[type(Z.Type)iTypes].Given,Out) :bagofi(type(Z,Type),Hid),(member(type(Z,Type).Given), Hidden = Hid),Comb), extract_hid(Comb,Extract), collect_all_types(Extract.Types.Given,Out). instantiated(F) :- f u n c t o r ( F , _ , A r i t y ) , instantiated(F,Arity). instantiated(_,0). instantiated(F,Arity) :- a r g ( A r i t y . F . X ) . \+ v a r ( X ) , AA i s A r i t y - 1, instantiated(F,AA). then Source code to Influence B. B.7 98 consistent.pl TI. c o n s i s t e n t ( + A , + B ) '/,'/, - c o m p l a i n i f t h e u s e r t r i e s a n y k i n d o f s t u p i d i t y '/.'/. +A, w h i c h may be a l i s t , a r e i n t e n d e d t o be d i r e c t p a r e n t s o f +B '/,'/, - we d o n ' t want d i r e c t e d c y c l e s i n o u r n e t w o r k ! - consistent((A,L),B) consistent(A,B), consistent(L,B). consistent(A,B) :- \+ \+ ( n u m b e r v a r s ( ( A , B ) , 0 , _ ) , A = B ) , !, f o r m a t ( ' E r r o r : "w c a n n o t i n f l u e n c e i t s e l f . ~ n ' , [ A ] ) , consistent(A,B) :- p a r e n t s ( B , L ) , member(A,L), i • > format('Error: fail. A l r e a d y know ~w => ~ w . ~ n ' , [ A , B ] ) , consistent(A,B) :- c h i l d r e n ( B , L ) , member(A,L), i • i format('Error:Already fail. know ~w => ~ w . ~ n ' , [ B , A ] ) , consistent(A,B) :- infKB.A), ! » format('Error: fail. ~w i n f l u e n c e s consistent(_,_). g i v e n _ i n c o n s i s t e n t ( [ _ ] ) :- f a i l . g i v e n _ i n c o n s i s t e n t ( [ " A | L ] ) :member(A,L). g i v e n _ i n c o n s i s t e n t ( [ A | L ] ) :member("A,L). g i v e n _ i n c o n s i s t e n t ( [ _ | L ] ) :given_inconsistent(L). "w already.~n',[B,A]), fail. B. Source code to Influence B.8 99 done.pl process_and_compile :- e a c h ( ( c h g d ( A ) , (multivalued_node(A,_) -> r e t r a c t ( p r i o r ( A = _ , _ , _ ) ) , r e t r a c t ( ( d y n _ p ( A = _ , _ , _) retract(prior(A,_,_)), r e t r a c t ( ( d y n _ p ( A , _ , _ ) :each( retract(st_infl(A,B))), each(( st_node(A,_,_,_), st_node(B,_,_,_), infl(A,B), _)) ))) ) ) , assert(st.infl(A,B))) ) , B.9 each(( chgd(A), parents(A,L), get_priors(A,L,[] ) ) ) , •/.•/,build_prior_str(A,PS), '/.'/.assert(PS) ) ) , each(( chgd(A), '/,*/. ( b i n a r y _ n o d e ( A ) ; m u l t i v a l u e d _ n o d e ( A , _ ) ) , parents(A,L), reversed.,RL), rulify(A,RL,R), assert(R) ) ) , each( retract(chgd(_))), each( retract(new_prior(_,_,_))). infl.pl •/.•/. d i r _ i n f l ( A , ( B , C ) ) ll :- !, ll d i r _ i n f l ( A , B ) , •/.*/. d i r _ i n f l ( A , C ) . B. Source code to Influence '/.*/. d i r _ i n f l ( A , B ) '/,'/, :- c o n s i s t e n t (A, B ) , •/,•/, ! , a d d _ t o _ n e t ( A , B ) . dir_infl(A,B) :- c o n s i s t e n t ( A , B ) , !,add_to_net(A,B). delete_influences([],_) :- ! . delete_influences(_,[]) :- ! . d e l e t e _ i n f l u e n c e s ( [ A IL] ,B) :- d e l e t e _ i n f l ( A , B ) , delete_influences(L.B). delete_influences(A,[BIL]) :- d e l e t e _ i n f l ( A , B ) , delete_influences(A.L). delete.inf1(A,B) :- n o d e ( A ) , node(B), remove_from_net(A,B), assert_check(chgd(B)). i n f l ( A , " B ) :infl(A.B). i n f l ( A = _ , B ) :infl(A.B). infl(A,B=_) :- infl(A,B). i n f l ( A . B ) :children(A,L), member(B,L). i n f l ( A , B ) :children(A,L), member(C,L), infl(C,B). 100 B. Source code to Inf luenc i n f l u e n c e s ( A , ~ B ) :influences(A.B). i n f l u e n c e s ( A = _ , B ) :influences(A.B). i n f l u e n c e s ( A , B = _ ) :influences(A,B). '///.influences (A ,B) :y.'/.childrenCA.L), '/,7,member(B,L). '///.influences(A,B) :'///.children(A,L), '///.member(C,L), '///.influences(C,B). influences(A.B) st_infl(A,B). »
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Dynamic Bayesian networks
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Dynamic Bayesian networks Horsch, Michael C. 1990
pdf
Page Metadata
Item Metadata
Title | Dynamic Bayesian networks |
Creator |
Horsch, Michael C. |
Publisher | University of British Columbia |
Date Issued | 1990 |
Description | Given the complexity of the domains for which we would like to use computers as reasoning engines, an automated reasoning process will often be required to perform under some state of uncertainty. Probability provides a normative theory with which uncertainty can be modelled. Without assumptions of independence from the domain, naive computations of probability are intractible. If probability theory is to be used effectively in AI applications, the independence assumptions from the domain should be represented explicitly, and used to greatest possible advantage. One such representation is a class of mathematical structures called Bayesian networks. This thesis presents a framework for dynamically constructing and evaluating Bayesian networks. In particular, this thesis investigates the issue of representing probabilistic knowledge which has been abstracted from particular individuals to which this knowledge may apply, resulting in a simple representation language. This language makes the independence assumptions for a domain explicit. A simple procedure is provided for building networks from knowledge expressed in this language. The mapping between the knowledge base and network created is precisely defined, so that the network always represents a consistent probability distribution. Finally, this thesis investigates the issue of modifying the network after some evaluation has taken place, and several techniques for correcting the state of the resulting model are derived. |
Subject |
Probabilities Artificial intelligence Reasoning |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-10-04 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051146 |
URI | http://hdl.handle.net/2429/28909 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1990_A6_7 H67.pdf [ 5.09MB ]
- Metadata
- JSON: 831-1.0051146.json
- JSON-LD: 831-1.0051146-ld.json
- RDF/XML (Pretty): 831-1.0051146-rdf.xml
- RDF/JSON: 831-1.0051146-rdf.json
- Turtle: 831-1.0051146-turtle.txt
- N-Triples: 831-1.0051146-rdf-ntriples.txt
- Original Record: 831-1.0051146-source.json
- Full Text
- 831-1.0051146-fulltext.txt
- Citation
- 831-1.0051146.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051146/manifest