Shielding Against Conditioning Side Effects in Graphical Models by Mark Anthony Crowley B.A., York University, 1999 A THESIS SUBMITTED IN PARTIAL F U L F I L M E N T OF T H E REQUIREMENTS F O R T H E D E G R E E OF Master of Science in The Faculty of Graduate Studies (Computer Science) The University of British Columbia October 2005 © Mark Anthony Crowley 2005 ii Abstract When modelling uncertain beliefs with graphical models we are often presented with "natural" distributions that are hard to specify. An example is a distribution of which instructor is teaching a course when we know that someone must teach it. Such distributions over a set of nodes can be easily described if we condition on a child of these nodes as part of the specification. This conditioning is not an observation of a variable in the real world but by fixing the value of the node, existing inference algorithms perform the calculations needed to achieve the desired distribution automatically. Unfortunately, although it achieves this goal it has side effects that we claim are undesirable. These side effects create dependencies between other variables in the model. This can lead to different beliefs throughout the model, including the constrained variables, than would otherwise be expected if the constraint is meant to be local in its effect. We describe the use of conditioning for these types of distributions and illuminate the problem of side effects, which have received little attention in the literature. We then present a method that still allows specification of these distributions easily using conditioning but counterbalancing side effects by adding other nodes to the network. iii Table of Contents Abstract Table of Contents List of Tables List of Figures Acknowledgements 11 iii v • • vi viii 1 Introduction 1 2 Background 2.1 Variables, Nodes and Sets 2.2 Bayesian Networks 2.2.1 Inference in Bayesian Networks 2.2.2 Variable Elimination 2.2.3 Junction Tree Algorithm 2.3 Related Work 4 4 4 5 6 8 8 3 Using Conditioning to Induce Distributions 3.1 Method 1 : Directed Clique 3.2 Method 2 : Chain of Sufficient Statistics 3.3 Method 3 : Conditioning 10 11 13 14 4 Side Effects of Conditioning 4.1 Modelling Ancestors with a Directed Clique 4.2 Modelling Ancestors with Conditioning 4.2.1 Descendants 19 20 21 27 5 Shielding Against Side Effects . . . 5.1 Conditioned, Effect and Shield Sets 5.1.1 Affected Network 5.2 Defining Shielding 5.2.1 Anti-nodes 5.2.2 Anti-factors 28 28 29 30 30 30 ; Table of Contents iv 6 Cloned Anti-networks 6.1 Definition 6.2 Junction Tree Complexity 6.3 A General Solution 6.3.1 Base Example 6.4 General Formulation 39 39 40 42 42 45 7 Finding a Solution 7.1 Constrained Optimization Background 7.1.1 The K K T Conditions 7.1.2 Sequential Quadratic Programming 7.2 Reformulation of Problem 7.3 Example Networks 49 49 49 50 50 51 8 Conclusion 8.1 Open Problems Bibliography ' 62 62 64 V List of Tables 3.1 5.1 Marginal probabilities for teaching variables prior and posterior to the application of the constraint 12 CPDs can lead to a factor containing zeros 34 vi List of Figures 2.1 2.2 2.3 Structures that allow effects to pass through . . Structures that do not allow effects to pass through A n example Bayesian network with node B conditioned to true. . 5 6 7 A Bayesian network with a directed clique to model example 3. . Bayes net for example 3 modelling the joint distribution of nodes T , 1 and T . 3.3 Conditional probability tables for directed clique 3.4 Beliefs after inference for directed clique shown in Netica. Probabilities expressed as percentages 3.5 A Bayesian network using a chain of sufficient statistics to model example 3 3.6 Conditional probability tables for chain method 3.7 Beliefs after inference for chain shown in Netica 3.8 A Bayesian network using a conditioned node C to model example 3 3.9 Conditional probability tables for Bayes net with conditioning. . 3.10 Beliefs after inference using a conditioned node to model example 3 11 3.1 3.2 A 4.1 4.2 h c 4.7 Bayesian network modelling example 1 with a directed clique. . . Conditional probability tables using the clique method to model example 4 Resulting beliefs after inference is performed with a clique. . . . Model using a conditioned node C for example 1 C P D s for conditioned model are simply the given prior distributions and constraint Resulting beliefs after inference is performed. Side effects of conditioning lead to a different distribution Cindy's disinterest in A l has different effects 5.1 5.2 5.3 Shield and effect sets of nodes are: Sc = {J, K), E c = {L, M) . A first try at shielding with anti-nodes A factor produced for a c-node has a corresponding anti-factor 4.3 4.4 4.5 4.6 on its s-nodes 5.4 C P D for the antifactor C for example 1 '. 12 13 13 14 15 15 16 17 17 20 22 23 23 24 25 26 29 31 32 35 List of Figures 5.5 5.6 Posterior distributions after inference with conditioning and a shielding anti-factor node C for example 1 36 Antifactors can cause significant increase in the size of cliques in the junction tree if SQ is large 38 6.1 6.2 Effect sets in a Bayesian network with an anti-network Adding an anti-network and its effects on triangulation of the network 6.3 A more complex network showing how connectivity between enodes and s-nodes can lead to clique almost twice as large as those present without the anti-network 6.4 Bayesian network with cloned anti-network 6.5 A general Bayesian network structure with anti-network 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 vii Computed CPDs for anti-network found using nonlinear constrained optimization Computed beliefs after inference for a shielded network using a cloned anti-network for example 4 Distribution after conditioning TB = true using an anti-factor. . Distribution after conditioning TB = true using an anti-network. Distribution after observing Ic = false using an anti-factor. . . . Distribution after observing Ic = false using an anti-network. . Observing D = true reduces our belief that Alice is interested in A l , this influences the distribution of the Tx nodes but Bob and Cindy's interests remain shielded. E also maitains its original distribution. The antinetwork does not need to be recomputed to add D Observing E = true reduces our belief in the desire to teach for Bob and Cindy, this influences the distribution of the Tx nodes but Alice's interest is unaffected. D also maintains its original distribution. The antinetwork does not need to be recomputed to add E Bayesian network containing two c-nodes, each shielded seperately. Notice that the beliefs for Tx, Wx and Ix nodes are the same as before 40 41 43 44 46 52 54 55 56 57 58 59 60 61 viii Acknowledgements Gratitude is something I'm full of right now, many people have contributed to this thesis though they may not all know it. To my parents, you have made me who I am. My love of knowledge, desire to do things the right way or not at all and dependancy on chocolate are all because of you and I couldn't have a better, more loving pair of parents. You always support me, even if my choices don't always seem to make too much sense at first. Thank you. To my wife Lily, you know I wouldn't have been able to do this without you, and believe me I know it too. You motivate me and you calm just me just when I need it, I know I'm the luckiest man in the world to have you. To the many people around LCI that have helped me with my ideas over the past year especially Kevin Murphy, Kevin Leytoft-Brown, Mike Klaas, Michael Chiang, Rita Sharma and Jacek Kisynski. You may not realize how much your comments helped, but I often think to questions and observations each of you has said, and they've all been helpful. To Brent Boerlage, your question got this all started so I defmetly wouldn't be here without that, and your help writing the UAI paper was invaluable in the development of the ideas behind this paper. To my supervisor David Poole, your insightful ideas and piercing questions always stir new courses of inquiry and exciting solutions. I have been honoured to be able to work you and look forward to continuing to work on all those great ideas locked up in your mind. And to the complexity and beauty that fills the universe, that is the source of all interesting ideas, including life. I know in some way I have been helped along so far on this swerving road of life by your truth, so, thank you too. To Lily, For Lily. 1 Chapter 1 Introduction The essence of knowledge is, having it, to apply it; not having it, to confess your ignorance. ~ Confucius In directed graphical models conditioning of nodes to particular values usually corresponds to an observation of the value of some variable in the real world. There is another use of conditioning, however, that is often applied but seldom discussed in the literature. T h a t is, to specify joint distributions or constraints across the parents of the conditioned node. This method is a straightforward way to specify constraints as many computations are handled automatically by the existing inference algorithms for graphical models. This thesis discusses the properties of conditioning including the possibly undesired creation of dependencies in the network not implied by the original constraints called conditioning side effects. We present a method to counteract side effects efficiently while maintaining the effectiveness and expressive simplicity of conditioning. The nature of this topic is intimately tied to the language and rules of graphical models and Bayesian probability. As such, it is difficult to delve deeply into a definition until the appropriate background material has been covered, this will be done in chapter 2. For now, we begin with a simple, intuitive example to provide an idea of the topic and some general motivation for why it is important. Example 1 Consider a model of three university instructors, Alice, Bob and Cindy, and an Artificial Intelligence' (Al) course. The model has the following components: 1. An instructor's interest in the topic of A I: The binary variable Ix represents whether or not instrxictor X is interested in the topic of Al. 2. Whether an instructor wants to teach the Al course: This is modelled by the binary variable W~x for instructor X. The belief that X wants to teach the course is dependant upon their interest in the topic, Ix3. A constraint on teaching : At least one instructor must teach the course. 4- Whether an instructor will teach the course: This is modelled by the binary variable T y for instructor X. Our belief that X will teach the course is dependent upon whether the instructor actually wants to teach it, Wx, and whether someone else is already teaching the course. Chapter 1. Introduction 2 5. Research productivity of instructors : This is modelled by the binary variable Rx for instructor X. If Rx is true it means that instructor X will complete their research goals for this semester. Our belief in the value of , Rx is dependent upon whether or not they end up teaching the AI course, Tx, which will reduce the time they can spend on research. The frame of reference for this work is the knowledge engineering task of the person who must model a distribution such as the one above. The distributions we are in interested have the following two distinct components across some subset of the variables: undirected component : This is a constraint on the values of the set of variables, or any type of undirected, joint distribution over them. In example 1 this type of distribution is specified in point 3 as a simple "or" of the states of the variables TA,TB and Tc- This is a simultaneous constraint. It is fundamentally not a directed relationship, the teaching status of each instructor affects all of the others. directed component : Each constrained variable also has a directed component which specifies its distribution when the constraint is satisfied by some other variable. The directed component of the model of these variables is expressed in the example by points 4. The goal is to combine these two components to produce a coherent model. We show in chapter 3 that there are ways to combine the two by forcing a directionality onto the undirected component. We argue that conditioning is a much more natural method that is very likely to be used by knowledge engineers. This method takes advantage of existing properties of graphical models to easily specify these distributions. It involves creating an artificial variable,fixedto one value to induce the constraint onto the variables. We show how this technique can be used in directed graphical models, such as Bayesian networks. We also highlight a problem with this technique which is not discussed in the literature. If the constrained variables are conditionally dependent on other variables in the model then conditioning creates new dependencies between all of the influencing variables. These dependencies are not necessarily implied by the desired distribution and are called side effects of conditioning. In example 1 the use of conditioning to combine points 3 and 4 creates dependencies amongst the Ix and Wx variables. This could allow us to reduce our belief that Bob is interested in Al if we find out that Cindy is interested in Al. But the distribution for example 1 does not support this inference. Cindy and Bob's interests are unconditionally independent from each other. Only the teaching variables are constrained. .It is perfectly possible that they are all interested in Al or that they all find it quite dull. The new dependencies tell us however that conditioning induces a belief that their is a kind of total level of interest in Al. If some instructors are less interested then it must be that others are more interested, since one of them must teach the course. This is not supported by the desired distribution for example 1. The probabilities for teaching are already conditioned on the fact that the constraint is satisfied by someone else. Chapter 1. Introduction 3 The contribution of this thesis is to illuminate this subtle problem and provide a solution for it. We show that a conditioned node used in this way is fundamentally different from a normal node with evidence entered that corresponds to the observation of some variable. It turns out that in the types of distributions we've described it is often the case that the constraint is meant to remain local in its effect. In chapter 3 we will define the method of conditioning used to induce these constraints or undirected distributions. The notion of side effects of conditioning is explored fully in chapter 4. In chapter 5 we define how side effects can be counteracted through a process called shielding. A technique for implementing this shielding without unnecessary cost to the complexity of inference is presented in the form of cloned anti-networks in chapter 6. This is followed by a chapter containing implementation details and results on several example networks. 4 Chapter 2 Background We now provide a brief overview of notation used in this work and probabilistic graphical models. If you are already familiar with graphical models you may want to only skim this chapter. For others it hopefully provides a good introduction to the area. 2.1 Variables, Nodes and Sets We begin with the building blocks of all graphical models, random variables. These are represented as circular nodes in a graph. Each variable, X, in the graph has a domain of possible values it can take on, dom(X) Random variables, or nodes in a network, are presented in capitalized italics whereas sets of nodes are in boldface, as in e C . Specific values of nodes are shown in lowercase italics, such as c; to mean C = Ci. When the domain of a variable is Boolean we simply use the lowercase name for true and false as in c, and Inotci to mean C = true and C = false respectively. For a set of nodes X = {Xi,..., X } we define the domain of the set to be n dom{X) = Xt x X x • • • x X 2 (2.1) n For a set of nodes in a graphical model this means that each x e dom(X) is a unique assignment of values to all the nodes in X. 2.2 Bayesian Networks A Bayesian Network (Bayes net, belief network) [16] is a directed, acyclic graph with a node for each random variable. The network defines a probability distribution over the set of random variables, X = {A'i,..., X }. The set of parents of a node Xi, denoted pa(Xi), are all the variables on which X; is conditionally dependent. This is indicated in the graph by directed arcs going from each element of pa(Xi) into Xi, seefigure2.3. We refer to the ancestors of a node, anc(X.j) as the transitive closure of pa{X.j), thus any node reachable from X.going upwards only in the graph. The descendants of X; are defined similarly, desc(Xi) is all the nodes reached by following links downwards in the graph. Each node Xi has an associated conditional probability distribution (CPD) over X{ and its parents defining p(Xi\pa(Xi)). The structure of the graph encodes a statement of independence that a node is independent of all its nondescendants given the values of its parents. This CPD can be represented by a n t Chapter 2. Background 5 Figure 2.1: Structures that allow effects to pass through table which has a number of entries that is exponential in the number of parents of the node. If a node has no parents then a prior distribution is specified as p(Xi) and we define that p(Xi\pa(Xi)) = p(X;). The joint probability distribution of the entire network is then given by the chain rule P ( X ) = J J piXifaiXi)). (2.2) X;SX 2.2.1 Inference in Bayesian Networks The task of inference is to compute the marginal probability of a node, p(Xi\e), given the distribution modelled by the network and any evidence, e, which may have been observed. When the state of a variable, Xi, is observed this is entered as evidence in the Bayes net by fixing the value of X to the one observed and shading the node in the graph. The updated marginal probability of a node is then given in general by t p(Xi)= x-x, p(X ,...,X ). 1 n (2.3) How the effect of evidence propagates throughout the Bayes net is very important for the discussions that follow. It can be described graphically with a simple set of rules called the Bayes Ball algorithm [19]. We can think of the probabilistic influence as propagating outwards in all directions along the edges connected to each node. Figures 2.1 and 2.2 show the three basic structures from which all Bayes nets can be built and the effect of evidence as it propagates through the network. Case 2.1(c), called a v-structure, is particularly important for the topic of this work. It shows that the parents of conditioned node become interdependent. This feature is what allows us to use conditioning to induce joint distributions. In conjunction with cases 2.1(a), 2.1(b) and 2.2(c) the computed beliefs of nodes throughout the network can be affected. If two nodes are blocked from affecting each other according to these rules then the nodes are said to be d-separated [16]. There are many algorithms used to carry out inference in Bayes nets, for good overviews see [13] [4] [22]. We describe two of them here that are pertinent Chapter 2. Background 6 (a) (b) (c) Figure 2.2: Structures that do not allow effects to pass through. to our discussion. 2.2.2 Variable Elimination Variable elimination [23] [5] is a method of performing inference which is used in this work. The method essentially comes down to recognizing that the posterior distribution of a query node, G, can be computed by using the joint probability of the entire network and summing out all nodes other than the query node. Given a query node G, a set of observations entered as evidence into nodes e and rn remaining nodes Y = X — {G} — e we have: p(G|e) = = p(G,e) p(e) p(G,e) E P(G,e) (2.4) G where (2.5) p(G,e) = P(G,e,Y ...,Y ) lt m (2.6) Yi,...,Y„, In a given Bayes net, then, inference can proceed by summing out, or eliminating, all unobserved, non-query variables. The algorithm by Zhang & Poole [23] uses factors to store the intermediate sums in this process. The order of elimination that is chosen can have a significant impact on the complexity of the algorithm though it does not affect the correctness of the final answer. Example 2 In the network shown in figure 2.3, the elimination order used is A E F C D. The computation of the marginal probability of query node G given evidence b is shown in (2.7). The factors produced at each step are subscripted with the nodes that have been eliminated to produce them and have the remaining free variables in brackets. So for example, once the node A is eliminated it produces the factor //t(C). The factors are represented as tables just as the CPDs for nodes but factors have no conditional meaning. Factors can be multiplied together with CPDs Chapter 2. 7 Background Figure 2.3: A n example Bayesian network with node B conditioned to true. or other factors to produce new factors in the same way that C P D tables are multiplied, element by element given the elimination ordering. The algorithm proceeds accoring to the following equation: (G,b)= Y,- P p{A,b,C,D,E,F,G) A , C , D , E , F = J2 P(G\D)p(D\b)p(F\C,D)p(E\C)p(C\A,b)p(A)p(b) A , C , D , E , F = Yp(G\D) (D\b) YYPW ' X>(£|C) b)p(A)p(b) = Yp(G\D)p{D\b)Y Y,P( \ > D)2~2p(E\C)f (C) 0 £>(C|A, P D C F E D C F E F . ' A C A = y>(G|£)£Y>(F|C,D)A (C) £ D C (2.7) F = £p(G|D)£W(C,D) D = C Y,P( \ )f*CEF{D) G D D = /ACDEF(G) We appeal to the intuition of this algorithm throughout our work and to the concept of factors. For efficient implementations we are interested in the following algorithm. Chapter 2. Background 2.2.3 Junction Tree Algorithm The junction tree algorithm [11] [9] performs essentially the same calculations as the variable elimination approach but avoids duplicated computations over multiple queries through the use of a tree structure for caching. The tree is constructed based on the structure of the Bayes net by a method called triangulation which has two steps. Thefirststep involves creating undirected arcs between all nodes with common children in the BN. This is known as moralization since all the parents in the network are being forced to "marry". In the second step, additional arcs are added to the graph until there are no remaining cycles with more than three nodes without arcs crossing between nodes in the cycle. The resulting graph, with the directionality of any arcs removed, is called fully triangulated or chordal. This graph is then used to construct a tree by finding all of the maximal cliques in the graph and creating a node in the tree labelled with the nodes in that clique. A clique is a set of nodes such that each node in the set is connected to every other node in the set. A clique is maximal if it is not a subset of any other clique in the graph. The clique-nodes in the tree are furthermore connected in such a way so as to satisfy the junction tree property. This is simply that if any two clique-nodes in the tree contain some variable X, then X must also be present in all clique-nodes on the connecting path between them. The algorithm passes messages between clique-nodes that represent the marginal distributions of the variables those cliques have in common. This is continued until all neighbouring cliques agree on their marginal distributions, generally in two passes through the tree. At this point all of the marginal distributions of the nodes in the network are cached in the clique-nodes of the junction tree, ready to be queried efficiently. This algorithm is widely used in many implementations of Bayesian networks because of its speed and efficiency in many practical cases. The complexity of the algorithm, however, is still exponential in the size of the largest clique in the network. Thus, minimizing the size of the largest clique is very important, an issue we return to for our solution in section 5.2.2. 2.3 Related Work The work presented here clearly builds on the prodigious literature of graphical models as a whole and deals with a fundamental feature of them, conditioning on evidence. Very little in that literature, however, refers to the two fundamental concepts of this work, using conditioning to specify a distribution and side effects that arise from such conditioning. There is also a strong relationship between the expression of joint distributions using conditioning and undirected distributions such as those in Markov Random Fields [16]. Work has been done by Buntine on chain graphs [3] attempting to merge directed and undirected models. The use of conditioning has similar possible applications for using both modelling paradigms. This work fo- Chapter 2. Background 9 cusses on a fully directed representation whereas Buntine's always grounds back to undirected models. Our work also relies fully on existing inference algorithms for directed graphical models rather than devising a new one. Yedidia et al. [22] present the M R F - B N relationship using factor graphs which has similarities to the affect of conditioned nodes in a network. They do not discuss, however, the use of conditioning as a modelling tool as we do. The literature on constraint satisfaction, see [12], is also somewhat related as c-nodes can be used as constraints. Their application goes beyond constraints however to the creation of any undirected joint distributions. 10 Chapter 3 Using Conditioning to Induce Distributions We begin with a simplified version of example 1 without the Ix or Wx variables in order to focus on the use of conditioning without the concern of side effects. We describe three different ways to model the types of distributions discussed in the introduction using Bayesian networks. In these models we have a prior, conditional distribution for some nodes that applies when a constraint is satisfied or not present. We must distinguish these priors, which are provided to the modeler, from the distributions actually entered into the Bayesian network in order to achieve the correct, constrained distribution. We use Pr(X) to denote the prior distribution of node X and p(X) to refer to the CPD used within the Bayes net to model the combined distribution for X. Example 3 Consider a model of three university instructors, Alice, Bob and Cindy, and an Artificial Intelligence (Al) course. The model has the following components: 1. A constraint on teaching : At least one instructor must teach the course. 2. Whether an instructor will teach the course: This is modelled by the binary variable Tx for instructor X. Our belief that X will also be teaching the course once the constraint has been satisfied by another instructor is specified as a prior distribution, Pr(Tx), the same for each instructor: Pr^tx) • .9 Pr(t ) .1 x 3. Research productivity of instructors : This is modelled by the binary variable Rx for instructor X. If Rx is true it means that instructor X will complete their research goals for this semester. Our belief in the value of Rx is dependent upon whether or not they end up teaching the Al course, Tx, which will reduce the time they can spend on research. This belief is represented by the distribution Pr(R \Tx), and is the same for each instructor: x T true false x Pr(r \T ) .5 .8 x x Prhr \T ) .5 .2x x Chapter 3. Using Conditioning to Induce Distributions 11 Figure 3.1: A Bayesian network with a directed clique to model example 3. The constraint in this example has the effect of making the Tx nodes completely interdependent. We would expect that since at least one of the Tx variables is true that all other things being equal they should each have at least a j chance of teaching the course plus there additional likelihood of being the second or third instructor on the course. If we observe that, for example, Alice is not teaching the course, then the responsibility for satisfying the constraint is spread between B o b and Cindy giving them each a probability of at least | . Our beliefs in the research productivity of each instructor is influenced by these changes as well. So our model specifies that Alice not teaching the course reduces our belief that Bob or Cindy will be productive since it is more likely they will be busy teaching. 3.1 Method 1 : Directed Clique Since the nodes TA,TB,TC are interdependent we can think of modelling them with a fully connected clique of nodes, allowing each node to have information about all the other constrained nodes. Directed models are acyclic so we need to force a direction onto the constraint by specifying an arbitrary ordering and computing the appropriate C P D for each constrained node based on the states of all "previous" nodes. For the order (TA,TB,TC), the final node, Tc, has all the other constrained nodes as its parents, as shown in figure 3.1. To compute the C P D s for this network it helps to consider that we are constraining the joint distribution of the three nodes so we can view the model as the Bayesian network shown in figure 3.2 with one node for the marginal probability of the teaching variables. The marginal of the prior distributions are shown in table 3.1(a) Now we need to apply the constraint to this <TA,TB,TC > node. The "or" constraint i n example 3 makes the case where all three variables are false in table 3.1(a) infeasible, so we replace it with probability zero. We renormalize this to yield p{T ,T ,T ) shown in table 3.1(b). We can now compute the C P D s to for the directed clique structure in the A B c Chapter 3. Using Conditioning to Induce Distributions 12 Figure 3.2: Bayes net for example 3 modelling the joint distribution of nodes T , T and T . A B c T t t f f t t f f Pr(T ,T ,T ) .1 x .1 x .1 = .001 .1 x .1 x .9 = .009 .1 x .9 x .1 = .009 .1 x .9 x .9 = .081 .9 x .1 x .1 = .009 .9 x .1 x .9 = .081 .9 x .9 x .1 = .081 .9 x .9 x .9 = .729 A B t t t t f f f f t f t f t f t f B T t t t t f f f f c T t t f f t t f f A (a) C o m p u t e d m a r g i n a l of t h e p r i o r d i s t r i b u t i o n s , PT(T ,TQ,TQ)T h i s is before t h e c o n s t r a i n t is a p p l i e d . T h e case ( / , / , / ) is inconsistent w i t h t h e "or" c o n s t r a i n t . p{T ,T ,Tc) .0037 .0332 .0332 .2989 .0332 .2989 .2989 0.0 A B t f t f t f • t f B (b) P o s t e r i o r d i s t r i b u t i o n p ( T , TB, TQ) of t h e t e a c h i n g nodes after t h e c o n s t r a i n t is applied, A A Table 3.1: Marginal probabilities for teaching variables prior and posterior to the application of the constraint Bayes net shown in figure 3.1 by marginalizing out the appropriate variables as follows and then normalizing: p(T ) A B TD,T p(T \T ) B (3.1a) P(TA,T ,T ) OC C C CC Y,P(T ,T ,T ) A A B C (3.1b) To p(Tc\T ,T )<xp(T ,T ,T ) A B A B c (3.1c) Remember that this needs to be done just to specify a distribution consistent with example 3, we are actually performing inference on portions of the network during specification. The complete conditional probability tables are shown in figure 3.3. After performing inference the computed beliefs are as shown in figure 3.4 as displayed in the Bayesian network software package Netica [7]. A l l Chapter 3. Using Conditioning to Induce Distributions T true true false false A A c B T true false 0.1 .526 RA T RB T Rc .5 .8 true false .5 .8 true false .5 .8 .369 T true false T T true false true false T A B B c 13 0.1 0.1 0.1 1.0 Figure 3.3: Conditional probability tables for directed clique. TA tl RA • true 36.9 false 63.1 true false 1 i 36.9 63.1 .. RB true 68.9 false 31.1 1 true 68.9 false 31.1 ^ T c TB true 36.9 false 63.1 R c true 68.9 false 31.1 _ Figure 3.4: Beliefs after inference for directed clique shown in Netica. Probabilities expressed as percentages. probabilities are expressed as percentages in Netica. 3.2 Method 2 : Chain of Sufficient Statistics Notice in the clique method that we create ever larger tables as we move along the clique ordering. The final node will always have all other nodes in the clique as its parents. The second method is very similar to a clique but reduces the number of arcs needed by taking advantage of the fact that we do not need all the information, only the sufficient statistics of previous nodes. In the case of example 3 this can be expressed by binary nodes, S i and 52- They need only indicate whether or not any previous Tx nodessatisfy the constraint as shown in figure 3.5. Chapter 3. Using Conditioning to Induce Distributions 14 Figure 3.5: A Bayesian network using a chain of sufficient statistics to model example 3. The Tx nodes are computed as follows: P{TA,T ,T ) p(T )x A B C (3.2a) T ,T D C p ( r | 5 i ) K 2ZP( UTB,T ) S B C (3.2b) Tc p(Tc\S )<xp(S ,T ) 2 2 (3.2c) c The sufficient statistics represent the distributions of all previous nodes in each computation. The node 52 summarizes the T and TB nodes so that Tc only needs one input. In example 3 the resulting complexity is the same as using the clique method. However, with a larger network having more T nodes this structure only increases linearly in table size, whereas the clique method increases exponentially. Consider a network with six Tx nodes. The clique method would leave the final node in the clique with five parents and 2° entries. In the chain representation, however, each successive Si node would suffice to summarize the only pertinent information, which is if any of the previous Tx nodes is true. Each Tx node would only have one parent and 2 entries in its table. The computed tables using the chain method for example 3 are shown in figure 3.6. After performing inference the posterior beliefs are as shown in figure 3.7. Note that the posterior beliefs are identical to those using the clique method in 3.4. A 2 3.3 Method 3 : Conditioning Each of the previous methods correctly result in the distribution required by example 3 but specifying them requires carrying out inference on a portion of the network using the marginal distribution of all constrained nodes. The resulting Chapter 3. Using Conditioning to Induce Distributions 15 Figure 3.6: Conditional probability tables for chain method. true false RA 68.9 31.1 _ 3 ii true false RB 68.9 31.1 Rc true false 68.9 31.1 SSlil- Figure 3.7: Beliefs after inference for chain shown in Netica. Chapter 3. Using Conditioning to Induce Distributions 16 graph forces a direction on to an undirected constraint by allowing the influence to flow from each Tx node towards Tc so that they can become interdependent. Conditioning provides a much simpler way to specify this distribution. A node, C, is added to the network to represent the constraint and is conditioned to be true, see figure 3.8. This entered evidence does not correspond, however, to an observation in the real world. The new node, known as a conditioned node or c-node, is true by definition and is part of the original specification of the joint distribution of the network. Figure 3.9 shows the CPDs for this network. These are the prior distributions given in example 3. So, p(Ts) in the network is simply the P T ( T B ) distribution given to us. There is no need for calculations to come up with correct tables to enter into the Bayesian network since the normal process of inference will perform the calculations of (3.1) and (3.2) automatically. This induces the required distribution on the Tx nodes as shown in figure 3.10 which behaves identically to the other two methods for example 3. Conditioning is compelling for several reasons : simultaneous constraint : The entire constraint is captured in one table. No artificial ordering of the variables needs to be created by the modeler. Note also that for a constraint such as "or", this conditioned node could be modelled by a more compact function instead of a table [1]. separation of priors and constraints : The two components of the distribution, directed and' undirected, can be specified separately. These correspond to the prior distributions tables and the constraints in example 3. The inference mechanism deals with combining them automatically. If the modeler is given a distribution divided in this way then conditioning is completely straightforward to use. natural modelling : Conditioning is a method that comes naturally out of the use of directed graphical models. Modelers know that entering evidence Chapter 3. Using Conditioning to Induce Distributions 17 Figure 3.10: Beliefs after inference using a conditioned node to model example 3. Chapter 3. Using Conditioning to Induce Distributions 18 into a network has an effect on the parents of the conditioned node, they see it whenever an observation is made. It is natural to try to use this to more easily specify distributions that do not translate well into a directed language. No computations are needed by the modeler to specify the distribution. 19 Chapter 4 Side Effects of Conditioning Now we return the ancestor variables modelling interest in A l and desire to teach from example 1 to demonstrate the problem of side effects of conditioning. When the constrained nodes have ancestors, the use of conditioning to specify distributions creates side effects in the network. We demonstrate this using example 4. Example 4 Consider a model of three university instructors, Alice, Bob and Cindy, and an Artificial Intelligence (Al) course. The model has the following components: 1. An instructor's interest in the topic of Al: The binary variable Ix represents whether or not instructor X is interested in the topic, of AL Our belief that X is interested in teaching the course is specified as a prior distribution, Pr(Ix) given by: X Alice Bob Cindy Pr{i ) .3 .7 .6 x PrHx) .7 .3 •4 2. Whether an instructor wants to teach the Al course: This is modelled by the binary variable Wx for instructor X. The belief that X wants to teach the course is dependant upon their interest in the topic, Ix, represented by Pr(Wx\Ix), and is the same for all three instructors: Ix true false Pr(w \Ix) .75 .1 x Pr(^wx\Ix) .25. .9 3. A constraint on teaching : At least one instructor must teach the course. 4- Whether an instructor will teach the course: This is modelled by the binary variable Tx for instructor X. Our belief that X will also be teaching the course once the constraint has been satisfied is dependant upon their desire to teach the course and is represented by, Pr(Tx\Wx): W true false x Pr(t \W ) .1 .01 x x Pr(^tx\W ) .9 .99 x Chapter ' 4. Side Effects of Conditioning 20 Figure 4.1: Bayesian network modelling example 1 with a directed clique. 5. Research productivity of instructors : This is modelled by the binary variable Rx for instructor X. If Rx is true it means that instructor X will complete their research goals for this semester. Our belief in the value of Rx is dependent upon whether or not they end up teaching the Al course, Tx, which will reduce the time they can spend on research. This belief is represented by the distribution Pr(Rx\Tx), and is the same for each instructor: Tx true false 4.1 Pr(rx\T ) .5 .8 x Pr{^r \T ) .5 .2 x x Modelling Ancestors with a Directed Clique With the ancestors of the constrained Tx nodes added back in we can again model the distribution using a directed clique of nodes, shown in figure 4.1. Notice that each Wx node has a dependency connecting to its Tx node as before but there are three additional influences indicated as well. These influences, shown with bold arrows, express the fact that the distribution of each Wx node is pertinent to our beliefs about each Tx node. To see this, consider p{Wc), our belief in Cindy's desire to teach the A l course. If we have found by talking to Cindy that she does not want to teach Chapter 4. Side Effects of Conditioning 21 A l , this should reduce our belief that Cindy will teach the course, since teaching a course is influenced by the instructor's interest in the topic. This allows us to then infer that it is more likely that either Alice or Bob will teach the course since someone must teach it. However, if we think back to the Bayes ball rules described in section 2.2.1 we see that the nodes T —Tc — WC form a v-structure, with no observation at the bottom node, Tc- This means that effects from Wc will not propagate to TB- The same is true of the effect of WB on T . But if we are certain that Bob wants to teach the course then it is less likely that Alice or Cindy will end up teaching it. The bold arcs added to the network in figure 4.1 compensate for this problem by allowing all the ancestor nodes to influence all the constrained nodes. The CPDs for the Tx nodes now need to be computed for the- network in figure 4.1 to take the influence of these ancestors into account as we did in (3.1). Since the values of the Wx nodes now inform every combination of Tx values we determine a marginal distribution of the priors across all Wx and Tx nodes, PT(TA,TB,TC,WA,WB,WC)In this distribution we then enforce the constraint, assigning a probability of zero to each case where all Tx nodes are false. This gives us the posterior after the constraint P(TA,TB,TC,WA,WB,WC)With this we can compute the conditional probability distributions for each Tx node as follows: B A P(T \WA,W ) A B OC (4.1a) P(TA,TB,TC,W ,WB,WC) A T ,T ,W B OC P(T \T ,WB,WC) B A C C Y (4.1b) P( A,TB,T ,W ,WB,WC) T C A T ,W C P(TC\TA,T ,W ) B C OC A Y P(TA,T ,T ,WA,WB,W ) D C C (4.1c) w ,w A n (4.1d) After normalizing we arrive at the distributions shown in figure 4.2 with the resulting beliefs in each node after inference shown in figure 4.3. The chain method has a very similar specification and results in the same distribution, it is not shown here. 4.2 Modelling Ancestors with Conditioning Figure 4 . 4 shows the model for example 4 using a conditioned node, C, that is a child of all nodes that need to be constrained. Again, specification of the CPDs for this network, shown in figure 4 . 5 , is much more straightforward than the clique method. Each node for the given variables is simply assigned a CPD equal to the prior distributions provided in example 4 . The CPD for C is one that matches the constraint given. The posterior beliefs after performing inference on all nodes are shown in figure 4 . 6 . As discussed in the Introduction conditioning in this network produces side effects which distort the resulting distribution. We can see the difference IA W A IA IA IA W IB RB Ic Rc true false .75 .1 true false .75 .1 true false .75 .1 W B true true true true true false true false false true false true false false false false A W c true false true false true false true false T A 0.369 0.505 0.505 0.848 0.0505 0.0848 0.0848 0.337 T A true true true true false false false false W Wc true true false false true true false false true false true false true false true false B T B 0.1 0.1 0.01 0.01 0.526 0.9174 0.0917 0.503 T A true true true true false false false false T W Tc true true false false true true false false true false true false true false true false 0.1 0.01 0.1 0.01 0.1 0.01 1.0 1.0 B c T RA T RB T Rc true false .5 .8 true false .5 .8 true false .5 .8 A B C Figure 4.2: Conditional probability tables using the clique method to model example 4. Chapter 4. Side Effects of Conditioning 23 Chapter 4. Side Effects of Conditioning 24 Figure 4.5: CPDs for conditioned model are simply the given prior distributions and constraint. Chapter 4. Side Effects of Conditioning 25 by comparing to the distribution found by the clique method, figure 4.3, where all the tables are computed without the use of evidence. If we think of the rules of 2.1 once again it is clear why this occurs. The evidence in C allows all nodes in the network to influence each other, so they all become dependent to some degree. The problem is that we are using a mechanism, conditioning, provided by the infrastructure of Bayesian networks that is assumed to have a certain meaning, that is, observation of a variable in the model. But C is not a variable in our model. It is an expression of the undirected component of the distribution of the Tx nodes. It is straightforward to use a node in this way in a Bayes net as it makes specification much easier and yields the desired distribution across the Tx nodes. The constraint holds as expected and it may not be immediately clear that the posteriors computed by the model with conditioning are incorrect if they are not compared to other models. Due to this subtlety it is important to create an antidote for this problem, in more complex models it will be even harder to detect. We want to stop the propagation of influence up from a conditioned node after it has affected its parents. The process of doing this will be referred to as shielding the Ix nodes from the side effects of the conditioning of C. This is the subject of the next chapter. Chapter 4. ' Side Effects of Conditioning Figure 4.7: Cindy's disinterest in A l has different effects. 26 Chapter 4. Side Effects of Conditioning 4.2.1 27 Descendants A note on the descendants of the constrained nodes, the Rx nodes in example 4. We do not need to shield the effects of conditioning from these nodes. This is because they are defined as being influenced by our beliefs in Tx- These in turn are influenced by beliefs in Wx and Ix • So, a constraint on who can teach the course will influence our belief in the research productivity of the instructors. Determining the interest in A l of an instructor will also influence our belief in their research productivity. However, the computation of that belief yields different actual probabilities than desired. This is due to the propagation of belief amongst the Ix and Wx nodes which only occurs because of conditioning side effects. 28 Chapter 5 Shielding Against Side Effects To remove the problem of side effects and maintain the useful properties of conditioning we want to devise a way to shield part of the network by counteracting the influence of the conditioning node. In this chapter we define some notation to aid this discussion and present a definition for what successful shielding will look like. 5.1 Conditioned, Effect and Shield Sets In order to make it easier to refer to different parts of a network we define three sets of nodes. Each set contains one of three node types; conditioned, affected or shield nodes. Note that these are not partitions of the network. These three sets indicate what part a node plays in creating the distribution across the network, a simple network is shown in figure 5.1. c-node A node that is created in order to induce an undirected distribution across its parents through conditioning. A c-node has the following properties: • it has no children. • it is always binary, since one value is always observed a larger domain would never be used. • it has evidence of true entered into it by definition. This is what we mean by saying it is conditioned to one of its values. • The effect of a c-node is intended to be local to its parents and by extension to all of its descendants, no other nodes should be influenced by-the conditioning. It represents the undirected or constrained distribution across its parent nodes. A given Bayesian network can contain multiple c-nodes. e-node A c-node produces a direct effect on this node. This is simply the set of parents of the c-node , pa(d) for a c-node Ci- We define the effect set for each c-node as all the nodes affected by it, that is, ~Ec — {E .,..., E \} where Q has m parents. t c c Chapter 5. Shielding Against Side Effects 29 s-node A shield node is the first line of defence against the propagation of side effects. We define a set Sc for each c-node, C as all nodes in the set of "grandparents" of C{ that are not also an affected node of C{, that is, ; S ' = pa(E ) Cl Ci -E . C i We exclude Ec, since it is possible that linking between e-nodes could cause some e-nodes to be grandparents of C;. . If there are k of these nodes then the members of the set are Sd = {S , • • •, S }. To be shielded from the side effects of C means that after inference the following holds for each s-node : Ci c p(S \C = true) = p(S ) 3 c c That is, in the absence of any other evidence, S should behave as if the node C were not conditioned. c In a network containing c-nodes, the specification of the network is not complete until all the c-nodes are conditioned to true and some method has been introduced to shield the nodes in Sc that should not be affected by this conditioning. 5.1.1 Affected Network We will also refer to the portion of the network that the c-node affects as the affected network. This is all of the nodes that are influenced by the conditioning Chapter 5. Shielding Against Side Effects 30 of C and the node C itself. It follows from our previous definitions that this set contains all the e-nodes and their descendants, which are not shielded. EcUdesc(Ec) 5.2 (5.1) Defining Shielding Our goal now is to implement some unobstructive method of shielding. By unobstructive we mean that it is a solution that requires no modification of standard Bayes net inference algorithms. Ideally, a modelling tool allowing specification of distributions using c-nodes would be able to implement some method to shield conditioning side effects without the modeler needing to know how it was implemented in their network. The solution we describe can be entered into any existing Bayesian network modelling package to produce the desired distribution. 5.2.1 Anti-nodes We begin with the idea that what conditioning has wrought, perhaps it can undo as well. Figure 5.2 shows a simple network with two anti-nodes, A\ and Ai, one for each s-node to be shielded. We will use this "hat" notation for all nodes added to counter side effects. The CPD of each anti-node is computed so that it shields its parent from the side effects of C, that is, the following holds: p(J\c,ai,a ) = p(J) p(K\c,a ,a )=p{K) ' 2 1 2 (5.2a) (5.2b) Although this can maintain the individual distributions p(J) and p(K) in this network they are unable to deal with more complex distributions. Without conditioning we have that p(J\K)=p(J) but with conditioning and the anti-nodes p(J\K,c,ai,a ) 2 ^ p(J\c,'ai, a ). 2 Anti-nodes, then, are not connected enough to compensate for these effects. What we need is an expression that encompasses the total influence of the nodes in the affected network. 5.2.2 Anti-factors Side effects are an influence on the joint distribution of the s-nodes. So, to counter this influence we create another conditioned node, C with parents Sc = {J, K}, see figure 5.3, and determine if there is a CPD that will exactly counter C's effects. Consider how inference will proceed using variable elimination: Chapter 5. Shielding Against Side Effects (b) Network with antinodes added. Figure 5.2: A first try at shielding with anti-nodes. 31 Chapter 5. Shielding Against Side Effects 32 Figure 5.3: A factor produced for a c-node has a corresponding anti-factor on its s-nodes. 33 Chapter 5. Shielding Against Side Effects p(J,K,-c,c)= P(J\A,B)p(A)p(B)p(K\B,D)p(D) A , B , D , L , M x p(c\J, K)p{L\J)p{M\K)p{c\L, M) = J2 p(J|A,B)p(A)p(B)p(K|B,D)p(D) (5.3) A , B , D xp(c\J,K)f (J,K) LM If the affected network is eliminatedfirst,the factor JL,M(J, K) is produced leaving us with the unknown distribution of the C node and the remainder of the network shown in bold. This remaining portion is simply p(J, K). We assign the CPD for C then as p(c\J,K)= 1 1 Zf , (J,K) 1 L M 1 Zf , (J,K) L M where Z = Y1J,K SL,M{J,K) is a normalization constant that ensures these values are in the range [0,1]. The node C, called an anti-factor, when conditioned to true, will cancel out with the factor / Z , , M ( ^ , K)- During inference this will yield: p{J,K,c,c)=p{J,K) General Anti-factors In general, we create an anti-factor node, C, with parents Sc such that p(c|S ) = c 1 1 Z/E (SC) C pH|s ) =i-ic 1 ^/E (SC) C where Z — Y1E /EC(SC)- With C conditioned to true the following is the definition of shielding: C p(S ,c,c) =p(S ) c c Instructor Example Figure 5.4 shows the CPD computed for the anti-factor in our instructor example. The values come, from the antifactor of the affected network given by !T ,T ,TC{IA,IB,IC) A D = \ Z T jz—z—z-^rJT ,T ,TC(1A,IB,1C) A B (5.4) Chapter 5. Shielding Against Side Effects J t f p(l = t J) K t f .59 0 p(m = t\K) 0 .33 (b) . (a) L t f t f M t t f f p(c = 34 J t f t f t\L,M) 1 1 1 0 (c) K t t f f .59 0 .7194 .33 (d) Table 5.1: CPDs can lead to a factor containing zeros. We can see the resulting posterior distribution in figure 5.5. Notice that the this matches exactly with the distribution shown using the clique method in figure 4.3. Inverting Zero There appears to be a problem with the computation of an anti-factor, we cannot invert a factor containing a zero. For example, consider the network from figure 5.3 using the CPDs from table 5.1.a-c. Using these CPDs to compute the factor resulting from eliminating L and M we arrive at the result shown in 5.1(d). The computation for the case resulting in zero is shown below: h, H,k) M = Y,P(LH)p(M\k) (c\L,M) P L , M = p{lhj)p(m\k)p(c\l,m) + p(^l\^j)p(m\k)p(c\^l,m) + P{lhj)phrn\k)p(c\l,^m) + p(^l\^j)p(-^m\k)p(c\^l, ->m) =0x0x1+1x0x1+0x1x1+1x1x0 = 0 Theorem 1 For a factor / E C to contain zeroes it is sufficient that for each e £ dom(Ec) c • some node a; £ ec has p(x\Sc) = 0 • or p(c|ec) = 0 Furthermore, it is necessary that for. some ec it is the case that p(c\ec) = 0. Chapter 5. Shielding Against Side Effects 37 Proof: We can see from the form of the computation for the factor that the first two cases are sufficient to make the final answer zero. Yl /E (SC)= C U P(X\S ) P(c\e ) c .(5.5) C e ecfcwi(Ec) x£e c c As long as one of the probabilities is always zero then each term in the sum will be zero. To see that it is also necessary for the c-node to be zero at some point, consider if it were not so. Assume p(c|e ) never equals zero for any instantiation of e , yet some entry in the factor is still zero. Since p(x\Sc) is a probability, there are no negative values and the only way for the sum to equal zero is for each and every term to be zero. For E c = {E ,..., Sg}, all binary nodes, there are 2 instances of e . If each node has probability zero for one of its values it must have probability one for the other value. There will still be one instance of e where every p(x\Sc) term has a probability of one. For this instance the only way to obtain a zero would be to have p(c|e ) = 0. Which breaks our assumption. Therefore there cannot be a zero in the factor without p(c|e ) = 0 at some point. • c c c n c c c c The importance of this situation should not be overstated. The meaning of a factor containing a zero is that for some instance of the values of the parents, the s-nodes in our case, the combined probability of the affected network for that instance is zero. The affected network, the combined distribution of the enodes and their ancestors including the conditioned node, is basically saying this assignment of values to the s-nodes is impossible. By trying to shield the side effects in this case we are trying to say that the impossible is possible. Since the only way we have of canceling out a probabilistic influence is through scaling, we simply cannot do it in this situation, nor should we try. The affected network has essentially imposed a hard constraint directly onto the values of Sc and the multiplication by zero has destroyed information that cannot be recovered. Complexity Simple and effective as anti-factors are, they do have a drawback. We are creating one new node to represent a factor of many nodes. Thus it is possible that the node C will have a much higher inwards arity than any other node in the network, seefigure5.6. If we think of inference using the junction tree algorithm what happens is that now there is new clique in the network containing the antifactor node and all of the s-nodes. There was not necessarily a clique this large previously. Since inference is exponential in the size of the largest clique in the junction tree this method could easily lead to large increases in computation time arbitrarily influenced by the size of ScIn the next chapter we propose another solution, cloned anti-networks, that will get around this difficulty while retaining the shielding properties of the anti-factor. Chapter 5. Shielding Against Side Effects (a) A Bayes net, Af, with anti-factor and a large S c set. 38 (b) Original junction tree for Af without node C (c) Junction tree for Af Figure 5.6: Antifactors can cause significant increase in the size of cliques in the junction tree if Sc is large. 39 Chapter 6 Cloned Anti-networks To reduce the size of the clique created by the anti-factor we create instead a conditional structure in the network that has the same effect as a single antifactor node. The most straightforward way to do this is to duplicate the existing conditional structure of the affected network and somehow compute the CPDs for these nodes so that the factor produced equals the anti-factor. This way, the shielding will still take place without creating a new clique arbitrarily larger than any existing clique. 6.1 Definition Given a Bayesian network with c-node C and sets E c and S c defined in the usual way, we define the cloned anti-network for C as the set of nodes Ec = {E \dom(E ) i c c = do (E ),pa(E' )=pa(E'c)} , m c c plus the node C satisfying dom(C) = dom(C) and pa(C) = pa(C), see figure 6.1. The nodes in the anti-network are referred to as the clones of their counterparts in the original network. The requirement for a solution is that the CPDs of the nodes in Ec and C are assigned such that they satisfy the following: / ( c)= K£oM^c)),.-,p(iSWfio)M^ E s E c «o) 1 " £ £ c £ « ^(Sc) JE »P(£ M4))C • • • ,p(SS|pa(£g))p(c|% . . . . -Eg) (6.1) C Ensuring this is not as straightforward as in the anti-factor case unfortunately. We need to specify the conditional probabilities of an entire subnetwork so that when the affected nodes are eliminated they yield a particular, known factor. In a sense we are performing variable elimination in reverse. We already know the factor but need to set the CPDs in a such a way that we arrive at it. Chapter 6. Cloned Anti-networks 40 Figure 6.1: Effect sets in a Bayesian network with an anti-network. 6.2 Junction Tree Complexity Consider the network with maximum independence among the s-nodes, figure 6.2(a) and how it will be triangulated. After moralization, see figure 6.2(b), there is a cycle of length 6. Moralization will always link the nodes in E c to each other and the nodes in E c to each other in this way. E-nodes also always share a parent with their copy in the anti-network. Therefore, these kinds of cycles will always arise unless no e-nodes have any parents, in which case shielding is unnecessary. In order to fully triangulate the graph we can cut the route short for each e-node by linking it to its clones in the anti-network, as in figure 6.2(c). A more complex example is shown in figure 6.3. The links between an enode and its clone will ensure, as in this example, that occurrences of an e-node in the original junction tree will now also contain the anti-network clone of that node. In the worst case this will create a clique that is twice the size of cliques in the tree without an anti-network. Consider however that if an anti-factor is used instead then an even larger clique would be created in this example. An anti-factor as a child of all s-nodes would create a clique of size seven. In some networks there could be very low connectivity between the snodes before shielding. After shielding with an antifactor, all s-nodes would be linked and form a clique together with the anti-node. Linking all of the s- Chapter 6. Cloned Anti-networks (c) J o i n i n g e-nodes 41 t o t h e i r a n t i - n e t w o r k c o u n t e r p a r t s removes the cycles. Figure 6.2: Adding an anti-network and its effects on triangulation of the network. Chapter 6. Cloned Anti-networks 42 nodes together is the simplest way to be sure all effects can be compensated for. However, using anti-networks, .cliques will only be created where needed due to the dependencies amongst the s-nodes. Therefore, depending on the network structure, anti-networks could provide significant complexity advantages over the use of anti-factors. 6.3 A General Solution In this section we present how to populate the CPDs of the nodes in the antinetwork in order to enable shielding as we have described. We begin with a simple example network showing what is known about the nature and existence of a solution. We will then derive a general form applicable to any network and show the same results. Implementation details of the solution, using constrained, nonlinear, optimization and resulting networks are discussed in chapter 7. 6.3.1 Base E x a m p l e Consider the example network shown infigure6.4, all nodes are binary. We use the concept of a cloned anti-network to cancel out the side effects of C in the portion of the network including and above Sc = {J,K}, the shield set. The parameters of the anti-network need to be set so that the factor produced after elimination of its nodes during inference conforms to: I (6.2) Where Z = }L,M{J, K) > the normalization constant used to keep the anti-factor entries in the range [0,1]. To deal with the parameters more compactly we will adopt the following notation. The free parameters of the system are p(C = true\L, M), p(L = true\J) and p(M = true\K). These will be represented by the variables 7;,,,, 4>j and tpk respectively. The subscripts of these variables indicate the values of their parent nodes with 1 indicating true and 2 indicating false with the parents ordered from left to right in the graph. The set of all these variables is X = {711,712,721,722, </>ij <j>2, i>2}- The values of the factor for the anti-network, given by equation (6.2), which are constants, will be represented in the same manner by 7Tjfc. Since all the variables in X represent probabilities, dom(x) = [0,1] for all x € X. Similarly, dom(irjk) = [0,1] for all j, k due to Z in equation (6.2). Thus the table for C is s K L M t t f f t f t f (c\L,M) P 7n 712 721 722 Chapter 6. Cloned Anti-networks 43 Figure 6.3: A more complex network showing how connectivity between e-nodes and s-nodes can lead to clique almost twice as large as those present without the anti-network. Chapter 6. Cloned Anti-networks 44 Figure 6.4: Bayesian network with cloned anti-network. Now our formulation is: f { Llk J, K) = \ M h [ J ; R ^ — • 7T v ) =E M ^ M J P ^ M M ^ ' i,M 4, 7 (6.3) i> Which expands out in the following way, we express the formula as a set of functions, pjjt(X), that are equal to zero. • TTjA: = lll<Pj1pk + 721 (1 - <t>j)ll>k + 712<M1 - V»fc) + 722(1 - 0 = -yn<Pjtpk + 7 2 i ( l - (t>j)i>k + 7 i 2 ^ ( l 5jfc( ) = X 7U<i>j1pk + y2li>k - 7 2 1 ^ ' ^ : 722^+722^^-^^ , + 7120.J ~ <j>j)(l - 1p ) k i/>fc) + 7 2 2 ( 1 - l^jlpk + 722 ~ <j>j)(\ - i/>) - i\. (6.4) fc 7220j~ (6.5) There are four multivariate, polynomial functions over eight variables and we need to find X such that <?jfc(X) = 0 for all j and k. jk Chapter 6. Cloned Anti-networks 45 Lower and Upper Bounds Consider the situation where all of the 7 variables equal zero. This is equivalent to a constraint that none of the parents of the c-node are true. Our conditioning to true in this case would be a zero probability event. This is only used for the purposes of having a lower bound for the <?jfc(X) function. In equation 6.4 we see that every term but the constant contains a 7 variable and so the result will be all zeros leaving ^-^(X) = —flj.fc- The other variables, <j>j d "4>k, have no affect in this case, so we can set them all to zero as well. Since we are only interested in the point where Pjfc(X) = 0 there is no risk of attempting to condition on a zero probability outcome. The only situation when <?jfc(X) = 0 will be if iTjk = 0 as well. This is a degenerate case of the zero factor discussed in theorem 1. Now consider the case where all variables are set to one. In equation 6.4 we see that all but two of the terms contain a (1 - 4>) or (1 — which are both zero under this assignment. We are left with (^(X) = 1 — iXjk- Since njk £ [0,1] this shows that a n 9 k(0) 3 < 0 < g (l). jk We also know that polynomials are always continuous over the real numbers and so by the Intermediate Value Theorem each function is guaranteed to have a solution Xjk € X such that 911(2:11) = 0 <7i2(a;i2)=0 921(2:21) = 0 1722(2:22) = 0 Conjecture In order for the solution to fully satisfy the requirements it must also be the case that i n = x\i = x \ = x . That is, all four functions must cross zero at the same point. Unfortunately, we have not yet been able to prove that this is always the case. However, based on our experimental results we believe that this is so, and leave it as a conjecture. We have found solutions for some networks and we see no obvious reason why some network structures would lend themselves to solution by this method more than others. 2 6.4 22 General Formulation Everything that was said for this specific example is also true in general. Figure 6.5 portrays the general case. The following formula represents the joint probability of the shield nodes and their ancestors, A = anc(Srj) Chapter 6. Cloned Anti-networks 46 Figure 6.5: A general Bayesian network structure with anti-network. p(AuS ) = £ c p(A)p(S |A)p(c|Ec)p(E |Sc)...p(E |S )x c c c c E .,.E ,E ...Eg. c c c p(c|E )p(E |Sc)...p(E |Sc) c c c = p(A)p(S |A) Y J P(e|E )p(E |Sc)...p(E |S )x. c c E ...E C E c c P(c|Ec)p(E |Sc)...p(E |S ) c E ...E C c C c c £ = p(A)p(S |A)/ (S )/ (Sc) c fic c Ec We thus require that f ^ { S o ) = where hf^c) z = E^(s ) c Sc When this is true the factor / E ( S C ) computed from the anti-network will fully cancel out the corresponding portion of the original network, thus eliminating the side effects caused by the conditioning of C. As in the basic example we adopt a variable notation to represent the free parameters and characterize the nonlinear equations. This will help us understand the space of the problem and the feasibility of finding solutions. We will represent particular instances of assignments of values to the affected clone nodes and shield nodes by: c e e dom(Ec) s £ dom(Sc) Chapter 6. Cloned Anti-networks 47 The variables 7 represent an instance of p(c = t|e) under the current assignment of values to e-nodes given by e. Similarly, 4>l,s represents p(e \pa (E )) for the j e-node where the relation pa (E ) indicates the value of the parents of node E within the current assignment s. We will also represent the constant values of the factor for the anti-network under the current assignment of s-nodes with 7r = ^ ^ ^ . The set of all these variables is e 3 s c th s c c 1 s X = {7e,< ) s Veerfom(Ec),sedom(Sc),je [1, |E |] C ]T T: = S p(c\e)p(e \pa (E ))...p( \pa (E [)) 1 n s c e s c e€*"ioTfi(Ec) 0= P(c\e)l[p(ei\pa (E ))-7: s e€dom(E ) .(X)= Y, fl c (6.6) s ~ ] c l 7.II«.) eerfom(Ec) 3 = ( e i = t (l-<.) ) ( e i = f ) -T. (6-7) 1 Since the sum ranges over all values of the e-nodes, not just true ones, we need the indicator power (e.g. (e = t)), for the 4> parameters in order to choose the appropriate form, negated or not, based on the value of e . If e = t then the exponent is 1. If e = f then it equals 0. We now have |Sc| multivariate, polynomial functions given by g which we will call the shield functions. The goal is to find a solution X such that </(X) = 0 for all s e dom(Sc). 3 J J 3 s s Size of Nonlinear System Each equation contains the same 7 variables, one in each term except for the final constant, 7r , and there are |dom(Ec)| terms. Each equation then contains a certain number of 4> variables. It will be the same number in each equation but different depending on the values of the parents given the current assignment s. In total, each e-node contributes \dom(pa(E ))\ variables to the entire system of equations, one for each unique assignment of its parents nodes. Thus the total number of variables is s c |Ecl n„ = \dom(E )\ + Y \ (P (E ))\ •j=\ dom c a c (6.8) The number of equations is simply the number of assignments of s, which is \dom(Sc)\- We cannot say anything more specific about the relationship between variables and equations. Depending on the domain sizes of nodes in E c and Sc and their connectivity there could be less, more or an equal number of variables as there are equations. Chapter 6. Cloned Anti-networks 48 Lower Bound Consider the point where all 7 and all <j> variables are set to zero. The the sum in equation (6.7) will clearly always result in zero thus leaving us with g (0) = s -TT (6.9) s Since ir e [0,1] for all s £ dom(Sc) this shows that s 9s(0) < 0 for all s 6 dom(S ) (6.10) c Upper Bound Consider now the point when all 7 and (j) variables are set to one. In equation (6.7) we see that every term produced by the sum contains a ^ j ora ( l - ^ j ) term. The latter terms will all result in zero with this assignment. This leaves only one term produced by the instance e when all e-nodes are true. Since all the 7 variables are also set to one, this term results in 1, giving us s <7(1) = 1 - 7 T S s (6.11) S And since 7r € [0,1] s 9s(l) > 0 (6.12) We know that polynomials are always continuous over the real numbers and so by the Intermediate Value Theorem each function is guaranteed to have a solution X' such that <?(X') = 0 for all s € ScS Conjecture Unfortunately, we cannot yet prove the final step that there is some assignment of variables X* such that g {X*) = 0 for all s e S , that is, that all of the X ' are in fact, the same assignment. We leave it as a conjecture that this is so, to be proven in future work. Note however, that if it is the case that certain classes of networks or distributions do not have solutions, anti-factors are still apply generally. Anti-networks would then still be a useful method of improving the complexity of a shielded network when the given model allows. s c 49 Chapter 7 Finding a Solution The form that we have now arrived at is one with many nonlinear functions across many unknowns. We have explored several analytical possibilities for finding a solution based on the problem domain but so far have found none that apply to any sufficiently large set of networks. A search approach has yielded more useful results. We have opted to use optimization techniques to find a solution since they are well understood with easy to access algorithms for anyone wanting to implement shielding. They also allow the easy specification of constraints which will be useful for our problem. We first provide a brief overview of the theory of constrained optimization and how it is used in the MATLAB optimization toolbox software package, which is the implementation that we use. 7.1 Constrained Optimization Background The topic of constrained optimization is a wide-ranging and complex one. Here we give only the briefest overview to indicate what is necessary for our problem. If you wish to find out more a very clear description of these techniques and many others is provided in [15]. A constrained optimization problem, also called a nonlinear programming problem, is of the following form min G(x) xeSR- v such that f c.;(x) =0 < , , \cj(x) > 0 are satisfied. / ^ (7.1) Where c is a vector of constraint functions to be satisfied and i e [l,m],j e [m + 1, n] where n is the number of constraints. 7.1.1 The K K T C o n d i t i o n s Many algorithms for solving nonlinear programming problems centre around the Karush-Kuhn- Tucker conditions [10]. They state that given linear independence among the active constraint gradients at a local solution x*, the following Chapter 7. Finding a Solution 50 conditions, in addition to 7.1, are necessary for optimality of x*: 71 VG(x*)- yjA;!V (x') = 0, (7.2) Cfc fc=i A£c (x*) = 0 fc for all j £ [1, ra], (7.3) for all k £ [I, n]. (7.4) When a solution is found, the KKT conditions tell us that the gradients of the objective function and constraints will cancel each other out given some set of lagrange multipliers given by A. Note that this only holds for active constraints, that is, when Cfc(x*) = 0 for all k £ [l,n]. Thus any inequality constraints where Cj(x) > 0 will be left out by having \j = 0. 7.1.2 Sequential Quadratic Programming This is a popular method used to solve the constrained optimization problem, and it is the method used by MATLAB through its fmincon function. The method works by solving a quadratic programming subproblem multiple times. At each step it determines the search direction as a solution to a quadratic subproblem of (7.2) with linearized constraints. The goal is to compute a new estimate of the lagrange multipliers and a search direction to improve the estimate of x* until some merit function is satisfied. These new estimates are then fed back into the next iteration of the algorithm. These methods have guaranteed convergence rates that are better than linear and often perform very well in practice. For a comprehensive and very clear explanation of these method see [15]. For other,references on these techniques see the work done by Han [6] and Powell [21], A literature review is provided in [2]. 7.2 Reformulation of Problem Using the shielding functions, g (X) from equation 6.7 we reformulate our problem to be solved as a constrained nonlinear system. Starting from a point where <7s(X) > 0, we want to minimize an objective function, G(X), that is a linear combination of the g functions subject to the constraint that each of the components of G remains non-negative at all times. Being a linear combination will ensure that if <?(X) = 0 then G(X) = 0. Non-negativity allows us to avoid incorrectly identifying as an answer the situation when G(X) = 0 but g (X) / 0 due to some of the functions being positive and some negative in a such a way that they exactly balance out. We define the objective function as s s S s (7.5) s£doitiS) Chapter 7. Finding a Solution 51 We have natural constraints in that the shielding functions that are the components of G(X) are always zero. However, we do not use c 'to specify this strong equality constraint, since this would require specifying a starting point that already satisfies the constraints. If we have such a starting point then this would be an acceptable final answer for our problem. So in our formulation of (7.1), we have rn = 0 and n = |dom(Sc)|- We define an ordering over the values of the s-nodes s e dom(Sc) indexed by j 6 [l,n]so that g-,(X) is the shielding function corresponding to the j value of s. We specify a set of constraints that require the shielding functions to always be non-negative : t th c,(X) = - (X) 9j (7.6) The functions must be negated in order to reverse the inequality in (7.1). The starting point here need only satisfy <?(X) > 0. Additionally, though not a constraint on the function, we must restrict the domain of the variables to the range [0,1] since each variable must represent a probability. We cannot simply drop this restriction and normalize the variables X in order to retrieve probabilities since the shielding functions contain products and division will not distribute over them. S 7.3 Example Networks Our solutions were computed using the MATLAB Optimization Toolbox implementation [8] of the SQP algorithm through its fmincon function. This required writing MATLAB function forms of the functions G(X) and Cj(X) for all j € [1, n]. We also made extensive use of Kevin Murphy's BNT Toolbox [14] for MATLAB which allows representation and computation of Bayesian networks within that environment. Instructor Example First we return to the instructors in example 4. After converting the network to the general formulation and expressing it as a nonlinear constrained optimization problem, we find the values for X, shown infigure7.1. Figure 7.2 shows the computed beliefs after inference, which conforms to our desired distribution. Note that these are the same distributions determined by both the anti-factor method in 5.5 and the clique method in 4.3. Comparing just to the anti-factor approach we also see infigures7.3-7.6 that further observations propagate and obey the original constraint in the same way. Localized Effect An important feature of networks containing conditioning and shielding is that the counterbalancing of the two is fully localized. This means that adding, additional ancestors or descendants outside the sets Sc and E c can be done arbitrarily without the needs for recomputing the anti-network. Some examples of this are shown infigures7.7 and 7.8. Figure 7.1: Computed CPDs for anti-network found using nonlinear constrained optimization. Chapter 7. Finding a Solution 53 Multiple c-nodes across network Figure 7.9 shows a Bayesian network with two c-nodes, C\ and C , and augmented with two cloned anti-networks. Each set of cloned nodes compensates for a different c-node. The CPDs for L, M, C were solved separately from those for TA,TB,TC,C\. They can in fact be solved as separate Bayesian networks and connected together afterwards, as long as no e-nodes are shared in common as parents of multiple c-nodes. 2 2 Figure 7.2: Computed beliefs after inference for a shielded network using a cloned anti-network for example 4. Or Figure 7.5: Distribution after observing Ic = false using an anti-factor. Figure 7.7: Observing D = true reduces our belief that Alice is interested in Al; this influences the distribution of the T nodes but Bob and Cindy's interests remain shielded. E also maitains its original distribution. The antinetwork does not need to be recomputed to add D. x CO Figure 7.8: Observing E = true reduces our belief in the desire to teach for Bob and Cindy, this influences the distribution of the Tx nodes but Alice's interest is unaffected. D also maintains its original distribution. The antinetwork does not need to be recomputed to add E. o Figure 7.9: Bayesian network containing two c-nodes, each shielded seperately. Notice that the beliefs for Tx, Wx and Ix nodes are the same as before. 62 Chapter 8 Conclusion The motivation of this work has been the specification of distributions in directed graphical models that are broken into two distinct components known to the modeler. For some subset of variables in the model there are known directed, or conditional, distributions, and an undirected, or constrained, distribution. The modeler presented with representing such a distribution in a Bayesian networkfindsthat significant calculations mustfirstbe made and the resulting network is often an unintuitive or forced representation of the problem. The use of an added node conditioned to true to represent the undirected component of the distribution is much more straightforward for those familiar with the behaviour of evidence in Bayesian networks. It also requires no calculations on the part of the modeler to specify the distribution. Unfortunately, this technique does not always lead to the desired distribution, causing dependencies among ancestor nodes not implied by the original problem. We have introduced this problem of side effects of conditioning and presented a general method for counteracting them with the use of computed anti-factors. When the network structure leads this method to large complexity increases we have shown that a conditional structure can be created, a copy of the original affected network, called a cloned anti-network. The parameters of the distribution for this anti-network need to be computed. We used nonlinear constrained optimization for this with very good results. The resulting networks provide the modeler with the best of both worlds, a straightforward specification matching the knowledge they have available and a lack of side effects caused by the use of conditioning to obtain the distribution. 8.1 Open Problems The methods introduced here could be used as part of a Bayesian network modelling system to provide the modeler with a post-specification processor to create the shielded anti-network. This would have the advantage of allowing them to see both distributions to determine if shielding is necessary for their situation as the distinction is sometimes subtle. Some open problems that involve a more general bound on the size of the cliques that are added to the junction tree when a model is augmented with an anti-network. The promise of the anti-network technique is partially in the simplicity and symmetry of its structure. It remains to be proven fully however that a solution to the anti-network parameters always exists, in practice we have Chapter 8. Conclusion 63 always found results. Showing this, existence or showing some set of networks where a solution must exist may help further in refining algorithms for searching the parameter space. For distributions where the constraint component is a function which can be represented compactly, such as "or" it is unknown if any simplifications can be made to the resulting distribution of nodes in the anti-network. There are many connections between the use of conditioning to induce undirected distributions and undirected graphical models such as Markov Random Fields. Conditioning can be used to simulate undirected distributions within directed models. The idea of side effects has not been studied in undirected models as of yet but the results of such study would no doubt be enlightening for an understanding of both frameworks and the connection between the them. The shielded distributions created in Bayesian networks using anti-networks and conditioned nodes, for example, are difficult to model in MRFs. There are also interesting questions relating to conditioning in other directed modelling languages such as Influence Diagrams such as the ramifications of side effects for utility maximization and the meaning of a conditioned node constraining decisions. 64 Bibliography [1] Valeria Bertacco and Maurizio Damiani. The disjunctive decomposition of logic functions. In ICCAD '97: Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design, pages 78-82, Washington, DC, USA, 1997. IEEE Computer Society. [2] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. Acta Numerica, 4:1-51, 1996. [3] Wray L. Buntine. Chain graphs for learning. In Uncertainty in Artificial Intelligence,, pages 46-54, 1995. [4] Adnan Darwiche. A differential approach to inference in bayesian networks. Journal of the ACM, 50(3):280-305, 2003. [5] Rina Dechter. Bucket elimination : A unifying framework for probabilistic inference. In E. Horvitz and F. Jensen, editors, Proceeding of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI-96), pages 211- 219, 1996. [6] S. P. Han. Superlinearly convergent variable metric algorithms for general nonlinear programming problems. Mathematical Programming, 11:263-282, 1976. [7] Norsys Software Inc. Netica : Bayesian network modelling tool. http://www.norsys.com/netica.html, October 2005. [8] The MathWorks Inc. Matlab optimization toolbox, October, 2005. http: //www.mathworks. com / access/helpdesk/help/toolbox/optim/. [9] Finn V. Jensen. Junction trees and decomposable hypergraphs. Technical report, Judex Datasystemer, Aalborg, Denmark., 1988. [10] H. W. Kuhn and A. W. Tucker. Nonlinear programming. In j . Neyman, editor, Proceeding ofteh Second Berkeley Symposium on Mathematical Statis- tics and Probability, pages 481-492. University of California Press, 1951. [11] S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. In Readings in uncertain reasoning, pages 415-448. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1990. Bibliography 65 [12] Alan Mackworth. Encyclopdeia of Artificial Intelligence, chapter Constraint Satisfaction, pages 285-293. John Wiley and Sons, 1992. [13] K. Murphy. Inference and learning in hybrid bayesian networks. Technical Report Technical Report 990, University of California, Berkeley, 1998. [14] Kevin Murphy. Bayes net toolbox for matlab. http://www.cs.ubc.ca/ murphyk/Software/BNT/bnt.html, October, 2005. [15] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag New York, Inc., 1999. [16] Judea Pearl. Probabilistic Reasoning in Intelligent Systems:Networks of Plausible Inference. Morgan Kaufmann, 1988. [17] Judea Pearl and Tom S. Verma. A theory of inferred causation. In James F. Allen, Richard Fikes, and Erik Sandewall, editors, KR'91: Principles of Knowledge Representation and Reasoning, pages 441-452, San Mateo, California, 1991. Morgan Kaufmann. [18] David Poole, Alan Mackworth, and Randy Goebel. Computational Intelligence : A Logical Approach. Oxford University Press, 1998. [19] Shachter Ross. Bayes-ball: The rational pasttime (for determining irrelevance and requisite information in belief networks and influence diagrams). In Proceedings of the 14-th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 480-487, San Francisco, CA, 1998. Morgan Kaufmann Publishers. [20] Stuart Russell and Peter Norvig. Artificial Intelligence : A Modern Approach. Prentice Hall, 1995. [21] G. A. Watson, editor. A fast algorithm for nonlinearly constrained opti- mization calculations. Springer Verlag, Berlin, 1977. [22] Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Understanding belief propagation and its generalizations. In Exploring artificial intelligence in the new millennium, pages 239-269. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. [23] Nevin Zhang and David Poole. Exploiting causal independence in bayesian network inference. Journal of Artificial Intelligence Research, 5:301-328, 1996.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Shielding against conditioning side effects in graphical...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Shielding against conditioning side effects in graphical models Crowley, Mark Anthony 2005
pdf
Page Metadata
Item Metadata
Title | Shielding against conditioning side effects in graphical models |
Creator |
Crowley, Mark Anthony |
Date Issued | 2005 |
Description | When modelling uncertain beliefs with graphical models we are often presented with "natural" distributions that are hard to specify. An example is a distribution of which instructor is teaching a course when we know that someone must teach it. Such distributions over a set of nodes can be easily described if we condition on a child of these nodes as part of the specification. This conditioning is not an observation of a variable in the real world but by fixing the value of the node, existing inference algorithms perform the calculations needed to achieve the desired distribution automatically. Unfortunately, although it achieves this goal it has side effects that we claim are undesirable. These side effects create dependencies between other variables in the model. This can lead to different beliefs throughout the model, including the constrained variables, than would otherwise be expected if the constraint is meant to be local in its effect. We describe the use of conditioning for these types of distributions and illuminate the problem of side effects, which have received little attention in the literature. We then present a method that still allows specification of these distributions easily using conditioning but counterbalancing side effects by adding other nodes to the network. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2009-12-11 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0103860 |
URI | http://hdl.handle.net/2429/16514 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2005-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_2005-0421.pdf [ 2.55MB ]
- Metadata
- JSON: 831-1.0103860.json
- JSON-LD: 831-1.0103860-ld.json
- RDF/XML (Pretty): 831-1.0103860-rdf.xml
- RDF/JSON: 831-1.0103860-rdf.json
- Turtle: 831-1.0103860-turtle.txt
- N-Triples: 831-1.0103860-rdf-ntriples.txt
- Original Record: 831-1.0103860-source.json
- Full Text
- 831-1.0103860-fulltext.txt
- Citation
- 831-1.0103860.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0103860/manifest