A COMPUTATIONAL THEORY OF DECISION NETWORKSByNevin Lianwen ZhangB. Sc. (Mathematics) China University of Elect. Sci. & Tech.M. Sc., Ph. D. (Mathematics) Beijing Normal UniversityA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIESCOMPUTER SCIENCEWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIADecember 1993© Nevin Lianwen Zhang, 1994In presenting this thesis in partial fulfillment of the requirements for an advanced degreeat the University of British Columbia, I agree that the Library shall make it freelyavailable for reference and study. I further agree that permission for extensive copyingof this thesis for scholarly purposes may be granted by the head of my department or byhis or her representatives. It is understood that copying or publication of this thesis forfinancial gain shall note allowed without my written permission.Computer ScienceThe University of British Columbia2075 Wesbrook PlaceVancouver, CanadaV6T 1Z1Date:.O? (4AbstractThis thesis is about how to represent and solve decision problems in Bayesian decision theory (e.g. Fishburn 1988). A general representation named decision networks is proposedbased on influence diagrams (Howard and Matheson 1984). This new representationincorporates the idea, from Markov decision processes (e.g. Puterman 1990, Denardo1982), that a decision may be conditionally independent of certain pieces of availableinformation. It also allows multiple cooperative agents and facilitates the exploitationof separability in the utility function. Decision networks inherit the advantages of bothinfluence diagrams and Markov decision processes.Both influence diagrams and finite stage Markov decision processes are stepwisesolvable, in the sense that they can be evaluated by considering one decision at a time.However, the evaluation of a decision network requires, in general, simultaneous consideration of all the decisions. The theme of this thesis is to seek the weakest graph-theoreticcondition under which decision networks are guaranteed to be stepwise-solvable, and toseek the best algorithms for evaluating stepwise-solvable decision networks.A concept of decomposability is introduced for decision networks and it is shown thatwhen a decision network is decomposable, a divide and conquer strategy can be utilizedto aid its evaluation. In particular, when a decision network is stepwise-decomposable itcan be evaluated not only by considering one decision at a time, but also by consideringone portion of the network at a time. It is also shown that stepwise-decomposability isthe weakest graphical condition that guarantees stepwise-solvability.Given a decision network, there are often nodes and arcs that can harmlessly removed.An algorithm is presented that is able to find and prune all graphically identifiable11removable nodes and arcs.Finally, the relationship between stepwise-decomposable decision networks (SD DN ‘s)and Markov decision process is investigated, which results in a two-stage approach forevaluating SDDN’s. This approach enables one to make use of the asymmetric natureof decision problems, facilitates parallel computation, and gives rises to an incrementalway of computing the value of perfect information.111Table of ContentsAbstract iiTable of Contents ivList of Figures ixList of Symbols xiiAcknowledgment xiv1 Introduction 11.1 Synopsis 21.2 Bayesian decision theory 61.3 Decision analysis 71.3.1 Decision trees 81.3.2 Influence diagrams 111.3.3 Representing independencies for random nodes 121.4 Constraints on influence diagrams 131.5 Lifting constraints: Reasons pertaining to decision analysis 151.5.1 Lifting the no-forgetting constraint 151.5.2 Lifting the single value node constraint 181.5.3 Lifting the regularity constraint 191.6 Lifting constraints: Reasons pertaining to MDP’s 211.6.1 Finite stage MDP’s 221.6.2 Representing finite stage MDP’s 231.7 Computational merits 251.8 Why not lifted earlier 25iv1.9 Subclasses of decision networks 261.10 Who would be interested and why 282 Decision networks: the concept 302.1 Decision networks intuitively 302.1.1 A note 332.2 Bayesian networks 342.3 Decision networks 362.3.1 A general setup of Bayesian decision theory 362.3.2 Multiple-decision problems 372.3.3 Technical preparations 382.3.4 Deriving the concept of decision networks 392.3.5 An example 412.4 Fundamental constraints 423 Decision networks: formal definitions 453.1 Formal definition of Bayesian networks 453.2 Variables irrelevant to a query . . . 483.3 Formal definitions of decision networks 503.4 A naive algorithm 533.5 Stepwise-solvability 573.6 Semi-decision networks 594 Divide and conquer in decision networks 624.1 Separation and independence 634.2 Decomposability of decision networks 654.2.1 Properties of upstream and downstream components 66v4.3 Divide and conquer. 695 Stepwise-decomposable decision networks 715.1 Definition 725.1.1 Another way of recursion 745.2 Stepwise-decomposability and stepwise-solvability 755.3 Testing stepwise-decomposability 765.4 Recursive tail cutting 785.4.1 Simple semi-decision networks 795.4.2 Proofs 805.5 Evaluating simple semi-decision networks 825.6 The procedure EVALUATE 856 Non-Smooth SDDN’s 876.1 Smoothing non-smooth SDDN’s 876.1.1 Equivalence between decision networks 886.1.2 Arc reversal 886.1.3 Disturbance nodes, disturbance arcs, and disturbance recipients 906.1.4 Tail-smoothing skeletons 926.1.5 Tail smoothing decision networks 946.1.6 Smoothing non-smooth SDDN’s 956.1.7 Proofs 976.2 Tail and body 996.2.1 Tail and body at the level of skeleton 1006.2.2 Tail of decision networks 1016.2.3 Body of decision networks 1036.3 The procedure EVALUATE1 105vi6.4 Correctness of EVALUATE1.6.5 Comparison to other approaches6.5.1 Desirable properties of evaluation algorithms6.5.2 Other approaches7 Removable arcs and independence for decision nodes7.1 Removable arcs and conditional independencies for decision nodes7.2 Lonely arcs7.3 Pruning lonely arcs and stepwise-solvability7.4 Potential lonely arcs and barren nodes7.5 An algorithm8 Stepwise-solvability and stepwise-decomposability8.1 Normal decision network skeletons8.2 Short-cutting8.3 Root random node removal8.4 Arc reversal8.5 Induction on the number of random nodes8.6 Induction on the number of decision nodes8.6.1 An extension and a corollary8.7 Lonely arcs and removable arcs8.8 Stepwise-solvability and stepwise-decomposability[271281301351381421451471481509 SDDN’s and Markov decision processes9.1 Sections in smooth regular SDDN’s9.1.1 An abstract view of a regular smooth SDDN9.1.2 Conditional probabilities in a section107109109112118119121123124125157158160161vii9.1.3 The local value function of a section. 1619.1.4 Comparison with decision windows9.2 Condensing smooth regular SDDN’s9.2.1 Parallel computations9.3 Equivalence between SDDN’s and their condensations9.4 Condensing SDDN’s in general9.4.1 Sections9.4.2 Condensation9.4.3 Irregular SDDN’s9.5 Asymmetries and unreachable states9.5.1 Asymmetries in decision problems9.5.2 Eliminating unreachable states in condensations9.6 A two-stage algorithm for evaluating SDDN’s9.6.1 Comparison with other approaches10 Conclusion10.1 Summary10.2 Future work10.2.1 Application to planning and control under uncertainty10.2.2 Implementation and experimentation10.2.3 Extension to the continuous case10.2.4 Multiple agentsBibliography 186• . . 162163165166• . . 168168170172172172173175176179• . . . 179• . . . 181181• . . . 183• . . 184184viiiList of Figures1.1 Dependencies among the chapters 51.2 A decision tree for the oil wildcatter problem 101.3 An influence diagram for the oil wildcatter problem 111.4 An influence diagram for the extended oil wildcatter problem 131.5 A decision network for the extended oil wildcatter problem, with independencies for the decision node oil—sale—policy explicitly represented. . 171.6 A decision network for the extended oil wildcatter problem with multiplevalue nodes. The total utility is the sum of all the four value nodes. . 181.7 The decision network obtained from the one in Figure 1.6 by deleting someremovable arcs. This network is no longer no-forgetting 201.8 A decision network for the further extended oil wildcatter problem. It isnot regular, “forgetting” and has more than one value node 211.9 A three period finite stage MDP 241.10 Subclasses of decision networks 272.11 A decision network skeleton for the extended oil wildcatter problem. . . . 312.12 Two Bayesian networks for the joint probability P(alarm, fire, tampering, smoke, leaving).2.13 Two decision networks for the rain-and-umbrella problem 423.14 Bayesian network and irrelevant variables 463.15 A decision network whose evaluation may require simultaneous consideration of all the decision nodes 55ix4.16 The relationships among the sets Y, Y1, Y11, X1, X1, and lrd. The threesets Yi, Y11 and lrd constitute a partition of Y — the set of all the nodes;while X1, X11 and lrd constitute a partion of CUD — the set of all therandom and decision nodes. When the network is smooth at d, there areno arcs going from Xii to 7Td 634.17 Downstream and upstream components: The downstream component is asemi-decision network, where the prior probabilities for oil-produced andoil-market are missing. In the upstream component, u is the downstream-value node, whose value function is the optimal conditional expected valueof the downstream component 675.18 Step by step decomposition of the decision network in Figure 2.11 736.19 The concept of arc reversal: At the beginning the parent set of c1 is BUB1and the parent set of c2 is BUB2U{ci}. After reversing the arc1—*c2, theparent set of c1 becomes BUB1U{c}and the parent set of c2 becomesBUB2U1.There are no graphical changes otherwise 886.20 A non-smooth decision network skeleton 916.21 The application of TAIL-SMOOTHING-K to the decision network skeletonin Figure 6.20 with the input candidate node being d2: (a) after reversingc6—*c4, (b) after reversingc6—*c5 936.22 The effects of applying SMOOTHING to the SDDN in Figure 1.7: (a)after the arc from seismic—structure to test—result is reversed, (b)the final SDDN, which is smooth 966.23 Tail and body for the non-smooth decision network skeleton in Figure 6.20[FINAL CHECK]: (a) shows its body w.r.t d2 and (b) shows its tail w.r.t d2.100x7.24 Removable arcs and removable nodes 1218.25 An abnormal decision network skeleton (1), and an normal equivalentskeleton (2) 1288.26 Short-cutting. The random node c in (1) is short-cut, resulting in thedecision network skeleton in (2) 1309.27 A regular SDDN and its sections: t stands for test, d stands for drill,and s stands for oil—sale—police 1599.28 An abstract view of a smooth regular SDDN. Smoothness is indicated bythe fact that all the arrows from the are pointing downstream. . . . 1609.29 The skeleton of the condensation of the SDDN in Figure 9.27 (a) 1649.30 Sections in a non-smooth regular SDDN 16910.31 Mobile target localization 182xiList of symbolsc, 3, ‘y: value of a variable or of a set of variables (nodes)B, B1, B2: sets of nodes (variables)C: the set of random nodes in a decision networkd, d: a decision node (variable)a policy for a decision network, i.e a vector of decision functions/.: policy space, the set of all policiesj, L: decision function space of d, d6j, S: a decision function for a decision node d, d60: optimal policy of a decision network6., 6: optimal decision function for a decision node d, dD: the set of decision nodes in a decision networke(d, lrd): the evaluation functional of a simple semi-decision networkE[Jf]: the optimal expected value of a decision network AlE[VS]: the optimal conditional expected value of a decision network Al given SE5 [Al]: the expected value of a decision network Al under policy 6Es[JVS]: the conditional expected value under policy 6a decision network skeleton.1C(d, K), IC1: body of a decision network skeleton.IC11(d, IC), ICE- : tail of a decision network skeleton.: utility or value functionAl: a Bayesian network or a decision network.Al1(d,Al), Al1: body of a decision network.xiiJ/11(d,J/), .iVi : tail of a decision network.).C: condensation of a decision network.J’f(d, d+i): section of a decision network.jV: the Bayesian network induced from a decision network .iV by a policy 5.: the frame of a variable or the Cartesian product of the frames of variablesa set of probabilitiesF(cr): the conditional probability of random variable c given its parents 7rF5, Fs(X): the joint probability induced by a policy SF0: the multiplication of all the conditional probabilities in a simple semi-decision networkir: the set of parents of node x in a networkS: a set of variables, usually related to separatorV: the set of value nodes in a decision networkX: the set of random and decision nodes in a decision networkX1: the set of random and decision nodes in upstream setX11: the set of random and decision nodes in downstream setY: the set of all the nodes in a decision networkY1,Y1(d,A1), Yi(d, sAC): upstream setYii, Yii(d,.iV), Y11(d, sAC): downstream set- xliiAcknowledgementFirst, I would like to thank my thesis supervisor David Poole for his guidance, encouragement, support, and friendship. David, it has been great fun being your student.I am grateful to Runping Qi for stimulating discussions, his friendship, and his poem.Members of the UBC reasoning group have been always the first to hear about and tocriticize my ideas. Besides David and Runping, other members of the group include BrentBoerlage, Craig Boutillier, Mike Horsch, Keiji Kanazawa, Ying Zhang. Folks, thank youall for bearing with me and for all your valuable comments and suggestions.Thanks also go to Bruce D’Ambrosio, David Kirkpatrick, David Lowe, Alan Mack-worth, Raymond Ng, Nicholos Pippenger, Martin Puterman, and Jim Zidek for servingon my supervisory committee and/or my examination committee.I thank the department staffs: Deborah, Everlyn, Gale, Grace, Jean, Joyce, Monica,and Sunnie for their help and friendliness.I thank my friends, office mates, lab-mates for sharing wonderful times, and for essential support during some not-so-wonderful times.Finally, I would like to thank all my teachers from the past: Xinfu Zhang, DuangcaiWang, Quanglin Hou, Peizhuang Wang, Sijian Yan, Glenn Shafer, Prakash Shenoy, andDavid Poole, to name a few. Together, they have made an academice career possible fora shabby boy from a remote and poor Sichuan village.xivChapter 1IntroductionThis thesis is about how to represent and solve decision problems in Bayesian decision theory (e.g. Fishburn 1988). A general representation named decision networks isproposed based on influence diagrams (Howard and Matheson 1984). This new representation incorporates the idea, from Markov decision processes (e.g. Denardo 1982), that adecision may be conditionally independent of certain pieces of available information. Italso allows multiple cooperative agents and facilitates the exploitation of separability inthe utility function.Influence diagrams are stepwise-solvable, that is they can be evaluated by consideringone decision at a time (Shachter 1986). However, the evaluation of a decision network requires, in general, simultaneous consideration of all the decisions. The theme of this thesisis to seek the weakest condition under which decision networks are stepwise-solvable, andto search for the best algorithms for evaluating stepwise-solvable decision networks.This introductory chapter provides a synopsis of our theory, and describes how andwhy it differs from its two mother theories: the theory of influence diagrams and thetheory of Markov decision processes.The synopsis in Section 1.1 below describes salient features of the following chapters.Section 1.2 reviews Bayesian decision theory, and Section 1.3 reviews two methodologiesfor decision analysis, namely decision trees and influence diagrams.An influence diagram is a representation of a single agent’s view of the world asrelevant to a decision problem; it spells out information availability for each decision.1Chapter 1. Introduction 2Several constraints follow from its semantics (Section 1.4). A decision network, on theother hand, is a representation of a group of cooperative agents’ view of the world; itindicates both information availability and dependency for each decision node. Someconstraints of influence diagrams do not apply to decision networks.Syntactically decision networks are arrived at by lifting some of those constraints(Section 1.4). Reasons for lifting the constraints originate from the demand of a moregeneral representation framework in decision analysis (Section 1.5), and from the effort toprovide Markov decision processes with a graph-theoretic language (Sections 1.6). Moreimportantly, the lifting of constraints also allows us to apply more techniques in solvinga problem and hence leads to better and more efficient algorithms (Section 1.7).Section 1.9 echoes the synopsis by providing a description of all the subclasses ofdecision networks we will encounter later. Finally, Section 1.10 suggests who might beinterested in this thesis, and why.Li SynopsisOur goal is to enable computers to help decision makers solve complex decision problems.The first step in achieving this goal is to design a language or some kind of representation framework so that decision makers and computers can communicate about decisionproblems. Two frameworks exist previously, namely influence diagrams (Howard andMatheson 1984, Shachter 1986) and Markov decision processes (MDP) (see, for example,Denardo 1982).MDP’s are a model of sequential decision making for the sake of controlling dynamic systems; it is special-purpose. Influence diagrams are a general framework fordecision analysis; however they are always required to satisfy the so-called no-forgettingconstraint, which requires a decision node and its parents be parents to all subsequentChapter 1. Introduction 3decision nodes.There are at least three reasons that a decision maker would be interested in liftingthe no-forgetting constraint (details coming up in later in this chapter):1. The decision maker may be able to qualitatively determine that a decision does notdepend on certain pieces of available information. As a matter of fact, one resultof MDP theory is that given the current state of the dynamic system, the optimalcurrent decision is independent of previous states and decisions, even though theyare known. Such conditional independence for decisions can-not be representedin influence diagrams. The no-forgetting constraint is supposed to capture therationale that information available earlier should also be available later. In themean time, unfortunately, it also excludes the possibility of earlier informationbeing irrelevant later.2. There may be several cooperative decision makers, each responsible for a subsetof decisions. When communication is not feasible or is too expensive, informationavailable earlier to one decision maker may not be available later to a differentdecision maker. Furthermore, there may not be a predetermined ordering amongthe decisions. This defeats not only the no-forgetting constraint, but also anotherconstraint— the so-called regularity constraint, which requires a total orderingamong the decisions.3. It has been noticed that given an influence diagram, a decision node may turn outto be independent of some of its parents. In such a case, the arcs from those parentsto the decision node can be harmlessly removed. It is a good idea to remove sucharcs at a preprocessing stage, since it yields a simpler diagram. However, removingarcs from an influence diagram leads to the violation of the no-forgetting constraint.Chapter 1. Introduction 4In this thesis, we lift the no-forgetting constraint, together with two other constraintspreviously imposed on influence diagrams, and make due semantic modifications to arriveat decision networks (DN), a framework for representing decision problems that is moregeneral then both influence diagrams and finite stage MDP’s.In both influence diagrams and finite stage MDP’s, a decision problem can be solvedby considering one decision at a time, while solving a decision problem in the frameworkof general DN’s requires simultaneous consideration of all the decision nodes, even whenthe structure of the problem is simple (Chapter 3). One of the themes of this thesis is toinvestigate when a decision network can be solved by considering one decision node at atime. We give a graph-theoretic criterion called stepwise-decomposability (Chapter 5);and we prove that this criterion is the weakest graph-theoretic criterion that guaranteesstepwise-solvability (Chapter 8).Another theme of this thesis is to develop algorithms for evaluating stepwise-decomposabledecision networks (SDDN’s). As a first step, we find a way to prune all the removablearcs that are graph-theoretically identifiable (Chapters 7, 8). It is shown that pruningremovable arcs from SDDN’s does not destroy the stepwise-decomposability (Section 7.3).A divide-and-conquer procedure named EVALUATE1 for evaluating SDDN’s is developed (Chapters 4, 5, and 6). This procedure has several other advantages in additionto embracing the divide and conquer strategy. It clearly identifies all the Bayesian network (BN) inferences involved in evaluating a SDDN. Consequently, it can be readilyimplemented on top of any BN inference systems. The procedure does not require arcreversals and induces little numerical divisions. Finally, the procedure explicitly exploitsindependencies allowed by multiple value nodes (Section 6.5).A two stage procedure named EVALUATE2 is also developed on the basis of EVALUATE1 (Chapter 9) as a result of our investigation on the relationship between MDP’sand SDDN’s. EVALUATE2 inherits all the advantages of EVALUATE1. Furthermore,Chapter 1. Introduction 5backgrounda conceptsChapters 1, 2formaldefinitionsCha er3evaluation:divide and conquerChapters 4, 5, 6preprocessing: goodness evaluation:renovahie arcs theorems two—stage approachChapter 7 Chapter 8 Chapter 9Figure 1.1: Dependencies among the chapters.it enables one to exploit the asymmetric nature of decision problems1 and opens up thepossibility of parallel processing (Section 9.2.1). It also leads to an incremental way ofcomputing the value of information (Zhang et al 1993h).There have been a number of previous algorithms for evaluating influence diagrams.Since influence diagrams are special SDDN’s, there can also be evaluated by the algorithms developed in this thesis for evaluating SDDN’s. Our algorithms are shown to beadvantageous over the previous algorithms in a number of aspects (Sections 6.5, 9.6.1).The dependency relationships among the nine chapters of this thesis are shown inFigure 1.1.The remainder of this chapter relates the background of this thesis, and gives a moredetailed account to the points put forward by the story. Let us begin with Bayesiandecision theory.1This was first pointed out by Qi (1993).Chapter 1. introduction 61.2 Bayesian decision theoryWe make numerous decisions every day. Many of our decisions are made in the presenceof uncertainty. A simple example is to decide whether or not to bring the umbrella inlight of the weather forecast. If the weather man were an oracle such that his predictionis always correct, then the decision would be easy. Bring the umbrella if the weatherman predicts rain and leave the umbrella home if he predicts no rain. However real lifeforecasts are not as predictive as we wish. Instead of saying that it will rain, the weatherman says, for instance, that there is a sixty percent chance of precipitation.We would be happy if we bring the umbrella and it rains, or if we leave the umbrellaat home and it does not rain. But we would regret carrying the umbrella around if itdoes not rain, and we would regret even more not having the umbrella with us whenit rains. We have all made this decision many times in our lives, and did not find ithard because we thought this particular decision is not significant. However, there aredecisions, such as buying a house or making a major investment in the stock market,that are of significance to us. In such cases, we want to make rational decisions.Understanding how to make rational decisions is also important for building intelligentsystems.Bayesian decision theory provides a framework for rational decision making in theface of uncertainty. One setup for Bayesian theory consists of a set S of possible states ofthe world, a set 0 of possible observations, and a set Qd of decision alternatives. Thereis a conditional probability distribution F(ols) describing how likely it is to observe owhen the world is in state s, and there is a prior probability distribution F(s) describinghow likely the world is to be in state s. There is also a utility function 14d, s), whichrepresents the reward to the decision maker if he chooses the decision alternative d Cand the world is in state & C S. The problem is to decide on a policy , i.e a mappingChapter 1. Introduction 7from 0 to , which dictates the action to take for each observation.In our example, the possible states of worlds are rain and no-rain. The observations are all the possible forecasts, that is the set {“there is an x percent chanceof precipitation”I x E {0,. .. , 100}}. There are two possible decision alternatives:take-umbrella or not-take-umbrella. The conditional probability of the forecast that“there is an x percent chance of precipitation” given rain and the prior probability of rain are to be assessed from our experience. Our utilities could be as shown inthe following table:rain no-raintake—umbrella 0 -10not-take-umbrella -100 0The problem is to decide whether or not to bring the umbrella in light of the weatherforecast.The expected utility E6 induced by the policy 6: 0—+ 1d is defined by= P(s)F(os)i(6(o),s). (1.1)sES,oEOThe principle of maximizing the expected utility (von Neumann and Morgenstein 1944,Savage 1954) states that a rational decision maker choses the policy 6° that satisfiesE30 = max5E3, (1.2)where the maximization is over all possible policies. The quantity max5E is called theoptimal expected value of the decision problem.1.3 Decision analysisIn the setup of Bayesian decision theory given in the previous section, there is only onedecision to make. Applications usually involve more than one decision (e.g. HosseiniChapter 1. Introduction 81968). This thesis is about how to apply Bayesian decision theory to problems thatinvolve multiple decisions and multiple variables.There exist two methodologies that deal with multiple decisions, namely decisionanalysis (e.g. Smith 1988) and Markov decision processes (e.g. Denardo 1982). Betweenthem, decision analysis is more general-purpose. It emphasizes exploring the structuresof complex problems. In a sense, it has a representation advantage. On the otherhand, finite stage Markov decision processes deal with decisions for controlling a dynamicsystem (e.g. Bertsekas 1976). This class of multiple-decision problems have relativelysimple structures. Finite stage Markov decision processes emphasize problem solving byusing the technique of dynamic programming. In a sense, they have a computationaladvantage.One goal of this thesis is to combine the representational advantage of decision analysisand the computational advantage of finite stage Markov decision processes.This section gives a brief account of decision analysis. A latter section will touch onfinite stage Markov decision processes.1.3.1 Decision treesWithin decision analysis, there are two frameworks for representing the structures of decision problems, namely decision trees (North 1968, Raiffa 1968) and influence diagrams(Howard and Matheson 1984). Decision trees represent the structure of a decision problem all at one level, while influence diagrams distinguish three levels of specification fora decision problem.Consider the following oil wildcatter problem taken from (Raiffa 1968). The oil wildcatter must decide either to drill or not to drill. He is uncertain whether the hole is dry,wet or soaking. The prior probabilities (obtained from experts) are as follows.Chapter 1. Introduction 9dry wet soaking.500 .300 .200His utilities are given in the following table.dry wet soakingdrill -$70,000 $50,000 $200,000not—drill 0 0 0At a cost of $10,000, our wildcatter could conduct a test on the seismic structure, whichwill disclose whether the terrain below has no structure, closed structure, or open structure. The conditional probabilities of the test result given the states of the hole are givenin the following table.dry wet soakingno structure .600 .300 .100open structure .300 .400 .400closed structure .100 .300 .500The problem is whether or not the wildcatter should conduct the test? And whether ornot he should drill?The decision tree for this problem is shown in Figure 1.2, where rectangles stand fordecision variables and ellipses stand for random variables. The values of the variables andthe corresponding probabilities appear on the edges. The tree is to be read as follows.If our wildcatter decides not to test, he must make the drill decision based on noinformation. If he decides not to drill, that is the end of the story. He does not makenor lose any money. If he decides to drill, there is a 50 percent chance that the hole isdry, in which case he loses $70,000; there is a 30 percent chance that the hole is wet, inChapter 1. Introduction 10Figure 1.2: A decision tree for the oil wildcatter problem.which case he makes $50,000; and there is a 20 percent chance that the hole is soaking,in which case he makes $200,000.If he decides to test, there is a 41 percent chance that there turns out to be no seismicstructure. The probability .41 is calculated by using Bayes’ rule from the prior andconditional probabilities given. If he still decides to drill, there is a 73 percent chancethat the hole is dry, in which case he loses $80,000, for now the test has cost him $10,000already. Again the probability .73 is calculated by using Bayes’ rule from the prior andconditional probabilities given. There will be a 22 percent chance that the hole is wet,in which case he makes $40,000; and there will be only a 5 percent chance that the holeis soaking, in which case he makes $190,000. And so on and so forth.An optimal policy and the optimal expected value of a decision tree can be found bythe so-called folding backing strategy (Raiffa 1968, Smith 1987).yes—$80$40—$80$40$190—$80$40$190no$0Chapter 1. Introduction 111.3.2 Influence diagramsDecision trees came into being during the 1930’s and 1940’s (Shafer 1990). They were themajor framework for representing the structure of a decision problem until late seventiesand early eighties, when researchers began to notice the shortcomings of decision trees.For one thing, decision trees are usually very complicated. According to Smith (1988),the first thing to do in decision analysis is to find a large piece of paper. A more importantdrawback of decision trees include that they are unable to represent independencies.Influence diagrams were introduced by Howard and Matheson (1984) (see also Milleret al 1976) to overcome the shortcomings of decision trees. They specify a decision problem in three levels: relation, function, and number. The level of relation indicates thatone variable depends in a general way on others; for example test-result probabilistically depends on test and seismic-structure; and utility deterministically dependson test, drill and oil-underground. At the level of number, we specify numericalprobabilities for each conditional and unconditional event; and the numerical value of avariable given the values of the variables it deterministically depends upon. The levelof function describes the form of dependencies, which is useful in arriving at the level ofnumber. Two examples: profit equals revenue minus cost; if a man is in his thirties, thenthe probability distribution of his income is a normal distribution with mean $45,000 andstandard deviation 1000.Figure 1.3: An influence diagram for the oil wildcatter problem.Chapter 1. Introduction 12Figure 1.3 shows the level of relation of the influence diagram for our oil wildcatterproblem. The diagram clearly shows that the test decision is to be made based on noinformation, and the drill decision is to be made based on the decision to test andthe test-result. The random variable test—result directly depends on the decision totest and the seismic-structure, and it is independent of oil-underground given testand seismic-structure. The random variable seismic-structure directly depends onoil-underground. Finally, the utility deterministically depends on test, drill, andoil-underground.At the level of number, we need to specify the prior probability of oil-underground,the conditional probability of seismic-structure given oil-underground, and the conditional probability of test—result given test and seismic-structure. We need alsoto specify the value of Utility for each array of values of test, drill and oil-underground.In Howard and Matheson (1984), an influence diagram is transformed into a decisiontree in order to be evaluated to find an optimal policy and the optimal expected value.Shachter (1986) shows that influence diagrams can be directly evaluated.Before moving on, let us note that variables will be also called nodes when they areviewed as members of an influence diagram. With that in mind, we can now say thatinfluence diagrams consists of three types of nodes: decision nodes, random node and asingle value node , where the value node represent utilities.1.3.3 Representing independencies for random nodesA quick comparison of the influence diagram in Figure 1.3 with the decision tree in Figure 1.2 should convince the reader that influence diagrams are intuitive, as well as morecompact. They make numerical assessments easier (Howard and Matheson 1984). Furthermore, they serve better than decision trees to address the issue of value of information(Matheson 1990).Chapter 1. Introduction 13The most important advantage of influence diagrams over decision trees, however, liestheir ability to represent independencies for random nodes at the level of relation.This point could be illnstrated by using the oil wildcatter problem. For later convenience, consider extending the oil wildcatter problem by considering one more decision —the decision of determining a oil-sale-policy based on oil quality and market-information.The influence diagram for this extended oil wildcatter problem is shown in Figure 1.4. Byusing the so-called d-separation criterion (Pearl 1988), one can read from the network thatmarket - information is marginally independent of test, test-result, seismic-structure,oil-underground, drill, and oil-produced. Also, as mentioned in section 1.3.2,test-result is independent of oil-underground given test and seismic-structure.Those marginal and conditional independencies can not be represented in decision trees.1.4 Constraints on influence diagramsThere are five constraints that one can impose on influence diagrams: namely the acyclicity constraint, the regularity constraint, the no-forgetting constraint, the single valuenode constraint, and the no-children-to-value-node constraint. Before this thesis, onlyinfluence diagrams that satisfy all those constraints have been studied 2 In this sense,2With the exception of Tatman and Shacter (1990), who deal with one super value node. A supervalue node may consist of many value nodes. See sections 1.5.2 and 1.6.2 for details.Figure 1.4: An influence diagram for the extended oil wildcatter problem.Chapter 1. Introduction 14we say that the five constraints have always been imposed on influence diagrams. Fromnow on, we always mean an influence diagram that satisfies all those five constraints bythe term “influence diagram”.The acyclicity constraint requires that an influence diagram does not contain anydirected cycles. The regularity constraint requires that there exists a directed path thatcontains all the decision nodes. The no-forgetting constraint requires that each decisionnode and its parents be parents to all subsequent decision nodes. The single value nodeconstraint requires that there be only one value node, and the no-children-to-value-nodeconstraint requires that the value node have no children.The regularity constraint is due to the fact that an influence diagram is a representation of a single agent’s view of the world as relevant to a decision problem. Theno-forgetting constraint is due to the fact that in an influence diagram, arcs into decisionnodes are interpreted as indications solely of information availability. The constraintfollows if the agent does not forget information (Howard and Matheson 1984).This thesis is about decision networks, a representation framework for multi-decisionproblems that is more general than influence diagrams. Syntactically, decision networksare arrived at by lifting the regularity, no-forgetting, and single value node constraintsfrom influence diagrams. Semantically, a decision network is a representation of theview of the world of a group of cooperative agents with a common utility; and in decisionnetworks, arcs into a decision node indicate both information availability and dependency.The idea of a representation framework for decision problems free of the regularityand no-forgetting constraints is not new. Howard and Matheson (1984) have suggestedthe possibility of such a framework. The next three sections conduct a close examinationon the reasons for lifting the regularity, no-forgetting, and single value node constraintsfrom influence diagrams. The reasons arise from decision analysis, from Markov decisionprocesses.Chapter 1. Introduction 151.5 Lifting constraints: Reasons pertaining to decision analysis1.5.1 Lifting the no-forgetting constraintAs mentioned in the synopsis, there are three major reasons for lifting the no-forgettingconstraints. The first reason is explained in detail in this subsection. The second andthird reasons will be addressed in the next two subsections.Semantics for arcs into decision nodes and independencies for decision nodesThe no-forgetting constraint originates from the interpretation of arcs into decision nodesas indications of only information availability (Howard and Matheson 1984). More specifically, there is an arc from a random node r to a decision node d if and only if the valueof r is observed at the time the decision d is to be made. The no-forgetting constraintis to capture the rationale that people do not destroy information on purpose; thus information available earlier should also be available later (Howard and Matheson 1984,Shachter 1986).The primary reason for lifting the no-forgetting constraint is that it does not allow therepresentation of conditional independencies for decision uddes. However, there do existcases where the decision maker, from her/his knowledge about the decision problem,is able to tell that a certain decision does not depend on certain pieces of availableinformation. In our extended oil wildcatter problem, for instance, it is reasonable toassume that the decision oil—sale—policy is independent of test, test—result, anddrill given oil-produced.Sometimes independence assumptions for decision nodes are made for the sake ofcomputational efficiency or even feasibility. In the domain of medical diagnosis andtreatment, for instance, one usually needs to consider a number, say ten, of time points.To compute the diagnosis and treatment for the last time slice, one needs to consider allChapter 1. Introduction 16the previous nine time points. In the acute abdomen pain example studied by Provanand Clarke (1993), there are, for each decision node, 6 parent nodes that lie in the sametime slice as the decision node. This means that the decision node at the last time slicehas a total of 69 parents. In the simplest case of all variables being binary, we need tocompute a decision table of 269 entries; an impossible task. The same difficulty existsfor planning under uncertainty (Dean and Wellman 1992). One way to overcome thisdifficulty is to approximate the decision problem by assuming that the decision in a timeslice depends only on the previous, say one time slice, and is conditionally independentof all earlier time points. In this case, the decision table sizes are limited to 2’ = 8192;still large but manageable.Independence for decision nodes cannot be represented in influence diagrams. Goingback to our extended oil wildcatter problem, even though we have made the assumption that oil-sale-policy is independent of test, test-result, and drill givenoil—produced. But in Figure 1.4 there are still arcs from test, test—result, anddrill to oil—sale—policy.Following Smith (1988), this thesis reinterprets arcs into decision nodes as indicationof both information availability and (potential) dependency. This new interpretationenables us to explicitly represent conditional independencies for decision nodes. To bemore specific, the judgement that d is conditionally independent of r can be representedby simply not drawing an arc from r to d, even when the value of a random node r isobserved at the time the decision d is to be made.In our example, if we explicitly represent the assumption that oil—sale—policy isindependent of test, test-result, and drill given oil-produced, then the decisionnetwork for the extend oil wildcatter problem becomes the one shown in Figure 1.5. Wenotice that there are no arcs from test, test—result, and drill to oil-sale—policy;the network is simpler than the one in Figure 1.4.Chapter 1. Introduction 17Figure 1.5: A decision network for the extended oil wildcatter problem, with independencies for the decision node oil—sale—policy explicitly represented.Note that a user may be wrong in assuming that a decision is independent of a certainpiece of information. To prevent such a case from happening, one can run the algorithmin Chapter 7 to graph-theoretically verify the user’s independence judgements. If thealgorithm is not able to verify, the user should be informed, and the user should abandonthe independence assumption by adding an arc.Another advantage of the new interpretation of arcs into decision nodes is that itprovides uniform semantics to both arcs into decision nodes and arcs into random nodes;namely they both indicate dependence. This was first mentioned by Smith (1988).It is evident that the no-forgetting constraint is not compatible with the new interpretation of arcs into decision nodes. It needs to be lifted.Limited memoryAnother reason for lifting the no-forgetting constraint is that the agent, say a robot,that executes decisions (actions) may have limited memory. There may be cases wherethe agent has only a few bits of memory. Even in the case when the agent has a fairamount of memory, it can not remember things forever. Because if so, the memorywill run out sooner or later. Even if the agent has unlimited memory, remembering toomuch information would lead to inefficiency. We human being seem to remember onlyChapter 1. Introduction 18Figure 1.6: A decision network for the extended oil wildcatter problem with multiplevalue nodes. The total utility is the sum of all the four value nodes.important things.1.5.2 Lifting the single value node constraintAs pointed out by Tatman and Shachter (1990) and by Shenoy (1992), the total utilityof a decision problem can sometimes be decomposed into several components. In ourextended oil wildcatter problem, for instance, utility can decomposed into the sumof four components, namely test—cost, drill—cost, sale—cost, and oil—sales. Insuch a case, we assign one value node for each component of the total utility, with theunderstanding that the total utility is the sum of all the value nodes. Figure 1.6 showsthe resulting decision network after splitting the value node utility in Figure 1.4.A major advantage of multiple value nodes over a single value node is that multiplevalue nodes may reveal independencies for decision nodes that are otherwise hidden. Asthe reader will see later in the thesis, there is a way for one to graph-theoretically tellthat in Figure 1.6 oil—sale—policy is independent of test, test-result, and drillgiven oil-produced. The same can not be done for the network in Figure 1.4.In the last subsection, we said that from her/his knowledge about the extended oilwildcatter problem, the decision maker may be able to say that oil-sale—policy isindependent of test, test-result, and drill given oil-produced. Here we see thatChapter 1. Introduction 19when multiple value nodes are introduced, those independencies can actually be readfrom the network itself, even if the decision maker fails to explicitly recognize them.Independence for decision nodes and removable arcsThe next two paragraphs briefly revisit the third reason for lifting the no-forgettingconstraint as listed in the synopsis. In Section 7.1, we shall formally define the conceptof a decision node being independent of a certain parent and prove that when it is thecase, the arc from that parent to the decision node is removable, in the sense that itsremoval does not affect the optimal expected value of the decision problem. It is agood idea to remove such arcs at a preprocessing stage, since it yields simpler diagrams.However, removing arcs from an influence diagram leads to the violation of the no-forgetting constraint.Consider the no-forgetting decision network in Figure 1.6. Since from the networkitself it can be determined that oil—sale—policy is independent of test, test—result,and drill given oil—produced, the arcs from test, test—result, and drill to oil—sale-policyare removable. Removing those arcs results in the network in Figure 1.7, which is nolonger no-forgetting. This shows that in order to prune removable arcs from influencediagrams, we need to consider decision networks that do not satisfy the no-forgettingconstraint.1.5.3 Lifting the regularity constraintThe regularity constraint requires that there be a total ordering among the decisionnodes. It is also called the single decision maker condition (Howard and Matheson 1984).When there are more than one decision maker who cooperate to achieve a common goal,the regularity constraint is no longer appropriate.Chapter 1. Introduction20Figure 1.7: The decision network obtained from the one in Figure 1.6 by deleting someremovable arcs. This network is no longer no-forgetting.Consider further extending our oil wildcatter problem so that that there is not onlyoil but also natural gas. In this case, a gas-sale-policy also needs to be set. Supposethe company headquarter makes the test and drilldecisions, while the oil departmentsets the oil—sale—policy and the gas department sets the gas-sale—policy. Then itis inappropriate to impose an order between oil-sale-policy and gas-sale-policy,since there is no reason why the gas department (or the oil department) should reach itsdecision earlier than the other department. A decision network for the further extendedoil wildcatter problem is shown in Figure 1.8. We notice that there is no ordering betweenoil-sale-policy and gas-sale-policy.Even in the case of one decision maker, the regularity constraint may be over-restrictive. From her/his knowledge and experience, the decision maker may be ableto conclude that the ordering between two decision nodes is irrelevant; one has the sameoptimal expected value either way. In our further extended oil wildcatter problem, itmay be reasonable to assume that it makes no difference whether gas-sale-policy oroil—sale—policy is set first.Even when the ordering between two decision matters, the decision maker may notknow the ordering beforehand. Suppose our oil wildcatter determine, on the first day aevery month, the gas-sale-policy and oil-sale-policy for the coming month, basedChapter 1. Introduction 21Figure 1.8: A decision network for the further extended oil wildcatter problem. It is notregular, “forgetting” and has more than one value node.on the policies for the last month and market information. In this case, we are uncertainas to which one of those two decisions should be made first.Let us now briefly revisit the second reason for lifting the no-forgetting constraintas listed in the synopsis. Together with the regularity constraint, the no-forgetting constraint says that information available when making an earlier decision should also beavailable when making a later decision. In the further extended oil wildcatter problem,we do not know before hand whether oil-sale-policy comes first or gas-sale-policycomes first. In such a case, the no-forgetting constraint can not be enforced. This is whywe said in the synopsis that the existence of unordered decisions not only defeats theregularity constraint, but also the no-forgetting constraint.1.6 Lifting constraints: Reasons pertaining to MDP’sLike decision analysis, finite stage Markov decision processes (MDP) are also a model forapplying Bayesian decision theory to solve multiple-decision problems. Recent researchhas shown application promise for a combination of MDP’s and influence diagrams in theChapter 1. Introduction 22form of temporal influence diagrams in planning under uncertainty (Dean and Weilman1991) and in diagnosis and treatment/repair (Provan and Clarke 1993). One goal of thisthesis is to provide a common framework for both of finite stage MDP’s and influencediagrams. Doing so necessitates the lifting of the no-forgetting and the single value nodeconstraint.1.6.1 Finite stage MDP’sThis subsection briefly reviews finite stage MDP’s; and the next subsection will explainwhy it is necessary to lift the two constraints.Finite stage MDP’s are a model for sequential decision making (Puterman 1990,Denardo 1982, Bertsekas 1976). The model has to do with controlling a dynamic systemover a finite number of time periods. There is a finite set T of time points. At time t,if the decision maker observes the system in state st E S, s/he must choose an action,d, from a set of allowable actions at time t, . This choice may also depend all theprevious states of the system. There are two consequences of choosing the action d whenthe system is in state s; the decision maker receives an immediate reward vt(st, d) andthe probability distribution P(s+i 1St, d) for the state of the system at the next stageis determined. The collection {P(st+ilsj,dj)},vt(st,dt)) is called a finite stageMarkov decision process (Puterman 1990). The problem is how to make the choice dat each time point t so as to maximize the decision maker’s total expected reward. Thefunction which makes this choice is called decision rule and a sequence of decision rulesis called a policy.A classic example of finite stage MDP is the problem of inventory control. Considera ski retailer (Denardo 1982). From September to February, he makes an order fromthe wholesaler at the first day of the month. The amount of the order depends on his31n general, Qd can vary according to st. Here we assume it does not.Chapter 1. Introduction 23current stock. His stock at the beginning of next month depends probabilistically onhis current stock and how large the order is. This conditional (transition) probabilitycan be estimated since the number of customers who arrive at a service facility during aperiod has, typically, a Poisson distribution. The profits our retailer makes during eachmonth is computed from the number of pairs of skis sold and the difference between thewholesale and retail prices.The standard way to find optimal decisions in a finite stage Markov decision processis by means of dynamic programming. In this approach, one begins with the last periodand works backward to the first period. An optimal policy for the last period is foundby maximizing the reward for that period. Then the whole last period is replaced by onevalue node, which is counted as reward in the next last period. This results in a finitestage MDP with one less period. One keeps repeating the procedure on the new process,till all the periods have been accounted for. This is very similar to the folding-backstrategy for evaluating decision trees.For the above model, one can show that an optimal decision rule depends only on thestate st of the dynamic system at time t and is independent of the previous states anddecisions.1.6.2 Representing finite stage MDP’sThis thesis achieves a common framework for decision analysis and finite stage MDP’sby representing the MDP’s as decision networks.Since we have reinterpreted arcs into decision nodes as indications of both informationavailability and potential dependency, finite stage MDP’s can be naturally representedas decision networks. Figure 1.9 (1) depicts a three stage MDP in the graph-theoreticallanguage of decision networks. We notice that there are no arcs from and d1 to d2even though s and d1 will be observed at the time the decision d2 is to be made. TheChapter 1. Introduction 24Figure 1.9: A three period finite stage MDP.reason is that the optimal decision rule for 4 is independent of .sl and d1 givenHowever if we insist, as in influence diagrams, on interpreting arcs into decision nodesas indications of only information availability, then it is cumbersome to represent finitestage MDP’s. Figure 1.9 (2) depicts the influence diagram that represents the threestage MDP (Tatman and Shachter 1990). One can see that there is a number of extrano-forgetting arcs, namely arcs from and d1 to 4 and 4, and from 2 and 4 to 4.The presence of those arcs not only complicates the network, but also fails to reflectone important conclusion of MDP, namely that the current decision is independent ofprevious states and decisions given the current state.Tatman and Shachter’s algorithm is able to detect that 4 does not depend on sand d1, and that 4 does not depend on i, d1, 2, and 4. So, the extra no-forgettingarcs makes no difference to the decision problem after all. They were introduced onlybecause there was no concept of a decision network that does not satisfy the no-forgettingconstraint.In a finite stage MDP, there is a reward in each period. This can be naturally(1)Chapter 1. Introduction 25represented by assigning one value node for each period, as shown in Figure 1.9 (1).Note that 83 separates the last period from all the previous periods. If we insist, as ininfluence diagrams, on the single value node constraint, then we need to connect v1, v2,and v3 into a “super node” (Tatman and Shachter 1990), as shown in Figure 1.9 (2). Onenotices that no longer 33 separates the last period from all the previous periods. This isanother reason for lifting the single value node constraint.1.7 Computational meritsThe lifting of the no-forgetting, regularity, and single value node constraints allows usto discover stepwise-decomposable decision networks (SDDN). SDDN’s are more generalthan both influence diagrams and finite stage MDP’s. Moreover when evaluating SDDN’swe can prune removable arcs, while the same cannot be done when evaluating influencediagrams since pruning arcs leads to the violation of the no-forgetting constraint. To putit more abstractly, SDDN’s relax constraints imposed by influence diagrams and thusallow us to apply more techniques in solving a problem, and hence to solve the problemmore efficiently. See Sections 6.5 and 9.6.1.1.8 Why not lifted earlierHoward and Matheson (1984) have hinted that in the case of multiple decision makers, the regularity and no-forgetting constraints may be violated. Smith (1987) has alsomentioned that it is possible that a decision maker may choose or be compelled to “forget”. Yet, no one before has studied decision networks that are not regular and/or are“forgetting”. Why?Chapter 1. Introduction 26Howard and Matheson (1984) deal only with regular and no-forgetting decision networks (influence diagrams), because for evaluation, decision networks are first transformed into decision trees, and the transformation is possible only for regular no-forgettingdecision networks. Even though new algorithms for evaluating influence diagrams havebeen developed after Howard and Matheson (1984) (see, for example, Shachter 1986), thecorrectness of all those algorithms relies on the regularity and no-forgetting constraints.This is probably why those constraints have always been imposed on influence diagrams.In this thesis, we shall show that one can evaluate a decision network, even if it is notregular and no-forgetting. This opens up the possibility of working with general decisionnetworks.1.9 Subclasses of decision networksThe lifting of the no-forgetting, the regularity, and the single value node constraintsfrom influence diagrams leaves us only with the acyclicity and no-children-to-value-nodeconstraints. In Chapter 2, we shall argue that those two constraints are fundamental andcan not be lifted.The acyclicity and no-children-to-value-node constraints define the concept of decisionnetwork. This section previews subclasses of decision networks we will encounter in thisthesis.Influence diagrams and finite stage MDP’s are two existing subclasses of decision networks, which have been studied for many years. It is known that both of those subclassesof decision networks are stepwise-solvable, i.e they can be evaluated by considering onedecision node at a time.The most important subclass of decision networks introduced in this thesis is stepwisedecomposable decision networks (SDDN). They include both influence diagrams andChapter 1. Introduction 27Figure 1.10: Subclasses of decision networks.finite stage MDP’s as special cases. See Figure 1.10. SDDN’s are also stepwise-solvable.As a matter of fact, regular SDDN’s4 are the subclass of decision networks that canbe evaluated by conventional dynamic programming (Denardo 1982, Chapter 9), andSDDN’s in general constitute the subclass of decision networks that can be evaluated bynon-serial dynamic programming (Bertelè and Brioshi 1972, Chapter 9).The decision networks that are not stepwise-decomposable can be of various degreesof decomposability. To evaluate them, one needs to simultaneously consider two ormore decision nodes. The number of decisions one need to consider simultaneously isdetermined by the degree by which the network is decomposable. The divide and conquerstrategy spelled out in Chapter 4 can be utilized to explore the decomposability of a givendecision network.Smooth decision networks are introduced for technical convenience. They are conceptually simple and thus easy to manage. They are used extensively in this thesis tointroduce new concepts and to prove theorems. Non-smooth decision networks can be4To be more precise, the term decision network should be replace by the term decision networkskeleton in this section.abnegj. noitialdecisiondecisionnetwork. networksChapter 1. Introduction 28be transformed into equivalent smooth decision networks when necessary.Finally normal decision networks are introduced so that the equivalence betweenstepwise-decomposability and stepwise-solvability can be established. We conjecture thatabnormal decision networks can be transformed into equivalent normal decision networks.1.10 Who would be interested and whyGenerally speaking, if you anticipate a solution to your problem by Bayesian decisiontheory, you should find this thesis interesting. Because it provides, in a sense, the mostgeneral framework decision networks— for applying Bayesian decision theory. Problems representable as MDP’s can be solved in (stepwise-decomposable) decision networksin the same way as before. Problems representable in influence diagrams can be solvedin (stepwise-decomposable) decision networks at least as efficiently as, and usually moreefficiently, than in influence diagrams. The reason for this efficiency improvement is thatworking with SDDN’s relaxes the constraints imposed by influence diagrams, and allowsone to apply more operations, such as pruning removable arcs, than previously allowed.If you are a decision analyst, you might appreciate the ability of decision networksto represent independencies for decision nodes, to accommodate multiple cooperativedecision makers, and to handle multiple value nodes. You might find it a relief that youdo not have to completely order the decision nodes beforehand. Furthermore, you mightappreciate the efficiency and other advantages of our algorithms.If your problem falls into the category of MDP’s, you might find the concept ofdecision networks helpful in assessing the transition probabilities and rewards. In theski retailer problem (Section 1.6), many factors may affect the transition probabilitiesand rewards, for example deterioration of stock, delivery lag, payment upon delivery bythe retailer and by customers, refusal to enter backlog by customers (Denardo 1982).Chapter 1. Introduction 29Within MDP, one needs to figure out the dynamic programming functional equationfor each combination of the factors, which may be complicated. In decision networks,consideration of one more factor simply corresponds to the addition of one more node.This allows one to consider more factors than before. The representation advantage ofdecision networks may benefit control theory in general.AT researchers who are concerned with planning, and diagnosis and treatment/repairshould also find this thesis interesting.Planning is a process of constructing courses of action in response to some objective.Since the planner might not have complete knowledge about the environment and aboutthe effects of actions, planning are usually performed under uncertainty. Being a theory for rational choice of actions under uncertainty, Bayesian decision theory naturallycomes into play. Preliminary research (Dean and Wellman 1992) has indicated that successful application of Bayesian decision theory in planning under uncertainty calls for aframework that combines characteristics of influence diagrams and and those of MDP’s.Research on diagnosis and treatment (Provan and Clarke 1993) has pointed to the samedirection. The concept of decision network introduced in this thesis may prove to be agood combination of influence diagrams and MDP’s. Also, the ability of decision networks to represent conditional independencies for decision nodes may be computationallyessential for those areas.Chapter 2Decision networks: the conceptThis chapter introduces the concept of decision networks and addresses some of thefoundational issues. Formal definitions will be provided in Chapter 3.The concept of decision networks is intuitively illustrated through an example in section 2.1. Section 2.2 exposes the way by which other authors develop the concept ofBayesian networks from joint probabilities by means of the chain rule of probabilities,and by using the concept of conditional independencies. Section 2.3 derives the conceptof decision network, through the concept of Bayesian networks, from the Bayesian decision theory setup by considering multiple decision problems. Section 2.4 discusses thefundamental constraints that decision networks need to satisfy and argues that decisionnetworks are the most general representation framework for solving multiple-decisionproblems in Bayesian decision theory.2.1 Decision networks intuitivelyIn this section, we illustrate the concept of decision networks through an example.Decision networks can be understood at two levels: relation and number. At thelevel of relation, decision networks are directed graphs consisting of three types of nodes:decision nodes, random nodes and value nodes; and they are used to graphically representthe structures of decision problems. This directed graph is called a decision networkskeleton. Consider the further extended oil wildcatter problem:30Chapter 2. Decision networks: the concept 31Figure 2.11: A decision network skeleton for the extended oil wildcatter problem.An oil wildcatter is deciding whether or not to drill in a new area. To aidhis decision, he can order a seismic structure test. His decision about drillwill depend on the test results if a test is ordered. If the oil wildcatter doesdecide to drill, crude oil and natural gas will be produced. Then, the oilwildcatter will decide his gas sale policy and oil sale policy on the basis of thequality and quantity of crude oil and natural gas produced, and on the basisof market information.The structures of this decision problem can be represented by the decision network skeleton shown in Figure 2.111, where decision nodes are drawn as rectangles, random nodesas ovals, and value nodes as diamonds.Briefly, here are the semantics of a decision network. Arcs into random nodes indicateprobabilistic dependencies. A random node depends on all its parents, and is independent1The figure is the same as Figure 1.8. The duplication is to save the reader from flipping back andforth.Chapter 2. Decision networks: the concept 32of all its non-descendants given the values of its parents. In the extended oil wildcatterproblem, test-result, for instance, probabilistically depends on seismic-structureand the decision to test, but is independent of gas-underground and oil-undergroundgiven seismic-structure and test.Arcs into decision nodes indicate both information availabilities and functional dependencies. In our example, the arc from oil-produced to oil-sale-policy means thatthe oil wildcatter will have learned the quantity and quality of crude oil-produced whenhe decides his oil-sale-policy, and he thinks that the quantity and quality of oil-produced should affect his oil-sale-policy. There is no arc from oil-underground tooil-sale-policy because information about oil-underground is not directly available.There is no arc from test-result to oil-sale-policy, because the oil wildcatter figuresthat the information about the test-result should not affect his oil-sale-policy sincethat he will already have learned the quality and quantity of crude oil-produced at thetime the policy is to be made.Arcs into value nodes indicate functional dependencies. A value node is characterizedby a function of its parents; the function take real number values, which represent thedecision maker’s utilities. In the extended oil wildcatter problem, oil-sales is a functionof oil-produced, oil-market and oil-sale-policy. It depends on no other nodes. Foreach possible values of oil-produced, of oil-market, and of oil-sale-policy, the valueof this function stands for the corresponding expected oil-sales. The total utility is thesum of all the value nodes; namely the sum of test-cost, drill-cost, oil-sale andgas-sale.At the level of number, a decision network specifies a frame, i.e a set of possiblevalues, for each variable. For example, the frame of drill my be {YES, NO}, and theframe of oil—sales may be the set of real numbers.There is also a conditional probability for each random node given its parents and priorChapter 2. Decision networks: the concept 33probability of each random node that does not have any parents. In our example, we needto specify, for instance, P(oil-uriderground), P(oil-produced I oil-underground), andetc.Further, we need to specify a utility function for each value node. In our example,the utility function for oil—sales is a real function of oil—produced, oil—market, andoil—sale—policy.In summary, a decision network consist of (1) a skeleton which is an directed graphwith three type of nodes, (2) a frame for each node, (3) a conditional probability for eachrandom node, and (4) a utility or value function for each value node.In a decision network, the decision about a decision node is made knowing the valuesof the parents of the node. Optimal decisions are decisions that maximize the expectedtotal utility. The goals of decision analysis are to find the optimal decisions and todetermine the optimal expected total utility.2.1.1 A noteNote that the term “decision network” has been previously used in Hastings and Mello(1977). The meaning of the term in this thesis is different. In this thesis, the nodes ina decision network are variables, while nodes in a Hastings and Mello decision networkare states, or values of variables. In a sense, one can say that we are working at a higherlevel of abstraction than Hastings and Mello. The relationship between our decisionnetworks and Hastings and Mello’s decision networks is the same as the relationshipbetween influence diagrams and decision trees.As observed by Smith et al (1993), influence diagrams gain much of their advantagesover decision trees from the fact that they graphically capture conditional independenciesat the level of relation (among variables). The same can be said for our decision networksand Hastings and Mello’s decision networks. As the reader will see, the efficiency of ourChapter 2. Decision networks: the concept 34algorithms heavily depends on the fact that nodes in our decision networks are variables,instead of values of variables.2.2 Bayesian networksOne way to understand decision networks is to think of them as developed from thestandard Bayesian decision theory setup. We shall explain this in the next section.As a preparation, this section develops the concept of Bayesian networks from jointprobabilities by means of the chain rule of probabilities and the concept of conditionalindependency (Howard and Matheson 1984, Pearl 1988).Let X be a set of random variables. Let P(X) be the joint probability of the variablesin X. It is usually difficult, if possible at all, to assess the joint probability directly. Oneway to assess the joint probability indirectly is first to choose an ordering over the variableset X, say x1, x2, ..., x, then to expand the joint probability by the chain rule as follows:x2,. . . , x,) = P(xi)P(x2l i) . . . P(xxi,. . . , xfl..1). (2.3)We shall refer to the ordering as an expansion ordering . Because of equation (2.3),to assess the joint probability P(X), it suffices to assess F(x:Ixi,. . . , x..i) for eachi{1,...,n}.Often a decision maker is able to determine a proper subset 7rr of {x1,.. . , x_} thatare “directly related” to x such that other variables in {xi, . . . ,x_1} are only “indirectlyrelated” to x via Translating into the language of the probability theory, this meansthat x is independent of other variables in {x1,...,x} given ir. Formally that isP(xx1,. . .,x_1)= P(xIir). (2.4)This equation further reduces the assessment task.Chapter 2. Decision networks: the concept 35tariperin fire(1)Figure 2.12: Two Bayesian networks for the joint probabilityP(alarm, fire, tampering, smoke, leaving).Given an expansion ordering x1,.. . , x and the irs, we construct a directed graphby the following rule:For any x and x, draw an arc from x to x2 if and only if xEir.The acyclic directed graph such constructed, together with the conditional probabilitiesP(xIir), is called a Bayesian network for the joint probability P(X).As an example, consider the following decision scenario which is borrowed from (Pooleand Neufeld 1991). The scenario involves five variables: alarm, fire, tampering, smoke,and leaving, denoting respectively the following propositions: the alarm is on, thereis a fire, somebody is tampering; there is smoke and people are leaving. An expansionordering for the joint probability P(alarm, fire, tampering, smoke, leaving) could be(fire, tampering, alarm, leaving, smoke). Suppose it is reasonable to set7rtajnperjng =0, = {fire,tampering}, irleaving = {alarin}, and 7rsmoke = {fire}. Then we getthe Bayesian network shown in Figure 2.12 (1). Another expansion ordering could be(leaving, alarm, smoke, fire, tampering). Suppose it is reasonable to set ira1m{ leaving}, lrsmoke = {alarm}, ir ire = {alariu, smoke}, and irtampering = {f ire, alarin}.Then we get the Bayesian network shown in Figure 2.12 (2). This network has more arcsthan the one in (1).(2)Chapter 2. Decision networks: the concept 36How should one choose an expansion ordering? The answer provided by Howardand Matheson (1984) is that the ordering should be chosen such that the decisionmaker would feel natural and comfortable in assessing the 7r1’s and the P(x[7r)’s.For example, it probably is easier to assess P(alarmfire,tampering) than to assessP(tauiperingfire, alarm). Smith (1989) says that one should choose the ordering tominimize the number of arcs in the resulting directed graph. In our example, the network in Figure 2.12 (1) is preferred to the network in (2). Pearl (1988, pp. 50-51) claimsthat when there are cause-effect relationships among the variables, the structure of aBayesian network can be directly determined from the cause-effect relationships. Forexample, tampering and fire cause alarm, fire causes smoke, alarm causes leaving.2.3 Decision networksIn this section, decision networks are developed as a way to represent of the knowledge(beliefs) and utilities that are needed in order to solve multiple-decision problems inBayesian decision theory. Let us begin with a standard setup of Bayesian decision theory.2.3.1 A general setup of Bayesian decision theoryHere is a setup of Bayesian decision theory (Gärdenfors et al 1988b, Fishburn 1988) thatis more general than the one given in Section 1.2:1. There is a set X of (random and decision) variables, which are relevant to a decisionproblem;2. there is a set L of policies and for each possible policy 6e, there is a correspondingprobability Ps(X);Chapter 2. Decision networks: the concept 373. and there is a utility function (X), which specifies the decision maker’s preferencesabout the possible states of affairs.The problem is to decide on a policy 6° that maximizes the expected utility, that isPso(X)1u(X = maxô{Pô(X)1i(X)}, (2.5)x xwhere Zx means summation over all the possible values of X.The setup given in Section 1.2 can be fitted into the setup given here by lettingX = {o,s,d}, and for each policy 6: O—*Zd settingF(s)P(ols) if d = 6(o)P5(o,s,d) = (2.6)otherwiseIn equation (2.5), summation is used instead of integration because we deal only withdiscrete variables in this thesis. However, most of our results can be easily extended tothe case of continuous variables.2.3.2 Multiple-decision problemsIn applications, the decision maker usually needs to set the values for a number of variables d1,..., dk. Let OBS(d) denote the set of all the variables whose values will beobserved by the decision maker at the time of decision d is to be made.Sometimes, as in MDP’s, the decision maker is able to qualitatively tell that someof those observed variables are irrelevant to d. On other occasions, the decision makermay be forced, for instance by computational complexity, to approximate the world byby making such irrelevance assumptions. Let ir be a subset of OBS(d), such that thevariables in OBS(d) — ir. are, according to the decision maker, irrelevant to the decisiond given r.Chapter 2. Decision networks: the concept 38Before one can solve a problem, one needs first to clearly state the problem. Theconcept of multiple-decision problems is introduced as a way to pose a decision problem.A multiple-decision problem is a set V = {< > 1 i k}, where the d1’s aredecision variables and for each i, is the set of variables depending upon whose valuesthe decision maker is to choose a value for d1.The further extended oil wildcatter problem (Figure 2.11) is a multiple-decision problem. The decision maker needs to decide on a value for each of the following variables: test, drill, gas-sale-policy, and oil-sale-policy. The r0’s are as follows:est = , riii = {test-result}, 7tassa1epo1icy = {gas-produced, gas—market},and 7ti1_saie_po1icy = { oil—produced, oil—market }.Given a multiple-decision problem V, define a partial ordering among its variables asfollows: for any two variables x and y, we say that x precedes y if xE7r, or if there isanother variable z such that xEir and z precedes y.The fundamental constraint that a multi-decision problem must obey is the so-calledacyclicity constraint, which require that there do not exist two variables x and y such thatboth x precedes y and y precedes x. The reason for this constraint is that the precedencerelationship defined above implies time precedence. More explicitly, if x precedes y, thenthe value of x is observed or determined before the value of y.2.3.3 Technical preparationsGiven a multiple decision problem V, let X be the set of all the variables in V and othervariables that are relevant to the problem. For the further extended oil wildcatter problem, X also contains oil-underground, gas-underground, and seismic-structure inaddition to the variables appeared in the problem statement, namely test, drill, gassale-policy, oil-sale-policy, test-result, gas-produced, gas-market, oil-produced,and oil-market.Chapter 2. Decision networks: the concept 39For any variable xEX, let ! be the frame of x, i.e. the set of all possible values ofx. For any subset B X, let fZB = lJaEB Zr.To determine a value for d based on the values of the variables in is to choose afunction 6: : —* c. Such a function is called a decision function (table) for d:. LetL, denote the set of all the decision functions for d. The policy space is the Cartesianproduct L = fl An element of L is called a policy.2.3.4 Deriving the concept of decision networksOne needs to have the necessary knowledge to solve a problem. This subsection developsdecision networks as a framework for specifying the knowledge (beliefs) and utilities thatare required in order to solve a multiple-decision problem.If the decision maker wants to solve a multiple-decision problem V in the setup givenin Subsection 2.3.1, then s/he needs, according to the second item of the setup, to comeup with a probability Ps(X) for each policy 6. When obtained, P5 would contain moreinformation than is conveyed by V and 6. The portion of information conveyed by P5that is not conveyed by V and 6 should originate from the decision maker’s knowledgeand beliefs about the uncertainties involved in the decision situation. Equipped withBayesian networks, we are able to explicitly spell out this portion of information, asdemonstrated in the following.Assume P5(X) were somehow obtained. An expansion ordering for P5(X) conformsto V if for each clj, variables in precede d in the ordering. One can easily verify thatsuch an ordering is possible since V must be acyclic.Given an expansion ordering p: x1, ..., x that conforms to V, we could, as in theprevious section, expand P5(X), determine the 7rr’s, and construct a Bayesian network.Denoted by .iV5, this Bayesian network would contain the following information:Chapter 2. Decision networks: the concept 401. For each decision node d, the conditional probabilityP6(dI?rd) and the fact that(Facti:) d is independent of all the variables that come before d in p given thevariables in 7rd; and2. for each random node c, the conditional probability P5(cr) and the fact that(Fact2:) c is independent of all the variables that come before c in p given thevariables in 7r.Since 7rd and have the same semantics, we have 7rz,=ir.. Hence Facti would havecome from the problem statement V; and the conditional probability Fs(dI7rd1)wouldhave come from the policy 6.One the other hand, Fact2 and the conditional probability Fs(cIir) do not follow fromeither V or 6, and hence must have come from the decision maker. They represent thedecision maker’s knowledge and beliefs about the uncertainties involved in the decisionsituation and need to be elicited before the decision problem V can be solved in Bayesiandecision theory.We now turn to utility. According to item 3 in the setup of Subsection 2.3.1, thedecision maker needs also to express his preferences about the possible state of affairs bya utility function 1z(X). 1z(X) can sometimes be decomposed into the sum of a numberof components, each of which depends only on a number of variables. Suppose t(X)decomposes into m components pi(Zi) + ... + gUm(Zm), where Z: is the set of variableof which ,u depends upon. Introduce a value variable v for each ,ij, and attach v tothe Bayesian network .iV5 by drawing arcs from each of the variables of Z1 to v. In thefollowing, we shall write Z as and t(Z) asTo summarize the discussions above and in Subsection 2.3.2, the decision maker needsto do the following in order to solve a multiple-decision problem in Bayesian decisiontheory:Chapter 2. Decision networks: the concept 411. specify the decision variables whose values are to be to determined, and the randomvariables and value variables that are related to those decision variables;2. for each decision variable d, specify the set of observed variables whose valuesare relevant to cli,3. determine an ordering p among all the variables such that p conforms to the problemstatement {< d, ir > }. Let p[< xl denote the set of nodes that come before xin the ordering p.4. for each random variable c, specify a subset 7r of p[< c] such that c is P(clp[< c]) =P(cr), and specify the conditional probability P(cIr);5. for each value variable v, specify the subset i, of variables in p[< v] that v dependsupon and specify the utility function ,u (ir).We call the collection of all the information specified in items 1, 2, 4, and 5 a decisionnetwork . Thus, a decision network represents the decision maker’s knowledge (beliefs)and preferences (utilities) that are needed in order to solve a multiple-decision problem inBayesian decision theory. The ordering p is not included as part of the decision networkbecause it can be arbitrary as long as it conforms to {< d, ir > Ii}Smith (1989) presents a way of developing the concept of influence diagrams (decisionnetworks) in terms of the so-called third part semantics. In this section, the concept ofdecision networks has been developed directly from a standard setup of Bayesian decisiontheory without using the third part semantics.2.3.5 An exampleAs an example, consider a decision scenario where a decision maker needs to decidewhether to bring-umbrella in light of weather forecast. An additional variable, rain,Chapter 2. Decision networks: the concept 42urbj(1) (2)Figure 2.13: Two decision networks for the rain-and-umbrella problem.which takes the value “yes” if it does turn out to rain and “no” otherwise, is believedto be relevant to the decision and hence is included in our analysis. For each decisionfunction 6 : fforecast “ bring—uinbre11a, the decision maker needs to come up with ajoint probabilityP3(rain, forecast, bring — umbrella). The expansion ordering rain,forecast, bring-umbrella conforms to the decision problem. If the decision maker’sutility — satisfaction — is a function of rain and bring—umbrella, then the decisionnetwork is as shown in Figure 2.13 (1). To complete the specification of this network, oneneeds to assess the prior probability of rain and the conditional probability for forecastgiven rain. One also needs to assess the utility function.The expansion ordering forecast, rain, bring-umbrella also conforms to the decision problem. It gives rise to the decision network shown on Figure 2.13 (2). The readerwill see later that one can go between those two networks by reversing the arc betweenforecast and rain using Bayes’ theorem (see Howard and Matheson 1984 and Section5.5).2.4 Fundamental constraintsIn the introduction we have seen that among the five constraints that define influence diagrams, the regularity, the no-forgetting, and the single value node constraints should belifted. This short section considers the remaining two constraints, namely the acyclicityChapter 2. Decision networks: the concept 43and the no-children-to-value-node constraints.In the derivation of decision networks in the previous section, we first had a Bayesiannetwork iV3 consisting of decision and random nodes. Then each value node v wasattached to .iV by drawing arc from those nodes in .iV that v depends upon. Thus, thevalue nodes do not have children. In other words, decision networks always satisfy theno-children-to-value-node constraint.There is one issue that needs to be addressed. In the last section, we have assumedthe set of value nodes does not intersect with the set of random and decision nodes. Thismay not be the case sometimes; there may be nodes that are value nodes and decisionor random nodes at the same time. For example, the amount of money x one spendsthe next month is a value variable. In the meantime, x is also a decision variable, andit affects how much one will be willing to spend the month after. In such a case, wewill have two copies of x: one copy xd functions as a decision node, while the other xfunctions as a value node. Since Xj is a decision node, one can set its value at his willand this value affects later decisions. On the other hand, the value node x dependson xd and it does not affect any other nodes. By appropriately introducing copies ofvariables, we can always ensure that the set of value nodes does not intersect with theset of random and decision nodes.We now turn to consider the acyclicity constraint. Decision networks must always beacyclic because multiple-decision problems are acyclic (subsection 2.3.2) and Bayesiannetworks acyclic. In the derivation of the last section, we began with a joint probabilityP3(X) which one must have in order to solve the multiple-decision problem V in Bayesiandecision theory. Because V is acyclic, we were able to have an expansion ordering p forP5(X) that conforms to V. The ordering p led to a Bayesian network )s. For any arcx—*y in the Bayesian network, x comes earlier than y in the ordering p. Thereforemust be acyclic. A decision network was obtained from by adding value nodes. SinceChapter 2. Decision networks: the concept 44the value nodes do not have any children, the decision network must also be acyclic.The acyclicity and the no-children-to-value-node constraints are the only two constraints we impose on decision networks. We have just argued that those two constraintsare fundamental and are indispensable to decision networks. In this sense, we say thatdecision networks are the most general representation framework for solving multipledecision problems in Bayesian decision theory.Chapter 3Decision networks: formal definitionsThe previous chapter has introduced the concept of decision networks. This chaptergives the exact definitions. We first formally define Bayesian networks (Section 3.1) andgive two properties of Bayesian networks that will be useful later in a number of places(Section 3.2). Then we present the formal definitions of decision networks and of theirevaluation (Section 3.3). A naive algorithm for evaluating decision networks is providedin Section 3.4. This algorithm is very inefficient because it simultaneously considers allthe decision nodes. A decision network is stepwise-solvable if it can be evaluated byconsidering one decision node at a time (Section 3.5). Obviously, stepwise-solvability isa desirable computational property. In the next three chapters, we shall discuss whena decision network is stepwise-solvable and how to evaluate a stepwise-solvable decisionnetwork. For that purpose, we need the auxiliary concept of semi-decision networks(Section 3.6).Starting from this chapter, we shall introduce various mathematical symbols. To helpthe reader to keep track of them, we have listed all the major symbols at the beginningof the thesis.3.1 Formal definition of Bayesian networksBefore getting started, let us note that in this thesis, standard graph theory terms suchas acyclic directed graphs, parents (direct predecessors), children (direct successors),predecessors, descendants (successors), leaves (nodes with no children), and roots (nodes45Chapter 3. Decision networks: formal definitions 46Figure 3.14: Bayesian network and irrelevant variables.with no parents) will be used without giving the definitions. The reader is directed toLauritzen et al (1990) for exact definitions. We shall use w to denote the set of parentsof a node x in a directed graph.A Bayesian network1 (Pearl 1988) .4’ is a triplet Al = (X, A,?), where1. X is a set of random nodes (variables); each xEX has a frame Q — the set ofpossible values of x;2. A is a set of arcs over X such that (X, A) is an acyclic directed graph; and3. 2 is a set {F(xIir)Ix C X}2 of conditional probabilities of the variables given theirrespective parents3.Figure 3.14 show a simple Bayesian network neti with seven variables a, b, c, d, e,f, and g. The network contains the following prior and conditional probabilities: F(a),F(f Ia), F(bla), F(clb), F(dIb), F(elc,d), and F(gif,e).networks are also known as belief networks, Bayesian belief networks, and probabilisticinfluence diagrams.2A conditional probability F(xIir) is a mapping P(xI7r) {}u7 —. [0, 1] such thatwEfl P(x=wir=j3) = 1 for each value fi of ir2,.3When x is a root, 7r is empty. When it is the case, F(x17r3,) stands for thç prior probability of x.neti net2 net)Chapter 3. Decision networks: formal definitions 47The prior joint probabilityP1r(X)4 of a Bayesian network .V = (X, A, 2) is definedbyPAr(X)= H F(xir). (3.7)rEXIn words, 1(X) is the pointwise multiplication of all the conditional probabilities in iV.For any subset B X, the marginal probability P(B) is defined byP(B) = P(X), (3.8)X-Bwhere Zx —B means summation over all. the possible values of the variables in the setX-B.A note about notation usage: In equation (3.7) the range of the multiplication is specified by the sign “e”; xEX means x ranges over X. In equation (3.8) there is no “e” sign.As a convention, we use X—B P(X) as an abbreviation of BEB P(wB,ix_B),where wx_B stands for a general member of cZX_B, WB stands for a general memberof fZB, and (WB, wX-B) is thus a general member of x• We shall always follow thisconvention about notation usage throughout this thesis.For any two subsets B1,B2 Y of variables, the conditional probability P(B1IB2)isa function that satisfiesP(B1=/3,B2=3)=P(B2/)P(Bi=/3iIB/ V/31EB,V/32EB. (3.9)For technical convenience, we also introduce the auxiliary concept of semi-Bayesiannetworks. A semi-Bayesian network is a Bayesian network except that the prior probabilities of some of the root nodes are missing. More precisely, a semi-Bayesian network is a quadruplet Al = (X, A, P IS), where (X, A) is a acyclic directed graph, P ={F(xI7r)IxEX—S} is set of conditional probabilities, and S is the set of root nodes whoseprior probabilities are missing.4A function from f2x to [0, 1].Chapter 3. Decision networks: formal definitions 48It follows from the definition that Bayesian networks are semi-Bayesian networks.As in Bayesian networks, we can define Pg(X) as follows,P(X)= fJ P(xr). (3.10)Unlike in Bayesian networks, here P(X) usually is not a probability; it may not sumto one. Thus, it is called the prior joint potential instead of the prior joint probability.Marginal and conditional potentials can be defined from the joint potential in the sameway as marginal probabilities are defined from joint probabilities.Note that since there are no arcs from X—S to S, the prior joint potential P#(X) isnothing but the conditional probability of the variables in X—S given variables in S. Forexample, net3 in Figure 3.14 is a not Bayesian network if we have only P(clb), P(dlb), andP(ec, d) but not P(b). In this case, net3 is a semi-Bayesian network. The multiplicationof all the conditional probabilities yields the conditional probability P(c, d, eb).3.2 Variables irrelevant to a queryGiven a (semi-)Bayesian network ./V, one can pose a query ?P(BiIB2). It is often possible to graphically identify certain variables being irrelevant to the query ?PAr(Bl B2).This issue is addressed in Geiger et al (1990), Lauritzen et al (1990), and Baker and Boult(1990). The materials in the reminder of this section are extracted from those papers.To remove a node x from a semi-Bayesian network .A[ = (X, A, PS) is to: (1) removex from X, (2) remove from A all the arcs that contain x, (3) remove from P all the itemsthat involve x, and (4) those nodes that were not roots and become roots because of theremoval are added to S.We notice that removing a node from a Bayesian network may create root nodeswhich do not have prior probabilities. This is why we need the concept of semi-Bayesiannetwork.Chapter 3. Decision networks: formal definitions 49A leaf node is barren w.r.t a query ?PK(BlB2),if it is not in B1UB2. In neti (Figure3.14), g is barren w.r.t ?Pneji(elb). The following proposition says that if a leaf nodeis barren w.r.t a query, then it is irrelevant to the query, and hence can be harmlesslyremoved.Lemma 3.1 Suppose Al is a semi-Bayesian network, and x is a leaf node. Let Al’ bethe semi-Bayesian network obtained from Al by removing x. If x is barren w.r.t to thequery ?P.,(BiB2), thenPAr(B1IB2)= P’(BiIB2). (3.11)Consider computing Fneti(elb). The node g is barren w.r.t the query and henceirrelevant. According to Lemma 3.1, g can be harmlessly removed. This creates a newbarren node f. After the removal of g and f, neti becomes net2. Thus the query?Fneti(eb) is reduced to the query ?Pt2(eIb = b0).Let An(B1UB2)be the ancestral set ofB1UB2,i.e the set of nodes in B1UB2 and theancestors of those nodes. By repeatedly applying Lemma 3.1, we getProposition 3.1 All the nodes outside An(B1UB2)are irrelevant to the query ?P(B1IB2).Let G = (X, A) be a directed graph. An arc from x to y is written as an ordered pair(x, y). The moral graph m(G) of G is an undirected graph m(G) = (X, E) whose edgeset E is given byB = {{x,y}(x,y) or (y,x) E A, or z such that (x,z) and (y,z) E A}.In words, {x, y} is an edge in the moral graph if either there is an arc between the twovertices or they share a common child. The term moral graph was chosen because twonodes with a common child are “married” into an edge (Lauritzen and Spiegehalter 1988).Chapter 3. Decision networks: formal definitions 50In an undirected graph, two nodes x and y are separated by a set of nodes S if everypath connecting them contains at least one node in S. In a directed graph C, x and yare rn-separated by S if they are separated by S in the moral graph m(G)5. Note thatany node set separates itself from any other set.Proposition 3.2 Suppose .iV is a semi-Bayesian network. Let IV’ be the serni-Bayesiannetwork obtained from J’f by removing all the nodes that are not in B2 and are rn-separatedfrom B1 by B2. ThenPAr(BlB2)= Pp(BiB2). (3.12)In our example, since a is rn-separated from e by b in net2, the query can be furtherreduced to Fne,a(eIb). Note that a is not rn-separated from e by bin net 1.It can be proved (Lauritzen et al 1990 and Geiger et al 1990) that all the nodesirrelevant to a query ?Pj(Bi 1B2) can be recognized and removed by applying Proposition3.1 and Proposition 3.2.3.3 Formal definitions of decision networksA decision network skeleton is an acyclic directed graph K = (Y A), which consists ofthree types of nodes: random nodes, decision nodes, and values nodes; and where thevalue nodes have no children.A decision network skeleton describes a decision problem at the level of relation. Itcontains the set of parents lrd for each decision node d, the set of parents 7r for eachrandom nodes, and the set of parents ir for each value nodes. See Subsection 2.3.4.A decision network iV is a quadruplet .V=(Y A, 2, F) where5To relate rn-separation to d-separation (Pearl 1988), Lauritzen et al (1990) have shown that Sd-separates B1 and B2 if and only if S rn-separates B1 and B2 in the ancestral set An(B1USU2.Chapter 3. Decision networks: formal definitions 511. (Y, A) is a decision network skeleton. Let us use C to denote the set of randomnodes, D to denote the set of decision nodes, and V to denote the set of valuenodes.2. Each yeY has a frame ,— the set of possible values of y.3. P is a set {P(cr)IcEC} of conditional probabilities of the random nodes giventheir respective parents.4. F is a set {, : fZ—R1IveV} of value (utility) functions for the value nodes,where R1 stands for the real line.A decision network is obtained from a decision network skeleton by providing numerical information, i.e by specifying a frame for each variable, providing a conditionalprobability of each random node, and a value function for each value node. We say that(Y, A, P, F) is a decision network over the skeleton (Y A) , and that (}‘ A) is the skeletonof the decision network (Y A, 7’, F).A decision function (table) for a decision node d is a mapping S —fThe decision function space LS., for d1 is the set of all the decision functions for d1. LetD = {d1,. . . , dk} be the set of all the decision nodes. The Cartesian product L fl /is called the policy space for .iV, and a member S=(S, . . . , dk) E L is called a policyNote that while a decision function S is a function, a policy S is a vector of decisionfunctions.The relationship between a decision node d and its parents 1rd ‘as indicated by adecision function S : —f ftj is equivalent to the relationship as represented by theconditional probability P (d 1rd) given by(i if &(i3)=cPS:(di=aIrdg=/3) = (3.13)I. 0 otherwise,Chapter 3. Decision networks: formal definitions 52for all cEfd, and/9Ed.Since 6=(6,.. . , Sk), we sometimes writeP5(dj7rd) for F6(dj-d). Because of equation(3.13), we will abuse the symbol 6 by letting it also denote the set {P6(dIird)Idj€D} ofconditional probabilities of the decision nodes.In a decision network .‘V=(Y, A, P, :F), let X=CUD. Let Ax be the set of all thearcs of A that lie completely in X. Then the triplet (X, A, PUS) is a Bayesian network,where S denotes a set of conditional probabilities of the decision nodes. We shall refer tothis Bayesian network the Bayesian network induced from ./V by the policy 6, and writeit as jV. The prior joint probability P5(X) of .iV5 is given byF5(X)= JJ F(xIr) [J Ps(xIir). (3.14)rEC EDWe shall refer to P3(X) as the joint probability over X induced by SBecause the value nodes do not have children, for any value node v, 7r contains novalue nodes. Hence The expectation Es[v] of the value function (ir) of v underP3 is given byEs[v] =x= (3.15)The expected value E3[] of .JV under the policy S is defined byEs[V] = Es[vJ (3.16)VEV= P5(X)E/V(7rV). (3.17)X vEVLet us point it out again that and Zx mean summation over all the possible valuesof and X respectively, while means summation over the set V. See Section 3.1for a note about notation usage.Chapter 3. Decision networks: formal definitions 53The optimal expected value E[J’/] of .,V is defined byE[.A/]= maxoEE6[.A/]. (3.18)The optimal value of a decision network that does not have any value nodes is zero. Anoptimal policy 5°=(Sç,.. .,6) is one that satisfiesE50[V] = E[A/]. (3.19)We call an optimal decision function (table) of d. For a decision network that doesnot have any value nodes, all policies are optimal.In this thesis, we shall only consider variables with finite frames. Hence there are onlyfinitely many possible policies. Consequently, there always exists at least one optimalpolicy.To evaluate a decision network is to1. find an optimal policy, and2. find the optimal expected value.3.4 A naive algorithmA straightforward approach to the evaluation of decision networks is to simply followthe definitions of optimal policy and of optimal expected value, and exhaustively searchthrough the policy space /1 This idea is made explicit by the following algorithm.Procedure NAIVE-EVALUATE:• Input: .iV a decision network.• Output: An optimal policy and the optimal expected value of A/.Chapter 3. Decision networks: formal definitions 54Let z be the policy space of Al.1. Pick one policy from L and denoted it by 6°. Set /. =2. Compute E50[A’].3. While L 0, do• Pick one policy from z and denoted if by 6. Set =• Compute E6 [V].• If Es[.iV] > E30[.Vj, set 80 = 6.end-while4. Output 60 and Eso[JV1.Though simple, this naive algorithm is very inefficient. The main reason is that itsimultaneously considers all the decision nodes. This results in an exhaustive searchthrough the policy space , which can be computationally prohibitive. Suppose thereare k decision nodes, each has 1 parents, and suppose all the variables are binary. Thenfor each decision node d, the cardinality of fZd is 2’; hence there are 2(i) possible decisionfunctions for d. Consequently there are (2(21))1v policies in Zi. The procedure NAIVE-EVALUATE computes the expected value of Jl for each of the (2(2))/ policies!There are decision networks whose evaluation necessitates simultaneous consideration of all the decision nodes. As an example, consider the decision network (skeleton)in Figure 3.15. Enemy movements may be observed by both agenti and agent2. Anagent decides whether or not to report enemy movements according to the instructionsestablished beforehand by the intelligence office. If an agent reports, there is a chancethat s/he may be exposed.Chapter 3. Decision networks: formal definitions 55Figure 3.15: A decision network whose evaluation may require simultaneous considerationof all the decision nodes.For the sake of illustration, assume all the variables either take the value YES orNC, except for the variable value, which takes real numbers. To complete the specification, we need to give the following (conditional) probabilities: P(enemy-movement),P(observed-by-agentl lenemy-movement), P(observed-by-agent2 I enemy-movement),P(agentl-exposedlagent 1-reports), and P(agent2-exposedlagent2-reports). Weneed also to give value function Lv&iue(enemym0vement, agent 1-exposed, agent2-exposed).There are four possible instructions (decision functions) for agentl:1. If observed-by-agent 1 =YES, then agent 1 -report s=YES;If observed-by-agentl=NO, then agentl-reports=YES.2. If observed-by-agent 1=YES, then agent 1-reports=YES;If observed-by-agent i=NO, then agent 1-reports=NO.3. If observed-by-agent 1=YES, then agent 1-report s=NO;If observed-by-agentl=NO, then agentl-reports=YES.Chapter 3. Decision networks: formal definitions 564. If observed-by-agent i=YES, then agent 1-report s=NO;If observed-by-agent i=NO, then agent 1-report s=NO.Similarly, there are four possible instructions for agent2. The policies (instructionsfor both agents) for the problem are given as follows:1. If observed-by--agentl=YES, then agentl-reports=YES;If observed-by-agentl=NO, then agentl-reports=YES.andIf obs erved-by-agent2=YES, then agent 2-report s=YES;If observed-by-agent2=NO, then agent2-reports=YES.2. If observed-by-agent i=YES, then agent 1-report s=YES;If observed-by-agentl=NO, then agentl-reports=NO.andIf observed-by-agent2YES, then agent2-reports=YES;If observed-by-agent2=NO, then agent2-reports=YES.3. and so onOne can easily see that the policy space consist of (2(2’))2 = 4 * 4 = 16 possible policies.In assessing his utilities, the intelligence office needs to keep both agents in mind. Forexample, it may be the case that information about an particular enemy movement isimportant enought to risk one agent but not both. The instructions for such a situationshould allow one and only one agent to report. The instructions can require, for instance,agenti to report when the information is deemed important enough to risk one agent,and require agent2 to report only when the information is deemed important enough torisk both agents. Such instructions can be arrived at only by considering the two agentsChapter 3. Decision networks: formal definitions 57simultaneously. In Chapter 8, we shall formally prove that with appropriate probabilitiesand value functions, optimal policies for the decision network in Figure 3.15 can be foundonly by considering the two decisions at the same time.On the other hand, however, there are decision networks which allow more efficientalgorithms than NAIVE-EVALUATE. The best case is when a decision network canbe evaluated by considering one decision node at a time. This leads to the concept ofstepwise-solvability.3.5 Stepwise-solvabilityLet .iV be a decision network. Let d1, d2,..., dii, be all the decision nodes in .A[, and let8=(6, 62,.. .,6) be a policy of .A1, where 63 is a decision function of d3. The expectedvalue Es[iVl = E(sl,52,...,sk)[.A/] is a function of 6, 82 ..., and 6k.For any iE{1, 2,..., k}, if we fix the value of 8 for all j{1, 2,... , k} such that ji,then E(5,,...,s_l,5,s+l,...,sk)[JV] becomes a function of 6. Rank all the possible values of Sj,i.e all the possible decision functions of d, according to the value E(51If the decision function (for d) that is ranked the highest remains the same regardless ofthe values of the Si’s (ji), then we say that d is a stepwise-solvability candidate nodeor simply an SS candidate node of .iV.A deterministic node is a random node whose conditional probability takes the valueeither 0 or 1. To replace a decision node d3 by a deterministic node characterized by afunction 63 : — ! (Shachter 1988) is to replace d3 by a deterministic node with thesame frame, and to set P(d317r) to be the conditional probability that represents 8 inthe sense of equation (3.13).If d1 is an SS candidate node, then an optimal decision function 8 can be foundas follows. For all jE{1, 2,. .. , k} such that ji, replace the decision node d, by aChapter 3. Decision networks: formal definitions 58deterministic random node characterized by an arbitrary decision function 6j, resultingin a decision network with only one decision node d. Then find a policy S,° of d1 thatsatisfies[iV] = maxs1eE(61,6i+i 6k) [Al]. (3.20)Proposition 3.3 If d is an SS candidate node of a decision network Al, then an decisionfunction 6’ that satisfies equation (3.20) is an optimal decision function of d.Proof: Let (6,.. .,6, 6, 6,.. . , 6) be an optimal policy of Al. Since d is an SScandidate node and &? satisfies (3.20), we have5)[Al].Therefore, (6,...,6, 6,?, ,...,6) must also be an optimal policy of Al. Consequently, 6 must be an optimal decision function of d. The proposition is proved.A decision network is stepwise-solvable if it contains no decision nodes, or if1. there exists an SS candidate node d,, such that2. if d is replaced by a deterministic node characterized by an optimal decision function of d, the resulting decision network (with one less decision node) is stepwisesolvable.A decision network skeleton is stepwise-solvable if all the decision networks over theskeleton are stepwise-solvable.If a decision network Al is stepwise-solvable, then it can be evaluated as follows. Findan SS candidate node d and find an optimal decision function 6’ of d in the way asspecified in equation (3.20). Replace d1 by a deterministic node characterized by 6,,resulting in another stepwise-solvable decision network Al’ with one less decision node.Chapter 3. Decision networks: formal definitions 59Then recursively apply the procedure to .Af’, an so on so forth. We see here that A/ isevaluated by considering one decision node at a time.Suppose ./V is a stepwise-solvable decision network with k decision nodes. Supposeeach decision node has 1 parents, and suppose all the variables are binary. Then toevaluate V, we need to compute the expected values of jV for (2(2)) * k policies, insteadof(221))k policies as in the case of NAIVE-EVALUATE.Note that the aforementioned evaluation method is not the best. The term caneasily be prohibitively large. We shall show that when a decision network is stepwisesolvable, it can be solved not only by considering one decision node at a time, by alsoby considering one, usually small, part of the network at a time. The complexity can bereduced to that of computing no more than 2k + m marginal probabilities of no morethan K + 1 variables or computing no more than (2k + m)2 numbers, where m standsfor the number of value nodes.So, stepwise-solvability is a very desirable property for a decision network to possess.In the next three chapters, we shall investigate when a decision network is stepwisesolvable and what is the best way to evaluate a stepwise-solvable decision network. Forthis purpose, we need the technical concept of semi-decision network.3.6 Semi-decision networksThe reader has seen that removing nodes from a Bayesian network may create root nodeswhich do not have prior probabilities. This is why we need the concept of semi-Bayesiannetwork. We shall also be discussing removing nodes from decision networks, whichnecessitates the concept of semi-decision networks.A semi-decision network is a decision network except that the prior probabilities ofsome of the root random nodes are missing. We use .V=(Y A, 2, FIS) to denotes aChapter 3. Decision networks: formal definitions 60semi-decision network, where S is the set of root random nodes whose prior probabilitiesare missing.As before, let X = C U D be the set of random and decision nodes. Similarly to thecase of decision networks, a policy 6 induces a semi-Bayesian network (X, A, PU6S),which will be referred to as the semi-Bayesian network induced from .A/ by the policy 6and which will be written as .iV5 Let P3(X) be the prior joint potential of .iV5For any value node vEV, irX. The expected value E3[V] of .,V under the jointpotential P5(X) is defined byE5[iV] = P5(X) t,(ir). (3.21)X vEVThe optimal expected value E[.A/] of / is defined byE[iV] = maxsEAES[JV].An optimal policy 6° is one that satisfiesUnlike in the case of decision networks, we also define the concept of conditionalexpected value for semi-decision networks. Th conditional expected value E5[VS] of iVgiven S is defined byE6[VS] = P5(X) (ir). (3.22)X-S vEVObviously Es[Ji9S} is a function of S.We chose the term “conditional expected value” because that the prior joint potentialFAr(X) is nothing but the conditional probability of the variables in X—S given S (seethe note at the end of Section 3.1)The optimal conditional expected value E[VIS] of .iV’ given S is defined byChapter 3. Decision networks: formal definitions 61E[J’f IS] = maxoEEs[JVIS].An optimal conditional policy 6° is one that satisfiesEso[AIS] = E[A’S],for all possible values of S.Proposition 3.4 A conditionally optimal policy of a semi-decision network, if exits, isalways an optimal policy.Proof: From equations (3.21) and (3.22), we have= Es[VIS].SLet 6° be a conditional optimal policy and let 6 be an arbitrary policy of .iV. Then= E5[S] Es[S] = E5[Aq.S SThereforeE50[V] maxsE6[V].In words, 6° is an optimal policy of jV. CFor a semi-decision network, there always exists at least one optimal policy. But theremay not necessarily exist any conditional optimal policies.Given a semi-decision network .iV, if every optimal policy of .iV is also a conditionallyoptimal policy, then we say that .A1 is uniform.We shall investigate when a semi-decision network is uniform later.Chapter 4Divide and conquer in decision networksThis chapter and the next two chapters constitute the heart of this thesis; they introduceand study one subclass of decision networks, namely stepwise-decomposable decisionnetworks (SDDN’s). SDDN’s are important because they are stepwise-solvable, and aswe shall show in Chapter 8 stepwise-decomposability is the weakest graph-theoreticalcriterion that guarantees stepwise-solvability.This chapter investigates when and how a decision network can be decomposed intotwo subnetworks such that the optimal expected value and an optimal policy of thedecision network can be computed by evaluating the two subnetworks. The next twochapters are concerned with how to evaluate decision networks that can be decomposedinto ii— the number of decision nodes — subnetworks such that the optimal expectedvalues and optimal policies of the decision networks can be computed by evaluating then subnetworks.The organization of this chapter is as follows. Section 4.1 discusses the relationshipbetween independence in a decision network and separation in the underlying decisionnetwork skeleton. Section 4.2 defines a concept of decomposability for decision networks,and Section 4.3 shows that this concept of decomposability implies a divide and conquerevaluation strategy.Since manipulation of decision networks gives rise to semi-decision networks and deci• sion networks are special semi-decision networks, exposition in this chapter will be carriedout in terms of semi-decision networks.62Chapter 4. Divide and conquer in decision networks 63Figure 4.16: The relationships among the sets Y, Y1, Y11, X1, X11, and id. The threesets Y1, Y1 and 7Td constitute a partition of Y — the set of all the nodes; while X1, X11and lrd constitute a partion of CUD — the set of all the random and decision nodes.When the network is smooth at d, there are no arcs going from X11 to ire.4.1 Separation and independenceThe main goal of this chapter is to prove Theorem 4.1, one of the most important theorems in this thesis. In preparation, this section exposes the relationship between graph-theoretic separation and probabilistic independence in the context of decision networks.Suppose AC = (Y A) is decision network skeleton and d is a decision node in AC. LetY1(d, K;), or simply Y1 be the set of all the nodes that are rn-separated from d by 7rd, withthe nodes in 7rd itself excluded. Let Y11(d, K;), or simply Y11 be the set of all the nodesthat are not rn-separated from d by ?rd. We observe that Yi, lrd, and Y11 forms a partitionof Y. Let X1(d,K;) =Y1(d,K;)fl(CUD) andX11(d,K;) =11(d,AC)fl(CUD).The relationships among the sets are illustrated in Figure 4.16. In the following, weshall refer to Y1 as the upstream set of ire, Y11 as the downstream set of irj. We shall alsorefer to X1 as the set of random and decision nodes in the upstream of ire, and X11 asthe set of random and decision nodes in the downstream ofConsider a semi-decision network .iV=(Y A, 2, .9S). Let S be a policy of .jV. Aspointed out by Lauritzen et al (1990), m-separation in the skeleton (Y A) implies conditional independence for P5(X) — the joint potential over X induced by 6. Since 7rdChapter 4. Divide and conquer in decision networks 64rn-separates Xj and we have that Fo(Xn (Xi, irt) = Ps(XiIjlrd). ThereforeF5(X1,7rd, X11) = F5(X1,Kd)Fs(XIIIXI, Ird)= F3(X1,lrd)F6(Xii(lrd). (4.23)The rest of this section seeks an explicit representation ofF5(Xj, lrd) and Fs(XIIjlrd)in terms of conditional probabilities.A decision network is smooth at the decision node d, if there are no arcs going fromthe downstream set Y11 of irj to nodes in irj. In other words, arcs between irj and Y1only go from 7rd to Y11.As an example, consider the decision network skeleton for the further extended oilwildcatter problem (Figure 2.11). The downstream set of lroilsale..policy consists ofoil—sale—policy and oil—sales. There are no arcs from oil—sales to nodes inKoilsalepolicy. So, the skeleton is smooth at oil—sale—policy. One the other hand,the downstream set of 7r11 contains all the nodes except test-cost and the nodesin ir&j11. In particular, the downstream set contains the node seismic-structure.Because of the arc from seismic—structure to test—result (e Rcjrjll), the decisionnetwork skeleton is not smooth at drill.Let d1, ..., d be all the decision nodes in XIUKd and ..., 4 be all the decisionnodes in X11. Note that d C {d1, ..., dk}. For a policy S = (6k,... ,Sk), let S =(6k,. . . , Sj) and= (Si+i,. . . ,Suppose the semi-decision network .1%! is smooth at d. It follows from Proposition 3.1thatFs(XI,wd)= fi F(xI)flFo1(dI(Kd). (4.24)xECfl(XzUlrd) i=1• And it follows from Proposition 3.2 thatkPs(XII(2rd)= II F(xIK) [J F51(dIIlrd)). (4.25)xECflX11 i=j+1Chapter 4. Divide and conquer in decision networks 65Those two equations give us the following lemma.Lemma 4.1 IfV is smooth at d, then P8(XI,lrd) only depends on 5, and Ps(XII1rd)only depends on 5ii. From now on, we shall writeP5(X1,lrd) asF31(X1,Ird), and Fs(XIIIlrd)as P311(XII7rd).4.2 Decomposability of decision networksAlso in preparation for Theorem 4.1, this section introduces a concept of decomposabilityfor decision networks and shows how to divide a decomposable decision network into twosubnetworks.A decision network skeleton AC=(Y A) is decomposable at a decision node d if thenumber of decision node in Yij(d, K) is less than the number of decision nodes in K. Asemi-decision network is decomposable at a decision node d if the underlying skeleton is.When a decision network skeleton K is decomposable and smooth at d, we definethe downstream component of JC w.r.t d, denoted byK11(d,K) or simply K, to be thedecision network skeleton obtained by restricting K to 7rdUYII and then removing thosearcs that connect two nodes in 7rd.Also, we define the upstream component of AC w.r.t d, denoted by 1C(d, AC) or simplyK1, to be the decision network skeleton obtained by restricting AC to YJUlrd and thenadding a node u and drawing an arc from each node in ird to u. The node u is to be usedto store the value of the downstream component, and is thus called the downstream-valuenode.Figure 4.17 shows the downstream and upstream components, w.r.t oil-sale-policy,of the decision network skeleton in Figure 2.11.Note that while Yi and YH are sets of nodes, AC1 and ACH are decision network skeletons. AC1 and AC11 contain the nodes in ird, while Y1 and Y11 do not.Chapter 4. Divide and conquer in decision networks 66Let iV be a decision network over K, and suppose .iV (or PC) is decomposable andsmooth at d. The downstream component of .iV w.r.t d, denoted by .A/11(d, .iV) or simplyby Ni1, is a semi-decision network over PC11. The value functions for all the value nodes ofJV remain the same at in .A[. The conditional probabilities of the random nodes of N11that lie outside ir also remain the same as in N. The nodes in 7rj, random or decision,are viewed in J’/ii as random nodes whose prior probabilities are missing.The upstream component of .,V’ w.r.t d, denoted by N1(d,A1) or simply by N1, is asemi-decision network over PCi. The conditional probabilities of all the random nodesremain the same as in .iV. The values functions of the value nodes other than u alsoremain the same as in N. The value function [L(7rd) of the downstream-value node u isthe optimal conditional expected value E[NIIrd] of the downstream component N11.Since the decision node d is not in the upstream component N1, the number of decisionnodes in N1 is less than the number of decision nodes in N. Since the decision nodesin rd, if any, are treated as random nodes in N11 and since the .iV decomposes at d, thenumber of decision nodes in NH is also less than the number of decision nodes in N.Furthermore the number of decision nodes in N1 plus the number of decision nodes inNj-i equals the number of decision nodes in .,V.We shall later define the concepts of downstream and upstream components for thecase when PC and N are not smooth at d.4.2.1 Properties of upstream and downstream componentsThis subsection gives several properties of upstream and downstream components ofdecomposable decision networks. Those properties will be useful in the proof of Theorem4.1.Given a policy S = . , Sk), the downstream component N11 is a semi-Bayesiannetwork. Let P.kr11,6(wd,XI1) denote the prior joint potential of this semi-BayesianChapter 4. Divide and conquer in decision networks 67Figure 4.17: Downstream and upstream components: The downstream component is asemi-decision network, where the prior probabilities for oil-produced and oil-marketare missing. In the upstream component, u is the downstream-value node, whose valuefunction is the optimal conditional expected value of the downstream component.network. From the definition of the downstream component, one can see that if .iV issmooth at d, thenkPj.r11,o(1rd,XII)= fJ P(xr) fJ Ps(d1rdj),xECflX11which is the same as the right hand side of equation (4.25). Therefore we haveLemma 4.2 Suppose a semi-decision network ./V is decomposable and smooth at decisionnode d. Let P311(XJzIlrd) be as in Lemma 4.1. ThenFAr11,s(rd,Xii) =F511(XIIIlrd). (4.26)Similarly, given a policy Si = (Si,. .. , So), the upstream .iV1 is a Bayesian network. LetP.&r1,51(Xi, S) denote the joint probability of this Bayesian network. From the definitionupstream componentDownst ream componentof the upstream component, one can see that if .IV is smooth at d, thenChapter 4. Divide arid conquer in decision networks 68Fr1.51(Xj,S)= fl P(xI)HPö(djI),XECfl(XIUlrd)which is equal to the right hand side of equation (4.24). Therefore we haveLemma 4.3 Suppose a semi-decision network .jV is decomposable and smooth at decisionnode d. Let P51(XI,rd) be as in Lemma .1. ThenPg1,61(X1lrd) = P51(X1,lrd). (4.27)The following proposition is especially important to the proof of Theorem 4.1.Proposition 4.1 Suppose a semi-decision network .iV is decomposable and smooth atdecision node d. Let Ps11(XIIIrd) and P31(XI,lrd) be as in Lemma 4.1. Let V1 and V11be the set of value nodes in A11 an JV11 respectively. Then the conditional expected valueof A11 under policy Sii satisfiesE311[./V’III7rd] =P511(XIIJrd) L,(7r), (4.28)X11 uEV11and the expected value of Al’1 under policy 6 satisfiesE51[.1V] = P51(X1,1rd){ iV(rV)+E[JVIiI1rd]}. (4.29)Xj,lrd vEV1Proof: By definition, we haveE511[Al’IIllrd] = Pg1,1(ir,X11)X11Thus equation (4.28) follows from equation (4.26).Again by definition, we haveEs1[V’i] = Pg1,5(XI,lrd){ r) + E[.JV’IIIlrd]}.XI,lrd VEV1Thus equation (4.29) follows from equation (4.27).Chapter 4. Divide and conquer in decision networks 694.3 Divide and conquerThis section shows how decomposability of (semi-)decision networks leads to a divideand conquer evaluation strategy.Theorem 4.1 Suppose a (semi-)decision network .,V is decomposable and smooth at decision node d. Let Ar11 be the downstream component of Ar w.r.t d, and Ar1 be the upstreamcomponent. If Ar11 is uniform, then1. If 6 is an optimal policy for Ar11 and 6 is an optimal policy for Ar1, then 30 =def(Sq, I) is an optimal policy for Ar.2. The optimal expected value E[Ar1] of the body Ar1 is the same as the optimal expectedvalue E[Ar] of Ar.The theorem divides the task of evaluating a (semi-)decision network Ar into two sub-tasks: the task of evaluating the downstream component Ar11 and the task of evaluatingthe upstream component Ar1.Applying the theorem to the decision network in Figure 4.17, we get that optimaldecision functions for oil-sale-policy can be found in the downstream component,which is much smaller than the original network. The optimal decision functions for allthe other decision nodes can be found in the upstream component, which is also smallerthan the original network. Furthermore, we can repeatedly apply the theorem to theupstream component.The following mathematical proof may test the reader’s patience, but it is the key tounderstanding the correctness of our later algorithms.Proof: For any policy S of Ar, we haveE3[Ar] = P6(X1,rd, X11) pv(7r,) (By definition)XI,lrd,XII veVChapter 4. Divide and conquer in decision networks 70= F&(Xi, rd)Psij(XIII7rd) (ir,) (By equations 4.23 and Lemma 4.1)X1,lrd ,X11 vEV= P31(X1,lrd){ r) > P511(XJIIlrd) + P11(Xir) tu(irv)}XI,lrd X11 X11 uEV= >Z Ps1(Xz,wi){ + ES11[JVIJwd]} (By equation 4.28). (4.30)XI,lrd vEVSince 6 is an optimal policy for JV11 and Jii is uniform,E511[NIrd] Esy[JViiiri] = E[.jVIIIirdj. (4.31)Noticing that P31 (X1,7rd) is non-negative, we have= P3(Xi, d){ ii() +E311[Id]} (By equation 4.30)XI,lrd VEV1P6(X, 7rd){ 1,(1rv) +E31[./ViIIrd]} (By equation 4.31)XJ,lrd vEV1= P5(X1,lrd){ i(irv) + E[JV’IIIlrdj} (By equation 4.31)XI,lrd vEV1= E51[.A11] (By equation 4.29)E6 [Jf] (Optimality of 6)= Po(XI,wd){E ,Lv(lrv)+E[.A(uLwd]} (By equation 4.29)XI,Trd VEV1= P3(X, 7rd){ ii(r) +E51[.ViI7rd]} (Optimality of 6)XJ,7rd vEV1= E60[/]. (By equation 4.30)Therefore, 60 is indeed an optimal policy for Al. The first statement of the theorem isproved.The foregoing derivation has also shown thatE3[ E3[Al1] E50[.Letting 6 be 6°, we get=Therefore E[Al] = E[Al1]. The proof is completed. 1JChapter 5Stepwise-decomposable decision networksThis chapter introduces and studies the most important concept of this thesis, namelystepwise-decomposable decision networks (SDDN). Roughly speaking, a SDDN is a decision network that can be decomposed into n — the number of decision nodes— subnetworks such that each subnetwork contains only one decision node and that the originalnetwork can be evaluated through the evaluation of those subnetworks (Section 5.4). Afirst reason why SDDN’s are computationally desirable is that the subnetworks may besubstantially smaller than the original network.A second reason why SDDN’s are computationally desirable is that each of the sub-networks is a semi-decision network with only one decision node. Single-decision-nodesemi-decision networks can be evaluated by enumerating the values of the parents of thedecision node instead of enumerating all the possible policies or decision functions (seeSection 5.5). Suppose that the decision node has n parents and that all the variables arebinary. Then, the parents can assume 2’ possible values, while there are 2(2) decisionfunctions!The organization of this chapter is as follows. The definition of SDDN’s is given inSection 5.1, and Section 5.2 shows that smooth SDDN’s are stepwise-solvable. The issueof testing stepwise-decomposability is addressed in Section 5.3. In Section 5.4, we discusshow to evaluate a smooth SDDN by using the divide and conquer strategy outlined inthe previous chapter. An algorithm is presented in Section 5.6, which makes use of thesubroutine given in Section 5.5 for evaluating simple semi-decision networks.71Chapter 5. Stepwise-decomposable decision networks 72Non-smooth SDDN’s are treated in the next chapter.5.1 DefinitionThis section defines stepwise-decomposable decision networks.In a decision network skeleton PC, a decision node d is a stepwise-decomposabilitycandidate node or simply an SD candidate node if ir1 rn-separates d from all otherdecision nodes and their parents. A decision node is an SD candidate node in a decisionnetwork ,V if it is an SD candidate node in the skeleton of .,V.As an example, consider the decision network skeleton in Figure 2.11). Both oil-sale—policyand gas-sale-policy are SD candidate nodes, while drill and test are not. The decision nodes oil-sale-policy and gas-sale-policy are not rn-separated from drill(test) by idri11 (irtest).Lemma 5.1 Suppose d is am SD candidate in a decision network skeleton PC. Then thedownstream set Y11(d, PC) contains only one decision node, which is d itself. So, if PCcontains more than one decision node, then it is decomposable at d. 1When d is an SD candidate node in decision network V and .iV is smooth at d, theupstream component JV of iV w.r.t d is called the body of .iV w.r.t d, and the downstreamcomponent of Al w.r.t d is called the tail of Al w.r.t d. Moreover, the downstreamvalue node in 1V will be referred to as the tail-value node.A decision network skeleton PC is stepwise-decomposable if either it contains zero orone decision node, or1. There exists an SD candidate decision node d, and2. The body AC1 of AC w.r.t d is stepwise-decomposable.Chapter 5. Stepwise-decomposable decision networks 73(2)(3)Figure 5.18: Step by step decomposition of the decision network in Figure 2.11.A (semi-)decision network is stepwise-decomposable if its underlying skeleton is. Theterm “stepwise-decomposable decision network” will be abbreviated to SDDN.Suppose a decision network .iV is stepwise-decomposable. If .iV contains more thanone decision node, then it has at least one candidate node d. According to Lemma 5.1, .iVis decomposable at d and it decomposes into a body .A11 and a tail. ./V1 is again stepwisedecomposable. If .A11 contains more than one decision node, we can again decomposeinto a body and a tail, an so and so forth, till there is only one decision node left in thebody. In other words, we can decompose an SDDN into a series of subnetworks (tails) ina step-by-step fashion. This is why the term “stepwise-decomposable” was chosen.As an example, let .iV be the decision network in Figure 2.11. .A/ is stepwisedecomposable. The node oil-sale-policy is an SD candidate node in .JV, and ./V(1)(4)decomposes at oil-sale-policy into a body .,Vj and at tail, as shown in Figure 4.17.Chapter 5. Stepwise-decomposable decision networks 74The node gas-sale-policy is an SD candidate node in .iV1, and .iV1 decomposes atgas-sale-policy into a tail and a body. The body is shown in Figure 5.18 (1), and thetail Figure 5.18(2). In Figure 5.18 (1), drill is an SD candidate node, and the networkdecomposes at drill into a body as shown in Figure 5.18 (3) and a tail as shown inFigure 5.18 (4).The decision network skeleton in Figure 3.15 contains two decision nodes, but no SDcandidate nodes. So, it is not stepwise-decomposable.5.1.1 Another way of recursionIn the definition of decomposability, the number of decision nodes is recursively reducedby cutting tails that contain a single decision node. Another way to recursively reducethe number of decision nodes is to replace them one by one with deterministic randomnodes. Let us first prove a lemma.Lemma 5.2 Let d be an SD candidate node in a decision network skeleton AC. Let ACbe the body of 1C w.r.t d, and let AC’ be the decision network skeleton obtained from AC byreplacing d by a deterministic node. Then AC1 is stepwise-decomposable if and only if AC’is.Proof: We prove this lemma by induction on the number of decision nodes in AC. Whend is the only decision node in AC, both AC1 and AC’ contain zero decision nodes, and henceboth are stepwise-decomposable.Suppose the lemma is true for the case of k — 1 decision nodes. Consider the case ofk decision nodes. One can easily verify that a decision node is an SD candidate node inAC1 if and only if it is an SD candidate node in AC’.Suppose a decision node d’ ( d) is a SD candidate node in AC1 (hence in AC’). Thereare two cases depending on whether or not d is in the downstream set of lrd’ in AC. WhenChapter 5. Stepwise-decomposable decision networksd is in the downstream set of lrd’, the body of K1 w.r.t d’ is the same as that of K’; henceKi is stepwise-decomposable if and oniy if K;’ is. When d is not in the downstream setof 7rd’, let K; be the body of K1 w.r.t d’, and K;* be that of K;’. Then K;* is the body ofK w.r.t d. By the induction hypothesis, K;* is stepwise-decomposable if and only if K;is. Therefore, K’ is stepwise-decomposable if and only if K is. The lemma is proved. 1JThis lemma leads directly to the following proposition.Proposition 5.1 A decision network skeleton is stepwise-decomposable if and only ifeither it contains no decision nodes or1. There exists an SD candidate decision node d, and2. If d is replaced by a deterministic node, the resulting decision network skeleton (withone less decision node) is stepwise-decomposable.One can view Proposition 5.1 as an alternative definition of stepwise-decomposability.The original definition is based an recursive construct that will be used directly in thealgorithms, while the recursive construct of this alternative definition is the same asthat of the definition of stepwise-solvability, which makes it convenient to study therelationship between stepwise-decomposability and stepwise-solvability, as the reader willsee in the next section.5.2 Stepwise.-decomposability and stepwise-solvabilityA decision network is smooth if it is smooth at every decision node.Theorem 5.1 A smooth decision network is stepwise-solvable if it is stepwise-decomposable.Proof: Let .,V be a smooth decision network and d be a decision node. Because ofProposition 5.1, it suffices to show that if d is an SD candidate node, then it is also anSS candidate node.Chapter 5. Stepwise-decomposable decision networks 76Suppose d is an SD candidate node. Let X1 be the set of random and decision nodesin the upstream of 7rd; let Jf and Jv11 be respectively the body and tail of .A/ w.r.t d; letVi be the set of value nodes .iV1; let 6 be a policy of An; and and let be a policy ofJV11, i.e a decision function of d. By equation (4.30) we have that= F31(X, lrd){ iiv(rv) +E511[JV11)lrd]}.X1,lrd vEV1Fixing Si, we can rank all the possible policies Sji of.A/11 according to the valueE(51,11)[An].Since P61 (X1,7rd) is non-negative, this ranking does not depend on the value of Si. Therefore d is an SS candidate node. The theorem is proved.We shall show later that the theorem is true also for non-smooth decision networks,and that under “normal” conditions stepwise-solvability implies stepwise- decomposabilityas well (Chapter 8).The remainder of this chapter is devoted to the following two questions: How can onetest stepwise-decomposability? How can one evaluate a smooth SDDN?5.3 Testing stepwise-decomposabilityIn a decision network, we say that a decision node d precedes another decision node d’if there is a directed path from d to d’. Decision nodes that precede no other decisionnodes are called leaf decision nodes.We say d weakly precedes another d’ if d’ is in the downstream set of irs. Decisionnodes that weakly precede no other decision nodes are called weak leaf decision nodesThe following lemma follows from the definition of SD candidate node and the definitionof downstream sets.Lemma 5.3 In a decision network, if node is an SD candidate decision node, then it isa weak leaf decision node.Chapter 5. Stepwise-decornposable decision networks 77Proof: Straightforward.Lemma 5.4 Let d and d’ be two decision nodes in a decision network. If d precedes d’,then d weakly precedes d’.Proof: Since d precedes d’, there is a directed path from d to d’. No nodes in this pathcan be in lrd, because otherwise there would be a cycle in the network. Hence, d’ is notm-separated from d by ire. Consequently, d’ is in the downstream set of d. The lemma istherefore proved. DCombining the forgoing two lemmas, we getProposition 5.2 In a decision network, an SD candidate decision node must be a leafdecision node. In other words, if a node is not a leaf decision node, then it cannot be acandidate node.Proof: Suppose d is a decision node but not a leaf decision node. Then there existsanother decision node d’ such that d precedes d’. By Lemma 5.4, d weakly precedes d’,hence d is not a weak leaf decision node. By Lemma 5.3, d cannot be a candidate node.CThis proposition leads to the following algorithm for testing if a decision networkskeleton is stepwise-decomposable.Procedure TESTSTEPWISEDECOMPOSABILITY(,AC):• Input: K — a decision network skeleton.• Output: “YES” or “NO” depending on whether K is stepwise-decomposable.If there are no decision nodes in AC, return “YES”.ElseChapter 5. Stepwise-decornposable decision networks 781. Find a leaf decision node d of IC.2. Check if d is an SD candidate decision node.• If d is not, find another leaf decision node d’ and go to 2 with d’. Ifthere are no more leaf decision nodes, return “NO”.• If d is an SD candidate decision node, compute the body K of K;w.r.t d, and recursively call TEST-STEPWISE-DECOMPOSABILITY(K;1).What is the running time of TEST-STEPWISE-DECOMPOSABILITY? Let n bethe total number of nodes in K;, k be the number of decision nodes, a be the number ofarcs, e be the number of edges in the moral graph of K;. Finding a leaf decision nodetakes at most 0(a) time. Testing if a decision node is an SD candidate node and thecomputation of a body are of the same order complexity as testing the connectivity ofthe moral graph of K;, which is 0(n + e) by either breadth-first search or depth-firstsearch. In worst case, all the decision nodes are leaf nodes and there is only one SDcandidate node. If the only candidate node is always tested the last, then the complexityof TEST-STEPWISE-DECOMPOSABILITY is0(k2(n+e--a)) = 0(k2(n+e)). On theother hand, if every leaf decision node tested is an SD candidate node, the complexity is0(k(n + e)).5.4 Recursive tail cuttingIn Section 5.6, we shall give an algorithm for evaluating smooth SDDN’s. In preparation,this section shows that an optimal policy for a smooth SDDN can be computed byrecursively evaluating tail semi-decision networks, and the next section studies how toevaluate a tail semi-decision network.Theorem 5.2 Let V be a SDDN and d be an SD candidate node. Let .N11 be the tailof!’! w.r.t d, and JJj be the body. Let S be an optimal policy for .iV11, i.e an optimalChapter 5. Stepwise-decomposable decision networks 79decision function of d, and let and 6 be an optimal policy for A/i. If.A/ is smooth at d,then1. 8° def (6, 6) is an optimal policy for .A/.. The optimal expected value E[.iV1] of the body A/j is the same as the optimal expectedvalue E[Ar] of Al.The proof is postponed to the end of this section. The theorem implies the followingstrategy of evaluating a smooth SDDN Al: first compute and evaluate a tail JV1 of .iV,then compute and evaluate a tail of Al1, and so on so forth. We shall refer to this strategyas recursive tail cutting5.4.1 Simple semi-decision networksThis subsection introduces the concept of simple semi-decision networks. The concept isimportant to the proof of Theorem 5.2, as well as to our later algorithms.A semi-decision network Al = (Y A, 1’, .FIS) is simple if it contains only one decisionnode d and lrd=S.Proposition 5.3 Suppose Al is a smooth SDDN and d is an SD candidate. Then1. The tail A/si of Al w.r.t d is a simple semi-decision network, and2. The body Al1 of Al w.r.t d is again a smooth SDDN.Proof: The proof is straightforward. 1Suppose Al=(Y A, 2, XIS) is a simple semi-decision network. Let 6d be a decisionfunction of the only decision node d of Al.The conditional expected value Esd[AIS] depend on S and 8d. Since 6d is a functionfrom to !d, Esd[.IVIS=/3] may depends possible on Sd(/3) for all /3Es.Chapter 5. Stepwise-decomposable decision networks 80Lemma 5.5 For any value/3Efs, Esd[JVS=/3] depends only on the value Sd(/3) of thedecision function 6d at 3, in the sense that fixing Sd(/3)fixes Esd[.AIIS=/3], no matter whatthe &C8’) ‘s (/3’Efs, 3’3) are.The proof is postponed till the end of this section. The implication of this lemma isthat Esd[.IVIS=/9] is really a function of 3 and the value Sd(/9) of 6d at /3. To make thismore explicit, let6d(/3)=a, then Es5[.NIS/3] is a function of /3 and . To signify thisfact, we shall sometimes write Esd[JV1S=/31 asThe following proposition will also be proved at the end of this section.Proposition 5.4 Simple semi-decision networks are uniform.The corollary below follows from Theorem 5.2 and Lemma 5.5Corollary 5.1 Let .A[ = (YA,P,FIS) be a simple semi-decision network and let d bethe only decision node. Then an optimal decision function & for d is given by= arg maxaEOdEsd:sd(,a[JViS/3]. (5.32)5.4.2 ProofsProof of Theorem 5.2: Because of Theorem 4.1, it suffices to show that JV11 is uniform. By Proposition 5.3, the tail .iV11 is a simple semi-decision network. According toProposition 5.4, must be uniform. The theorem is therefore proved. U.Proof of Lemma 5.5: Since rd=S, we can write Pod(dIS) for PSd(dllrd). Let C and Vbe the set of random and value nodes respectively. The joint potential Psd(C, d) is givenby= Psd(dIS=/3) ]J P(crc).cECChapter 5. Stepwise-decomposable decision networks 81ThereforeEsd[JVIS=/3] = Pod(dIS=/3) fl P(cir) pv(rv). (5.33)Cu{d}—S cEC vEVThe only term that contains Sd is Psd(dIS=/3),which only depends on the value of 6d at3 according to equation (3.13). Thus, the lemma is proved. DProof of Proposition 5.4: Suppose N = (Y, A, P, FIS) is a simple semi-decisionnetwork. Let d be the only decision node. We need to show that for any decisionfunction S of d, ifE8dQ[A] = max5dEtdE5d[JV], (5.34)then for any value /3EflsEsdo[.JVIS=/9] = maxsdEsdEd[AuIS=/31. (5.35)For the sake of contradiction, assume there were a decision function S that satisfies(5.34) which did not satisfy (5.35). Then, there must exist another decision function 6and a value /3 E s such thatE5d0[J\/1S/3] <E5[J’.fS/3].Construct a new decision function 6 which is the same as S. at /3 and which is thesame as 6 at all other values 3’ of S. By Lemma 5.5, E32 [NIS=/3] = Es[.A/IS=/3j, andE52[JVIS/3’] = E[NIS=/3’] for any /3’EJs such that /3’/3. Hence, we haveE52[V] == E5[NIS=/3J + >> E5[NIS=/3] += E5[NIS=/3’]=Chapter 5. Stepwise-decomposable decision networks 82A contradiction. Therefore, it must be the case that (5.34) implies (5.35). The proposition is proved.5.5 Evaluating simple semi-decision networksThis section presents an algorithm for evaluating simple semi-decision networks.Let .,‘V = (Y,A,P,.FS) be a simple semi-decision network, and let d be the onlydecision node.Let .iV be the semi-Bayesian network obtained from .iV by removing all the valuenodes, and by deleting all the arcs into d and treating d as a root random node withoutprior probability. Let P0 be the joint potential of .M, i.e the product of all the conditionalprobabilities in Al.For any value node v of jV, all its parents—all the nodes in 1r—are in .M. Thus,we can compute the marginal potential Po(7r, S, d). We define the evaluation functionale(d,S)ofAlbye(d, S) = Po(ir.,, S, d)t(ir). (5.36)v€V rv—SU{d}Theorem 5.3 Let A/ = (Y A, 2, FIS) be a simple semi-decision network; let d be theonly decision node; and let e(d, S) be the evaluation functional of I’!. Then1. An optimal decision function 6 can be found by63) = arg maxEcDe(d=o., S=i3), V3 E 1s, (5.37). The conditional optimal expected value E[AIIS] is givenE[VS=i3] = maxQEc2de(d=cr, S=/3), V E fZs. (5.38)The proof is postponed to the end of this section.Chapter 5. Stepwise-decomposable decision networks 83The theorem suggests the following procedure for evaluating simple semi-decisionnetworks.Procedure S-EVALUATE(V):• Input: iV — A simple semi-decision network.• Output: An optimal decision function for the only decision node d, andthe optimal conditional expected value of .A/.1. Construct the semi-Bayesian network .M32. Compute the marginal potential Po(ir, S, d) for each the value nodes v.3. Obtain the evaluation functional e(d, S) by equation (5.36).4. Compute the optimal conditional expected value by equation (5.38) andan optimal policy by equation (5.37).Note 1). For those who are familiar with Bayesian network inferences, the clique treepropagation approach (Jensen et al 1990, Lauritzen and Spiegehalter 1988, and Shaferand Shenoy 1988) can be used to compute all marginal potentials Po(7r, 8, d)’s. All themarginal potentials can be computed by traversing the clique tree twice. When there isonly one value node, there is only one marginal potential to compute, which can be doneby traversing the clique tree only once.Note 2). Relating back to the point made in the introduction of the chapter about theevaluation of single-decision-node semi-decision networks, we see from Equation (5.37)that S-EVALUATE does indeed evaluate a simple semi-decision network by enumeratingthe values of the parents of the only decision node, rather than enumerating all thedecision functions, as the procedure NAIVE would do.Chapter 5. Stepwise-decomposable decision networks 84An interesting question is: Can we do the same for (semi-) decision networks withmore than one decision node? The answer is no, unless all the decision nodes have thesame parents. Essentially, what goes on in S-EVALUATE is that for any /3E!wd, weinstantiate the parents rd of the only decision node d to 3 and figure out the value ford that maximizes the expected value of the simple decision network. When there are atleast two decision nodes, say d1 and d2, with different parents, we can not do the samebecause instantiating the parents of d1, for instance, would mean that all the parents ofd1 are observed at the time the decision d2 is to be made. This may not be true at all.Proof of Theorem 5.3: Let 6d be a decision function of d. Let Psd(X) be the potentialover the set X of all the random and decision nodes of .iV under policy 6d• We havePsd(X) = Po(X)Psd(dIS). (5.39)Therefore, we haveEsd[Jf IS] = F6(X) i(ir) (By definition)X-S vEV= > Psd(dIS)Fo(X) > (By equation (5.39) )X-S vEV= EPsd(dIS) Po(X)it(ir)uEV d X—(Su.(d})= EP6d(dIS) Po(X)(r)vEV d ir X—(irUSU{d})= Psd(dIS) Po(ir, S, d)it(r) (Marginal potential)vEV d 7rvFrom equation (3.13), we see that given S, Psd(dIS) is the characteristic functionx{dld=od(s)}(d) of the set {dld = Sd(S)}. ThereforeEsd[.Af IS] = > Po(7r, S, Sd(S))U(7r). (5.40)uEVlrvChapter 5. Stepwise-decomposable decision networks 85Hence,= e(d=, S=). (5.41)The first item of the theorem follows from Corollary 5.1 and equation (5.41). Thesecond item follows from the first item. The theorem is proved. Li5.6 The procedure EVALUATEWe are now ready to present our algorithm for evaluating smooth SDDN’s. The correctness of the algorithm is guaranteed by Theorem 5.2 and Proposition 5.3.Procedure EVALUATE(JV):• Input: .iV — a smooth SDDN.• Output: An optimal policy for and the optimal expected value of .,V.If there are no decision nodes, Call N-EVALUATE(.A1) to compute the expected value, and stop.Else1. Find an SD candidate decision node d,2. Compute the tail JV11 of J’f w.r.t d,3. Call S-EVALUATE(.A111)to compute an optimal policy for and the optimal conditional expected value E[.A[IIIlrd] of V11,4. Compute the body J/i of iV w.r.t d (E[J’fIIrd] is used here), and5. Recursively call EVALUATE(iV1).In EVALUATE, the subroutine N-EVALUATE is a procedure for evaluating decisionnetworks that contain no decision nodes, which is given below.Chapter 5. Stepwise-decomposable decision networks 86Procedure N-EVALUATE(iV):• Input: N — a decision network with no decision nodes.• Output: the optimal expected value of .iV.If there are no value nodes in N, return 0.Else1. Let v1,...,V be all the value nodes. Compute P(7r1) for all the vi’s.2. Returni=1 1rvNote that in EVALUATE, there is the subtask of finding an SD candidate node, whichis also in the TEST-STEPWISE-DECOMPOSABILITY. In implementation, one shouldavoid doing this twice.Also note that in N-EVALUATE one can, as in S-EVALUATE, compute the marginalprobabilities P(K) by using the clique tree approach. All those marginal probabilitiescan be computed by traversing the clique tree only twice. When there is only one valuenode, one pass through the clique tree is enough.Finally, no complexity analysis of EVALUATE is carried out here because a moregeneral version of EVALUATE will be given in the next chapter.Chapter 6Non-Smooth SDDN’sThe previous chapter has discussed how to evaluate a smooth SDDN. This chapter dealswith non-smooth SDDN’s. We extend the concepts of tail and body to the non-smoothcase in such a way that, as in the smooth case, optimal policies of the tail and optimalpolicies of the body together form optimal policies of the original network (Section 6.2),and thereby obtain a procedure called EVALUATE1 that is very similar to EVALUATE(Section 6.3). The correctness of EVALUATE1 is proved in Section 6.4. Both the presentation and the proof of EVALUATE1 rely upon the preparatory Section 6.1, whichdiscusses how to transform a non-smooth SDDN into an equivalent smooth SDDN by aseries of arc reversals.Several algorithms have been previously developed for evaluating influence diagrams.Being special SDDN’s, influence diagrams can also be evaluated by EVALUATE1. Section 6.5 compares EVALUATE1 with the previous algorithms for evaluating influencediagrams.6.1 Smoothing non-smooth SDDN’sAn algorithm for evaluating non-smooth SDDN’s will be given in Section 6.3. In preparation, this section shows how to transform a non-smooth SDDN into an equivalent smoothSDDN by a series of arc reversals.87Chapter 6. Non-Smooth SDDN’s 88Figure 6.19: The concept of arc reversal: At the beginning the parent set of c1 is BUB1and the parent set of c2 is BUB2U{ci}. After reversing the arc1—*c2, the parent set ofc1 becomes BUB1U2{c}and the parent set of c2 becomes BUBU. There are nographical changes otherwise.6.1.1 Equivalence between decision networksTwo decision networks are equivalent if1. They have the same decision nodes, the same policy space, and2. For each policy, they have the same expected value.Lemma 6.1 If two decision networks are equivalent, then they have the same optimalpolicies and the same optimal expected value. 1Note that a decision node can have the same decision function space in two differentdecision networks even when its parents vary from one network to the other. For example,consider the case where a decision node d has only one parent x in one decision network,while two parents Yl and Y2 in the other. If the frame of x is the same as the Cartesianproduct of the frames of Yi and of Y2, then d has the same decision function space inthe two networks. This is why, in the foregoing definition of equivalence between twodecision networks, we do not require that a decision node have the same parents in bothnetworks. This note will be useful in Chapter 9.(1) Before arc reversal (2) After arc reversal6.1.2 Arc reversalChapter 6. Non-Smooth SDDN’s 89Arc reversal is an operation that transforms one decision network into another different but equivalent decision network (Howard and Matheson 1984, Shachter 1986). Weintroduce arc reversals at two levels: first the level of skeleton, and then the level ofnumber.Let c1 and c2 be two random nodes in a decision network skeleton K. Suppose thereis an arc from c1 to c2. When there is no other directed path from c1 to c2, we say thatthe arcc1—*c2 is reversibleLet B=ir1fli2,Biir1—2,andB2=7r—f1U{ci}). In a decision network skeleton, to reverse a reversible arc c1—+c2 is to delete that arc, draw an arc from c2 to c1,an arc from each node in B2 to c1, and an arc from each node in B1 to c2. Figure 6.19illustrates this concept.Let JC’ be the decision network skeleton resulting from reversingc1—*c2 in K. Let irdenote the set of parents of a node x in K’. Thenir1=BUBiUB2U{c}andir=BUBUB.Let .iV be a decision network and K be the underlying skeleton. To reverse a reversiblearc c1—*c2 in ./V is to reverse that arc in the underlying skeleton K, and to set theconditional probabilities P(ciIir1)and P(c2ir)to be as follows:P(c2Iir)= P(ciIB,Bi,B2,) =def P(ci,c2B,B B) (6.42)P(ciIir1)= P(c2tB,Bi,B) def P(ci,cIB,Bi B (6.43)where P(ci,c2IB, B1,)=P(ciI’r(cIir,and F(ciIir1)and P(c2Iir)are in turn theconditional probabilities of c1 and c2 in .IV respectively. In (6.43), P(c2IB,B1,B2) maybe zero. When it is the case, P(ciIB,Bi,B)is defined to be constant 1.Note that arc reversals at the level of skeleton do not involve numerical computations,while arc reversals in decision networks do. The following lemma reveals some propertiesof arc reversals in decision networks, which will be useful later.Chapter 6. Non-Smooth SDDN’s 90Lemma 6.2 Suppose an arc1—*c2 in a decision network .IV is reversible. Let .iV’ be thedecision network resulting from reversing c1—*c2 in Jf. Let ir denote the set of parentsof a node x in ./V”, and let P’(cIr) denote the conditional probability of a random nodec in Jf’. Then1. For any node x that is not a random node, =2. For any random node c other than c1 and c2,= 7r and P’(cir) P(cIir);3. AndP’(cl1r1)F’(c2I7r= F(ciI7ci)P(c2h7lc).Proof: The lemma follows directly from the definition of arc reversal. CProposition 6.1 Let .iV be a decision network. Let Al’ be the decision network obtainedfrom .Al by reversing a reversible arc. Then Al’ and Ar are equivalent.Proof: According to Lemma 6.2 (1), Ar and Al’ have the same decision nodes, and thateach decision node has the same parents. So, Ar and Ar’ have the same policy space. ByLemma 6.2 (2) and (3), we have that E5[Al] = E3[Ar’] for any policy S. The propositionis thus proved. C6.1.3 Disturbance nodes, disturbance arcs, and disturbance recipientsConsider a decision network skeleton K. Suppose d is a decision node in 1C. If K is notsmooth at d, then there are arcs from the downstream set Y11(d, K) to nodes in Adisturbance node of d is a node in Y11 from which there is a directed path to at least onenode in 7rd. The arcs on such a path are called disturbance arcs of d, because they goChapter 6. Non-Smooth SDDN’s 91Figure 6.20: A non-smooth decision network skeleton.against the “stream”. The nodes in ird that are pointed to by disturbance arcs are calleddisturbance recipients of d.As an example, let K be the decision network skeleton in Figure 6.20. The downstreamset Y11(d2,JC) consists of d2, c6 and v2. The node c6 is a disturbance node of d2, the arcsc6—*c4 andc6—+c5 are disturbance arcs of d2, and c4 and c5 are disturbance recipients ofd2.Lemma 6.3 Let K be a decision network skeleton and d be an SD candidate decisionnode. Let X11 be the set of random and decision nodes in the downstream set Y11(d, IC).1. For any cEX11, 7r ç XIIU7rd.2. For anyc2EX11 and any ClElrd, if ftC contains the arc 2—.c1, then c2 and c1 areboth random nodes. So are the ancestors of c2 in X11.Proof: The first part follows immediately from the definition of downstream set.We now prove the second part. First of all, c1 cannot be a value node since valuenodes have no children and ci has the child d. Also since ir does not separate d from c2and c2 is a parent of c1, c1 can not be decision node either, for this would contradict thefact that d is an SD candidate node of ftC. Therefore ci must be a random node.Following a similar line of reasoning, one can show that c2 and its ancestors in X11are all random nodes.Chapter 6. Non-Smooth SDDN’s 92Corollary 6.1 Suppose d is an SD candidate decision node of a decision network skeleton. Then all the disturbance nodes and disturbance recipients of d are random nodes.6.1.4 Tail-smoothing skeletonsLet d be an SD candidate node in a decision network skeleton K. Suppose AC is notsmooth at d. This subsection presents a procedure for smoothing AC at d.A leaf disturbance node of d is a disturbance node of d such that none of its childrenare disturbance nodes of d.Let c be a leaf disturbance node of d. Let c—*c1, c—.c2,...,c—+c be all the disturbance arcs emitting from c. An disturbance arc c—*c is the most senior if there is noother disturbance arc c—c3 such that c3 is an ancestor of cj.Since AC is acyclic, if there are disturbance arcs emitting from c, then one of them mustbe the most senior. Since c is a leaf disturbance node of d, the most senior disturbancearc c—*cj is reversible.Procedure TAIL-SMOOTHING-K(AC, d)• Input: AC — an SDDN skeleton,d— an SD candidate of AC.• Output: An SDDN skeleton that is smooth at d.Whilel there are disturbance nodes of d, find a leaf disturbance node c, breakties arbitrarily.while2 there are disturbance arcs of d emitting from c, pick andreverse a most senior one, break ties arbitrarily. end-while2Chapter 6. Non-Smooth SDDN’s 93Figure 6.21: The application of TAIL-SMOOTHING-K to the decision network skeletonin Figure 6.20 with the input candidate node being d2: (a) alter reversing c6—c4, (b)after reversingc6—+c5.end-whilel.As an example, let IC be the decision network skeleton in Figure 6.20. Figure 6.21shows the working of TAIL-SMOOTHING-K(AC, d2). The node c6 is a leaf disturbancenode of d2. There are two disturbance arcs emitting from c6: c6—*c4 andc6—*c5, amongwhichc6—*c4 is the most senior. So, the arcc6—+c4 is first reversed, resulting in the decision network skeleton in Figure 6.21 (a). The arc c6—*c5 is reversed thereafter, resultingin the decision network skeleton in Figure 6.21 (b), which is smooth at d2.Proposition 6.2 The procedure TAIL-SMOOTHING-K terminates and is correct.A proof can be found at the end of this section.Let IC’ be the output decision network skeleton of TAIL-SMOOTHING-K(IC, d). Forany disturbance recipient r of d (in IC), the set of parents of r in IC’ is different fromthe set of parents 7rr of r in IC. In our example, ir — {d1,c4}, while 7r5 = {c4,c6}. Thefollowing lemma gives us some idea about what nodes consists of. The lemma is usefulin presenting EVALUATE1.Lemma 6.4 Let and ir,. be as in the previous paragraph and let d be the set of parentsof d in IC. Then rrflrdc1rc7rd, and each xE7r—7r is not a descendent of r in IC.(a) (b)Chapter 6. Non-Smooth SDDN’s 94A proof can be found at the end of this section.6.1.5 Tail smoothing decision networksThe arc reversals in TAIL-SMOOTHING-K are at the level of skeleton. There are nonumerical computations whatsoever. The following algorithm for smooth a decision network at a decision node is the same as TAIL-SMOOTHING-K, except now numericalcomputations are involved.Procedure TAIL-SMOOTHING(A’, d)• Input: J’/ an SDDN,d — an SD candidate of .A/’.• Output: An equivalent SDDN that is smooth at ci.Whilel there are disturbance nodes of d, find a leaf disturbance node c, breakties arbitrarily.while2 there are disturbance arcs of d emitting from c, pick andreverse a most senior one, break ties arbitrarily. end-while2end-whilel.As an example, let .iV be a decision network over the skeleton in Figure 6.20. Considerthe working of TAIL-SMOOTHING(.Af, c12). As in the case of TAIL-SMOOTHING-K,the arc c6—*c4 is first reversed, resulting in a decision network with underlying skeleton• as in Figure 6.21 (a). The conditional probabilities of c4 and c6 in the resulting networkare as follows:P(c4Idi) = P(c4Idi,c6)P(c, (6.44)Chapter 6. Non-Smooth SDDN’s 95F(c4Idi,c6)P(c)P(c6di,4)=. (6.45)P(c4di,c6)P(cThen the arc 6—*c5 is reversed, resulting in a decision network with underlying skeletonas in Figure 6.21 (b). The conditional probability of c5 in the resulting network is asfollows:P(c5di,4)-P(c54,c6)F(Idi,c= (6.46)Note that no complexity analysis of TAIL-SMOOTHING is carried out because itwill be used only in proofs, never in evaluation algorithm.6.1.6 Smoothing non-smooth SDDN’sThis subsection is for the benefit of Chapter 9; it gives an algorithm that smooths non-smooth SDDN’s.Procedure SMOOTHING(H)• Input: H — an SDDN.• Output: A smooth SDDN that is equivalent to H.If H contains no decision node, return J.f.Else1. Find an SD candidate decision node d of H.2. Call TAIL-SMOOTHING(H, d). Let Let H’ denote the resulting decision network.3. In H’, treat d as a random node’. (Thus Al’ contains one less decisionnodes than H.) Recursively call SMOOTHING(H’).Chapter 6. Non-Smooth SDDN’s 96Figure 6.22: The effects of applying SMOOTHING to the SDDN in Figure 1.7: (a) afterthe arc from seismic-structure to test-result is reversed, (b) the final SDDN, whichis smooth.As an example, consider the SDDN in Figure 1.7. The network is smooth at oil-sale-policy,so SMOOTHING does nothing in the first recursion. In the second recursion, oil-sale-policyis treated as a random node, rendering drill an SD candidate node. The SDDN is notsmooth at drill. So TAIL-SMOOTHING will enter its while loops. There is oniyone leaf disturbance node of drill, namely seismic-structure. Thus the arc fromseismic—structure to test-result is reversed, introducing an arc from oil-undergroundto test-result and an arc from test to seismic-structure. See Figure 6.22 (a). Now,oil-underground becomes a leaf disturbance node of drill. The arc from oil-undergroundto test-result is reversed, introducing an arc from test to oil-underground. The finalSDDN is shown in Figure 6.22 (b), which is smooth.Note that no complexity analysis of SMOOTHING is carried out because it will beused only in proofs, never in evaluation algorithm.‘The decision node d is treated as a random node only within the scope of SMOOTHING. It is treatedagain as a decision after the termination of SMOOTHING.(a)(b)Chapter 6. Non-Smooth SDDN’s 97Theorem 6.1 The procedure SMOOTH terminates and is correct.A proof will be provided in the next subsection.6.1.7 ProofsProof of Proposition 6.2: To prove that the procedure TAIL-SMOOTHING-K terminates, we need to show that the procedure does not get trapped in the while-loops. Theprocedure will eventually exit the inner while-loop, because the number of disturbancearcs emitting from the lea! disturbance node c is reduced by one during each executionof the ioop.The procedure will also exit the outer while-loop since reversing all the disturbancearcs emitting from a leaf disturbance node c does not produce any new disturbance nodes,and c is no longer a disturbance node thereafter. Therefore the number of disturbancenodes is reduced by one during each execution of the outer while-loop. Since there areonly a finite number of disturbance nodes, the procedure will eventually leave the outerwhile-loop.TAIL-SMOOTHING-K changes neither the downstream set nor the upstream set ofRd. So, the resulting decision network is also stepwise-decomposable.Since the procedure exits the outer while-loop only when there are no more disturbance nodes of d, the resulting network produced by the procedure is smooth at d. Theproposition is proved, CProof of Lemma 6.4: Let rr(t) be the set of parents of r at time step t during the execution of TAIL-SMOOTHING(K, d). We show by induction on t that (1) lrrfllrdclrr(t),(2) irr(t)flYi(1.C, d) 0, and (3) each XE(irr(t)fl7rd)—lrr is not a descendant of r in K.At the beginning, irr(O) = lrr. So (1) and (3) are trivially true. (2) is true because atChapter 6. Non-Smooth SDDN’s 98least one node in 7rr is in Y11(IC, d), since r is a disturbance recipient of d. Hence none ofthe nodes in ir,. can be in the upstream set Y1.Suppose (1-3) are true at time step t. Consider time step t+1. Suppose at this timestep, the disturbance arc reversed is c—*r’. If r’r, then irr(t + 1) = 7Tr(t), hence (1-3)are true. When r’ = r, let x be a node in irr(t + 1)—Tr(t). Then x must be parent of Cthat is not a parent of r. Since the arc c—ir is reversible, x cannot be a descendant of r.So x does not lead to the violation of (3). Since c is in the downstream set Y11 of lrd, xcan only be either in rd or in Y11. In both case, x does not lead to the violation of anyof (2). By the definition of arc reversal, irr(t) — rr(t + 1) = {c}. Again because c is inY11, (1) remains true. In other words, (1-3) are true for the case of t + 1. Consequently,(1-3) are true for all t’s.At the end of the execution of TAIL-SMOOTHING(K, d), tr(t) = ir. Since K’ issmooth at d, none of the nodes of ir are in the downstream set Y11, hence 7rflrd = ir.Consequently, it follows from (1-3) that Krfl7rdC7rc1rd, and each xEr— lrr is ancestorofrinK. 1We prove the correctness of SMOOTHING by induction on the number of decisionnodes. When there are no decision nodes, SMOOTHING is trivially correct. SupposeSMOOTHING is correct in the case of k—i decision nodes. Now consider the case of kdecision nodes.Let d be an SD candidate node of .iV. Let .iV’ be the output network of TAILSMOOTHING(.iV, d). According to Proposition 6.2, d remains an SD candidate node in.iV’ and V’ is smooth at d and equivalent to .iV.Treating d as a random node in .N’, we let d’ be an SD candidate node of Letbe the output network of SMOOTHING(.iV’). Then P* is equivalent to .A/’ with dregarded as a random node. Consequently, iV is also equivalent to iV’ when d is treatedChapter 6. Non-Smooth SDDN’s 99as a decision node.By the induction hypothesis, .iV is a smooth and stepwise-decomposable when d isregarded as a random node. By proposition 5.1, what remains to be proved is that whentreated as a decision node, d is an SD candidate node of and .,V is smooth at d.Let Ar” be the output network of TAIL-SMOOTHING(iV’, d’). Since .iV’ is smoothat d, the tail of .iV’ w.r.t d is not touched in the execution of TAIL-SMOOTHING(V’,d’). Thus, d is an SD candidate node in 1V” and .iV” is smooth at d.Suppose d” becomes an SD candidate node of .iV” if both d and d’ are treated as random nodes. Let .iV” be the output network of TAIL-SMOOTHING(V”, d”). Repeatingthe argument in previous paragraph, we can show that d is an SD candidate node inand .iV” is smooth at ci. Continuing the argument, we can eventually show that d is anSD candidate node in .iV and .IV* is smooth at d. The theorem is proved. Li6.2 Tail and bodyThe procedure TAIL-SMOOTHING suggests the following approach for evaluating a non-smooth SDDN .IV: Find an SD candidate node d, use TAIL-SMOOTHING to smoothAl’ at d, decompose .iV at d into a tail and a body, find an optimal decision function ford in the tail, and repeat the process for the body. An disadvantage of this approachis that TAIL-SMOOTHING demands a series of arc reversals, which may be inefficient(Shenoy 1992, Ndilikilikesha 1991). The motivation behind EVALUATE1 is to avoid arcreversals. This section paves the way to EVALUATE1.Let .iV be a decision network and d an SD candidate node in dV. In Sections 5.1and 4.1, we have defined the concepts of tail (or downstream component) and body (orupstream component) for the case when Al’ is smooth at d. In this section, we extendthe concepts of tail and body to the case when Al’ is not smooth at d.Chapter 6. Non-Smooth SDDN’s 100Figure 6.23: Tail and body for the non-smooth decision network skeleton in Figure 6.20[FINAL CHECK]: (a) shows its body w.r.t d2 and (b) shows its tail w.r.t d2.6.2.1 Tail and body at the level of skeletonWhen IC is smooth at d, there are no disturbance recipients of d. When IC is notsmooth at d, some of the nodes in ‘ird are disturbance recipients. Disturbance recipientsrequire special attention when extending the definition of tail and body to the non-smoothcase.Suppose d is an SD candidate node of IC. The tail of IC w.r.t d, denoted by IC11(d, IC)or simply by IC11, is the decision network skeleton obtained from IC by restricting IC ontoYIIU’rd and removing all those arcs among nodes in ‘1d that do not point at disturbancerecipients of d.As an example, let IC be the decision network skeleton in Figure 6.20. Figure 6.23(b) shows the tail of IC w.r.t d2. The restriction of IC onto Y11(d2,IC)Urd2 contains thefollowing three arcs among nodes in lrd2: d1—*c3,d1—c4, and c4—*c5. The arc d1—*c3,which is removed because c3 is not a disturbance recipient of d2. On the other hand, thearcs d1—+c4 and d4—*c5 are retained because both c4 and c5 are disturbance recipients ofIn the definition of tail, why do we need to handle disturbance recipients of a decisionnode d in a different manner from other parents of d? Consider, for instance, the disturbance recipient c4 of d2 in Figure 6.20. The conditional probability P(c4Idi, c6) of c4(a) (b)Chapter 6. Non-Smooth SDDN’s 101involves the node c6. Since c6 is in the downstream set Y11(d, K;), P(c4Idi,c6) is placed inthe tail (Subsection 6.2.2). Consequently, the arcd1—*c4 has to be retained. On the otherhand, c3 is not a disturbance recipient of d2. Its conditional probability P(c3di) doesnot involve nodes in the downstream set Y11, is hence placed in the body (see Subsection6.2.2). So, we delete the arc d1—*c3 from the tail.To extend the concept of body to the non-smooth case, let K;’ be the output decisionnetwork skeleton of TAIL-SMOOTHING-K(K;, d). Since K;’ is smooth at d, its body Kw.r.t d is defined (Sections 4.1 and 5.1). We define the body of K; w.r.t d to simply bethe body K; of K;’ w.r.t d, and we denote it by K;1(d, K;) or simply by K;j.As an example, let K; be the decision network skeleton in Figure 6.20. Figure 6.21(b) shows the output decision network skeleton of TAIL-SMOOTHING-K(K;,d2), fromwhich we obtain the body K;1 of K; w.r.t d2. K;1 is as shown in Figure 6.23 (a).The reader is encouraged to verify that the general definitions of tail and body (at thelevel of skeleton) given in this subsection are consistent with the corresponding definitionsfor the smooth case given Sections 4.1 and 5.1. In doing so, s/he needs to keep in mindthat in the smooth case there are no disturbance recipients.6.2.2 Tail of decision networksHaving defined tail at the level of skeleton, we can now define tail for decision networksby providing the necessary numerical information. Suppose d is an SD candidate nodein a decision network Al. Let K; be the underlying skeleton. The tail of N w.r.t d,denoted by N11(d, .iV) or simply by is a semi-decision network over K;11(d, K;). Thevalue functions of all the value nodes in Al11 remain the same as in N. The conditionalprobabilities of random nodes outside lrd also remain the same as in A/. Since d is anSD candidate node, Corollary 6.1 assures us that the disturbance recipients of d are allrandom nodes. The conditional probabilities of the disturbance recipients of d againChapter 6. Non-Smooth SDDN’s 102remain the same as in .N. The nodes in S =def{xElrd x is not disturbance recipient ofd} are all viewed as root random nodes without prior probabilities.As an example, let .iV be a decision network over the skeleton shown in Figure 6.20.Then the tail .iVjj(d2,.A/) is a semi-decision network over the skeleton shown in Figure 6.23(b). .iV contains conditional probabilities P(c4Idi,c6) F(c5c4,c6),and F(c6) of randomnodes c4, c5, and c6, which are respectively the same as the conditional probabilities ofC4, c5, and c6 in H. H11 also contains a value function 1(d2,c6) of v2, which is thesame as the value function of v2 in ./V. The root random node c3 does not have priorprobability. The node d1 is treated as a root random node without prior probability.Let X11 be the set of random and decision nodes in the downstream set Y11(d, H). LetP0(X11,7rd) be the product of all the conditional probabilities in A/11. For any subset B ofXJIU?rd, F0(B) is obtained fromP0(X11,7rd) by summing out the variables in XIIU7rd—B.Define the evaluation functional e(d, lrd) of H11 as follows:e(d,d)= p d (6.47)o(7rd, ) VEVI%lrV—lrdU{d}where V11 stands for the set of value nodes in Al11.To continue our example,Y11(d2,Af) = {d2,c6v}. So X11 = {d2,c6}. Po(XII,lrd2)isgiven byPo(XII,lrd2)=Po(d2,c6di,c345)=P(c4Idi,c6)P(5c,c)F(.So, the evaluation functional e of A/H is given bye(d2,d1,c3,c4, C5) =P(c4ldi )F(c5Ic4,)P(c6 P(c4jdi, )F(c5Ic4)P(c62(d,c6).A note about consistency in the definition of evaluation functional. According to thenote at the end of Section 3.1, when A/ is smooth at d, P0(X11 lrd) is the conditionalChapter 6. Non-Smooth SDDN’s 103probability P(XII—{d} 7rd, d). Thus Fo(rd, d)= Ex11—{d} P(XII—{d}rd, d) = 1. Consequently, when .A1 is smooth at d, the definition of evaluation functional given here is thesame as the definition given in Section 5.5.Theorem 6.2 Suppose d is an SD candidate node in a decision network Jif. Let e(d, lrd)be the evaluation functional of the tail .A/i1(d,J ). The optimal decision functions 6 ofci can be found through= arg maxcyEc2de(d=a,lrd=/3),V/3 E lrd. (6.48)A proof will be provided in Section 6.4.6.2.3 Body of decision networksAs in the case of tail, the body of a decision network A/ w.r.t to an SD candidate noded is obtained from the body of its underlying skeleton w.r.t d by providing the necessarynumerical information. Let C be the skeleton underlying .JV. The body of iV w.r.t ci,denoted by .,‘V1(d, .iV) or simply by .A[1, is a semi-decision network over 1C(d, ,AC). Thevalue functions of all the value nodes other than u remain the same as in The valuefunction j of the tail-value node u is defined by= maxaec2de(d=c,1rd=/3),V/ (6.49)The conditional probabilities of random nodes that are not disturbance recipients ofd also remain the same as in .V.What remain to be provided are the conditional probabilities of the disturbance recipients of d. Let us first note that a disturbance recipient of d has different parents inthe body 1C from in the original skeleton K. For example, the parents of c5 in Figure6.20 are c4 and c6, while in Figure 6.23 (a) the parents of c5 are d1 and c4. For anyChapter 6. Non-Smooth SDDN’s 104disturbance recipient r of d, let be the set of the parents of r in 1C. In Figure 6.23(a), for instance, ir = {dj,c4} while = {c4,c6}.Let V’ be the output decision network of TAIL-SMOOTHING(.iV, d), and let r be adisturbance recipient of d. We want to define the conditional probability P(rI7r?!) of r inA1 to be the conditional probability of r in .Al’, but the sake of computational efficiencywe do not wish to explicitly compute .iV’. The following definition resolves our dilemma.The conditional probability P(rr) of r in J is defined byFo(ir’ r)P(rI7rf) def r (6.50)rOklr)where F0 is as in the previous subsection.We shall show in Section 6.4 that F(rI7r) as defined by equation (6.50) is indeedthe conditional probability of r in j%f’. Here is an example. Recall that c4 and c arethe only two disturbance recipients of d2 in Figure 6.20. Let us compute the conditionalprobabilities of c4 and c5 in Figure 6.23 (a).Fo(c4,di)F(c4lit-)= P(c4di) d= F(c4Idi) =F(c4Idi,c6)F( (6.51)—— Po(c5,d1,c4) — F(c4Idi,6)F(csIc46)F(cF(c5r)— F(c51 1,c4) — — . (6.52)Po(di,c4) F(cIdi,c6)P(ce)A comparison between equations (6.51) and (6.52) with equations (6.44) and (6.46)reveals that the conditional probabilities of c4 and c5 obtained through equation (6.50)are indeed the same as the conditional probabilities of c4 and c5 in .1V.According to Lemma 6.4, ir,f d for any disturbance recipient r of d. This observation about r,f leads to the following formula for computing P(rr):F(rIir) = d1rUJ{} Fo(lrd) (6.53)ira—irrU{r} Fo(d)Chapter 6. Non-Smooth SDDN’s 105In words, in order compute P(rr), we can compute Po(lrd) fromP0(X11,lrd) by summingout the nodes in XIIU7rd—7rd and obtain P(rIir) through equation (6.53).Theorem 6.3 Suppose d is an SD candidate node in a decision network Jf. Then theoptimal decision functions for decision nodes other than d are the same in .A/ as in thebody A1(d, .iV).A proof will be provided in Section 6.4.6.3 The procedure EVALUATE1Theorems 6.2 and 6.3 lead to the following procedure for evaluating SDDN’s, smooth ornon-smooth.Procedure EVALUATE1(V):• Input: ./V — an SDDN, smooth or non-smooth.• Output: An optimal policy and the optimal expected value of .iV.If there are no decision nodes, call N-EVALUATE(.Af) to compute the expected value, and stop.Else1. Find an SD candidate node d,2. Compute the tail JV1 of .iV w.r.t d. Let P0 denote the product of all theconditional probabilities in iVii.(a) Compute the marginal potentials Po(lrd) and Po(rd, d), and themarginal potential Po(rd, d, ir,) for each value node v in .Nrj.Chapter 6. Non-Smooth SDDN’s 106(b) Compute the evaluation functional e(d, rd) by1e(d,lrd)= d Po(rU,1rd,d),u(1rU). (6.54)OIrd, )where V11 is the set of value nodes in JV11.(c) Compute an optimal decision function 6 of d by63) = arg maxaE2de(d=, lrd=/3),VI3EfZird. (6.55)(d) Compute the body .N1 of .iV w.r.t d (equation (6.53) is used here).3. Recursively call EVALUATE1(.A11).What is the running time of EVALUATE1? Let n be the total number of nodes iniV, k be the number of decision nodes, a be the number of arcs, e be the number of edgesin the moral graph of According to the complexity analysis of TEST-STEPWISEDECOMPOSABILITY, the time EVALUATE1 spends on finding candidate nodes andcomputing tails and bodies is O(k2(n + e)).If we use the clique tree propagation approach to compute the marginal potentialsin step (a), we need only to traverse the clique tree twice. If there are 1 cliques and themaximum number of nodes in a clique is q, the runing time is O(lA), where A stands forthe maximum number of values a variable can assume. So, EVALUATE1 spends O(klA)time computing marginal potentials.The time for computing the evaluation functional and optimal decision functions fromthe evaluation functional is dominated by the time for computing marginal potentials,except for the numerical divisions. For each (candidate) node d, the factor Po(lrd, d) isdivided from an expression to arrive at the evaluation functional e(7rd, d). Numericaldivisions also happen once for each disturbance recipient in the computation of body2.2This can be avoided via a subtle technical trick.Chapter 6. Non-Smooth SDDN’s 1076.4 Correctness of EVALUATE1To prove the correctness of the procedure EVALUATE1, it suffices to show that Theorems6.2 and 6.3 are true.Proof of Theorem 6.2: Let A/’ be output network of TAIL-SMOOTHING(V, ci).Then d is also an SD candidate node of Al’ and iV’ is smooth at d.Let P be the product of the conditional probabilities in the tail .iV1(d, Al’). Accordingthe Theorem 5.3, optimal decision functions & for d can be found through= arg maxE1de(d=a,7rd=/3),VIE (6.56)where the evaluation functional e’(d, lrd) of JV is given bye(c,/3) = P(7r,7rd,d)1t(7r) Va e Qd,V/3 E 1Td, (6.57)UéVH lrV—lrdU{d}where V11 stands for the set of value node in Jv7.Let P0 be the product of all the conditional probabilities in the tail .iV11(d,JV). ByLemma 6.2, we conclude that arc reversals do not change joint probabilities. Hence theydo not change conditional probabilities either. Consequently, for each value node v inAl,1 (or in .A71) we havePo(rI1rd,d) = P(1rVI1rd,d).Since .A/’ is smooth at d, we haveP(7rIrd,d) =Therefore= Po(1rIrd,d). (6.58)Chapter 6. Non-Smooth SDDN’s 108Consequently,e’(d,lrd) = Fo(irVI7rd,d)iV(7rV)vEV lrt,—lrdU{d}=d EPo(7rV,1rd,d)(?rV)Po(lrd, ) v’11 ir= e(d, rd), (6.59)where e(d, lrd) is the evaluation functional of J’/ri. This proves Theorem 6.2.Proof of Theorem 6.3: Let .A/’ be the output network of TAIL-SMOOTHING(iV, d).Then d is also an SD candidate node of J’/’ and V’ is smooth at d. Let ir. be the setof parents of a node x in .A/’ and let F’(cIir) be the conditional probability of a randomnode c in Jv”.Let RC be the skeleton underlying ,‘V. Recall that in the definition of A/1, we executedTAIL-SMOOTHING-K(K, d). Suppose the ties were broken in the same way in both theexecution of TAIL-SMOOTHING-K(AC, d) and the execution TAIL-SMOOTHING(V,d). Then for any node x in JV, other than the tail-value node, ir = ir. In particular, forany disturbance recipient r of d in V, = ir.Because of Theorem 5.3, it suffices to show that the body .iV1(d, .iV) of .iV is the sameas the body .iV(d, A1’) of .Af’.First of all, because of equation (6.59) the value function of the tall-value node in ./fis the same as the value function of the tail-value node in JVE.What remains to be proved is that for any disturbance recipient d of d,P’(rIir) = F(rir). (6.60)Let R be the set of all the disturbance recipients of d in .A[. Let C11 be the setof random nodes in the downstream set Y11(d,iV). Consider the product of all theconditional probabilities of nodes in C11 U R. According to Lemma 6.2, this product isChapter 6. Non-Smooth SDDN’s 109not changed by the arc reversals in TAIL-SMOOTHING(d, .iV). ThusH P(cIir) = H P’(cI7rj.cEC11UR cEC11URSumming out all the nodes in C11 from both sides of the equation, we getPO(lrd)= II P’(cr).céRThus for any rER, we haveF’(rIir) = 1rd—({r}u1r) Po(rd) = P(rIr).Z7d—l. Po(lrd)Theorem 6.3 is therefore proved.6.5 Comparison to other approachesInfluence diagrams are special SDDN’s and hence can be evaluated by EVALUATE1.This section compares EVALUATE1 with previous approaches for evaluating influencediagrams. We identify a list of desirable properties and examine EVALUATE1 and eachof the previous approaches with regard to those properties.6.5.1 Desirable properties of evaluation algorithmsA list of desirable properties of algorithms for evaluating influence diagrams is given inthe first row of Table 6.1. Due explanations follow.Chapter 6. Non-Smooth SDDN’s 110Table 6.1 Comparisons among approachs for evaluating influence diagrams.facilitating divide separating multiple reversingarc and BN value arcsremoval conquer inference nodesEVALUATE1 yes yes yes yes noShachter 86 no no no no yesNdilikilikesha 92 no no no no noTatman and no no no yes yesShachter 90Shachter 88 no no yes no noShenoy 90 n/a no no no noShenoy 92 n/a no no yes noFacilitating the pruning of removable arcsIn a decision network, an arc into a decision node is removable if its removal does notaffect the optimal expected value of the network. In Chapter 7, we shall present analgorithm that prunes from an influence diagram all the removable arcs that can begraphically identified.There are a couple of advantages to pruning removable arcs: it results in a simplernetwork, and it reduces the sizes of the decision tables. Thus a desirable property for anevaluation algorithm to possess is to be able to facilitate the pruning of removable arcs.It will be shown in Chapter 7 that pruning graphically identifiable removable arcsfrom influence diagrams results in SDDN’s. Since EVALUATE1 is designed for evaluatingSDDN’s, it facilitates the pruning of removable arcs from influence diagrams.Chapter 6. Non-Smooth SDDN’s 111Divide and conquerIt is desirable to decompose, prior to evaluation, an influence diagram into (overlapping)portions such that each portion corresponds to a decision node and optimal decisionfunctions of a decision node can be computed in its corresponding portion. This is anapplication of the standard divide and conquer idea.Suppose the influence diagram to be evaluated has been put through a preprocessingstage such that removable arcs have been pruned. Let .iV be the resulting SDDN. Theprocedure EVALUATE1 evaluates .iV recursively. At the first step of the recursion,EVALUATE1 finds an SD candidate node d and cuts .iV into two portions: the tail jViiand the body JVj’. EVALUATE1 computes an optimal decision function of d in J’f, andthen repeats the process for the body .A11 In this sense, we say that EVALUATE1 worksin a divide and conquer fashion.Separating Bayesian network inferenceWhen evaluating an influence diagram, it is desirable to separate Bayesian network (BN)inference from other computations. There have been intensive research on BN inference,and systems have been built. If an influence diagram evaluation approach can clearlyseparate BN inference from other computations, then it can be implemented on top ofany system for BN inference. This is an application of the principle of separation ofconcern and the modularity principle.EVALUATE1 clearly separates BN inference from other computations; all the BNinference tasks — the tasks of computing the marginal potentials Po(?rd), PoQrd, d), andPo(lrd, d, 7r) — are collected in step (a). We have been suggesting to use the clique treepropagation method to compute the marginal potentials. However, other methods canbe used as well.Chapter 6. Non-Smooth SDDN’s 112Arc reversals and numerical divisionsNumerical divisions are slower than additions and multiplications; they should be avoidedwhen possible. Arc reversal implies numerical divisions; they should also be avoided ifpossible.The procedure EVALUATE1 does not require arc reversals. Furthermore, the onlytimes EVALUATE1 does numerical divisions are when computing the evaluating functional e(d, Trd) and the conditional probability P(rIir) of a disturbance recipient r ofsome decision node.Multiple value nodesWhen the decision maker’s utility function can be separated into several components(Tatman and Shachter 1990), it is important to take advantage of the separability byhaving multiple value nodes. This may imply substantial speed up of computation.EVALUATE1 is designed for dealing with multiple value nodes.6.5.2 Other approachesThis subsections examines the approaches by Shachter (1986), Ndilikilikesha (1992), Tat-man and Shachter (1990), Shachter (1988), Shenoy (1990), Shachter and Peot (1992), andShenoy (1992). The approaches by Howard and Matheson (1984), and Cooper (1989)will be discussed in Chapter 9.Things that can be said for allUntil now, influence diagrams have always been assumed to be no-forgetting; there havebeen no methods for dealing with influence diagrams that violate the no-forgetting constraint. Even though several authors (Shachter 1988, Tatman and Shachter 1990, andChapter 6. Non-Smooth SDDN’s 113Shenoy 1992) have noticed and to some extent made use of the fact that some decisionnodes may be independent of some of their parents, no one has proposed to prune removable arcs at the preprocessing stage. The reason is that pruning arcs from an influenceresults leads to the violation of the no-forgetting constraint.Shenoy (1990) and (1992) proposes a new representation for decision problems, namelyvaluation-based systems. In this representation, the issue of removable arcs does notoccur. We will come back to this point later.Probably because they do not prune removable arcs by preprocessing, none of theprevious approaches work in a divide and conquer fashion. The method by Shenoy(1990, 1992) does not work in a divide and conquer fashion either. The adoption of adivide and conquer strategy is the most important advantage of EVALUATE1 has overthe previous approaches.The rest of this subsection examines the previous approaches with regard to the threeremaining properties: separating BN inference, multiple value nodes, and arcs reversals.Shachter (1986), Ndilikilikesha (1992), and Tatman and Shachter (1990)Before Shachter (1986), influence diagrams are evaluated in two stages—first transformthem into decision trees, and then evaluate the decision trees (Howard and Matheson1984). Shachter (1986) shows that influence diagrams can be evaluated without thetransformation into decision trees, and presents an approach that evaluates an influencediagram by properly applying four operations: barren node removal, arc reversal, randomnode removal, and decision node removal.As shown in the third row of Table 6.1, the approach by Shachter (1986) does notseparate BN inference, does not deal with multiple value nodes, and requires arcs reversals.By generalizing influence diagram into potential influence diagrams, the approachChapter 6. Non-Smooth SDDN’s 114by Ndilikilikesha (1992) is able to evaluate an influence diagram by using only threeoperations: barren node removal, random node removal, and decision node removal. Theoperation of arc reversal is avoided. However, this approach still does not separate BNinference and does not deal with multiple value nodes (see the fourth row of Table 6.1).Tatman and Shachter (1990) generalizes influence diagrams in another direction forthe sake of dealing with multiple value nodes. The evaluation approach is very much likeShachter (1986), except that it has one more operation, namely the operation of mergingvalue nodes. This approach does not separate BN inference, and it requires arc reversals(see the fifth row of Table 6.1).Shachter (1988)Let d be an SD candidate node in an influence diagram Al, and let v be the only valuenode in A/. Shachter (1988) and (1990) has noticed that optimal decision functions8° :—f of d can be obtained through8(/3) = arg maxOEcdE[vI7rd = /3, d = a], (6.61)for each 3 E 1rd.Further in this direction, Shachter and Peot (1992) (first half) proposes a way to scalethe value function and change v into a observed random node, denoted by u (see alsoCooper 1989). Formula (6.61) is transformed into= arg maxaeczdP(7rd = /3 d = alu = 1). (6.62)Thus, this approach separates BN inference.Even though Shachter (1990) points out the possibility that the conditional expectation E[v7rd = /3, d = a] can be computed in one portion of the original network, thealgorithm proposed by this paper does not work in a divide and conquer fashion. AfterChapter 6. Non-Smooth SDDN’s 115the optimal decision function for d is computed, the decision node d is replace by a deterministic random node characterized by the optimal decision function. The resultinginfluence diagram contains one less decision nodes, but has the same number of nodes asthe original network.Finally, this approach deals with only one single value node. See the sixth row ofTable 6.1.Shenoy (1990), (1992), and Shachter and Peot (1992)Shenoy (1990), (1992) propose a new representation for Bayesian decision problems,namely valuation-based systems. While a decision network consists of an acyclic directedgraph, a set of conditional probabilities, and a set of value functions, a valuation-basedsystem consists of an (undirected) hypergraph graph with a precedence relation, a set ofpotentials, and a set of valuations. The no-forgetting constraint is enforced by requiringthe precedence relation to be transitive.- Influence diagrams can be readily represented as valuation-based systems.Shenoy (1990) develops an approach for evaluating a valuation-based system by modifying the clique tree propagation algorithm for BN inference. No arc reversals are requires in this approach. The approach was developed for the case of multiple value nodes.However, there is an error. The paper concludes that the combination operation is commutative, but it is not. Consequently, the approach works only for the case of one singlevalue node. Also, the approach does not separate BN inference (see the seventh row ofTable 6.1).Shachter and Peot (1992) (second half) present an algorithm for evaluating influencediagrams that is very similar to Shenoy (1990).Shenoy (1992) proposes a node removal approach for valuation-based systems. ThisChapter 6. Non-Smooth SDDN’s 116approach deals with multiple value nodes. It requires no arc reversals. However, numerical divisions become necessary when removing a random node that appears in at leastone valuation, but not in all valuations. Thus the approach requires more numericaldivisions than EVALUATE1, when there are at least two value nodes. When a randomnode to be removed appears in all the valuations, the valuations are combined into onesingle valuation. Thus, the approach makes less use of separability in the utility functionthan EVALUATE1.This approach does not separate BN inference (see the eighth row of Table 6.1).In a decision network, a decision is presumably to be made based on the values ofthe parents of the decision node. When a decision d is independent of a certain parent d,then the arc c—d is removable (Chapter 7). Thus arises the issue of removable arcs. Ina valuation-based system, on the other hand, the set of variables that a decision dependsupon is not explicitly specified. It is up to the evaluation algorithm to find it out. Thus,there is no issue of removable arcs here. This is why in Table 6.1 we state the issue ofremovable arcs does not apply to Shenoy (1990) and Shenoy (1992).An overhead of our approachOur approach has an overhead. Before doing any numerical computation, we need toidentify removable arcs, and figure out the tail and the body. On the other hand, mostprevious approaches go directly to numerical computations after little graphical preprocessing.According to the complexity analysis at the end of Section 6.3, the overhead takesO(k2(ri + e)) time. Our believe is that in many case, this overhead may help us cutdown the time O(l)) for numerical computations, which is usually of higher order thanO(k2(ri + e)).As a final note, let us point out the previous algorithms for evaluating influenceChapter 6. Non-Smooth SDDN’s 117diagrams can possibly be modified to evaluate SDDN’s.Chapter 7Removable arcs and independence for decision nodesGiven a decision network, there are often nodes and arcs that can be harmlessly removed,in the sense that their removal does not affect the optimal expected value of the network.It is a good idea to prune such nodes and arcs at the preprocessing stage because doingso simplifies the network. It is well known that barren (random and decision) nodes areremovable (Shachter 1986). This chapter addresses the issue of removable arcs in thesetting of SDDN’s.We begin by establishing the equivalence between removable arcs and independenciesfor decision nodes (Section 7.1), which is of fundamental importance to this chapter.Section 7.2 introduces lonely arcs a class of removable arcs that can be graphicallyidentified. In Section 7.3, we show that deleting lonely arcs from an SDDN does notdestroy its stepwise-decomposability. Section 7.4 introduces the concepts of potentiallonely arcs and of potential barren nodes to deal with the interaction between lonelyarcs and barren nodes. Finally, a pruning algorithm is given in Section 7.5. In thenext chapter, we shall show that this algorithm prunes all the removable arcs that aregraphically identifiable.Before starting the exposition, let us point out that the issue of removable arcs cannotbe addressed in influence diagrams, since deleting arcs from an influence diagram maylead to the violation of the no-forgetting constraint.118Chapter 7. Removable arcs and independence for decision nodes 1197.1 Removable arcs and conditional independencies for decision nodesIn a decision network, an arc into a decision node is removable if its removal does notaffect the optimal expected value of the network. In a decision network skeleton SAC, anarc into a decision node is removable if it is removable in every decision network over K.A decision table is a decision function represented in the form of a table. In a decisiontable of a decision node d, there is one dimension in correspondence to each parent ofd. One particular dimension b is irrelevant to d if fixing the values of all the otherdimensions, no matter which value b takes, the value for d remain the same.In a decision network, a decision node d is independent of one particular parent bgiven all the other parents of d if there exists an optimal decision table of d in which theb-dimension is irrelevant. When it is the case, we shall write Id(d, bI7r), where ir standsfor lrd—{b}. In a decision network skeleton K, a decision node d is independent of oneparticular parent b given all the other parents of d if it is so in every decision networkover PC.The following proposition reveals the relationship between removable arcs and conditional independencies for decision nodes, which is the corner stone of this chapter.Proposition 7.1 Let .A/ be a decision network, d be a decision node and b a parent ofd. Then the arc b—*d is removable if and only zf d is independent of b given all the otherparents of d.Proof: Let d1, d2, ..., dk be all the decision nodes in jV. Suppose d is d. We shall writefor 7rd—{b}.We first show that if b—*d is removable, then Id(d, bIir). Let Al’ be the decisionnetwork obtained from Al by removing the arc b—i.d. Since b—d is removable, E[Al] =Chapter 7. Removable arcs and independence for decision nodes 120Let 6’=(6,. . . , 6) be a policy for ./f’. Let 6 be the decision function of d, in 6’. Then6 is a mapping 6 : f—2d• Construct a policy 6 for .V from 6’ by extending 6 to bethe mapping Sj !Zrd—1d, such that6(7rd) =Then we haveEs[V] = Esi[V’J.Now letting 6’ be an optimal policy of .,V’, we getE5[ = E5[iV’] = E[iV’] = E.Therefore 6 is an optimal policy for .iV. Noticing that 6 is independent of b, we getId(d, bI7r.).We now show that if Id(dI, b[?r.), then b—*d is removable. Since Id(d, bI7r), thereexists an optimal policy 6 for V such that the decision function 6j of d is independent ofb. Construct a policy 6’ for .Ai’ as follows: let the decision functions of all decision nodesother than d be the same as in 3; and let the decision function 6’ : !.—*fd of d besuch that6(ir) =This definition is valid because S(7rd) is independent of b. It follows from the definitionof 6’ that E5 [V’] = E5 [.A/j. Consequently,E[V’] E5i[V’] = = E[V].On the other hand, we have shown in the first part of the proof that for any policy 6’for ./V’, there is a policy S for .jV such that E6 [iV] = E6[iV’]. Hence E [.iV’} E [N].Therefore E[JV’j = E[.A/]. Consequently, the arc b—*d is removable. CChapter 7. Removable arcs and independence for decision nodes 121Figure 7.24: Removable arcs and removable nodes.7.2 Lonely arcsThis section introduces lonely arcs— a class of removable arcs that can be graphicallyidentified.Suppose IC is a decision network skeleton. Let d be a decision node in IC, and let bbe a parent of d. The arc b—d is said to be accompanied if there exist at least one edgein the moral graph m(IC) of IC that connects & and some nodes in the downstream setY11(d, ,AC). When it is the case, we say that such edges accompany the arc b—+d. Thearc b—*d is lonely if it is not accompanied. In a decision network .,V, an arc b—.d into adecision d is lonely if it is lonely in the underlying skeleton.For example, in the decision network skeleton shown in Figure 7.24 (1), the downstream set Yji(d3,IC) is the singleton {v2}. Since the arc2—v is in IC, there is the edge(c2,v2) in m(IC), which accompanies the arc2—d3. However, the arc 3—d is lonely.The following two lemmas exhibit some basic properties of lonely arcs.Lemma 7.1 Suppose IC is a decision network skeleton, and d is an SD candidate nodein IC. An arc b—+d is accompanied if and only if b is(1) (2)(3) (4)Chapter 7. Removable arcs and independence for decision nodes 122• a parent to a random node in the downstream setY11(d,,kC), or• a parent to a value node in the downstream setY11(d,K), or• a disturbance recipient in d, OT• a parent to a disturbance recipient in 7rd.Lemma 7.2 Suppose K is a decision network skeleton. Let d and d’ be two differentdecision nodes in JC. Suppose d’ is an SD candidate node. Then an arc b—d is a lonelyarc in K if and only if it is a lonely arc in the bodyK1(d’,!C). ETheorem 7.1 Suppose K is an SDDN skeleton. If an arc b—d into a decision node dis lonely, then d is independent of b. Consequently, the arc b—pd is removable.Proof: Let .IV be a decision network over K. We need to show that d is independent ofb in .Ai.By Lemma 7.2, we can assume, without losing generality, that d is an SD candidatenode. Let JV11 be the tail of .A/ w.r.t d. Let P0 denote the joint potential over therandom and decision nodes in .JV11, i.e the product of the conditional probabilities of allthe random nodes in the downstream set Y11(d, .V) and of the disturbance recipients inSince the arc b—+d is lonely, by Lemma 7.1 b can be neither a disturbance recipient,nor a parent to a disturbance recipient in 1rj, nor a parent to random node in Y11(d, N).Thus, P0 is independent of b.Again because b—.d is lonely, by Lemma 7.1 b cannot be a parent to any value nodesin Y11(d, N). Hence, all the value functions in JV11 are independent of b.Putting those two points together, we get that the evaluation functional e(d, 7rd) (seeequation (6.47)) is independent of b. According to equation (6.55), the optimal decisionfunction of d is independent of b. The theorem is proved.Chapter 7. Removable arcs and independence for decision nodes 123In Figure 7.24 (1), the arc 3—d is lonely, hence removable. The removal ofc3—*dgives us the decision network skeleton in Figure 7.24 (2).The following corollary is obtained from the proof of Theorem 7.1.Corollary 7.1 Suppose b—+d is a lonely arc in an SDDN .,/. Let .iV’ be the decisionnetwork obtained from .N’ by removing the arc b—*d. Then, .N’ and A/ have the sameoptimal decision tables for all the decision nodes other than d. Furthermore, the optimaldecision tables for d in .,V’ can be obtained from the optimal decision tables for d in .iVby shrinking the irrelevant b-dimension. C7.3 Pruning lonely arcs and stepwise-solvabilityIn order to repeatedly apply Theorem 7.1, we need the following theorem.Theorem 7.2 The decision network skeleton resulted from pruning a lonely arc from anSDDN skeleton is again stepwise- decomposable.Proof: Let AC be an SDDN skeleton and b—>d be a lonely arc. Let AC’ be the resultingskeleton after removing b—÷d from AC. We prove that AC’ is stepwise-decomposable byinduction on the number k of decision nodes in AC. When k=J, AC’ also contains only onedecision node; hence is stepwise-decomposable.Suppose AC’ is stepwise-decomposable if k=m—1. Now consider the case of k=m. Letd’ be a candidate node of AC. There are two cases depending on whether d’=d. Let usfirst consider the case when d’d. According to Lemma 7.2, b—+d is also a lonely arcin the body AC1(d’, AC). Let AC’S be the resulting decision network skeleton after removingb--*d from AC1. By the induction hypothesis, AC is stepwise-decomposable. It is easy tosee that AC is the body of AC(d’, AC’). By Lemma 5.2, AC’ is also stepwise-decomposable.Chapter 7. Removable arcs and independence for decision nodes 124Now consider the case when d’td. Since there are no edges in m(K) that connect band nodes in the downstream set Y1(d,K), the set Y11(d,,AC’) is the same as Y11(d,K).So, d is also a candidate decision node of K’.The body K(d, K;’) is different from the body K;1(d, K:) only in that in K thereis no arc from b to the tail-value node u, while there is in K1r. Since K is stepwisedecomposable, so must be K’s,. By Lemma 5.2, K:’ is also stepwise-decomposable.7.4 Potential lonely arcs and barren nodesIn a decision network skeleton, a barren node is a random or decision node that doesnot have any children. In the following, we shall distinguish decision barren nodes andrandom barren nodes. The node c3 in Figure 7.24 (2) is a random barren node.Proposition 7.2 (Shachter 1986) A barren node may be simply removed from a decisionnetwork. If it is a decision node, then any decision function is optimal.Now we recursively define potential lonely arcs and potential barren nodes. A potentiallonely arc is a lonely arc or an arc that becomes lonely after the removal of some barrenand potential barren nodes, and the removal of some lonely and potential lonely arcs. Apotential barren node is a barren node or a node that becomes barren after the removalof some lonely and potential lonely arcs, and the removal of some barren and potentialbarren nodes.Going back to our example, after the removal ofc3—*d, the the node c3 becomesbarren, and hence can be removed. After the removal of c3,c1—*d2 becomes lonely. Afterthe removal ofc1—*d2 , c1—*d becomes lonely.Here is a corollary of Theorem 7.1.Chapter 7. Removable arcs and independence for decision nodes 125Corollary 7.2 Suppose IC is an SDDN skeleton. If an arc b—*d into a decision noded is a potential lonely arc, then d is independent of b. Consequently, the arc b—d isremovable.7.5 An algorithmThis section presents the algorithm PRUNE-REMOVABLES, which prunes all the potential lonely arcs and potential barren nodes in an SDDN skeleton.Procedure PRUNE-REMOVABLES(IC)• Input: IC an SDDN skeleton.• Outputs: An SDDN skeleton that does not contain potential arcs andpotential barren nodes.1. If there is no decision node in IC, output IC and stop.2. Else• Find and remove all the barren nodes;• Find a candidate node d of IC, and compute the downstream setY11(d,IC) of lrd.• Find and remove all the lonely arcs among the arcs from rj to d.Let IC’ be the resulting skeleton.• In IC’, treat d as a random node1 (thus IC’ contains one less decisionnode than IC) and recursively call PRUNE-REMOVABLES(IC’).As an example, consider the SDDN skeleton in Figure 7.24 (1). There are no barrennodes at the beginning; and d3 is the only candidate node. The downstream set IC11(d3,IC)1The node d is treated as a random node only within the scope of PRUNE-REMOVABLES.Chapter 7. Removable arcs and independence for decision nodes 126is the singleton {v2}. One can see that c3—*d is the only lonely arc. After the removalofc3—>d, the skeleton becomes as shown in Figure 7.24 (2), where C3 is a barren node.After the removal of c3, we get the skeleton in Figure 7.24 (3). Since d3 is now treatedas a random node, d2 becomes a candidate node. The arcc1—*d2 is lonely, and hence isremoved. Thereafter, d2 is also treated as a random node, rendering d1 a candidate node.The arc c1—*d is lonely and hence removed. The final skeleton is shown in Figure 7.24(4).Let K;’ be the output decision network skeleton of of PRUNE-REMOVABLES(K;).How is K;’ related to K; in terms of decision tables? Let V and .,V’ be decision networksover K; and K;’ respectively such that in both Al and Al’ each variable has the same frame,each random variable has the same conditional probability, and each value node has thesame value functions. By repeatedly applying Corollary 7.1, we can conclude that theoptimal decision tables for Al’ can be obtained from those for Al by deleting irrelevantdimensions.Finally, let us consider the complexity of PRUNE-REMOVABLES. Let n be the totalnumber of nodes in K;, k be the number of decision nodes, a be the number of arcs, ebe the number of edges in the moral graph of K;, and p be the maximum number ofparents of a decision node. Finding all the barren nodes takes 0(a) time. Accordingto the complexity analysis of TEST-STEPWISE-DECOMPOSABILITY, finding an SDcandidate node and computing its downstream set takes 0(k(n + e)) time. To find lonelyarcs, we need to check, for each node z in if x is connected to at least one node in thedownstream set Y11(d, K;), which can be done in O(pn) time. So, the total complexity ofPRUNE-REMOVABLE is 0(k(k(e + n) + pn)) = O(k2e+ k2n + kpn).Chapter 8Stepwise-solvability and stepwise-decomposabilityWe have shown in Section 5.2 that if a smooth decision network is stepwise-decomposable,then it stepwise-solvable. In this chapter, we go further to prove that if a decision network skeleton, smooth or non-smooth, is stepwise-decomposable, then if it is stepwisesolvable. More importantly, we show that under “normal” circumstances if a decision network skeleton is stepwise-solvable, then it is stepwise-decomposable (Section 8.8). Thus,stepwise-decomposability is the weakest graphical criterion that guarantees stepwisesolvability.According to Corollary 7.2, potential lonely arcs are removable. In this chapter, wealso show that potential lonely arcs are all the removable arcs that can be graphicallyidentified (Section 8.7).The proof technique is induction on the number of random nodes and on the numberof decision nodes. In order to do induction on the number of random nodes, we needthree operations on decision network skeletons, namely short-cutting (Section 8.2), rootrandom node removal (Section 8.3), and arc reversal (Section 8.4). Section 8.5 shows howthose three operations fit together. An induction apparatus on the number of decisionnode is given in Section 8.6. Let us begin with the concept of normal decision networkskeletons.127Chapter 8. Stepwise-solvability and stepwise-decomposability 128Figure 8.25: An abnormal decision network skeleton (1), and an normal equivalent skeleton (2).8.1 Normal decision network skeletonsA decision network skeleton AC is normal if for any decision node d, there is a directedpath from d to each value node in the downstream set Y11(d, AC) of A decision networkis normal if its underlying skeleton is.As an example, let AC be the decision network skeleton in Figure 8.25 (1). Y11(d,AC)contains all the nodes except c1. In particular, v2EY11. But there is no directed pathfrom d1 to v2. So AC is abnormal. On the other hand, the decision network skeleton inFigure 8.25 (2) is normal.What is the intuition behind this concept of normality? Consider a decision node dand a value node v in a decision network A/. Given any policy 6 of .jV, let P3 be the jointprobability 6 induces over all the random and decision nodes of .A/. The expected valueE3[v] of v is given byE3[v] =where 1u, stands for the value function of v. According Proposition 3.1, if there is nodirected path from d to v, then d is irrelevant to P(r) and hence to E5[v]. In otherwords, d can influence v only when there exists a directed path from d to v.(1) (2)Chapter 8. Stepwise-solvability and stepwise-decomposability 129Intuitively a normal decision network is one where each decision node d can influenceall the value nodes that are not rn-separated from ci by the parents of d. In other words,all those value nodes that d can not influence are rn-separated from d by 7rd.An abnormal decision network skeleton can be stepwise-solvable even when it is notstepwise-decomposable. For example, the decision network skeleton in Figure 8.25 (1)is not stepwise-decomposable. However it is stepwise-solvable. To see this, let iV be anarbitrary decision network over the skeleton. Construct a decision network .iV’ over theskeleton in Figure 8.25 (2) such that c has the same frame and conditional probabilityas c2. For any policy 6, let P5 be the joint probability 6 induces over all the random anddecision nodes in jV, and P be the joint probability 6 induces over all the random anddecision nodes in .A/’. By Proposition 3.1, we have that= Ps(c2,c3)= P(c2,c3)=Thus the expected value of v1 under 6 in A/ is the same as that in .AP. By the sameline of reasoning, we can show that the expected value of v2 under S in ./V is the same asthat in .iV’. Therefore .iV and A/’ are equivalent. Since .,V’ is stepwise-decomposable, itis stepwise-solvable. Therefore iV is also stepwise-solvable.The main goal of this chapter is to show that a normal decision skeleton with nobarren nodes and no lonely arcs is stepwise-solvable only if it is stepwise-decomposable.We also show that a normal SDDN skeleton with no barren nodes contains removablearcs only if it contains potential lonely arcs.The reader may ask: what about abnormal decision network skeletons? We conjecturethat abnormal decision network skeletons can always to transformed into “equivalent”normal skeletons. For example, the decision network skeleton in Figure 8.25 (1) can betransformed into the one in Figure 8.25 (2). However, we have not been able to preciselyformulate and prove the conjecture.Chapter 8. Stepwise-solvability and stepwise-decomposability 130/CA A(1)Figure 8.26: Short-cutting. The random node c in (1) is short-cut, resulting in thedecision network skeleton in (2).8.2 Short-cuttingThis sections introduces the operation of shorting cutting random nodes from a decisionnetwork skeleton. The properties of the operation with regard to induction are explored.Short-cutting is the first of the three operations that are needed to facilitate inductionon the number of random nodes.Before getting started, however, let us make a note about notation usage in thischapter. Applying the operation of short-cutting, or any other operation, on a decisionnetwork skeleton AC results in another decision network skeleton AC’. We shall let tr and7t to denote the set of parents of x in AC and in AC’ respectively. Let N and N’ be decisionnetworks over AC and AC’. We shall denote the conditional probability in N of a randomnode c by P(cr) and the value function in N of a value node v by ir1,). Similarly, weuse F’(cIir) and 4(ir) to denote the conditional probability of c and the value functionof v in N’ respectively.Let AC be a decision network skeleton. Let c be random node in AC such that c hasat least one parent. To short-cut c is to delete c from AC, and draw an arc from everyparent of c to each child of c. Figure 8.26 illustrates this concept. We see that after theshort-cuting, every child of c inherit all the parents of c.(2)Chapter 8. Stepwise-solvability and stepwise-decomposability 131The main task of this section is to prove Propositions 8.1 and 8.2, which are constructing blocks of our induction mechanism. We first present two lemmas.Lemma 8.1 Let IC be a decision network skeleton and c be a random node in IC thathas at least one parent. Let K;’ be the decision network skeleton obtained from K; byshort-cutting c. If c is not a barren node, then for any two nodes x and y in K;’,1. There is a directed path PATHI from x to y in K;’ if and only if there is a directedpath from x to y in K; that consists of the same nodes as PATHJ with the possibleaddition of the node c.2. There is a path PATH2 between x and y in the moral graph rn(AC’) if and only ifthere is a path between x and y in the moral graph m(K;) that consists of the samenodes as PATH2 with the possible addition of the node c.Proof: The lemma follows directly from the definition of short-cutting. EJLemma 8.2 Let K; be a decision network skeleton and c be a random node in K; thathas at least one parent. Let K;’ be the decision network skeleton obtained from K; byshort-cutting c. Let d be a decision node in K; (or equivalently in K;’).1. If c is in the upstream set Y1(d,K;), then ir = lrd, and Y11(d,K;’) =Y11(d,K;).2. If c is in the downstream setY11(d,K;), then rr = 7rd, andY11(d,K;’) = Yii(d,K;)—{c}.9. If cElrd, then = (7rd—{c})U1r. Furthermore if none of the parents of c areY11(d,IC) , thenY11(d,K;’) =Y11(d,K;).Proof: The lemma follows directly from Lemma 8.1 and the fact the the downstreamset Y11(d, K;) consists of all the nodes in K; that are not rn-separated from d by ire. UChapter 8. Stepwise-solvability and stepwise-decomposability 132Proposition 8.1 Let 1C be a decision network skeleton, and let c be a random nodewhich has at least one parent. Let K’ be the decision network skeleton obtained from /Cby short-cutting c.1. If IC does not contain any barren nodes, neither does IC’.2. If IC normal, so is IC’.3. If/C stepwise-decomposable, so is IC’.. Suppose JC is stepwise-decomposable and contains no barren nodes, and supposethat if cir for some decision node d, then none of the parents of c are in thedownstream set Y11(d,IC). Then when IC does not contain any lonely arcs, neitherdoes IC’.Proof: To show item 1, suppose a decision or random node x is not barren in IC. Thenit has at least one child y. If y = c, then the children of c in IC become the parents of x.Otherwise, y remains a child of x in IC’. In either case, x has at least one child; hence IC’contains no barren nodes.By Lemma 8.2, we have that for any decision node d,Y11(d,IC’) Y11(d,IC). (8.63)Together with Lemma 8.1, this proves item 2.To show item 3, we notice that because of equation (8.63), if a decision node d is anSD candidate decision node of IC, then it is also an SD candidate decision node of AC’.First consider the case when cEY11(d, IC). In this case the body K(d, IC’) of K’ w.r.td is the same as the body JC1(d, AC) of IC w.r.t d. If IC is stepwise-decomposable, then sois AC1, and hence IC. By Lemma 5.2, IC’ is also stepwise-decomposable.On the other hand, when cYji(d, IC), then ICr is the same as the resulting decisionnetwork skeleton after short-cutting c from ICE. If AC is stepwise-decomposable, so isChapter 8. Stepwise-solvability and stepwise-decornposability 133K;1. Consequently we can assume, as an induction hypothesis, that K; is stepwisedecomposable. By Lemma 5.2, K;’ is also stepwise-decomposable. Item 3 is thereforeproved.To show item 4, let d be an arbitrary decision node. We need to show that the arcsfrom nodes in to d are accompanied in K;’. There are three cases:Case 1). If cYi(d,AC), by Lemma 8.2 we have ir = lrd and Y11(d,K;’) = Y11(d,K;).Thus, the arcs from nodes in to d are accompanied in K;’ by the same edges as in K;.Case 2). If cEYri(d, K), by Lemma8.2 we have = ir and Y11(d, K;’) = Yii(d, K;)—{c}.Since the arcs from nodes in ir to d are accompanied in K;, and since K; contains no barrennodes, by Lemma 8.1 the arcs from ir to d remain accompanied in K;’.Case 3). If cElrd, 7r = (lTd — {c})Uir. The arcs from node in lrd—{c} to d areaccompanied in K; and remain accompanied in K’. We need only show that an arc froma node YE7Tc to d are not lonely in K’.Since K; contains no lonely arcs and none of the parents of c are in the downstream setY11(d,K;), either there is an arc c—*x to a node x in Y11(d,K;), or there exists a randomnode bElrd and a node xYjj(d,K;) such that the arcs c—.b and x—+b appear in K;.In the first case, the arc y—x appears in K;’. Hence the edge (y, x) appears in m(K;’)which accompanies the arc y—*d. In the second case, y—+b appears in K;’, and hencethe edge (y, x) appears in m(K;’), which accompanies the arc y—d. This proves that AC’contains no lonely arcs. The proof is complete. DProposition 8.2 Let AC be a decision network skeleton, let c be a random node whichhas at least one parent. Let K;’ be the decision network skeleton obtained from AC byshort-cutting c. Then for any decision network .iV’ over K;, there is a decision network.iV over K; that is equivalent to .iV’.Proof: Given .iV’, construct .t/ as follows. Let all the nodes in .,V, excluding c, have theChapter 8. Stepwise-solvability and stepwise-decomposability 134same frame as the corresponding nodes in .iV’. Let c be a compound variable consistingof a copy of each node in We set the conditional probability P(cIir) of c to beIi ifc=7rP(cI7r) = (8.64)( 0 otherwiseFor any child of y of c, ir, = (r—{c})U7r. Noticing c = !, we setF(yI7r) = P’(yir—{c},ir) = F’(yI7rj.The conditional probabilities of all other random nodes in .A/ are the same as in N’.For any value node v, there are two cases depending on whether cE1r. Whenwe have 7r = In this case, we set1(7rv) =On the other hand, when cEir, we can assume that 7rflr, = 0, i.e v has no parents in 1r.Because if v has parents in 1r, we can always set the value function of v to be independentof those nodes. Consequently we have ir,=(ir—{c})Uir. Noticing = we set= 1t,(7rv—{c},7rc) =To show that N and N’ are equivalent, we first notice that they do have the samepolicy space. Let 6 be a policy, and let P5 be the joint probability 6 induces over all therandom and decision nodes of N, and let P be the joint probability 6 induces over allthe random and decision nodes of N’.Let B be a set of random and decision nodes of N. It follows from the definition ofthe conditional probabilities of N thatI P(B) if cBP5(B) (8.65)( P(B—{c}, ire) if cB and Bflir = 0Chapter 8. Stepwise-solvability and stepwise-decomposability 135For any value node v such that cØ, we have=On the other hand, for any value node u such that cE?r we have=lrv—{c},lrc -=ThereforeE5[V] =That is J’/ and V’ are equivalent. D8.3 Root random node removalThis section investigates the operation of removing root random nodes from decision network skeletons. The properties of the operation with regard to induction are of particularinterest. Root random node removal is the second of the three operations that are neededto facilitate induction on the number of random nodes.Proposition 8.3 Let K be decision network skeleton, and let c be a root random node,i.e a random node without parents. Let K’ be the decision network skeleton obtained fromK by removing c and the arcs originating from c.1. JfK contains no barren nodes, neither does K’.2. JfK is normal, then so is K’.3. If K is stepwise-decomposable, then so is K’.Chapter 8. Stepwise-solvability and stepwise-decomposability 1364. Suppose AC is normal and stepwise-decomposable, and contains no barren nodes.Suppose that for any decision node d such that cEYij-(d, AC), none of the children ofc are in lrd. Then when AC does not contain any lonely arcs, neither does AC’.Proof: The first item is straightforward.To prove second item, we notice that for any decision node d,Y11(d,AC’) çY11(d,AC). (8.66)Together with the fact that c is a root random node, this equation entails item 2.Because of equation (8.66), an SD candidate decision node d of AC is also an SDcandidate decision node for AC’. Moreover, when c is not in the downstream set Y11(d, AC),then the body AC(d, AC’) of AC’ w.r.t d can be obtained from the body ACj(d, AC) of AC w.r.td by removing c. Hence item 3 can be proved in the same way as the third item ofProposition 8.1.To show item 4, let d be an arbitrary decision node. We need to show that the arcsfrom nodes in to d are accompanied in AC’. There are three cases:Case 1). If cEY1(d, AC), then ir = 7rd and Y1(d, AC’) = Y11(d, AC). In this case, the arcsfrom nodes in to d are accompanied in AC’ by the same edges as in AC.Case 2). If cY11(d, AC), then ir = rcj and Y11(d, AC’) = Y11(d, AC)—{c}. For any xEr,the arc x—.d is accompanied in AC by, say, the edge (jx, y) in rn(AC). There are two subcasesdepending on whether or not y=c.Case 2.1) y=c. Since c is a root, there must exist another node z such that the arcsx—z and c—*z appear in AC. Since none of the children of c are in )rd, zEYi(d, AC). Thereare three subsubcases depending on whether z is a random node, a decision node, or avalue node.Case 2.1.1). z is a random node. Since AC contain no barren nodes, there exists avalue node v such that there is directed path from z to v. Since AC is normal, there mustChapter 8. Stepwise-solvability and stepwise- decomposability 137be a directed path from d to v. Hence zEY11(d,K:’). Therefore in K:’ the arc x—*d isaccompanied by the edge (x, z) of m(K:’).Case 2.1.2). z is a a value node. Since K: is normal, there exists a directed path fromd to v. Hence, z must be in the downstream set Y11(d, K:’). Therefore in K:’ the arcis accompanied by the edge (x, z).Case 2.1.3). z is a decision node. In this case, there must be at least one value nodein the downstream set Yj-i(z, K:), because K: contains no barren nodes. Since K: is normal,there exists a direct path, say PATH, from d to v. Since K: is stepwise-decomposable,7Tz m-separated v from d. Therefore z must be in PATH. Consequently zEY11(d, K:’).Therefore in K:’ the arc x—*d is accompanied by the edge (x, z).Case 2.2) y5c. There are again three subsubcases depending on whether y is a randomnode, a decision node, or a value node. The proof for this subcase can be carried out inthe same way as in case 2.1), except with z replaced by y.Case 3). If cElrd, then ir = lrd—{c}. For any node xEir, the arc x—*d is accompaniedin K:. Hence either there is an arc connecting x and a node z in the downstream setY11(d, K:), or there exists another node YEd and a node zEYii(d, K:) such that the arcsx—*y and z—y appear in K:.In the first case, z cannot be c. Hence the arc that connects x and z in K: is also inK:’, hence x—*d is accompanied in K:’. In the second case, y cannot be c because c is aroot random node; and z cannot be c either because cElrd. Hence the arcs x—*y and z—yalso appear in K:’. Consequently, the arc x—*d are also accompanied in K:’. The proof iscomplete.Proposition 8.4 Let K: be decision network skeleton, and let c be a root random node.Let K:’ be the decision network skeleton obtained from K: by removing c and the arcsoriginating from c. For any decision network .A[’ over AC’, there is a decision network iVChapter 8. Stepwise-solvability and stepwise-decornposability 138over K: that is equivalent to iv”.Proof: Given N’, construct .,V as follows. Let all the nodes in N, excluding c, havethe same frames as in N’. Let c take only one value, say 1. For a random node r suchthat c7rr, set the conditional probability of r the same as in N’. If a random node r issuch that cenr, then yr,. = irU{c}. In this case, set its conditional probability P(rllrd)as follows:where P’(rIir) stands for the conditional probability of r in N’. This definition is validbecause c takes only one value.For any value node v such that set the value function of v to be the same as in.iV’. If a value node v is such that cE7r, then 7r = 7rU{c}. In this case, set1rv) =where ‘(v[w) stands for the value function of v in N’. This definition is valid becausec takes only one value.We now show that N is equivalent to .iV”. For any decision node d such that c7rd,then = r; d has the same decision function space in both .A( and Al’. If a decisionnode is such that cErd, then ir = lrd—{c}. Since c only take one value, d still has thesame decision function space in both N and .,V. So, N has the same policy space as N’.It is evident that given a policy 6, E5[N] = E5[N’]. Therefore, N and N’ areequivalent. The proposition is proved. C8.4 Arc reversalThis section revisits the operation of reversing arcs in decision network skeleton, with aneye on its induction properties. Arc reversal is the third of the three operations that areChapter 8. Stepwise-solvability and stepwise-decomposability 139needed to facilitate induction on the number of random nodes.Proposition 8.5 Suppose K is a decision network skeleton. Let b and c be two randomnodes such that the arc c—*b appears in K and is reversible. Let K’ be the decision networkskeleton obtained from K; by reversing the arc c—*b.1. Suppose c has at least two children. Then if K does not contains any barren nodes,neither does K’.2. Suppose c is a root. Then if K is normal, so is K’.3. Suppose c is a root. Then if K is stepwise-decomposable, then so is K’.4. Suppose c is a root. Then if K does not contains any lonely arcs, neither does K’.Proof: Item 1 is straightforward.When c is a root, the moral graph m(K’) of K’ is the same as the moral graph m(K)of K. Hence, items 3 and 4 follow.To show item 2, let d be an arbitrary decision node. It follows from m(K’) = m(K)that Y11(d, K’) = Y11(d, K). Let v be a value node in the downstream set Y11(d, K’) =Y11(d, K). Since K is normal, there must be a directed path, say PATH1, in K from d tov. Since c is a root, the arc c—*b cannot be in PATH1. Thus PATH1 is also a path in K’.Therefore K’ is also normal. The proposition is proved. CProposition 8.6 Let K be a decision network skeleton, let b and c be two random nodessuch that the arc c—*b appears in K and is reversible. Let K’ be the decision networkskeleton obtained from K by reversing the arc c—*b. If c is a root, then= lrb—{c} and ir = {b}Uir.Furthermore for any decision network Jsf’ over K’ such thatChapter 8. Stepwise-solvability and stepwise-decomposability 1401. c is a compound variable consisting a copy of each node in ir,2. the conditional probability P’(cIir) is given by1 1 ifc=ir’F’(cjt’r) = C (8.67)0 otherwise3. and if a value node is a descendent of c, then it is also a descendent of b,there exists an decision network %/ over K that is equivalent to Al’.Proof: Given jV’, construct Al’ as follows. Let all the variables have the same frames asin H’. Noticing that c is the compound variable consisting a copy of each node in ir, weset• the conditional probability F(bllrb) = F(blc, irk) of b to be1 F’(bir) if c=vr’P(blc, ir) = C (8.68)0 otherwise• and the prior probability of c to be the the uniform distributions, i.e F(c) =where id stands for the number of possible values of c.The conditional probabilities of all other random nodes are set to be the same as in Al’.For any value node v, ir=ir. If v is not a descendent of c, we set=and if v is a descendent of c, we set= IcI,%(ir).Chapter 8. Stepwise-solvability and stepwise-decomposability 141To show that .iV and .iV are equivalent, we first notice that for any decision node d,= 7r; hence K: and K’ have the same policy space. Let S be a policy, and let P be thejoint probability 6’ induces over all the random and decision nodes of V, and let P bethe joint probability S induces over all the random and decision nodes of iV’. It sufficesto show that for any value node v,= (8.69)If v is not a descendent of c, then it cannot be a descendent of b. By Proposition3.1, both c and b are irrelevant to P5(7r), as well as to P4(7rd). Hence P3(lrd) =Consequently equation 8.69 is true.Now if v is a descendent of c, then it is also a descendent of b. Consider P5(b, cI7r)and F(b, cIir). Noticing that ir={b}Uir, we haveP3(b,cIir) = P(c)P(bIc,r)= J” jF’(bIir) if c=(b,7r)( 0 otherwiseandP(b, cIir) = P’(bir)P’(cIb, ir)= J P’(bIir) if c= (b,7r)0 otherwiseHence we getP5(b, cIir) = 1—.P(b, cr).Since v is a descendent of both b and c, we haveF5(ir) =Chapter 8. Stepwise-solvability and stepwise-decomposability 142Therefore= v)Ic(irv) = F(ir)14(ir). (8.70)The proof is completed. 08.5 Induction on the number of random nodesis a decision nodeThis section shows how the three operations discussed in the last three sections fittogether to form an induction strategy on the number of random nodes. This inductionstrategy allows us to, in a certain sense, get rid of all the random nodes in the downstreamset Y11(d, K) for any d, as shown by the following proposition.Proposition 8.7 Let K be a normal SDDN skeleton without barren nodes and withoutlonely arcs. Let dr be a decision node in IC. Then there exists another decision networkskeleton AC’ such that• COND1: IC’ is normal and stepwise-decomposable, and contains no barren nodesand no lonely arcs;• COND2(AC): The upstream component Ki(dr,AC’) of PC’ w.r.t dr is the same asKi(dr, AC);• COND3: In AC’, there are no random nodes in the downstream set Yii(dr,IC’) ofr; and• COND4(AC): For any decision network J’/’ over AC’, there exits a decision networkover AC that is equivalent to JV’.Chapter 8. Stepwise-solvability and stepwise-decomposability 143Proof: We prove this lemma by induction on the number k of random nodes in thedownstream set Yii(dr, K) of lrdr. When k 0, the lemma is trivially true. Suppose thelemma is true for the case of k = rn—i. Now consider the case of k = rn.Let d be a decision node in Yii(dr, K) such that there are random nodes in Y11(d, K),and that either there are no decision nodes in Y11(d, AC) or for any decision node d’EY11(d, K)there are no random nodes in Y11(d’,AC).Since IC contains no barren nodes, there can only be three cases:1. There exists a random node c in the Y11(d, K) that has at least one parent; or else2. There exists a root random node c in Y11(d, K) whose children are either valuenodes or decision nodes in Yii(d,K); or else3. Every a random node in Y11(d, AC) is a root and has at least one child in 7rd.Case 1): In this case, we short-cut c from K, resulting in K. According to Propositions 8.1, AC is also a normal SDDN without barren nodes and lonely arcs.Since there are only rn—i random node in Y11(d, K*), there exits, by the inductionhypothesis, a decision network skeleton AC’ that satisfies COND1, COND2(K*), COND3,and COND4(AC*).It is easy to see that AC* satisfies COND2(AC). By transitivity, AC’ also satisfiesCOND2(AC).To see that AC’ satisfies COND4(AC), let .Af’ be a decision network over AC’. SinceAC’ satisfies COND4(AC*), there exist an decision network .iV over AC* that is equivalentto Al’. Because of Proposition 8.2, there must be a decision network .Al over IC thatis equivalent to .iV. By transitivity, Al is also equivalent iV’. So, K’ also satisfiesCOND4(AC). Therefore, the lemma is correct in this case.Chapter 8. Stepwise-solvability and stepwise-decomposability 144Case 2): In this case, we simply remove c from K, resulting in K. One can showthat the lemma is also true in this case by using Propositions 8.3 and 8.4 and by followingthe same line of reasoning as in Case 1.Case 3): In this case, letc be a random node in the downstream set Y11(d,K).Then c is a root and has at least one child in Let bErd be a child of c such that(COND5:) there is no other b’E7rd that is a child of c and a parent of b. Since IC isstepwise-decomposable, b has to be a random node. Because of COND5, the arc c—pb isreversible. Reverse the arc c—*b in IC, resulting in IC. By Propositions 8.5, K is normaland stepwise-decomposable, and it contains no barren nodes and no lonely arcs.There are also m random nodes in Y11(d, IC*). However in Y11(d, IC*) there is a random node, namely c, that has at least one parent b. According to Case 1), there mustbe a decision network skeleton IC’ that satisfies COND1, COND2(IC*), COND3, andCOND4(AC*).Since IC’ satisfies COND2(IC), so does IC’.To see that IC’ satisfies COND4(IC), let .iV’ be a decision network over IC’. Since IC’satisfies COND4(IC*), there exist an decision network .iV over AC* that is equivalent toAr’.From the proof of Proposition 8.2, we can have Ar such that1. c is a compound variable consisting of a copy of each node in ir, where ir is theset of parents of c in IC*. Since c is a root in IC, r’ = {b}U(7rb—{c}) = {b}U4.2. The conditional probability F*(cr) p*(cb, ‘rg) is given by1 1 ifc=*=(b,4)p*(cIb ir) = c (8.71)1 0 otherwiseChapter 8. Stepwise-solvability and stepwise-decomposability 145Moreover since céYji(d, ) and is normal, if a value node is a descendent of c, then itmust be a descendent of d, and hence b. So Proposition 8.6 applies and gives us that thereis a decision network .iV over PC that is equivalent to .iV, and hence to .,V’. Therefore PC’also COND4(PC). Thus the lemma is true in this case also. The proof is complete. D8.6 Induction on the number of decision nodesThis section shows how to carry out induction on the number of decision nodes. First,let us define two properties of value functions that we will do induction with.Let .iV be a decision network whose random and decision variables (nodes) are allbinary. Let A be a subset of nodes of JV. For any value node v of jV, its value functioni(irv) is said to have property Q(A) if• itv(ir) is independent of nodes in irflA, and• = q (some real number) when all the nodes in ir—A take the same value,regardless what this value is. When the nodes in ir—A do not all take the samevalue, iiv(irv) is strictly smaller than q.The value function is said to have property Q1(A)• 1(7rv) is independent of nodes in irflA, and• 1(lrv) = q (some real number) when all the nodes in ire—A take the value 1.When there is at least one node in ir—A that does not take the value 1. i(ir) isstrictly smaller than q,.Proposition 8.8 Suppose JV is a normal SDDN with no barren nodes and no lonelyarcs. Suppose all the random and decision variables (nodes) of .,V are binary. Let d bean SD candidate decision node of Jeef, and let A be a set of nodes in .V. Suppose there areChapter 8. Stepwise-solvability and stepwise-decomposability 146no random nodes in the downstream set Y11(d,.V) of ir. Then if all the value functionsin V have property Q(A) (orQ1(A)), so do all the value functions of the bodyAt1(d,.Ai).Proof: Let u be the tail-value node in JV1. It suffices to show that ,u(7rd) has propertyQ(A) (or Q1(A)).First of all, since there are no barren nodes, there must be at least one value node inY11(d,.A/). Let v1,..., Urn be all the value nodes in Yji(d,V).Let V11 be the tail of .iV w.r.t d. Since there are no random nodes in Yjj(d,V),Y11(d,iV) consists of only value nodes. So we have/1U(Wd) = E[J’/IIj7rd] (8.72)= maxdEVfrV1). (8.73)Since all the are independent of nodes in Afl7rd, so must be iLu(lrd).Suppose all the value functions in .A/ have property Q(A). Since .iV is normal, d isa parent of every u. Thus when all the nodes in 7rd—A take the same value, say a, thevalue of [L(7rd) is q, which is achieved when d = a.Now consider the case when there are two nodes x and y in ir — A such that x takethe value 0 and y takes 1. Since .iV contains no lonely arcs, and there are no randomnodes in Y11(d, .iV), there must be at least one value node, say v, which is a child of x andanother value node, say v2 (may be the same as vi), which is a child of y. Because thevalue functions and have property Q(A), we have that if d = 0, (ir,,) < andif d = 1, r) < q. Therefore, t(rd) < q. This shows that p has propertyQ(A).To prove the proposition for Q1(A), suppose all the value functions in .A/ has propertyQ1(A). When all the nodes in lrd—A take the value 1, ILu(lrd) = q, which is achievedwhen d = 1.Chapter 8. Stepwise-solvability and stepwise-decomposability 147Now consider the case when there is one node xElrd, whose value is 0 instead of 1.There is a value node v inY11(d,A1) that is a child of x. Because has property Q1(A),< q,. Hence, Ilu(lrd) < E q. This shows that j has property Q1(A). Theproposition is proved. D8.6.1 An extension and a corollaryLet d be a decision node in an SDDN Jf. We can extend the concept of a downstreamcomponent from the case when .iV is smooth at d to the case when .iV is not smooth atd in the same way as we did for the concept of as tall in Section 6.2. Let .A/11(d,.A )stand for the downstream component of iV w.r.t d. As in Section 3.6, we can define theconditional expected value E[.N11(dr, iV) I ?tdrjProposition 8.9 Suppose .V is a normal SDDN with no barren nodes and no lonelyarcs. Suppose all the random and decision variables (nodes) of JV are binary. Let dr be adecision node .Af, and let A be a set of nodes in .A/. Suppose there are no random nodesin the downstream set Y11(d,J’.f). Then if all the value functions in V have propertyQ(A) (orQ1(A)), so does the conditional expected value E[AnJI(dr,JV)Ilrdr].Proof: This proposition can proved by repeatedly use Proposition 8.8.Combining Proposition 8.9 and Proposition 8.7, we get the following corollary.Corollary 8.1 Let K be a normal SDDN skeleton without barren nodes and lonely arcs.Let 4 be a decision node in IC. For any subset Acrd, there exists a decision networkK over IC such that E[KJI(dr,J/)I7rdr] has property Q(A) (or Q1(A)).Proof: Let K’ be an SDDN skeleton as in Proposition 8.7. There are no random nodesin Yii(dr, IC’). Construct a decision network K’ over K’ as follows. Let all the randomChapter 8. Stepwise-solvability and stepwise-decomposability 148and decision variables be binary. For any value node v, set1 1 if all the variable in 7r—A take the same valuet(7rv) = . (8.74)( 0 otherwiseThen all the value function in .iV’ have property Q(A). By Proposition 8.9, E[J/ii(dr,Jf’)Iir]also has property Q(A). According to Proposition 8.7, there is a decision network .iV overAC that is equivalent to .iV’. Since the upstream Component Ki(dr, AC) of AC is the same asKi(dr,AC’), lrdr = 7T. Thus we have E[.iVII(dT,J1)I7rd.j = E[.A1ii(drA1’)I7r1. ThereforeE[A1ii(dr , V) 7rj has property Q(A).The Q1(A) part can be proved in the same way.8.7 Lonely arcs and removable arcsWe are now ready to prove a theorem about the relationship between removable arcs andlonely arcs.Theorem 8.1 Let AC be a normal SDDN skeleton without barren nodes. If AC containsno lonely arcs, then it contains no removable arcs.Before proving this theorem, let us point out an important implication.Corollary 8.2 In a normal SDDN skeleton, an arc into a decision node is removable ifand only if it is a potential lonely arc.To put the corollary in another way, in a normal SDDN, potential lonely arcs areall the removable arcs that can be graphically identified without resorting to numericalcomputations.Chapter 8. Stepwise-solvability and stepwise-decomposability 149Proof of Theorem 8.1: Let dr be a decision node of AC. Let c be an arbitrary node inlrdr. We need to show that the arc C4dr is not removable.Because of Proposition 8.7, we can assume that there are no random nodes in thedownstream set Yii(dr,AC).Let .iV be an SDDN over AC. Assume all the random and decision nodes are binary.Let A = lrdr—{c}. For any value node v in iV, set p, to be1 1 when all the variables in 7r—A take the same value= (8.75)0 otherwiseThen the value functions have property Q(A).We find an SD candidate decision node d, computes its body .iV1(d,) w.r.t d. It iseasy to verify that .iV’1(d, .iV) is also stepwise-decomposable and normal, and it containsno barren nodes and no lonely arcs. According to the Proposition 8.8, all the valuefunctions .iV have property Q(A).We then find an SD candidate decision node of .A/1(d,.V), computes its body, and soon so forth. Eventually, we will obtain a normal SDDN, denoted by in which dr isan SD candidate, and which contains no barren nodes and no lonely arcs. Furthermore,all the the value functions in .iV have property Q(A).Since .iV contains no lonely arcs, and there are no random nodes in the downstreamset Yii(dr,.iV), there must be at least one value node vEY11(dr,J’/) that is a child of c. Inthe mean time, V,. is also normal, so this value node v is also a child of dr. All the valuefunctions of .iV. have the Q(A) property. Since A = lrdr—{c}, all the value functions inthe tail .A1ii(dr, .M.) depend only on dr or v or both. Therefore when c=0, the optimaldecision for dr is 0, and when c=1, the optimal decision for dr is 1. Thus dr depends onc and hence the arc c—*d,. is not removable. The theorem is proved. 0Chapter 8. Stepwise-solvability and stepwise-decomposability 1508.8 Stepwise-solvability and stepwise-decomposabilityThis section proves the following theorem about the relationship between stepwisedecomposability and stepwise-solvability.Theorem 8.2 Suppose K is a normal decision network skeleton with no barren nodesand no lonely arcs. Then K is stepwise-solvable if and only if it is stepuise-decomposable.A decision network skeleton in Figure 3.15 is not stepwise-decomposable, hence it isnot stepwise-solvable. Consequently, as we predicted in Section 3.4, with appropriateprobabilities and value functions, optimal policies can be found only by considering thetwo decisions simultaneously. -The remainder of this section is to prove Theorem 8.2. In a decision network, adecision root node is a root node that is also a decision node.Lemma 8.3 Suppose K is a normal decision network without barren nodes. Suppose dis a decision node in K. If there are decision root nodes in the downstream set Y11(d, K),then d cannot be an SS candidate node.Proof: Let K’ be the decision network skeleton obtained from K by replacing withdeterministic nodes those decision nodes that are different from d and have at least oneparent. It suffices to show that (Statementl:) d is not an SS candidate node in K’.Let be the set of parents of a node x in K’. We show Statement 1 by induction onthe number k of random nodes, including deterministic nodes, in Y11(d, K’). When k = 0,all the nodes in Y11(d, K’) are either decision root nodes or value nodes; and there existsat least one decision root node. By the definition of Y11(d, K’), there must be one decisionroot node d’eY1(d,K’) such that d’ has a value node child v. Since AC is normal, so isK’. Hence, there exits a directed path from d to v. Because all the nodes in Y11(d, K’)are either decision nodes or value nodes, d must be a parent of v.Chapter 8. Stepwise-solvability and stepwise-decomposability 151Construct a decision network Al’ over K:’ as follows. Let all the random and decisionvariables be binary; let the value functions of the value nodes other than v all be zero;and setIi if d=d’1u@v) = (8.76)1 0 otherwiseWe see that if d’ = 1, the optimal decision for d is d = 1; and if d’ = 0, the optimaldecision for d is d = 0. Therefore the optimal policy for d depends on the policy for d’’.Consequently d is not an SS candidate node. So, Statementi is true in the case k = 0.Assume Statementi is true for the case of k = rn—i. Consider the case of k = m.There are -three subcases.Subcase 1). There is a random node cEY11(d, K:’) that has at least one parent. In thiscase, we can short-cut c from K:’, resulting in K:*. According to Proposition 8.1, K:* isalso normal and contains no barren nodes. Since there are only rn—i random nodes inY11(d, K:*), by the induction hypothesis, d is not an SS candidate node in K:K. ThroughProposition 8.2, this implies that ci is not an SS candidate node in K:’.Subcase 2). There is random node cEYn(d, K:’) whose children all are value nodes.In this case, we can simply remove c from K’, resulting in K:*. Using Propositions 8.3and 8.4 and following the same line of reasoning as in Subcase 1), we can show thatStatementi is true in this subcase.Subcase 3). Every random node cEY11(d, K:’) is a root, and it has at least one childin Let bE7r a a child of c such that there is no other b’E7r that is a child of c and aparent of b. By the definition of K:’, b is a random (maybe deterministic) node. By thechoice of b, the arc c—pb is reversible. We reverse the arc c—b in K:’ to get K:”.1 For later convenience, let us remark that this conclusion follows for any value function p, of v thathas property Q({d, d’}).Chapter 8. Stepwise-solvability and stepwise-decomposability 152By Proposition 8.5, K” is also normal and contains no barren nodes. There are mrandom nodes in Y11(d, K:”), one of which, namely c, has parents. As in Subcase 1),we short-cut c from K:”, resulting K:”. By the induction hypothesis, there is a decisionnetwork .,V over K:* in which the optimal decision function of d depends on the decisionpolicy of some other decision node d’.By Proposition 8.2, there exists a decision network .jV” over K:” that is equivalent toJV*; and by the proof of Proposition 8.2, we conclude that .jV” can be such that1. c is a compound variable consisting of a copy of each node in where ir’ is theset of parents of c in K:”. Since c is a root in K’, ir’ = {b}U(r—{c}) = {b}U7r’.2. The conditional probability P”(cIr’) = F”(clb, 7r’) is given byIi ifc=ir”=(b,ir’)P”(cb, ir’) = C (8.77)( 0 otherwiseMoreover, since cY,j(d, K:’) and K:’ is normal, if a value node is a descendent of c, thenit must be a descendent of d, and hence b. So, Proposition 8.6 applies, and gives usthat there is a decision network .jV’ over K: that is equivalent to .A/”, and hence to .iV’.Therefore in .Af’, the optimal decision function of d depends on the decision function ofsome other decision node. Consequently, d is not an SS candidate node in K:’. The proofis complete.Lemma 8.4 Let K be a normal decision network skeleton with no barren nodes. Supposed is a decision node in K:, and suppose there are no decision root nodes in the downstreamset Y11(d,K:). If there exists at least one decision node, other than d, in Y11(d,K:), thend is not an SS candidate node.Chapter 8. Stepwise-solvability and stepwise-decomposability 153Proof: Let d’ d be a decision node in Y11(d, K;). Let IC’ be the decision networkskeleton obtained from AC be replacing all the decision nodes other than d and d’ bydeterministic nodes. It suffices to show that (Statementi:) d is not an SS candidate nodein AC’.We prove Statementl by induction on the number k of random nodes, includingdeterministic nodes, in Y11(d, AC’). When k = 0, Y11(d, IC) consists of d, d’, and valuenodes. By the definition of Y11(d, AC), there must exist a value node v that is a child ofd’. Since AC is normal, so is IC’. Hence, d there is a directed path from d to v. There aretwo cases: either d is a parent of v, or d is a parent of d’.It has been shown in the proof of Lemma 8.3 that when both d and d’ are parentsto v, d is not an SS candidate node. Now consider the case when d is a parent of d’.Construct a decision network H’ over AC’ as follows. Let all the random and decisionvariables be binary; let the value functions of all the value nodes other than v be zero;and let1 1 ifd’=l= (8.78)1 0 otherwiseNoticing we have that when the decision function 6’ of d’ is such that 6’(iri) = d,the optimal decision for d is d = 1; and when the decision function 6’ of d’ is such that= 1—d, the optimal decision for d is d = 0. Therefore, the optimal decisionfunction for d depends on the decision function of d’ 2• Thus, d is not an SS candidatenode, and Statementl is true for the case of k = 0.Assume Statementi is true for the case of k = m— 1. We can prove that Statementiis true for the case of k = m by following the same line of reasoning as in the proof ofLemma 8.3. There is only one issue that demands special attention. In Subcase 3), we2 For later convenience, let us remark that this conclusion follows for any value function p,, of v thathas property Q1({d’}).Chapter 8. Stepwise-solvability and stepwise-decomposability 154need to reverse the arc c—+b. This can only be done when c is a random node and nota deterministic node. Since there are no root decision nodes in Y11(d, K;), there are nodeterministic root nodes in Y11(d, K;’). Thus, c can not be a deterministic node. Thelemma is proved. ELemma 8.5 Suppose K; is a normal decision network skeleton with no barren nodes. Letd be a decision node in IC. Suppose there are no decision nodes in the downstream setYj(d,K;). If there is a decision node d’Elrd such that at least one of the parents of d’ arein Y11(d,K;), then d is not an SS candidate node.Proof: Let K;’ be the decision network skeleton obtained from K; be replacing all thedecision nodes other than d and d’ by deterministic nodes. It suffices to show that(Statementl:) d is not an SS candidate node in K;’.We prove Statementl by induction on the number k of random nodes, includingdeterministic nodes, in Y11(d, K;’) — When k=O, Y11(d, K;’) consists of the parents ofd’, and value nodes. By the definition of Y11(d, K;’), there must be at least one parent cof d’ that has a value node child v. Since K; is normal, so is K;’. Thus, d must also be aparent of u.Construct a decision network jV’ over K;’ as follows. Let all the random and decisionvariables be binary; let the value functions of the value nodes other than v all be zero;and setIi ifd=c= (8.79)( 0 otherwiseNoticing cEir, and d’Eir, we have that if the decision function 8’ for d’ is such that= c, then the optimal decision function 6° of d is such that 6°(ir) = d’; and if thedecision function 6’ for d’ is such that 8’(ir,) = 1—c, then the optimal decision functionChapter 8. Stepwise-solvability and stepwise-decomposability 1556° of d is such that 6°(ir) = 1—d’. That is, the optimal decision function of d dependson the decision function of d’ . Consequently d is not an SS candidate node in K’;Statementi is true for the case of k = 0.Assume Statementi is true in the case of k = rn—i. We can prove that Statementiis true in the case of k = m in the same way as in the proof of Lemma 8.3. The lemmais thus proved.Proposition 8.10 Suppose K: is a normal decision network skeleton with no barrennodes. Let d be a decision node in K:. If d is an 55 candidate node, then it is alsoan SD candidate node.Proof: Since d is an SS candidate node, by Lemmas 8.3 and 8.4, there cannot be decisionnodes in the downstream set Y11(d, K:). By Lemma 8.5, there cannot be decision nodeswhich have parents in Y11(d, K:). Therefore, 7Td m-separates d from all other decisionnodes and their parents; i.e d is an SD candidate node.Corollary 8.3 Let K: be a normal decision network skeleton with no barren nodes. Letd be a decision node in K:. Suppose the downstream component K:11(d, K:) is stepwisedecomposable and contains no lonely arcs. Then in the upstream component K:1(d, K:), adecision node is an SD candidate node if it is an SS candidate node.Proof: One can prove this corollary in the same way as we prove Proposition 8.10. Theonly issue that demands special attention is that in K:1, there is a downstream-value nodeu. We may not be able to arbitrarily set the value function of U; it has to be theoptimal conditional expected value /=E[.iVfI7rdj of a decision network JV over K:11. Aswe mentioned in Footnotes 1, 2, and 3, we need only to be able to set such that it hasFor later convenience, let us remark that this conclusion follows for any value function p of v thathas property Q({d’, c}).Chapter 8. Stepwise-solvability and stepwise-decomposability 156property Q({x, y}) for some x, y E 7rd, or has property Q1({x}) for some x E 7rd. SincePC11 is normal and stepwise-decomposable, and contains no barren nodes and lonely arcs,this is possible according to Corollary 8.1. DIn a decision network skeleton PC, a decision node d is a potential SD candidate nodeif either it is an SD candidate node, or there exists an SD candidate node d’( d) suchthat d is a potential SD candidate node in the body PC1(d’, PC). It is easy to see that adecision network skeleton is stepwise-decomposable if and only if every decision node isa potential SD candidate node.A potential SD candidate node d is the oldest, if there is no other potential SD candidate node that is an ancestor of d.Proof of Theorem 8.2: Let us first show stepwise-decomposability implies stepwisesolvability. Suppose PC is stepwise-decomposable. Let A/ be an arbitrary decision networkover PC. We need to show that .// is stepwise-solvable. Let iV’ be the output networkof SMOOTHING(V). Then, N’ is a smooth SDDN. According to Theorem 5.1, Af’ isstepwise-solvable. Since ,%f’ and N is equivalent, N is also stepwise-solvable.To prove that stepwise-solvability also implies stepwise-decomposability, it suffices toshow that if there exist decision nodes in PC that are not potential SD candidate nodes,then PC is not stepwise-solvable.For simplicity, let us assume that there is only one oldest potential candidate node d°.Then none of the decision nodes in the upstream component AC1(d°, PC) are SD candidatenodes. By Corollary 8.3, none decision nodes in PC1 can be SS candidate nodes. ThereforePC is not stepwise-solvable. The theorem is proved. CChapter 9SDDN’s and Markov decision processesIn the introduction, we have shown how finite stage Markov decision processes (MDP’s)’can be represented as SDDN’s. This chapter shows that an SDDN can be condensed intoan equivalent MDP.This practice is interesting for two reasons. Conceptually, it reveals the close relationships between SDDN’s and MDP’s: MDP’s are special SDDN’s and SDDN’s can becondensed into MDP’s.Computationally, the concept of condensation opens up the possibility of parallelcomputation in evaluating SDDN’s (Section 9.2); it enables us to exploit the asymmetricnature of decision problems (Section 9.5); and it also leads to an incremental approachfor computing the values of information (Zhang et al 1993b).The organization of this chapter is as follows. The concept of sections in smoothregular SDDN’s is introduced in Section 9.1. Section 9.2 gives the definition of condensation of smooth regular SDDN’s, and points out the possibility of parallel computation.Section 9.3 shows that a smooth regular SDDN is equivalent to its condensation. Non-smooth regular and irregular SDDN’s are treated in Section 9.4. Section 9.5 exploits theasymmetric nature of decision problems to minimize the number of states of the variables in condensations. On the basis of condensation, Section 9.6 proposes a two stageapproach for evaluating SDDN’s, which is compared to the approaches by Howard andMatheson (1984) and Cooper (1989) in Section 9.6.1.1n this chapter, when talking about Markov decision processes we always mean finite stage Markovdecision processes.157Chapter 9. SDDN’s and Markov decision processes 1589.1 Sections in smooth regular SDDN’sThe first step toward the concept of condensation is to introduce the concept of sectionsfor smooth regular SDDN’s. Let .A/ be a smooth regular SDDN. Let d1, d2,...,di beall the decision nodes. Since .V is regular, there is a total ordering among the decisionnodes. Let the total ordering be as indicated by the subscriptions. More explicitly, let usassume that d directly precedes d1+ in the sense that there is no other decision node d3such that d precedes d3 and d3 precedes d+1. In this case, we also say that d+1 directlysucceeds d.For any i€{1, 2,... , k — 1}, the section of.!V from 7rd to denoted by J/(d, d+1),is the subuetwork of .iV that consists of the following nodes:1. the nodes in 7rj U2. the nodes that are in both the downstream setY11(d,.A1) of 1rd and the upstreamsetY1(d+,.Ar) of lrd1+.The graphical connections among the nodes remain the same as in the V except thatall the arcs among the nodes in 7rd1U{ } are removed. The nodes outside rdU{d} areeither random nodes or value nodes; their conditional probabilities or value functionsremain the same as in jV. The nodes in rdU{d} are either decision nodes or randomnodes. There are no conditional probabilities associated with random nodes in lrd1U{d:}.The initial section iV(—, d1) consists of the nodes in rd1 and the nodes in the upstreamset Y1(d,.iV) of It consists of only random and value nodes, whose conditionalprobabilities or value functions remain the same as in ./V.Value nodes in the initial section do not affect the optimal policies, even though theydo contribute to the optimal expected value. From now on, we shall assume there areno value nodes in the initial section, with the understanding that they, if any, are takenChapter 9. SDDN’s and Markov decision processes__________________________________________________159Figure 9.27: A regular SDDN and its sections: t stands for test, d stands for drill,and s stands for oil—sale—police.care of by some preprocessing measure.The terminal section H(dk,—) consists of the nodes in the lrdk and the nodes in thedownstream set YII(dk, N) of The graphical connections, the conditional probabilities, and the value functions in the terminal section are specified in the same way as inthe case of ordinary sections.As an example, consider the decision network in Figure 9.27 (a), which is a reproduction of Figure 6.22 (b). The network is smooth, regular and stepwise-decomposable.(a)TestcestTestTest Resultmarketformation9’L(d, s)(b)Chapter 9. SDDN’s and Markov decision processes 160upstreaz of dowOstream ofFigure 9.28: An abstract view of a smooth regular SDDN. Smoothness is indicated bythe fact that all the arrows from the 4s are pointing downstream.Let us denote this SDDN by .,‘V. Let us also denote the variable test by t, drill by d,oil—sale—policy by s, drill—cost by dc, test—result by tr, oil—produced by op,and market-information by mi.There are four sections in Al: .A1(—, t), .A1(t, d), Al(d, s), and .iV(s,—). The initialsection .Al(—,t) contains no nodes. All the other sections are shown in Figure 9.27 (b).At this point, we wish to emphasize that in each section .iV(d1,d1), all the decisionnodes are in 1rdU{d}. Since ./V is smooth, in .A/(d, d+1) there are no arcs pointingtoward those decision nodes. Thus, they can be regarded random nodes with no priorprobabilities. Consequently, .iV(d, d+1) can be viewed as a semi-Bayesian network withvalue nodes.9.1.1 An abstract view of a regular smooth SDDNThe concept of sections provides us with a proper perspective for viewing smooth regularSDDN’s. A regular smooth SDDN iV can be thought of as consisting of a chain sections.Al(—,d1), %/(di,d2), . .., .Af(dk....l,dk), Jf(dk,—).Two neighboring sections A’(d1_,d) and .A1(d, d+1) share the nodes in which mseparate the other nodes in .A1(d...,d) from all the other nodes .A1(d, d+1). Figure 9.28shows this abstract view of a regular SDDN.Chapter 9. SDDN’s and Markov decision processes 1619.1.2 Conditional probabilities in a sectionIn each section .A/(d, d+1), one can compute PJ.r(d,d+1)(7rd 7rd, d) — the conditionalprobability of the 7T1 given and d in .iV(d,d1). In the initial section .A/(—, d1),one can compute P.v(_,d1)(lrd — the marginal probability ir1 in .A[(—, d1).Lemma 9.1 Let .iV be a smooth regular SDDN and d and d+1 be two consecutive decision nodes. For any policy 6 for .A1, let P6 denote the joint probability determined by 6over all the decision and random nodes. Then we havePs(7rd, ir1, d) = [irj, di), (9.80)andP5(lrd1) PK(_,d1)(lrd. (9.81)Proof: According to Proposition 3.1, all the nodes in the downstream set of ir1 areirrelevant to P3(1rd+lI1rd, di). Hence, they can be harmlessly pruned from .iV. According to Proposition 3.2, all the nodes in the upstream set of ir are also irrelevant toPs(7rd+1 ir1, di). Hence, they can also be harmlessly pruned from .iV. After pruning thenodes in the downstream set of‘)rdi+l and those in the upstream set of 7rd, what is left of.A[ is exactly .iV(d, d+1). The lemma is therefore proved. CIn words, this lemma says that the conditional of probability of 7td,÷1 given 7rd andd is independent of the policy 6 and can be computed locally in the section .iV(d, d+1).9.1.3 The local value function of a sectionWe now turn to value functions. For a value node vjj in .,V(d, d+1), one can computethe conditional probability F,v(d ,d1)(ir di).Chapter 9. SDDN’s and Markov decision processes 162Lemma 9.2 Let iV be a smooth regular SDDN and d and d+1 be two consecutive decision nodes. For any policy 6 for jV, let P5 denote the joint probability determined by 6over all the decision and random nodes. For any value node v13 in the section .iV(d,d1+),we haveP5(r1 ir, d) = di). (9.82)Proof: The same as the proof of Lemma 9.1.Define a function f : X —* R’ byf. d) = PAr(d,d+1)(7rvI I 7rd1,d1)f(ir,) (9.83)1r—fr u{d})where f, is the value function of in Jf.Let v1,..., Vjmj be all the value nodes in the section J/(d, d+1). The local valuefunction f : X —b R’ of the section .Af(d,d1+) is defined bym= (9.84)j=1Note that if there are no value node in the section, then f is the constant 0. Inparticular, the local value function of the initial section is zero since it is assumed tocontain no value nodes.9.1.4 Comparison with decision windowsThe concept of decision windows is introduced by Shachter (1990) as a way to understandinformation arcs—arcs into decision nodes2. The window W consists of those randomnodes that are observed for the first time between the decision maker’s choice for d1and her/his choice for d.2pcajl that in influence diagrams arcs into decision nodes are interpreted indication of informationavailability.Chapter 9. SDDN’s and Markov decision processes 163The major difference between sections and windows is that sections correspond tographical separation, while windows do not. A section .Af(d_1,d) can be compared toan interval on the real line: the “left end point” is lrd1_, and the “right end point” is d•Two neighboring sections are separated by their common “end point”; two sections thatare not neighbors are separated by the sections in between.On the other hand, a node in a window W1 can be connected to a node in any otherwindow Wj; and there is no concept of “end points”. Consequently, with windows we donot have the the two lemmas given in the previous two subsections.9.2 Condensing smooth regular SDDN’sIntuitively, condensing a smooth regular SDDN .iV means doing the following in eachsection .iV(d, d+1) of Al: (1) delete all the random nodes that are neither in the 7rd norin (2) combine all the value nodes into one single value node v, and (3) group thenodes in 7rd into one compound variable x. What results is a Markov decision process(MDP). Now the formal definition.The condensation of .iV, denoted by J/’, is an MDP given as follows:1. It consists of the following nodes:• Random nodes x (1 i k), where x is the compound variable consists ofall the nodes in 7rd when rdO. When 7rd=O, x is a variable that has onlyone possible value;• The same decision nodes d1 (1 i < k) as in ./V; and• Value nodes v (1 k).2. The graphical connections among the nodes are as follows:Chapter 9. SDDN’s and Markov decision processes 164Figure 9.29: The skeleton of the condensation of the SDDN in Figure 9.27 (a)• For any i E {1, 2,..., k}; there is an arc from X: to d, an arc from x to v,and an arc from d1 to v.• For any i E {1, 2,..., k — 1}, there is an arc from x to Xj and an arc fromd to xj1.3. The conditional probabilities and value functions are as follows:• The conditional probability PD(x+ilx, d) (i E {1,... , k— 1}) is defined to beF.,v(d+1,d)(7d+j di).• The prior probability Pc(xi) is defined to be F.,e.f(_,d1)(7rd.• The value function f for v1 (i E {l,.. . , k}) is defined to be the local valuefunction f.Since the condensation ,\fc is an MDP, we shall sometimes refer to Pc(x+i lxi, d) asthe transition probability from x, to x1.Figure 9.29 depicts the skeleton of the condensation of the SDDN in 9.27 (a). Sincetest has no parent, x1 is a degenerate variable with only one value. The variable x2stands for the compound variable consisting of test and test-result, and x3 standsfor the compound variable consisting of oil-produced and oil-market.The prior probability of x1 is trivially defined, the transition probability Pc(x2lxi, t)Chapter 9. SDDN’s and Markov decision processes 165is set to be PJf(t,d)(t, tnt), and the transition probability Pc(xax2,d) is set to bePv(d,s)(op, inilt, tr, d).The value function f for the value node z4 is a representation of test—cost, f is arepresentation of drill—cost, and f is a representation of the summation oil-producedand sale-cost.The SDDN in Figure 9.27 (a) is not in the form of an MDP. However, its condensation,as shown in Figure 9.29, is a Markov decision process. In particular, each random nodeseparates the network into two parts.The condensation of a smooth regular SDDN is usually not a homogeneous MDP(Denardo 1982). The frames of the xi’s are different from one another.9.2.1 Parallel computationsIn the process of condensation, the following numerical computations are carried out ineach section .Af(d, d+i):• The calculation of the conditional probabilities Fg(d1,d1)(7r di), and(for each value node and• The summations as indicated by equations (9.83, 9.84).Those conditional probabilities can be obtained from the marginals PK(d1,d1)(rd, di),1td, d:), and Pg(d1,d1)(1rV, 7td, d) (for each value node v); and themarginals can in turn be computed by using clique tree propagation, so that all themarginals can be computed by visiting each clique at most twice.An important observation is that the numerical computations in different sectionscan be carried out in parallel.Chapter 9. SDDN’s and Markov decision processes 1669.3 Equivalence between SDDN’s and their condensationsThis section shows the following theorem.Theorem 9.1 A smooth regular SDDN is equivalent to its condensation.Proof: Let Al be a smooth regular SDDN and JV be its condensation. Let 7rd denotethe set of parents of d in .‘V. Let denote the set of parents of d in .iV. We knowthat = {xj. Sincelrd. = (9.85)we will sometimes abuse symbols to use and x interchangeably.Because of equation (9.85), a policy for Al is also a policy for .iV, and vice versa. Inother words, .iV and J/ have the same policy space.So, what remains to be proved is that for any policy S=Es[Alc]. (9.86)Let P5 and P denote the joint probability induced by S over the set of random anddecision nodes of Al and J/c respectively. This following lemma will be proved shortly.Lemma 9.3 For any decision node d,= Pg(w.) and P5(1rd,d) =Suppose d1,..., dk are all the decision nodes in Al. For each i, let vii, ..., Vjm beall the decision nodes in the section Al(d,d1+). We haveEs[v] = (By definition)=Chapter 9. SDDN’s and Markov decision processes 167= dj,di)fvjj(vjj) (By Lemmas 9.3 and 9.1)1rd.,d= > P9(ir, d) > d)f(ir) (9.87)Consequently, we havek mj=i=1 j=1k mj=P(r,d) (By equation (9.87))i1 j1 Wd ,d 7rv..k m=d) Pv-(d,d1+l) 1rVf ir,i=1 lTd ,d j=1 1rvk= (By equations (9.83) and (9.84))=1 1rd,dk= P(x, d)f(x, d) (x and 7rd are interchangeable.)i=1= E5[.iV9. (By definition)The theorem is therefore proved. UProof of Lemma 9.3:We now prove this lemma by induction on i. By Lemma 9.1, this lemma is true forthe case when i=1. Suppose the lemma is true for the case of i. Consider the case ofi+1:== P(7rj, d)FJr(d,d+l)(7rd1÷ I1td, d) (Lemma 9.1)1rdU{d}_1rd+l=lrd.=Chapter 9. SDDN’s and Markov decision processes 168Moreover, since Ps(dI7rd) and P(dj1rd) are completely determined by 8, they areequal. Consequently, we haveP6(rd+l, d+i) = P5(rd+1)Ps(d+1 d+1)r)cf ncr3= F5 Jr5 jU1 d+iflcDI=The lemma is therefore proved.9.4 Condensing SDDN’s in generalThis section extends the concept of condensation to cover all SDDN’s. Let us firstconsider regular SDDN’s in general. Irregular SDDN will be dealt with in Subsection9.4.3.9.4.1 SectionsLet .Af be a regular SDDN, smooth or non-smooth. Let d1, d2,..., dk be all the decisionnodes in ./V. For any i e {1,2,. . . — 1}, the section of J’f from ir to 7rd+1 , denotedby .i’./(d,d1), is the subnetwork of Al that consists of the following nodes:1. the nodes in 7rd1Ud+,,2. the nodes that are in the downstream set Y11(d,N) of ir and in the upstream setY1(d+,Al) of 7rd+1, and3. the disturbance nodes of d+1, (which are in the downstream set Y11(d+,.V) ofThe graphical connections among the nodes in the section .A/(d, d÷1), remain the sameas in Al, except that all the arcs pointing to nodes in 1rdU{ } that are not disturbancerecipients of d are removed.Chapter 9. SDDN’s and Markov decision processes 169Figure 9.30: Sections in a non-smooth regular SDDN.The nodes outside rd1U{d} are either random nodes or value nodes. The valuefunctions of all the value nodes remain the same as in .,V.The conditional probabilities of the random nodes outside rd1U{d} remain the sameas in .,V. If a random node CEtdgU{dj} is a disturbance recipients of d, then its conditionalprobability is the same as in .iV. The decision nodes in 7rdU{d} and the random nodesin U { d } that are not disturbance recipient of d are viewed as root random nodeswithout prior probabilities. Compare this definition with the definition of tail for thenon-smooth case (Section 6.2).The initial section .V(—, d1) consists of the nodes in 7rd1, the nodes in the upstreamset of it-, and the disturbance nodes of d1 (which are in the downstream set of 7rd1).This section consists of only random and value nodes, whose conditional probabilities orvalue functions of those nodes remain the same as in jV.The terminal section J’f(dk,—) is defined in the same way as in the smooth case.9(d, s)Chapter 9. SDDN’s and Markov decision processes 170Figure 9.30 shows the sections of the non-smooth regular SDDN in Figure 1.7. Unlike the case of smooth regular SDDN’s (Figure 9.27), the sections .iV(t, d) and .IV(d, s)not only share the node in 1r1, but also two other nodes— oil-underground andseismic-structure.The reader is encouraged to verify that if a regular SDDN is smooth, then its sectionsas defined here are the same as those defined in Section 9.1. To do so, one only need tokeeps in mind that in a smooth SDDN, there are no disturbance recipients.9.4.2 CondensationAs in the smooth case, in each section ./V(lrd2,ir1) we can compute the conditionalprobability (7-d. ) I d:). We can also compute ) (v d) foreach value node vj in ./V(7rd, 7rd+1), and hence the local value function f.Thecondensation of .iV is defined from the PJrd.,lT.d.+l)(1rdj+l d1)’s and the f’s inthe same way as in Section 9.2.Theorem 9.2 A regular SDDN, smooth or non-smooth, is equivalent to its condensation.Proof: Let Al be a regular SDDN, let j’f” be the output network of SMOOTHING(.A/).Then JV* is smooth and equivalent to Al. According to Proposition 9.1, which comes upnext, the condensation of Al is the same as the condensation of .A/*. Thus is equivalentto the condensation Alc of Al or equivalently of Al*. Consequently, Al is equivalent toAfc.Proposition 9.1 Let Al be a non-smooth regular SDDIV and let Al* be the output network of SMOOTHING(Al). Let d1 and d1 be two consecutive decision nodes. ThenPK.(d1,d+)(7rd+ d) = P(d,d÷1)(1r + 114, di), (9.88)Chapter 9. SDDN’s and Markov decision processes 171and= PAr(—,d1)( rd. (9.89)Furthermore, for any value node v in .iV(d, d+1),Fp.r.(d,d11)(7r1 d1) = F,%r(d,d÷l )(Tvq d3). (9.90)Proof: Given any policy 6 of .iV, let P5 be the joint probability 6 induces over all therandom and decision nodes of .AI. Then 6 is also a policy of .,V. Let P be the jointprobability 6 induces over all the random and decision nodes of Since arc reversalsdo not change the joint probability, we haveP = P5. (9.91)In particular, we haveP4K(lrdl)=P(7d+,l7rd,d) =d)= di).Consequently, we havePj...r.(_,d1)(ird = P’(7rd1)= P5(rd1)= Pj.r(_,d1)(7rd,),PJ(d,d+1)(7rd÷ Rd, d) = P(7rd41 I7rd, d)= P5(7rd+1I1r,d)= d,d+1)(7d+lir, di),andd) d)==Chapter 9. SDDN’sand Markov decision processes172The proposition is therefore proved. C9.4.3 Irregular SDDN’sThe generalization of concept of condensation to irregular SDDN’sis straightforward.The only issue thatdemands special attention is that there maybe more than onedecision node that directly succeeds a givendecision node. Thus, onewill not be able tospeak of the section from d to d+i; one canonly speak of the section starting at d.The condensation of an irregular SDDN is nota (linear) MDP since the decision nodesare not totally ordered. It is an MDP overa rooted tree.9.5 Asymmetriesand unreachable statesThis section investigates how to make use ofthe asymmetric nature of decision problerrwzto cut down the number of states for nodes in condensations. The basic idea is due t—------Qi (1993).9.5.1 Asymmetries in decision problemsIn the oil wildcatter examples, if the test isnot performed, there isno test—resu -_______The meaningfulness of the variable test-result depends on the decision made ab. —_test. This phenomenon is called asymmetry.Asymmetries are readily represented in decisiontrees. In Figure 1.2, wesee that -------_decision to test leadsthe decision maker to the upper branch of thetree, wherewill observe the test-result upon which tomake the drill decision. One the z -hand, the decision notto test leads the decision maker to the lower branch of thewhere there is no test-result. S/he has to make the drill decisionwithoutanything about the seismic-structure. Thelower branch is muchsmaller tha._.Chapter 9. SDDN’s and Markov decision processes 173upper branch because of asymmetries.Influence diagrams (and decision networks) are appreciated for compactness, intuitiveness and the ability to reveal independencies. However they cannot represent asymmetries. Artificial states have to be introduced to render the decision problems “symmetric”. The oil wildcatter problems are made “symmetric” by introducing the artificialvalue no-result for the variable test—result.Artificial states may bring about unnecessary computations. In the oil wildcatterexample, both test and test-result are parents of drill. Thus, a decision for drillhas to be computed for all possible combinations of values of test and test-result, eventhough some of those combinations, for instance test=no and test-result=open-structure,are impossible. Such unnecessary computations lead some to doubt the efficiency of influence diagrams (Lawrence 1990, Shafer 1993).Fung and Shachter (1990), Covaliu and Oliver (1992), Smith et al (1993) , and Shenoy(1993) deal with the asymmetric nature of decision problems by generalize influence diagrams to explicitly represent asymmetries. This section shows that in SDDN’s, asymmetries can uncovered even when they are not explicitly represented. We still need tointroduce artificial states; but we are able to eliminate the unnecessary computationsbrought about by those artificial states.9.5.2 Eliminating unreachable states in condensationsIn the condensation shown in Figure 9.29, x2 is a compound variable consisting of testand test-result, which are respectively shorthanded as t and tr. The variable t has twopossible values — yes or no, while tr has four — no-structure (ns), open—structure(os), closed-structure (cs) and no-result (nr). Thus, X2 has eight possible values.Since x2 is a variable in a condensation, which is a MDP, we shall refer to its possiblevalues as states.Chapter 9. SDDN’s and Markov decision processes 174Due to the asymmetric nature of the oil wildcatter problem, four of the eight statesof x2, namely (t=no, tr=ns), (t=no, tr=os), (t=no, tr=cs), and (t=yes, tr=nr), areunreachable, in the sense that their probabilities are zero. By pruning those states ofwe avoid the unnecessary computation due to the introduction of the artificial stateno-result.We now define the concept of unreachable state more rigorously. Suppose .iV is aregular SDDN and .IVC is its condensation. Let d be a decision node and let x be itsunique parent in Jfc As before, we shall use x, 7Td, and ir. interchangeably.Any policy S of induces a joint probability P over all the random and decisionnodes of .jVc. A state 3 of x is unreachable under S if P(x=3) = 0. A state of x1 isunreachable if it is unreachable under all the policies ofOne interesting question is: How can one identify unreachable states?For any state ,6 of x1, F(xi=i3) is independent of S and equals PJf(_,d1)(7rl=/3). So,it is unreachable if and only if= 0.When i > 1, the state {x=/3} is unreachable if and only if the following is true: forany policy 6’ and for any oE!d1_ and any 7E!1_,either1. P(x...=7)= 0, or2. = 0.By Lemma 9.1, we have:F(x=/3Ix.1=7,d_i=a) = PK(d_1,d)(7rd,=I@Iird_, =,d_1=o).Therefore, we haveChapter 9. SDDN’s and Markov decision processes 175Theorem 9.3 Suppose V is a regular SDDN.1. A state {xi = /3} is unreachable if and only ifPAr(_,d1)(7rd/3= 0.2. When i > 1, the state {x1=3} is unreachable if and only if for any c e d_1 andany 7 E 11rd 1 either(a) {Kd_1=7} is unreachable, or(b) P(d_1,d)(d =/3Id1_=,d_1=) = 0.This theorem suggests an obvious top-down procedure of constructing the condensation of regular SDDN’s that eliminates unreachable states along the way. One beginsby computing PV(_,d1)(7r=/3for each state 3 of x1, and deletes those states with zeroprobability. One then computesP(d1,d2)(7rd=/3I1r ,di=c) for each state /3 of x2 giveneach reachable state of x1 and each value o of d1, and deletes all the states /3 whoseconditional probabilities are always zero. And so on and so forth.So far in this section, our exposition has been in terms only regular SDDN’s. However,the idea can be extended to irregular SDDN’s in a straightforward way.This approach constructs condensations that do not contain any unreachable states.However, it also excludes the possibility of the parallel computations we mentioned inSubsection 9.2.1.9.6 A two-stage algorithm for evaluating SDDN’sThe concept of condensation leads to the following two-stage approach for evaluatingregular SDDN’s.Procedure EVALUATE2(V)Chapter 9. SDDN’s and Markov decision processes 176• Input: .Af an SDDN.• Output: The optimal policy and the optimal expected value of .iV.1. Compute the condensation .jV of iV.2. Evaluate ).fc by the folding back strategy.The procedure EVALUATE2 can be viewed as a modification of EVALUATE1, whereall the Bayesian network inferences (BNI’s) are grouped into one stage, i.e the stage ofcondensation. EVALUATE2 is better than EVALUATE1 in terms of modularity. As amatter of fact, EVALUATE2 can be implemented in three modules: one for managingsections of SDDN’s, one for doing BNI’s, and one for evaluating MDP’s. MDP’s havebeen studied in Dynamic programming for a long time, BNI’s have also been underextensive investigation during the last decade. Good algorithms exist for both MDP’sand BNI’s. This leaves us with only the module for managing sections of SDDN’s.Besides modularity, a more important of advantage of EVALUATE2 over EVALU—ATE1 is, as pointed out in Section 9.5, that EVALUATE2 enables us to exploit theasymmetric nature of decision problems. Also EVALUATE2 facilitates parallel processing (Section 9.2.1). As we pointed out Section 9.5, one can only have one of those twoadvantages; they do not co-exist.Zhang et al (1993b) presents yet another advantage of EVALUATE2, namely EVALUATE2 also leads an incremental method of computing the value of perfect information.9.6.1 Comparison with other approachesTwo two-stage algorithms for evaluating influence diagrams have been proposed byHoward and Matheson (1984) and Cooper (1989).To evaluate an influence diagram, Howard and Matheson (1984) first transforms theChapter 9. SDDN’s and Markov decision processes 177diagram into a decision tree, and then evaluate the decision tree. Our approach transforman SDDN into an MDP, instead of a decision tree. In Howard and Matheson’s transformation, every random node in the influence diagram corresponds to one level in thedecision tree, while in our approach all the random nodes, except for those that are observed at some point, are summed out in the condensation process. Also, condensing anSDDN into an MDP does not lose independencies for decision nodes, while transformingan SDDN into a decision tree would.To understand the approach by Cooper (1989), consider an influence diagram .A1. Letv be the only one value node and let the decision nodes be d1, d2, ..., dk. Given a policy6, let P6 be the joint probability 6 induces over all the random and decision nodes. It isimplicit in Cooper (1989) that P5(7rd+1 d:) does not depend on 6; and hence can bewritten as P(7rd+1I7td, di).Recursively define a series of functions gj (1ik) bygk(rd) =jf maxdk > fV(1rV)P(7rV7rdk,dk), (9.92)lrdk —lrt,f is the value function for v; and for any i<kg() def maxd di). (9.93)lrd+lThe following proposition is given by Cooper (1989).Proposition 9.2 For any iE{1, 2,. . . , k}, the optimal policy 6° for d: is given by= argmax > g+1(7rd11)P(7r I7r, d:).1Td÷l 1rdThe optimal expected value E[JV] is given byE[JV1 g1(7rd).Chapter 9. SDDN’s and Markov decision processes 178Proof: Consider the condensation Jfc of .A[. The probability F(lrd1+ d2) is the sameas P’(7rd1+ d:). The function gi is the value function of v, and the value functionsof all the other value node in .A/c are zero. The proposition thus follows from the factthat Jv” is an MDP. QTo make a comparison, it has been pointed out in this thesis that the conditionalprobability P(7rd1 ir, d) can be computed locally in the section .iV(d, d÷1). Moreimportantly, we have generalized Proposition 9.2 from influence diagrams to SDDN’s(Theorems 9.1 and 9.2).Chapter 10Conclusion10.1 SummaryThis thesis has been about how to use Bayesian decision theory to solve decision problems that involve multiple decisions and multiple variables. Here is a summary of ourcontributions.First of all, the concept of a decision network has been developed from a general wayfor stating decision problems and a standard Bayesian decision theory setup. A decisionnetwork is a representation of a decision problem together with knowledge (belief) andutilities necessary for its solution. We have argued that decision networks are the mostgeneral representation framework for the purpose of solving what we call multiple-decisionproblems in Bayesian decision theory. In particular, decision networks are more generalthan influence diagrams.The evaluation of a decision network requires, in general, an exhaustive search throughthe whole policy space. A concept of decomposability has been introduced for decisionnetworks and it has been shown that this decomposability leads to a divide and conquerstrategy.From a computational point of view, it is most desirable if a decision network isstepwise-solvable, i.e if it can be evaluated by considering one decision node at a time.We have introduced stepwise-decomposable decision networks (SDDN’s) and have shownthey can be evaluated not only by considering one decision node at a time, but also by179Chapter 10. Conclusions 180considering only one portion, usually a small portion, of the network at time.We have also shown that stepwise-decomposability is the weakest graphical criterionthat guarantees stepwise-solvability.The problem of evaluating SDDN’s has been studied in detail. A number of importantconcepts, such as simple semi-decision networks, body, tail, and smoothness have beenidentified. A procedure named EVALUATE1 has been proposed. The advantages of thisprocedure include the adoption of the divide and conquer strategy, a clear separation ofBayesian network inferences, and minimal numerical divisions.We have introduced the concept of decision nodes being conditionally independent ofpart of available information, and have shown its equivalence to the concept of removablearcs. An algorithm has been designed that is able to find and prune all the removablearcs in an SDDN that can be identified graphically without resorting to numerical computations.Finally, we have investigated the relationship between SDDN’s and Markov decisionprocesses. Finite stage Markov decision processes are special SDDN’s, and SDDN’s canbe condensed into Markov decision processes. This relationship leads to a two-stageapproach for evaluating SDDN’s. Besides providing an even cleaner interface betweendecision network evaluation and Bayesian network inferences than EVALUATE1, thisapproach is able to make use of the asymmetric nature of decision problems, facilitatesparallel computation, and gives rise to an incremental way of computing the value ofperfect information.Chapter 10. Conclusions 18110.2 Future work10.2.1 Application to planning and control under uncertaintyHigh on our list of future research is to apply the theory of decision networks to the areaof planning and control under uncertainty, and thereby further develop the theory.The history of influence diagrams is short (Matheson and Howard 1984, Miller etal 1976); and using influence diagrams to represent dynamic systems for the purposeof planning and control is a development over the last three or four years (Dean andWeliman 1991). Many issues remains to be addressed.To get a feeling about some of the issues, let us consider the mobile target localizationproblem taken from Dean et al (1990) with some minor changes. As shown in Figure10.31, there is a mobile target, and there is a robot that tracks the target. The targetlocation as observed by the robot may be different from the actual target location. Ateach time point, the robot makes a decision as to what location to report based on theobserved target location. There is a payoff depending on how accurate the report is; theoverall payoff is the sum of the payoffs of all the time points. The robot also makes atracking decision according to its own location and the observed target location. Thelocation of the robot at the next time point depends on its location and the trackingdecision made at the current time point.In this example, the decisions at time point t2, for instance, depend on the observedtarget location at time point ti, because it helps the robot to better estimate the actualtarget location at time point t2. The decisions at time point t20 depend on the observedtarget locations at all the previous time points, as indicated by the dotted arcs. So,the decision nodes at time point t20 have 21 parents, and their decision table have n21entries, where n stands for the number of possible locations. A decision table of such asize can neither be computed nor be stored.Chapter 10. Conclusions 182Trkh,gdi.IonRblotoob.ervedtrIotIOntgtotio,pOrtt3 L20 t21Figure 10.31: Mobile target localization.As we mentioned in the introduction, a reasonable thing to do here is to assume decision at any time point depends on only, for instance, two previous time points, whilebeing independent of all the other time points. This kind of independence assumptionsfor decision node can be make in decision networks because they are able to representindependencies for decision nodes. One important issue in applying the theory of decision network to planning and control is how can one guarantee that those independenceassumptions yield decisions with acceptable bounds from the optimal?When evaluating decision alternatives at a certain time point, one looks into thefuture to see what states of affairs each of the alternative may result in and with whatcertainties. Another issue in applying the theory of decision network to planning andcontrol is how many time points should one looks into the future? How should onediscount the importance of future time points that are far from the present?The above two issues are about reducing the complexities of planning and controlproblems in the time dimension. Another dimension in which one can explore the opportunities for reducing complexities is the dimension of problem structure. The graphtheoretical language of decision networks allows us to capture, at the level of relation, theChapter 10. Conclusions 183dependence and independence relationships among nodes. A step further is to explorethe so-called inter-causal indepedencies (e.g. Heckerman 1993), which basically says thatseveral caused independently contribute to an effect. Inter-causal independence is at thenumerical level.Let k be the number of parents of a random node c. Suppose all variables are binary.Then, the conditional probability of c requires 2’’—l numbers to specify. Assessingthose numbers and using them in inferences is hard when k is large. However, if theconditional probability of c satisfies the so-called noisy OR-gate model— an inter-causalindependence model— then we need only k numbers to specify the conditional probability (Pearl 1988). There is a growing research interest in making use of inter-causalmodels to reduce the complexities of knowledge acquisition and inference in Bayesiannetworks and Influence diagrams (e.g. Heckerman 1993). Much work remain to be donein this direction.One specific idea we have is to treat the noisy OR-gate model in Dempster-Shafertheory. Dempster-Shafer theory is a theory about combining evidence from independencesources, and is thus closely related to the concept of inter-causal independence.10.2.2 Implementation and experimentationAnother thing to do in the future is implementation. The procedure EVALUATE1 canprobably be implemented on top of IDEAL, a software package for analysis of influencediagrams developed by Srinivas and Breese (1990). For the sake of TEST-STEPWISEDECOMPOSABILITY and PRUNE-REMOVABLES, it is a good idea to keep the skeleton of a decision network separate from its numerical components, namely its conditional probabilities and value functions. The implementation of EVALUATE2 may bemore involved if the functionalities of parallel processing, exploitation of asymmetriesand incremental computation of value of information are all to be materialized.Chapter 10. Conclusions 184In the thesis, we have argued the that EVALUATE1 and EVALUATE2 are advantageous over all the previous algorithms for evaluating influence diagrams. Experimentsneed to be performed to substantiate this claim.10.2.3 Extension to the continuous caseThis thesis has only dealt with discrete variables. It would be interesting to extend ourtheory so that it can also handle continuous variables. To achieve this extension, thefollowing three issues need to be addressed.The first issue is the existence of an optimal policy. This is not an issue in thediscrete case. Since there are only finitely many possible policies, one of them must beoptimal. In the continuous case, however, there may be infinitely many possible policies;the existence of an optimal policy is not obvious.The second issue is integration. In the discrete case, we sum out variables. In thecontinuous case, we need to integrate out variables. While summation is provided inmost programming languages, integration is not.The third issue is the operation of finding the maxima of a function. This can be doneby exhaustive enumeration in the discrete case. In the continuous case, more advancedtechniques need to be used.10.2.4 Multiple agentsWe have said that decision network is able to represent multiple cooperative agentswith a common goal. We have also given two examples in this regard. However, muchfoundational work remains to be done to ensure that the optimal decision policies asdefined in this thesis is indeed optimal in particular applications.One can also consider game theory situations where agents are adversaries of eachother. One may come out with some kind of game network based on game trees in a wayChapter 10. Conclusions 185similar to how decision networks grew out of decision trees.Bibliography[1] M. Baker and T. E. Boult (1990), Pruning Bayesian networks for efficient computation, in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence,July, Cambridge, Mass., pp. 257 - 264.[2] K. Basye, T. Dean, and J. S. Vitter (1989), Coping with uncertainty in map learning,in Proceedings IJCAI 11, pp. 663-668.[3] D. E. Bell, H. Raiffa, and A. Tversky (1988), Decision Making: Descriptive, Normative, and prescriptive interactions, Cambriage University Press.[4] U. Bertelè and F. Brioschi (1972), Nonserial dynamic programming, Mathematics inScience and Engineering, Vol. 91, Academic Press.[5] D. P. Bertsekas (1976), Dynamic Programming and Stochastic Control, Mathematicsin Science and Engineering, Vol. 125, Academic Press.[6] J. Breese (1991). Construction of belief and decision networks, Technical Report,Rockwell International Science Center.-[7] K. Chang and R. Fung (1990). Refinement and Coarsening of Bayesian networks, inProceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, July,Cambridge, Mass., pp. 475-482.[8] J. R. Clarke and G. M. Provan (1992), A dynamic decision-theoretic approach tothe management of GVHD, in Proc. AAAI Spring Symposium on Al and Medicine.[9] G. F. Cooper (1989), A method for using belief networks as influence diagrams,in Proceedings of the fifth Conference on Uncertainty in Artificial Intelligence, pp.55-63.[10] G. F. Cooper (1990) The computational complexity of probabilistic inference usingBayesian belief networks, Artificial Intelligence, 42, pp. 393-405.[11] G. F. Cooper (1990), Bayesian belief-network inference using recursive decomposition, Report No. KSL 90-05, Knowledge Systems Laboratory, Medical ComputerScience, Stanford University.[12] Z. Covaliu and R. M. Oliver (1992), Formulation and solution of decision problemsusing decision diagrams, Working paper, University of California at Berkeley.186Bibliography 187[13] T. L. Dean and K. Kanazawa (1989), A model for reasoning about persistence andcausation, Computational Intelligence, 5(3), pp. 142-150.[14] T. L. Dean and M. P. Weliman (1991), Planning and Control, Morgan Kaufmann.[15] E. V. Denardo (1982), Dynamic Programming: Models and Applications PrenticeHall.[16] P. Dagum and R. M. Chavez (1993), Approximating probabilistic inference inBayesian belief networks, Pattern Analysis and Machinr Intelligence, VoL 15, 3,pp. 246-255.[17] J. W. Egar and M. A. Musen (1993), Graph-grammar assistance for Automated generation of influence diagrams, in Proceedings of the Ninth Conference on Uncertaintyin Artificial Intelligence, pp. 235-243.[18] K. J. Ezawa (1986), Efficient evaluation of influence diagrams, Ph.D. Thesis, Dept.of Engineering-Economics Systems, Stanford University.[19] K. J. Ezawa (1992) Technology planning for advanced telecommunication services:a computer-aided approach, Telematics and Informatics, 19, No. 2.[20] P. C. Fishburn (1988), Normative theories of decision making under risk and underuncertainty, in Bell et al 1988.[21] K. W. Fertig and J. Breese (1993), Probability intervals over influence diagrams,Pattern Analysis and Machinr Intelligence, Vol. 15, 3, pp. 280-287.[22] R. M. Fung and R. D. Shachter (1990), Contingent Influence Diagrams, AdvancedDecision Systems, 1500 Plymouth St., Mountain View, CA 94043, USA.[23] P. Gördenfors and N. Sahlin (1988a), Decision, Probability, and Utility: SelectedReadings, Cambridge University Press.[24] P. Gördenfors and N. Sahlin (1988b), Introduction: Bayesian decision theory — foundations and problems, in P. Gördenfors and N. Sahlin (1988)a.[25] D. Geiger, T. Verma, and J. Pearl (1990), d-separation: From theorems to algorithms, in Uncertainty in Artificial Intelligence 5, pp. 139-148.[26] R. P. Goldman and E. Cherniak (1993), A language for construction belief networks,Pattern Analysis and Machinr Intelligence, Vol. 15, 3, pp. 196-208.[27] M. Goldzsmith, P. Morris, and J. Pearl (1993), A maximum entropy approach tononmonotonic reasoning, Pattern Analysis and Machinr Intelligence, Vol. 15, 3, pp.220-232.Bibliography 188[28] I. Good (1961), A causal calculus (I). British Journal of Philosophy of Science, 11,pp. 305-318[29] N. A. J. Hastings and J. M. C. Meflo (1977), Decision networks, John Wiley & Sons.[30] D. Heckerman (1993), Canal indepedence for knowledge acquisition and inference,in Proceedings of the Nitth Conference on Uncertainty in Artificial Intelligence, pp.122-127.[31] D. Heckerman and E. Horvitz (1990), Problem formulation as the reduction of adecision model, in Proceedings of the Sixth Conference on Uncertainty in ArtificialIntelligence, July, Cambridge, Mass., pp. 82-89.[32] M. Henrion (1987), Some practical issues in constructing belief networks, in L. Kanal,T. Levitt, and J. Lemmer (eds.) Uncertainty in Artificial Intelligence, 3, pp. 161-174,North- Holland.[33] M. Horsch and D. Poole (1991), A dynamic approach to probabilistic inferenceusing Bayesian networks, in Proceedings of the Sixth Conference on Uncertainty inArtificial Intelligence, July, Cambridge, Mass., pp. 155-161.[34] J. Hosseini (1968), Decision analysis and its application in the choice between twowildcat oil ventures, Interfaces, 16, pp. 75-85.[35] R. A. Howard, and J. E. Matheson (1984), Influence Diagrams, in The principlesand Applications of Decision Analysis, Vol. II, R. A. Howard and J. E. Matheson(eds.). Strategic Decisions Group, Menlo Park, California, USA.[36] R. A. Howard (1990), From influence to relevance to knowledge, in [50].[37] F. V. Jensen, K. G. Olesen, and K. Anderson (1990), An algebra of Bayesian beliefuniverses for knowledge-based systems, Networks, 20, pp. 637 - 659.[38] Kanazawa (1991), A logic and time nets for probabilistic inference, In ProceedingsAAAI-91.[39] J. Kim and J. Pearl (1983), A computational model for causal and diagnostic reasoning in inference engines, in Proceedings of the Eigth International Joint Conferenceon Artificial Intelligence, Karlsruhe, Germany, pp. 190-193.[40] U. Kjrulff (1990), Triangulation of Graphs - Algorithms giving small total statespace, R 90-09, Institute for Electronic Systems, Department of Mathematics andComputer Science, Strandvejen, DK 9000 Aalborg, Denmark.Bibliography 189[41] P. Klein, A. Agrawal, A. Ravi, and S. Rao (1990), Approximation through multi-commodity flow, in Proceedings of 31st Symposium on Foundations of ComputerScience, pp. 726-737.[42] S. L. Lauritzen and D. J. Spiegehalter (1988), Local computations with probabilitieson graphical structures and their applications to expert systems, Journal of RoyalStatistical Society B, 50: 2, pp. 157 - 224.[43] S. L. Lauritzen, A. P. Dawid, B. N. Larsen, and H. G. Leimer (1990), IndependenceProperties of Directed Markov Fields, Networks, 20, pp. 491-506.[44] T. S. Levitt, T. 0. Binford, G. J. Ettinger, and P. Gelband (1988), Utility-basedcontrol for computer vision, Proceedings of the Fourth Conference on Uncertainty inArtificial Intelligence, pp. 245-256.[45] T. S. Levitt, J. M. Agosta, and T. 0. Binford (1989), Model-Based influence diagrams for machine vision, Proceedings of the Fifth Conference on Uncertainty inArtificial Intelligence, pp. 233-244.[46] A. C. Miller, M. W. Merkhofer, R. A. Howard, J. E. Matheson, and T. R. Rice(1976), Development of automated computer aids for decision analysis, TechnicalReport 3309, SRI International, Menlo Park, Calif.[47] J. E. Matheson (1990), Unsing influence diagrams to value information and control,in [50].[48] W. W. North (1968), A tutorial introduction to decision theory, reprinted in [75].[49] P. Ndilikilikesha (1991), Potential Influence Diagrams, Working Paper No. 235, Business School, University of Kansas.[50] R. M. Oliver and J. Q. Smith eds. (1990), Influence Diagrams, Belief Nets andDecision Analysis, John Wiley and Sons.[51] J. Pearl (1988), Probabilistic Reasoning in Intelligence Systems: Networks of Plausible Inference, \organ Kaufmann Publishers, Los Altos, CA.[52] L. D. Phillips (1990), Discussion of From Influence to Relevance to Knowledge byR. A. Howard, in [50].[53] D. Poole and E. Neufeld (1991), Sound probabilistic inference in Prolog: An executable specification of Bayesian networks, Department of Computer Science, University of British Columbia, Vancouver, B. C., V6T 1Z2, Canada.Bibliography 190[54] D. Poole (1992), Search for Computing posterior probabilities in Bayesian networks,Technical Report 92-24, Department of Computer Science, University of BritishColumbia, Vancouver, Canada.[55] 0. M. Provan (1991), Dynamic network updating techniques for diagnostic reasoning, in Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence.[56] G. M. Provan and D. Poole (1991), The utility of consistency-based diagnostic techniques, in The Proc. of the 2ns Conference on the principles of Knowledge representation.[57] G. M. Provan and J. R. Clarke (1993), Dynamic network construction and updatingtechniques for the diagnosis of acute abdominal pain, Pattern Analysis and MachineIntelligence 15, pp. 299-307.[58] M. L. Puterman (1990), Markov decision processes, in D. P. Heyman and M. J. Sobel(eds.), Handbooks in OR é4 MS., Vol. 2, pp. 331-434, Elsevier Science Publishers.[59] R. Qi and D. Poole (1992a), Two algorithms for decision tree search, PRICAI’92,Seoul, Korea, pp. 121-127.[60] R. Qi (1993), Decision graphs: algorithms and applications, Ph.D Thesis, Department of Computer Science, University of British Columbia, under preparation.[61] H. Raiffa, (1968), Decision Analysis, Addison-Wesley, Reading, Mass.[62] D. J. Rose (1970), Triangulated graphs and the elimination process, Journal ofMathematical Analysis and Applications, 32, pp 597-609.[63] L. J. Savage (1954), The foundations of statistics, Wiley, New York.[64] R. 0. Schlaifer (1967), Analysis of decisions under uncertainty (preliminary edition),McGraw-Hill.[65] R. Shachter (1986), Evaluating Influence Diagrams, Operations Research, 34, pp.871-882.[66] R. Shachter (1988), Probabilistic Inference and Influence Diagrams, Operations Research, 36, pp. 589-605.[67] R. Shachter (1990), An ordered examination of influence diagrams, Networks, Vol.20, 5, pp. 535-564.Bibliography 191[68] R. D Shachter, S. K. Andersen, and P. Szolovits (1992), The equivalence of exactmethods for probabilistic inference in belief networks, Department of Engineering-Economic Systems, Stanford University.[69] R. D. Shachter, B. D’Ambrosio, and B. A. Del Favero (1990), Symbolic ProbabilisticInference in Belief Networks, in AAAI-90, pp. 126-131.[70] Shachter and Peot (1992), Decision making using probabilistic inference methods, inProc. of 8th Conference on Uncertainty in Artificial Intelligence, July 17-19, Stanford University, pp. 276-283.[71] G. Shafer (1988), Savage revisited, in Bell et al 1988.[72] G. Shafer and P. Shenoy (1988), Local computation in hypertrees, Working PaperNo. 201, Business School, University of Kansas.[73] C. Shafer (1990), Decision making: introduction to Chapter 3 of [75].[74] G. Shafer (1993), Probabilistic Expert Systems, to be published by SIAM.[75] G. Shafer and J. Pearl (1990), Reading in uncertainty reasoning, Morgan KaufmannPublishers, San Mateo, California.[76] P. P. Shenoy, (1990), Valuation-Based Systems for Bayesian Decision Analysis,Working Paper No. 220, Business School, University of Kansas.[77] P. P. Shenoy, (1992), Valuation-Based Systems for Bayesian Decision Analysis, Operations research, 40, No. 3, pp. 463-484.[78] P. P. Shenoy, (1993), Valuation network representation and solution of asymmetricdecision problems, work paper No. 246, Business School, University of Kansas.[79] J. E. Smith, S. Holtzman, and J. E. Matheson (1993), Structuring conditional relationships in influence diagrams, Operations Research, 41, No. 2, pp. 280-297.[80] J. Q. Smith (1988), Decision Analysis: a Bayesian Approach, Chapman and Hall.[81] J. Q. Smith (1989), Influence Diagrams for Statistical Modeling, Ann. Stat., 17, pp.654-672.[82] T. P. Speed (1991), Complexity, calibration and causality in influence diagrams, in[50].[83] S. Srinivas and J. Breese (1990), IDEAL: A software package for analysis of influence diagrams, in Proceedings of the Sixth Conference on Uncertainty in ArtificialIntelligence, July, Cambridge, Mass.,pp. 212-219.Bibliography 192[84] J. A. Tatman and R. Shachter (1990), Dynamic programming and influence diagrams, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 20, pp. 265-279.[85] J. von Neumann and 0. Morgenstein (1944), Theory of games and economic behaviours, Princeton University Press. 2nd eed., 1947, 3rd ed., 1953.[86] M. Weilman (1990), Formulation of Tradeoffs in Planning under Uncertainty, Pit-man, London.[87] M. Weilman, J. Breese, and R. Goldman (1992), From knowledge bases to decisionmodels, Knowledge Engineering Review, 7(1).[88] L. Zhang (1993), Studies on hypergraphs (I): Hyperforests, Discrete Applied J’vlathematics, 42, pp. 95-112.[89] L. Zhang and D. Poole (1992) Sidestepping the triangulation problem in Bayesian netcomputations, in Proc. of 8th Conference on Uncertainty in Artificial Intelligence,July 17-19, Stanford University, pp. 360-367.[90] L. Zhang and D. Poole (1992), Stepwise-Decomposable Influence Diagrams, in TheProc. of the 3rd Conference on the Principles of Knowledge Representation, Cambridge, Mass. USA, October 26-29, 1992.[91] L. Zhang, Runping Qi and D. Poole (1993a), Minimizing Decision Tables in Influence Diagrams, in The Proc. of the Fourth International Workshop on ArtificialIntelligence and Statistics, Ft. Lauderdale, Florida, January 3-6, 1993.[92] L. Zhang, Runping Qi and D. Poole (1993b), Incremental computation of the valueof perfect information in stepwise-decomposable influence diagrams, in Proc. of 9thConference on Uncertainty in Artificial Intelligence, pp. 400-409.[93] L. Zhang, Runping Qi and D. Poole (1993c), A computational theory of decisionnetworks, accepted for publication on International Journal of Approximate Reasoning.Indexaccompanied arc, 121acyclicity constraint, 38arc reversal in decision network, 89arc reversal in skeleton, 89arc: accompanied, 121arc: lonely, 121arc: potential lonely, 124arc: removable, 110, 119arc: reversal in decision network, 89arc: reversal in skeleton, 89arc: reversible, 89asymmetry, 172barren node, 124Bayesian network, 35, 46Bayesian network: induced from a decision network by a policy, 52Bayesian network: semi-Bayesian network:induced from a decision networkby a policynetwork by a policy, 60Bayesian network: semi-Bayesian networks,47body of decision network, 103body of decision network skeleton, 101body: of decision network, 72compexity, 59complexity, 78, 106, 126condensation, 163, 170constraint: acyclicity constraint, 14constraint: no-children-to-value-node constraint, 14constraint: no-forgetting constraint , 14constraint: regularity constraint, 14constraint: single value node constraint,14decision function (table), 39, 51decision function (table): optimal, 53decision function space, 51decision network, 4, 14, 41, 50decision network skeleton, 50decision network skeleton: stepwise-solvable,58decision network: evaluation, 53decision network: over a skeleton, 51decision network: decomposable, 65decision network: equivalent, 88193Index 194decision network: normal, 128 equivalent decision networks, 88decision network: semi-decision network, evaluation functional, 82, 10259 expansion ordering of a joint probability,decision network: simple semi-decision net- 34work, 79 expansion ordering: conforms to a multiple-decision network: stepwise-decomposable, decision problem, 3972 expected utility: induced by a policy, 7decision network: stepwise-solvable, 58 expected utility: principle of maximizingdecision node: directly precedes another, the expected utility, 7158 expected value, 52decision node: directly succeeds another, expected value: conditional: in semi-decision158 network, 60decision node: independent of a parent, expected value: in semi-decision network,119 60decision node: replace, 57 expected value: optimal, 53decision root node, 150 expected value: optimal conditional: indisturbance arc, 90 semi-decision network, 60disturbance arc: most senior, 92 expected value: optimal: in semi-decisiondisturbance node, 90 network, 60disturbance recipient, 91frame, 39downstream component of decision network skeleton, 65 influence diagram, 2downstream component: of decision net- initial section, 158, 169work, 66 irrelevant parent of decision node, 119downstream set, 63.joint probability: by a policy, 52downstream-value node, 65Index 195joint probability: of a Bayesian network,47leaf disturbance node, 92local value function, 162lonely arc, 121rn-separation, 50Markov decision processe, 2moral graph, 49multi-decision problem: precedence in, 38multiple-decision problem, 38barren, 49, 124decision, 50decision nodes, 12deterministic, 57downstream-value, 65leaf decision node, 76leaf decision node: weak, 76one decision node precedes another,76node: one decision node weakly precedesanother, 76node: potential barren, 124node: random,50node: random node, 12node: remove from a semi-Bayesian network, 48node: stepwise-decomposability (SD) candidate node, 72node: stepwise-solvability (SS) candidatenode, 57node: tail-value, 72node: value, 50node: value node, 12normal decision network, 128optimal expected value, 7policy, 6, 39, 51policy space, 39, 51policy: optimal, 53policy: optimal conditional: in semi-decisionnetwork, 61policy: optimal: in semi-decision network,potential barren node, 124potential lonely arc, 124potential SD candidate node, 156potential: prior joint, 48probability: conditional probability, 47probability: marginal probability, 47property Q(A), 145node:node:node:node:node:node:node:node:60Index 196property Q1(A), 145 uniformity of semi-decision networks, 61••unreachable state, 174recursive tail cutting, 79upstream component: of decision network,removable arc, 11066removable removable, 119• upstream component: of decision networkreversible arc, 89skeleton, 65root decision node, 142upstream set, 63SDDN, 72section, 168 value (utility) function, 51section in a SDDN, 158separation, 50set: ancestral, 49shor-cutting random node, 130simple semi-decision network, 79smoothness, 64smoothness: of decision networks, 75stepwise-decomposability (SD) candidatenode, 72stepwise-decomposable decision network,72tail of decision network, 101tail of decision network skeleton, 100tail-value node, 72tail: of decision network, 72terminal section, 159, 169transition probability, 164
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A computational theory of decision networks
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A computational theory of decision networks Zhang, Nevin Lianwen 1994
pdf
Page Metadata
Item Metadata
Title | A computational theory of decision networks |
Creator |
Zhang, Nevin Lianwen |
Date Issued | 1994 |
Description | This thesis is about how to represent and solve decision problems in Bayesian decision the ory (e.g. Fishburn 1988). A general representation named decision networks is proposed based on influence diagrams (Howard and Matheson 1984). This new representation incorporates the idea, from Markov decision processes (e.g. Puterman 1990, Denardo 1982), that a decision may be conditionally independent of certain pieces of available information. It also allows multiple cooperative agents and facilitates the exploitation of separability in the utility function. Decision networks inherit the advantages of both influence diagrams and Markov decision processes. Both influence diagrams and finite stage Markov decision processes are stepwise solvable, in the sense that they can be evaluated by considering one decision at a time. However, the evaluation of a decision network requires, in general, simultaneous consider ation of all the decisions. The theme of this thesis is to seek the weakest graph-theoretic condition under which decision networks are guaranteed to be stepwise-solvable, and to seek the best algorithms for evaluating stepwise-solvable decision networks. A concept of decomposability is introduced for decision networks and it is shown that when a decision network is decomposable, a divide and conquer strategy can be utilized to aid its evaluation. In particular, when a decision network is stepwise-decomposable it can be evaluated not only by considering one decision at a time, but also by considering one portion of the network at a time. It is also shown that stepwise-decomposability is the weakest graphical condition that guarantees stepwise-solvability. Given a decision network, there are often nodes and arcs that can harmlessly removed. An algorithm is presented that is able to find and prune all graphically identifiable removable nodes and arcs. Finally, the relationship between stepwise-decomposable decision networks (SDDN‘s) and Markov decision process is investigated, which results in a two-stage approach for evaluating SDDN’s. This approach enables one to make use of the asymmetric nature of decision problems, facilitates parallel computation, and gives rises to an incremental way of computing the value of perfect information. |
Extent | 3458370 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-04-08 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051475 |
URI | http://hdl.handle.net/2429/6932 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1994-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1994-893761.pdf [ 3.3MB ]
- Metadata
- JSON: 831-1.0051475.json
- JSON-LD: 831-1.0051475-ld.json
- RDF/XML (Pretty): 831-1.0051475-rdf.xml
- RDF/JSON: 831-1.0051475-rdf.json
- Turtle: 831-1.0051475-turtle.txt
- N-Triples: 831-1.0051475-rdf-ntriples.txt
- Original Record: 831-1.0051475-source.json
- Full Text
- 831-1.0051475-fulltext.txt
- Citation
- 831-1.0051475.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051475/manifest