DECISION GRAPHS:Algorithms and Applications to Influence Diagram Evaluation and High-Level PathPlanning Under UncertaintyByRunping QiB.Sc. Inner Mongolia University, China, 1982M.Sc. Changsha Institute of Technology, China, 1986A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIES(DEPARTMENT OF COMPUTER SCIENCE)We accept this thesis as conformingTHE UNIVERSITY OF BRITISH COLUMBIAApril 1994to the tiuir d standard....©Runping Qi, 1994In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.The University of British ColumbiaVancouver, CanadaDate(SignatureDepartment ofDE-6 (2188)AbstractDecision making under uncertainty has been an active research topic in decision theory, operations research and Artificial Intelligence. The main objective of this thesisis to develop a uniform approach to the computational issues of decision making under uncertainty. Towards this objective, decision graphs have been proposed as anintermediate representation for decision making problems, and a number of searchalgorithms have been developed for evaluating decision graphs. These algorithms arereadily applicable to decision problems given in the form of decision trees and in theform of finite stage Markov decision processes.In order to apply these algorithms to decision problems given in the form of influence diagrams, a stochastic dynamic programming formulation of influence diagramevaluation has been developed and a method to systematically transform a decisionmaking problem from an influence diagram representation to a decision graph representation is presented. Through this transformation, a decision making problemrepresented as an influence diagram can be solved by applying the decision graphsearch algorithms. One of the advantages of our method for influence diagram evaluation is its ability to exploit asymmetry in decision problems, which can result inexponential savings in computation.Some problems that can be viewed as decision problems under uncertainty, butare given neither in the form of Markov decision processes, nor in the form of influencediagrams, can also be transformed into decision graphs, though this transformation islikely to be problem—specific. One problem of this kind, namely high level navigationin uncertain environments, has been studied in detail. As a result of this case study,a decision theoretic formulation and a class of off—line path planning algorithms forthe problem have been developed.Since the problem of navigation in uncertain environments is of importance inits own right, an on—line path planning algorithm with polynomial time complexityfor the problem has also been developed. Initial experiments show that the on—linealgorithm can result in satisfactory navigation quality.11ContentsAbstract iiContents iiiList of Figures viiList of Tables ixAcknowledgement x1 Introduction 11.1 General Background and Thesis Objective 11.2 Our Methodology 21.3 Thesis Overview 31.4 Summary of Thesis Contributions 81.5 Thesis Organization 9I ALGORITHMS FOR DECISION GRAPH SEARCH 112 Decision Graphs and Finite—Stage Markov Decision Processes 122.1 Decision Graphs 122.2 Finite—Stage Markov Decision Processes 162.3 Representing Finite—Stage Markov Decision Processes by Decision Graphs 183 Decision Graph Search 203.1 Depth—First Heuristic—Search Algorithms 213.1.1 Algorithm DFS 213.1.2 The effect of heuristic functions 283.1.3 Tree ordering 283.1.4 Using inadmissible heuristic functions 30111II INFLUENCE DIAGRAM EVALUATION 644 Decision Analysis and Influence Diagrams4.1 Bayesian Decision Analysis4.2 Decision Trees4.3 Influence Diagrams4.4 Disadvantages of Influence Diagrams4.5 Previous Attempts to Overcome the Disadvantages4.6 Our Solution5 Formal Definition of Influence Diagrams5.1 Influence Diagrams5.2 Our Extension5.3 Influence Diagram Evaluation656567727779815.4 No—Forgetting and Stepwise Decomposable Influence Diagrams. 866 Review of Algorithms for Influence Diagram Evaluation6.1 Howard and Matheson’s Two—Phase Method6.2 Methods for Evaluating Influence Diagrams Directly6.2.1 Shachter’s algorithm6.2.2 Other developments6.2.3 Some common weaknesses of the previous algorithms7 A Search—Oriented Algorithm7.1 Preliminaries7.2 Influence Diagram Evaluation via Stochastic Dynamic Programming.7.3 Representing the Computational Structure by Decision Graphs . .7.4 Computing the Optimal Solution Graph7.4.1 Avoiding unnecessary computation3.1.5 An anytime version of DFS 323.1.6 Exploiting shared structures in decision graphs 333.2 Applying AO* to Decision Graph Search 383.3 Iterative Deepening Search 463.3.1 Depth—bounded iterative deepening 473.3.2 Cost—bounded iterative deepening 473.3.3 Generic iterative deepening 493.3.4 Co—routines 493.4 Summary 503.5 Proofs of Theorems 52838384859090969698100102102106115120121iv7.4.2 Using heuristic algorithms 1247.4.3 A comparison with Howard and Matheson’s method 1247.5 How Much Can Be Saved7 1257.5.1 An analysis of a class of problems 1257.5.2 A case analysis of the used car buyer problem 1318 Handling Influence Diagrams with Multiple Value Nodes 1348.1 Separable Value Functions 1358.2 Decision graphs for influence diagrams with multiple value nodes. 1368.3 Summary 143III NAVIGATION IN UNCERTAIN GRAPHS 1449 U—graph based navigation 1469.1 Motivation 1469.1.1 High—level navigation for autonomous agents 1469.1.2 Packet routing in computer networks 1509.2 U—graphs 1539.3 Representing Uncertain Environments in U—graphs 1599.4 U—graph Based Navigation 1639.4.1 Distance—graph based navigation vs. U—graph based navigation 1649.4.2 The optimality issue 1659.4.3 The possibility of information purchase 1669.5 Related Work 1699.5.1 Related work in path planning 1699.5.2 Canadian Traveler Problems 1709.5.3 Related work in AT and decision theory 17310 A Formalization of U—Graph Based Navigation 17410.1 Preliminaries 17610.2 Modeling Information Purchase 18410.3 The Expected Costs of U—graph Based Navigation Plans 18510.4 Other Variations 18710.4.1 Minimizing the competitive ratio 18710.4.2 Minimizing the expected competitive ratio 18810.4.3 Minimizing the worst case cost 18910.4.4 Reaching one of the goal vertices 18910.4.5 Reaching multiple goal vertices 189V11 Computational Issues of U—graph Based Navigation 19111.1 Planning Paradigms 19111.2 Computing Complete Plans 19311.2.1 Some experimental data 19311.3 On—Line Planning 20111.3.1 The optimality characterization of on—line planners . 20211.3.2 A graph transformation based algorithm 20211.3.3 Experimental results 20612 Conclusions 21012.1 Contribution Summary 21112.2 Future Research 212Bibliography 214A Functions for U—graph Generation 223viList of Figures2.1 A simple decision graph. 153.1 The pseudo code of DFS 253.2 An illustration of the search algorithm 273.3 An illustration of the effect of tree ordering 293.4 The pseudo code of A—DFS 343.5 A Prolog version of A—DFS 353.6 The pseudo code of DFS’ 373.7 An example for which AO* may not terminate 454.1 A decision tree for the used car buyer problem 684.2 A complete decision tree for the used car buyer problem 714.3 An influence diagram for the used car buyer problem 734.4 An influence diagram for a variation of the used car buyer problem 776.1 An illustration of the arc reversal operation: reversing arc a —* b . . 926.2 An influence diagram for the oil wildcatter’s problem 936.3 A decision tree network derived from the influence diagram for the oilwildcatter’s problem 936.4 A compact representation of the decision tree derived for the oil wildcatter problem 946.5 A partial decision tree after the expansion of the first two layers . . 956.6 A partial decision tree after the expansion of the first three layers . 956.7 An illustration of random node removal: x is removed by expectation 986.8 An illustration of decision node removal: dis removed by maximization 987.1 Two sectors of the influence diagram for the oil wildcatter problem 1047.2 A complete decision graph for the oil wildcatter problem 1197.3 A simplified decision graph 1237.4 The optimal solution graph for the oil wildcatter problem 1237.5 An influence diagram for a generalized buying problem 1297.6 An illustration of the asymmetry of a decision variable 130vii7.7 A decision graph (tree) generated for the used car buyer problem . . 1338.1 A new representation of the oil wildcatter problem by an influencediagram with multiple value nodes 1368.2 A decision graph for the influence diagram shown in Fig. 8.1 1398.3 A variation of the oil wildcatter problem 1408.4 A decision graph for the influence diagram in Fig. 8.3 1418.5 A decision graph, with zero probability arcs removed, for the influencediagram in Fig. 8.3 1429.1 The block diagram of a typical autonomous navigation system . . 1489.2 An example U—graph 1569.3 Modeling an uncertain environment by a U—graph 1609.4 A segment of road that may have traffic jams 1609.5 A simple map of the LCI lab area 1629.6 A U—graph representing the LCI lab area 1629.7 A map of an environment with choke regions 1639.8 A simple path planning case with uncertainty 16710.1 A simple U-graph 18310.2 The representing graph of a navigation task 18310.3 The representing graph of a navigation task with the information purchase option 18510.4 Two solution graphs of a navigation task . . .. 18611.1 A U—graph in Class 1 — a representation of city roads 19511.2 A U—graph in Class 2 — an abstraction of a parallel highway system 196viiiList of Tables4.1 The prior probability distribution of the car’s condition P{CC} . . . . 744.2 The probability distribution of the first test result P{R1T,CC} . .. 754.3 The probability distribution of the second test result P{R2T1,R1,T2, CC} 757.1 The function of the value node 1177.2 The conditional probability distribution of R 1177.3 The conditional probability distribution of CD . . . 1177.4 The prior probability distribution of 0 1187.5 The conditional probability distribution of S 1189.1 The weight distribution P 1569.2 The weight distribution F’ after observing s = 10 15711.1 The average speedup ratios of Algorithm DFS . . . . . . 20011.2 The average cost ratios of Algorithm DFS 20011.3 The average cost ratios of the on—line algorithms . . . .. 209ixAcknowledgementI thank Dr. David Poole, my thesis supervisor, for his generous friendship, continuous support and valuable advice in the past six years.I thank Dr. Jim Little, Dr. Alan Mackworth, Dr. Maurice Queyranne and Dr.Jack Snoeyink for serving on my supervisory committee. Their constructive cornments and valuable suggestions contribute a great deal to the quality of my thesis.I thank Dr. Lianwen Zhang for his friendship and fruitful academic collaboration.I thank Dr. Walfgang Bibel for his encouragement and help I received during myearly days at UBC.I thank Andrew Csinger for his careful reading of this thesis.I thank my friends and colleagues in the Computer Science Department who mademy stay at UBC more productive and pleasant.I gratefully acknowledge the financial assistance I received from UBC and NSERC.I thank my beloved wife, Ying Zhang, for her constant support and love throughall these years. I cannot imagine what a poor shape I would be in without her.I dedicate this thesis to my parents.xChapter 1Tnt ro ion1.1 General Background and Thesis ObjectiveDecision making under uncertainty is one of the primary topics of Bayesian decisiontheory [25, 83, 102]. In this theory, expected utility is used as a normalized unit tomeasure the degree of desirability of various possible outcomes that can result fromtaking action in some situation. The prescriptive promise of this theory is that therational action that should be taken in a situation is the one that maximizes expectedutility. Bayesian decision analysis is a methodology for applying decision theory topractical decision problems [79].Decision making under uncertainty is an active research topic in AT, decision theory and operations research. Many problems, such as diagnostic reasoning [72, 73],planning under uncertainty [29, 31, 39, 104], and path planning in uncertain environments [15, 59, 60, 77] can be abstracted as decision making under uncertainty.See [19] for a good introduction to this subject. Problems of decision making underuncertainty can be presented in various forms, such as decision trees [79], finite—stageMarkov decision processes [20, 33], or influence diagrams [35]. Although much research has been carried out to address the computational issues of decision problems1CHAPTER 1. INTRODUCTION 2in particular forms, most of the algorithms are applicable only to that form and donot lend themselves to decision problems in other forms. For example, much work hasbeen done in the operations research community (see [74] for a comprehensive survey)on the computation of Markov decision processes, but the algorithms developed forMarkov decision processes are not applicable to influence diagram evaluation. Similarly, the evaluation of influence diagrams has been an active research topic in the ATcommunity, but the algorithms developed for influence diagram evaluation, with a fewexceptions [101, 106], cannot directly be used for solving Markov decision processes.The major objective of this thesis is to study the computational issues of decisionmaking problems in a uniform way and to develop algorithms that are general enoughto apply to decision problems in various forms.1.2 Our MethodologyTo achieve our objective, we propose a simple, abstract intermediate representationfor decision making problems, and develop algorithms based on this representation.To make use of these algorithms for solving decision making problems in a particularform, we map the problems into the proposed representation.The representation we propose in this thesis is decision graphs. A decision graphis an AND/OR graph with an evaluation function. When a decision graph is used torepresent a decision problem, the solution graphs of the decision graph are interpretedas the policies of the decision problem, while the evaluation function of the decisiongraph acts as a quality measurement function for the policies. The policies of adecision problem can be measured either in terms of costs or in terms of rewards.Accordingly, the evaluation function of a decision graph can be defined either in theform of minimization—expectation or in the form of maximization—expectation.CHAPTER 1. INTRODUCTION 3If the evaluation function is defined in the form of minimization—expectation, asolution graph is optimal if it minimizes the evaluation function. Similarly, if theevaluation function is defined in the form of maximization—expectation, a solutiongraph is optimal if it maximizes the evaluation function. Given a decision problemrepresented by a decision graph, we need to compute an optimal solution graph ofthe decision graph, which corresponds to an optimal policy of the decision problem.1.3 Thesis OverviewIn this thesis, we present a number of search algorithms for computing optimal solution graphs of decision graphs. These algorithms include a depth—first heuristic—search algorithm, a best—first heuristic—search algorithm, an anytime algorithm andtwo iterative—deepening depth—first heuristic—search algorithms. Similar to Ballard’s*minimax search procedures [3], the depth—first heuristic—search algorithm is developed from the alpha—beta algorithm for minimax game tree search [38]. While Ballardemphasizes the generalization of the alpha—beta algorithm to handle chance nodesin minimax trees, in our development we elaborate the pruning techniques so thatdomain—dependent information can effectively be used to improve search efficiency.Furthermore, we derive an anytime algorithm by integrating the anytime concept [8]into the depth—first heuristic—search algorithm. The best—first heuristic—search algorithm is obtained by modifying the AO* algorithm [50, 62, 68] for AND/OR graphswith additive costs. The iterative—deepening algorithms result from combining theiterative—deepening techniques [40] with the depth—first search techniques.Developing algorithms for decision graph search is just the first step towards ourobjectives. To fully achieve our objectives, we need to show that decision problemsgiven in other forms can be transformed into a decision graph representation. WeCHAPTER 1. INTRODUCTION 4first note that this mapping is trivial for those problems in the form of decision trees,because decision graphs are a generalization of decision trees, allowing for structuresharing. We show that it is also quite straightforward to map a finite—stage Markovdecision process into a decision graph representation.For decision problems given in the form of influence diagrams, we develop a methodthat transforms such a problem into a decision graph representation. By combiningthis transformation method with the decision graph search algorithms, we obtain anew method for influence diagram evaluation. Our method is similar to Howard andMatheson’s method [35] in the sense that both methods transform an influence diagram into a decision tree (a decision graph in our case) for evaluation. However, ourmethod is more efficient, partly because the decision graph obtained by our methodis likely much smaller in size than the decision tree obtained by Howard and Matheson’s for the same influence diagram. In comparison with other algorithms reportedin the literature [11, 85, 88, 91, 106, 108, 109] for influence diagram evaluation, ourmethod has three unique merits. First, it exploits asymmetry of decision problems.This is significant because (a) most practical decision problems are asymmetric [70]and (b) as it will be shown in Section 7.5, exploiting asymmetry can lead to exponential savings in computation. Second, by using heuristic search techniques, ourmethod provides an explicit mechanism for making use of heuristic information thatmay be available in a domain—specific form. Finally, by using decision graphs as anintermediate representation, the value of perfect information [53] can be computedmore efficiently [110]. In addition, our method, like other recent algorithms [88, 108],also provides a clean interface between influence diagram evaluation and Bayesiannet evaluation [69], thus various well—established algorithms (e.g., [47]) for Bayesiannet evaluation can be used in influence diagram evaluation.CHAPTER 1. INTRODUCTION 5Some other problems that can be viewed as decision problems with uncertainty butare given neither in the form of Markov decision processes nor in the form of influencediagrams can also be transformed into decision graphs, though this transformationis likely to be problem—specific. We have studied one problem of this kind in detailfor two reasons. First, we want to illustrate how to transform a decision makingproblem of this kind into a decision graph representation to make use of the algorithmsdeveloped for decision graph search. Second and more importantly, the problem is ofimportance in its own right and deserves a thorough study.The problem we have studied is high—level navigation in uncertain environments,where an autonomous agent is asked to reach a particular place in an uncertain environment, and needs to determine a plan for achieving the task. This is a challengingproblem in the AT and robotic communities.The problem of high—level navigation in uncertain environments is an importantissue in the study of autonomous agents. In the navigation system of an autonomousagent, a central part is the path planning component, responsible for generating adescription of a route that can guide the agent to a desired destination. However,due to the complexity and the uncertain nature of the environment, it is unreasonable to expect the path planning component to have a priori knowledge of everyrelevant detail necessary to generate an executable plan for a given navigation task.Consequently, the path planning component has to appeal to perception componentsfor obtaining information dynamically, and must be able to dynamically elaborateplans based on the information from the perception components. In order to managethe complexity of the path planning task, various kinds of hierarchical structure arecommonly employed in most navigation systems [2, 13, 55, 58, 89]. In these systems,a distinction is made between a high—level (global) path planner and a low—levelCHAPTER 1. INTRODUCTION 6(local) path planner. In the literature, the problem of low—level path planning hasbeen intensively studied. However, considerably less attention has been paid to theproblem of high—level path planning. In most navigation systems (e.g., [2, 89]) forautonomous agents, the problem of high—level path planning is usually modeled as avariation of computing the shortest distance path from a distance graph that servesas a representation of the global map. However, a major drawback of this treatmentis that uncertainty is not considered in high—level path planning.In this thesis, we address the problem of high—level path planning and navigationin uncertain environments. We propose U—graphs, a natural generalization of distancegraphs, as a framework for representing the necessary information about uncertainenvironments for high—level navigation. In a U—graph, the weight of an edge canbe either a constant or a random variable. An edge with a random variable asits weight is called an uncertain edge. The distribution of the random variable ofan uncertain edge is called the weight distribution of the uncertain edge. Like anordinary edge, an uncertain edge represents a connection between two places denotedby the two incident vertices. The weight distribution of the uncertain edge capturesthe uncertainty on the traversability of the corresponding connection. The weightdistributions of uncertain edges can be dependent or mutually independent.We use the term U—graph based navigation to refer to the process of an agentnavigating in an uncertain environment represented by a U—graph. More specifically,an agent is given a U—graph and is asked to reach a goal vertex from a start vertexin the U—graph. We assume information on the actual weight of an uncertain edgecan be determined when one of the incident vertices of the edge is first visited, orcan be “purchased” at some price. Once revealed, the weight of an uncertain edge isassumed to remain the same.CHAPTER 1. INTRODUCTION 7We give a decision theoretic formalization of the problem of U—graph based navigation. Within this formalization, a U—graph based navigation task is represented asa decision graph. A solution graph of the decision graph for a problem correspondsto a complete navigation plan, covering all of the contingencies possibly encounteredduring the course of ilavigation. Thus the path planning task of computing a completeplan that satisfies some optimality criterion is reduced to the problem of searchingan optimal solution graph from a decision graph. Because the path planning taskis finished once a complete plan is computed, we refer to this paradigm as off—linenavigation.Since the problem is of importance in its own right, we also study another navigation paradigm, called on—line navigation. In the on—line paradigm, a path plannerdoes not compute a complete plan for a given navigation task. Instead, it acts asa consultant telling the agent what to do in any situation. The planner and theexecutive act as co—routines. The advantage of the on—line paradigm is that we candevelop polynomial time planning algorithms. We have developed an on—line planningalgorithm for U—graph based navigation. Although the algorithm cannot guaranteeoptimal navigation’, our experimental data show that it results in satisfactory navigation quality.Polychronopoulos [71] has recently studied a similar problem, called StochasticShortest Distance Problem with Recourse (SSDPR), which is essentially the same asthe U—graph based navigation problem. He has given a dynamic programming algorithm for the problem. Although a more thorough comparison between his algorithmand ours is yet to be carried out, we believe that our algorithm is more practical thanhis for the following two reasons. First, for a given navigation task, we exclude those‘Since the problem of optimal navigation is #P—hard [71], no polynomial time algorithm canguarantee optimal navigation, unless P is equal to #P.CHAPTER 1. INTRODUCTION 8“obviously non—optimal parts” (e.g., loops) from the decision graph representationat the stage of problem formulation, resulting in a smaller search space. Second, incomputing an optimal plan, our algorithm uses heuristic search techniques and domain dependent information to increase the computation efficiency. Roughly, Polychronopoulos’ algorithm amounts to evaluating the decision graphs in a brute—forceway. Polychronopoulos also provides a simple on—line algorithm. We have simulatedhis on—line algorithm and ours against a few hundred randomly generated U—graphs.The results show that the navigation quality of both on—line algorithms are good andours is closer to optimal than his.1.4 Summary of Thesis ContributionsThe contributions of this thesis are summarized as follows.• A number of algorithms for decision graph search.• A new method for influence diagram evaluation.• A decision theoretic formalization of the problem of U—graph based navigation,and a general approach to the computation of optimal plans for U—graph basednavigation problems.• A polynomial—time heuristic on—line algorithm for U—graph based navigation.In a broader context, our work can be considered as a contribution to the cross—fertilization between decision theory and AT techniques. On the one hand, we developa number of heuristic search algorithms for the problem of decision making underuncertainty. These algorithms can be viewed as an application of AT techniques todecision analysis. On the other hand, we develop a decision theoretic formalization forCHAPTER 1. INTRODUCTION 9a (path) planning problem. This can be viewed as an application of decision theoryto planning. In the literature, much work has been reported on the applications ofdecision theory to AT problems. A common characteristic of these applications is thata given problem is viewed as a decision making problem and one is interested in a“good” or optimal solution, instead of an arbitrary solution, to the problem. Theseapplications include general planning [9, 16, 29, 32, 46], diagnostic reasoning [72,73, 98, 99], reactive systems [24, 28] and domain specific problem solving such as themonkey and bananas problem [23], the blocks world [39], and navigation problems [15,59, 60]. Our work on U—graph based navigation belongs to the last group, applyingdecision theory to the domain of autonomous navigation.1.5 Thesis OrganizationThe rest of this thesis consists of three parts and is organized as follows. Chapters 2and 3 constitute the first part. In Chapter 2, the concept of decision graphs isintroduced and the correspondence between decision graphs and finite—stage Markovdecision processes is discussed. In Chapter 3, algorithms for decision graph searchare developed.Chapters 4 through 8 form the second part. In Chapter 4, we review some basic concepts about decision analysis, informally introduce influence diagrams, discussthe advantages and disadvantages of inflilence diagrams and outline our approach tohandling the disadvantages. In Chapter 5, we formally introduce influence diagramsand give a formal definition of the problem of influence diagram evaluation. In Chapter 6, we review some current algorithms for influence diagram evaluation and pointout their shortcomings. In Chapter 7, we present our method for influence diagramevaluation. We also show that, by exploiting asymmetry, the method leads to expoCHAPTER 1. INTRODUCTION 10nential savings in computation for typical decision problems represented by influencediagrams. In Chapter 8, we generalize the method to deal with influence diagramswith multiple value nodes.Chapters 9 through 11 constitute the third part of the thesis, dealing with variousissues of high—level navigation in uncertain environments. In Chapter 9, we firstpresent our motivation for our study on high—level navigation and then propose U—graphs as a framework for representing uncertain environments. Next, we definethe problem of U—graph based navigation. Finally, we review some related workin the area of navigation under uncertainty. In Chapter 10, we develop a decisiontheoretic formalization that transforms a U—graph based navigation problem into adecision graph. In Chapter 11, we discuss two navigation paradigms. We show thatthe algorithms developed in Chapter 3 for decision graph search can be used foroff—line navigation, and present an algorithm for on—line navigation. We give someexperimental data on the performance of the algorithms. Conclusions are given inChapter 12.Part IALGORITHMS FOR DECISIONGRAPH SEARCH11Chapter 2Decision Graphs and Finite—StageMarkov Decision ProcessesIn this chapter, we introduce the basics of decision graphs and finite—stage Markovdecision processes, and show that a finite—stage Markov decision process can be naturally represented as a decision graph.2.1 Decision GraphsFrom the structural point of view, a decision graph is an acyclic AND/OR graph[68] with an evaluation function. More precisely, a decision graph is a directed acyclicgraph whose nodes are classified into two types: choice nodes and chance nodes, whichare analogous respectively to the OR nodes and the AND nodes in an AND/OR graph.Each decision graph has exactly one root. For simplicity of exposition, we assumethat all children of a node in a decision graph are of the same type, and chance nodesand choice nodes are interleaved in a decision graph. A cost or a reward is associatedwith each arc emanating from a choice node. A probability is associated with eacharc emanating from a chance node, and the probabilities of all the arcs from a chancenode sum to unity. Leaf nodes are terminals, each with a cost or a reward.12CHAPTER 2. DECISION GRAPHS AND MARKOV DECISION PROCESSES 13Decision graphs are a generalization of decision trees [79, 78], allowing for structuresharing. A solution graph SG, with respect to a node n, of a decision graph DG isa graph with the following characteristics:1. n isin SG;2. if a non—terminal chance node of DG is in SG, then all of its children are inSG;3. if a non—terminal choice node of DC is in SG, then exactly one of its childrenisin SG.A solution graph with respect to the root of a decision graph is simply referred to asa solution graph of the decision graph.A decision graph can be interpreted as a representation of a process of sequentialdecision making. Nodes in a decision graph represent situations. The root noderepresents the initial situation. A choice node represents a situation where an agentcan select an action. The arcs emanating from a choice node can be viewed asthe actions that the agent can take in the situation. A chance node represents anuncertain situation, resulting from taking an action in the situation represented by achoice node. The children of a chance node represent the situations that are possiblyreached from the uncertain situation represented by the chance node. The probabilitythat a particular situation will be reached is given by the number associated with thearc to the child representing the situation.The value of a given decision graph can be defined either in terms of costs orin terms of rewards. If the value is defined in terms of costs, the decision objectiveis assumed to be minimizing the expected cost. If the value is defined in terms ofrewards, the decision objective is assumed to be maximizing the expected reward. InCHAPTER 2. DECISION GRAPHS AND MARKOV DECISION PROCESSES 14the rest of this chapter and in the next chapter, we define value of decision graphs interms of costs. However, due to the duality of costs and rewards, all the techniques andalgorithms that we develop can be adjusted in a straightforward way to be applicableto decision graphs with reward—oriented values.Let n’ be one of the children of node n. We use c(n, n’) to denote the cost of thearc from n to n’, if n is a choice node; we use p(n, n’) to denote the probabilityof the arc from n to n’, if n is a chance node. We use v(n) to denote the valueassociated with a terminal node n.Let DG be a decision graph. A min—ezp evaluation (in contrast to the minimaxevaluation of a minimax tree [102]) of DC is a real—valued function u defined asfollows:1. If n is a terminal: u(DG, n) = v(n).2. If n is a chance node with k children n1,...,nj, in DG:u(DG, n)=p(n, n) * u(DG, ni).3. If n is a choice node with k children n1,...,nj in DG:u(DG, n) = min1{c( ,n1) + u(DG, n)}.The concepts of solution graphs and min—exp evaluation are the natural extension ofthose defined for decision trees in [78]. The value given by u(DG, n) is called themin—exp value of the node n. It can be interpreted as the minimal expected cost thatan agent is going to pay if it starts a sequential decision process from the situationrepresented by node n. Note that the above definition is applicable to a solutiongraph as well since a solution graph is a special decision graph.CHAPTER 2. DECISION GRAPHS AND MARKOV DECISION PROCESSES 15For a decision graph DC with evaluation function u, a solution graph SC of DCis optimal with respect to the evaluation function if u(SG, n) = u(DG, n) for everynode n in 8G. A general computational problem related to decision graphs is, for agiven decision graph and an evaluation function defined on it, to compute an optimalsolution graph (with respect to the evaluation function) of the given decision graph.In the next chapter we will present several algorithms for this problem.Fig. 2.1 shows a simple decision graph. In the figure, boxes, circles, and dotted—line circles denote the choice nodes, chance nodes, and terminals respectively.Figure 2.1: A simple decision graphLemma 2.1 For a decision graph DG with a min—exp evaluation function u, wecan define another min—exp evaluation function u’ that is isomorphic to u in thefollowing sense: 1) an optimal solution graph of the decision graph with respect toone evaluation function is also optimal with respect to the other, and 2) there existsa constant c0 such that u’(DG, n) = c0 + u(DG, n) for every node n in the decisiongraph.:3:CHAPTER 2. DECISION GRAPHS AND MARKOV DECISION PROCESSES 16Proof The evaluation function u’ can be defined as follows:1. If n is a terminal: u’(DG, ri) = v(n) + Co.2. If n is a chance node with k children n1,...,n in DG:u’(DG, n) = p(n, n) * n’(DG, n)3. If n is a choice node with k children n1,...,nj, in DC:u’(DG, n) = min1{c( , n) + u’(DG, n)}.The fact that the two evaluation functions are isomorphic can be proved by a simpleinduction on the structure of the decision graph. DDue to this lemma, we can assume that the evaluation functions of decision graphsare all non—negative without loss of generality.2.2 Finite—Stage Markov Decision ProcessesInformally, a Markov decision process is an alternating sequence of states of, andactions on, an evolution system [20]. At each point in time, the state of the systemcan be observed, and an action can be taken based on the observed state. A finite—stage Markov decision process is a Markov decision process that must reach sometarget state within a finite number of steps. A policy is a prescription for takingaction at each point in time.Formally, a finite—stage Markov decisioll process is a tuple (S, K, w, v, q), withS = {St = 0, ..., T} and K = {K5s€S}, where St denotes the space of the statesobservable at stage t, and S = uL0 is the total space of states, S fl S = ifi j; K8 denotes the set of actions that may be taken in state s; w and v are twocost functions and q is a transition function. S contains only one state calledCHAPTER 2. DECISION GRAPHS AND MARKOV DECISION PROCESSES 17the initial state, and the states in ST are called the target states. In this thesis, weconsider only Markov decision processes with finite state sets and finite action sets.If the system is in state s E S for some t, 0 t T, we say the systemis at stage t. The laws of motion of the system are characterized by a transitionfunction q. Whenever the system is in state s E S and action a E K3 is taken, theprobability of the system being in state s’ at the next stage is given by q(s, s’, a). It isassumed that Zs’es+i q(s, s’, a) = 1 for any s E S and a E K3, and q(s, s’, a) 0if either a is not in K3 or s’ is not in A cost structure is defined on thedecision process by the function w. Whenever the system is in state s and actiona is taken, a cost w(s, a) is incurred. Whenever the system enters a target states E ST, the system stops with cost v(s).A deterministic policy for a Markov decision process can be regarded as a functionmapping from states to actions. We define the expected cost function X of a finite—stage Markov decision process with respect to a policy R as follows.X1R — f v(s) if s e ST—w(s, R(s)) +3’s1q(s, s’, R(s)) * X(R, s’) otherwise.The value of X(R, s) is called the expected cost of the Markov decision process withrespect to policy R in state s, and the value of X(R,.Sü) called the expected costof the Markov decision process with respect to policy R. A policy R* is optimal ifX(R*,so) <X(R,so) for any policy R.The finite—stage Markov decision processes that we just introduced are a specialkind of general Markov decision processes [20] and are equivalent to the finite—horizonMarkov decision processes in [74].The following lemma expresses the dual results of Theorem 4.2 in [74].CHAPTER 2. DECISION GRAPHS AND MARKOV DECISION PROCESSES 18Lemma 2.2 For each 5EST,X(R*,s)= u(s);for eachstate O<t<T,X(R*, s) = minaEK{w(s, a) + q(s, s’, a) * X(R*, s’)}s’ ES1andR*(s)= arg minaEKs{w(s, a) + q(s, s’, a) * X(R*, s’)}sES1where the operation arg mm selects and returns an element minimizing the formulato follow.These equations are referred to as Bellman equations in the literature [5].2.3 Representing Finite—Stage Markov DecisionProcesses by Decision GraphsKumar and Kanal have observed a general correspondence between sequential decision processes and AND/OR graphs [44]. In this section, we show how a finite—stage Markov decision process can be represented as a decision graph. Let M =(S,1C,w,u,q) be a Markov decision process and DG = (VA1 U A2) be a directedgraph defined as:V=SU{usas E S;a e K},A1 = {(s,usa)s S;a E K8}andA2 = {(vsa,s’)s S,s’ E St+i for some t with 0 t T— 1, and a € K8}CHAPTER 2. DECISION GRAPHS AND MARKOV DECISION PROCESSES 19In such a graph, a node s S is a choice node, representing an observable state inwhich an action can be selected and taken; a node Vsa is a chance node, representinga temporary state resulting from taking action a in state s. The next observablestate after the temporary state is determined by a probability distributioll q(s, s’, a).We can attach cost w(s, a) on the arc (5, Vsa) and probability q(s, s’, a) on the arc(Vsa, s’). In such a graph, node 5o is the root and the nodes s E ST are terminals.It can be verified that DG is acyclic and is indeed a decision graph.Let u be an evaluation function of the decision graph defined as:• u(DG,s) = v(s) if SE ST•• u(DG, v8)= Z’es+1P(Vsa, s’) * u(DG, s’) for each s E St and each a E K8,where p(v3,s’) is the probability q(s,s’,a) on the arc (vsa,s’).• u(DG,s) = minaEK8{c(s,vsa)+u(DG,vsa)} for each s E S—ST ,where c(s,vsa)is the cost w(s,a) on the arc (S,Vsa).Lemma 2.3 For every state s E 5,X(R*,s)= u(DG,s).Proof. Note that, from the construction of DC, we have: c(s, Vsa) = w(s, a) andP(t’sa, s’) = q(s, s’, a). From the definition of n, we have:I v(s) ifu(DG,s) =minaEKs{c(s,vsa) + s’ESt+a p(vsa,s’) * u(DG,s’) otherwise.By Lemma 2.2, we have X(R*, s) = u(DG, s) for every state s E S. 0Chapter 3Decision Graph SearchIn this chapter, we consider the problem of decision graph search. The problem isdefined as follows. For a given decision graph DG and an evaluation function, findan optimal solution graph SG (with respect to the evaluation function).From the definition of the evaluation function of decision graphs, two algorithmsare readily available. The first one is the folding--back—and—averaging--out methodthat is commonly used in decision analysis for evaluating decision trees [79]. Anotherone is a recursive search algorithm. The disadvantage of these algorithms is that theyneed to “visit” all of the nodes in a decision graph in order to compute an optimalsolution.In this chapter, we present several algorithms for decision graph search. Thesealgorithms need not visit every node of a decision graph in general, and in certainfavorable situations, need only to visit the nodes in an optimal solution graph of thedecision graph. These algorithms can be roughly classified into three categories. Thefirst category includes a depth—first heuristic—search algorithm and its anytime version. The depth—first heuristic—search algorithm uses a branch—and—bound pruningmechanism similar to the alpha—beta technique for minimax tree search [38]. Thesecond category includes a best—first heuristic—search algorithm, derived from AO*20CHAPTER 3. DECISION GRAPH SEARCH 21[50, 62, 68]. This algorithm can also be regarded as a specialization of the general branch—and—bound algorithms described in [44, 45]. The third category includesiterative—deepening depth—first heuristic—search algorithms.3.1 Depth—First Heuristic—Search AlgorithmsIn this section we discuss a depth—first heuristic—search algorithm and its variations.Like Ballard’s *minimax algorithm [3], our algorithm is derived from the alpha—betamethod [38]. The difference is that our algorithm makes effective use of domain—dependent knowledge to increase search efficiency.This algorithm was originally developed for searching decision trees with mill—exp evaluation functions [78]. Sillce any decision graph can be viewed as a compactrepresentation of a decision tree, these algorithms are applicable to decision graphsearch as well. In the rest of this section, we first present this decision tree searchalgorithm and its anytime version, and then discuss how to tailor the algorithms toexploit shared structures in decision graphs.3.1.1 Algorithm DFSIn this section, we present a depth—first heuristic—search algorithm DFS (Depth FirstSearch) for decision graph search. The algorithm uses an alpha—beta—like pruningmechanism.In order to develop the pruning mechanism, we contrast a decision tree and aminimax tree. A choice node in a decision tree can be regarded as a mm nodesince we want to minimize the min—exp value of it. Consequently, a chance node isanalogous to a max node. However, a decision tree is different from a minimax treein two major aspects. First, there is no cost or other information associated withCHAPTER 3. DECISION GRAPH SEARCH 22the arcs of a minimax tree, but in a decision tree the information of this kind playsan important role in computing both the min—exp values of nodes and an optimalsolution tree of the decision tree. Second and more importantly, the way to computethe minimax values in a minimax tree is different from the way to compute the mm—exp values in a decision tree. These two differences make the original alpha—betapruning rules not directly applicable to a decision tree.Nevertheless, we still can design a similar pruning mechanism for decision trees ifsome admissible heuristic functions are available. A heuristic function for a decisiontree is a function that estimates the min—exp values of the nodes of the decision tree.A heuristic function h for decision tree DG is admissible if, for every node n in DC,h(n) u(DG, n). For the sake of brevity, we use h*(n) to denote u(DG, n), themin—exp value of the node n in the decision graph DG, when no ambiguity arises.DFS works as follows. For each node n to be searched next with a given upperbound b, DFS tries to find out whether h*(n) is less than b. DFS returns the valueof h*(n) if h*(n) is less than b, otherwise returns a value no less than b. Using theterminology in [38], we call the upper bound the “3—value” of node n.Let h be an admissible heuristic function for DC, and b be the /3—value of noden. We have the following three cases.(1) b h(n). In this case, due to the admissibility of h, we have b < h*(n).Thus DFS need not search node n (and the subtree rooted at n).(2) b > h(n) and n is a choice node with children n1,..., k• In this case DFSsearches the subtrees rooted at n1,...,n. Let= b and r = min{r_i, c(n, n) + h*(nj)} for i = 1, ..., k.Here, c(n, n) is the cost of the arc from n to n and r_1 stands for the lowestCHAPTER 3. DECISION GRAPH SEARCH 23cost obtained when the subtrees rooted at n1, ..., n_1 have been searched’ and thesubtree rooted at n is to be searched next. We call r_, an intermediate back—valueof node n. When searching node n, DFS tries to find out whether h*(n) is lessthan r_, — c(n, ne). Thus, DFS recursively applies to node n with r_1— c(n, n) asthe /3—value. After all of the subtrees under node n have been searched, DFS returnsr which is equal to h*(n) if h*(n) < b, and is otherwise equal to b.(3) b > h(n) and n is a chance node. In this case, a series of approximations ofh*(rz,) can be obtained as the children of n are searched. Letkpartial = h*(n)*p(n,n)+ h(n)*p(n,n)j=1 j=i+1where p(n, n) is the probability of the arc from n to nj, for i = 1, ..., k. partialcan be considered as the approximation of h*(n) when the subtrees rooted at nodesn1, ..., n, have been searched. From the definition of partial, we have:kpartial0 = h(n) * p(n, ni),j=1partial, = partial_, + p(n, n) * (h*(n) — h(n))andkpartialk = h*(rij) *p(fl,nj) = h*(n).j=1Since h is admissible, partial_i partial, for any i, 1 < i < k. Thus, if for somei, 1 i < k, partial, b, then, h*(n) b. In this situation, DFS will stopsearching the rest of the children of node n. When searching node n, DFS tries tofind out whetherh*(n) <(b— partial_,)/p(n, n) + k(n).1Note that the min—exp values h*(n) probably need not be computed for all nCHAPTER 3. DECISION GRAPH SEARCH 24Thus, DFS recursively applies to node n with (b — partial_i)/p(n, n) + h(n) as the,8—value. If the subtree rooted at k is eventually searched and partialk < b, DFSreturns partialk as the value of h*(n).The pseudo code of DFS is shown in Fig. 3.1. In this algorithm, MAXNUM is alarge positive number, representing oc; cost(n, i) and prob(n, i) correspond toc(n, n) and p(n, n) respectively. Function h corresponds to an admissible heuristicfunction, and order-d and order-n correspond to two tree ordering functions thatorder the children of choice nodes and those of chance nodes respectively. These threefunctions are the abstraction of the domain—dependent knowledge that DFS uses.DFS consists of two mutually recursive functions: dnode(n, b) and nnode(n,b), for choice node search and chance node search respectively. In the algorithm,parameter b is the/3—value for node n; variable nb is the 3—va1ue for the child to besearched next. In dnode, variable result denotes the intermediate back—up valuesof node n (corresponding to ri). As the children of node n are beillg searched,variable result is updated, and the /3—value (nb) for the child to be searched next iscomputed. If the ,8—value for a child is no more than the value given by the heuristicfunction, the child need not be searched. In nnode, variable partial represents theseries of approximations of the min—exp value of node. As the childrell of node nare being searched, variable partial is updated and the 3--va1ue (nb) for the child tobe searched next is computed. It is important to note here that partial will neverdecrease as more childrell of a chance node are searched, due to the admissibility ofthe heuristic function. Therefore, as soon as partial catches up with b, it is surelyknown that the min—exp value of the chance node is equal to or greater than the/3—value. Thus no more children need to be searched.The description of DFS given in Fig. 3.1 does not construct the optimal solutionCHAPTER 3. DECISION GRAPH SEARCH 25dnode(n, b)if n is a terminal thenif v(n) > b then return MAXNUM;else return v(n);if h(n) > b then return MAXNUM;result = b;k=letfor# of children of node n;ni, n2, ..., nk = order-d(n);(i = 1 to k) donb = result - cost(n, i);if nb > h(ni) thenresult = mm -(result; cost(n, i) + nnode(ni,nb)};if result > b then return MAXNUM;else return result;n is a terminal thenif v(n) >= b then return MAXNUM;else return v(n);if h(n) >= b then return MAXNUM;k = # of children of node n;let nl, n2, ..., nj = order-n(n);partial = h(nl)* prob(n, 1) + ... + h(nk) * prob(n, k);i = 0;while (partial < b) and (i < k) doi = i + 1;nb = (b - partial)/prob(n, i) + h(ni);partial=partial+prob(n, i)*(dnode(ni, nb)-h(ni));if partial >= b then return MAXNUM;else return partial;nnode(n, b)ifFigure 3.1: The pseudo code of DFSCHAPTER 3. DECISION GRAPH SEARCH 26tree. The algorithm can be easily modified to do so. Thus, for a decision tree DGand a node n, DFS can be used either to compute h*(n) or to compute an optimalsubtree rooted at n in DG. Since DFS is a depth—first search algorithm, the size ofthe space that it needs is linear in the depth of the tree if the solution tree need notbe constructed explicitly, and is otherwise linear in the size of the largest solutiontree that the algorithm ever constructs in the search course.As an illustration of this algorithm, let us consider an example. For convenience,we assume that, for a given decision tree, the algorithm orders the tree first (usingits tree ordering functions) and then searches the ordered tree in the left—to—rightorder2 (in contrast to integrating searching and tree ordering together). A decisiontree, after being ordered by the tree ordering functions used by DFS, is shown in Fig.3.2 where we assume that all the terminals have value 10 and all the branches of anychance node have the same probability (0.5). The optimal solution is indicated bythe arrows. Its cost is 35.5. Suppose that DFS uses a heuristic function h0 thatreturns zero for every node in the tree. Clearly, this heuristic function is admissible.The search algorithm starts from the root with as the /3—value. After node n8is searched, the intermediate result for node n4 is 28. Thus when node n9 is beingexplored, its /3—value is 3. After node n18 is explored, the approximation of themin—exp value of node n9 is 5 which exceeds its /3—value, thus node n19 is pruned3.Another cutoff happens right after node n0 is explored. The intermediate back—upvalue of node n5 is 25. The /9—value for node n11 is negative, therefore, node n11,together with all the nodes below it, is pruned. The last pruned node for this problemis node fl27. Therefore, five nodes in total are pruned.2This convention is used throughout this section.3Pruning a node means that the node need not be visited at all. In the current case, even if noden19 is not a terminal, but is an interior node, it will still be pruned.CHAPTER 3. DECISION GRAPH SEARCH 27Figure 3.2: An illustration of the search algorithmLet dt be a function defined as:dt(n b) = f nnode(n, b) if n is a chance node,j dnode(n, b) otherwise.The following theorem establishes the correctness of DFS.Theorem 3.1 If the heuristic function used by DFS is admissible, then:dt b — I h*(n) if h*(n) <b,(n, )—MAXNUM otherwisefor any node n in the decision tree and a number b.An inductive proof of this theorem is given in Section 3.5.Corollary 3.2 If the heuristic function used by DFS is admissible, then dt(n, b’) <dt(n, b) for any node n in a decision tree and any two numbers b’ b. Furthermore,if dt(n, b) <b, then dt(n, b) = dt(n, b’).Corollary 3.3 If the heuristic function used by DFS is admissible, then h(n) =dt(n, cc) for any node n in the decision tree.CHAPTER 3. DECISION GRAPH SEARCH 283.1.2 The effect of heuristic functionsLet h1 and h2 be two heuristic functions, h1 is more informed than h2 for adecision tree if hi(n) h2(n) for every node n in the decision tree. Suppose thatboth heuristic functions h1 and h2 are admissible, it is clear that the performanceof DES with h1 will be no worse than that of DES with h2 for the same decisiontree if h1 is more informed than h2.As an illustration on the effect of the heuristic function, let us assume that forthe decision tree shown in Fig. 3.2, we now have a more informed heuristic function14 defined as follows: 14(n) = 16 for i = 2,...,7 and 14(n) = 7 for i = 8,...,31.When applying DFS with 14 to the decision tree, nine nodes (nodes in the subtreerooted at nodes n9, n11 and n13) will be pruned.3.1.3 Tree orderingNote that the correctness of DFS is independent of the tree ordering functions. However, like minimax tree search, the order in which the children of nodes in a decisiontree are searched may have a great effect on the execution time of the algorithm.Generally speaking, we want to search first the branch of a choice node that results inthe final (minimal) min—exp value of the choice node in the hope that as many otherbranches as possible can be pruned; and we want to search first the child of a chancenode that contributes most to the min—exp value of the chance node in the hope thatthe partial accumulation can reach the 3—value of the node as early as possible.As an illustration on the effect of tree ordering, let us consider the decision treeshown in Fig. 3.3. This is the same decision tree as the one in Fig. 3.2 except thatthe orderings of the children of some nodes are different. It can be verified that whenDES with heuristic function h0 is applied to this tree, nine nodes (nodes n27, n29,CHAPTER 3. DECISION GRAPH SEARCH 29n19, and the nodes in the subtrees rooted at nodes n10 and nn) will be pruned, butwhen DFS with heuristic function h0 is applied to the decision tree shown in 3.2, onlyfive nodes are pruned. Similarly, when DFS with h is applied to this tree, twentyone nodes (nodes in the subtrees rooted at nodes n13, n14, and n2) will be pruned,but when DFS with heuristic function h is applied to the decision tree shown in 3.2,only nine nodes are pruned.Note that a heuristic function normally contains more information than a treeordering function. In particular, we can define tree ordering functions from a heuristicfunction. For example, given a heuristic function h, we can define orderd in such away that if n, and n are two children of a choice node n andh(n)+c(n,n) h(n)+c(n,n),Figure 3.3: An illustration of the effect of tree orderingthen n, should be searched before n3. With this definition, child ri of a choice nodeCHAPTER 3. DECISION GRAPH SEARCH 30n will be pruned if there exists a child n of n, i <j, such that:h*(n) + c(n, n) <h(n) + c(n, ni).Let e = h(n) + c(n, n) — (h(n) + c(ri, ni)). The above inequality is equivalent to:h*(n)— h(n) <e.The left hand side in the above inequality is the difference between the min—exp valueof node n and its lower bound given by function h, and the right hand side is thedifference which determines the search order between n and nj. The more informedthe heuristic function is, the smaller h*(n)— h(n), thus the better the chance fornode n being pruned.We can also define order(n) in such a way that if n and nj are two childrenof a chance node n andh(n) *p(n,nj) > h(n) *p(fl,flj),then n should be chosen before n3. With this ordering, children j+i, ..., k of achance node n will all be pruned ifj kh*(ni)*p(n,nl) + h(nk) *p(n,nk) b.1=1 l=j+1In Chapter 11, when we consider U-graph based navigation problems [77], we will encounter some heuristic functions for a realistic problem. Moreover, we define orderingfunctions in terms of heuristic functions in the same way as we did above.3.1.4 Using inadmissible heuristic functionsThe previous section required that the heuristic function h be admissible. For the A*algorithm, Harris [30] has argued that the condition of admissibility is too restrictive.CHAPTER 3. DECISION GRAPH SEARCH 31His arguments are applicable to decision tree search as well. Although it may beimpractical to find a good heuristic function that never violates the admissibilitycondition, it is often easier to find a function that estimates the min—exp values well,but occasionally overestimates them. Like the case for A* [68], we have the followingtwo theorems for DFS that establish the linear relationship between the maximal errorof an inadmissible heuristic function aild the maximal error of the min—exp value ofthe resulting solution.Theorem 3.4 Suppose DFS uses heuristic function h. If there exists a number 6 0such that h satisfies h(rt) h*(n) + 6 for every node n in a decision tree, then forevery node n in the decision treeh*(n)+6> b if dt(n,b) > bandh*(n) + 6 dt(n, b) if dt(n, b) < b.Theorem 3.5 Suppose DFS uses heuristic function h. If the costs of all the arcs ina decision tree are non—negative and h*(n) 0 for each node n in the decision tree,and there exists a number 6 > 0 such that h satisfies:0 < h(n) < (1 + 6) * h*(n) for every node n in the decision tree,then for every node n in the decision tree and any non—negative number b,h*(n)*(1+6)b if dt(n,b)bandh*(n)*(1+6) > dt(n,b) if dt(n,b) < b.CHAPTER 3. DECISION GRAPH SEARCH 32The proofs of these theorems are given in Section 3.5.3.1.5 An anytime version of DFSThe time complexity of searching for an optimal solution tree, or a suboptimal solutiontree with a bounded quality, of a decision tree is exponential in the depth of the tree.Thus for a practical problem, it may take a long time to compute such a solutiontree. Note that DFS will not return anything before completing the computation ofan optimal (suboptimal) solution. However, in some situations, it would be usefulif an algorithm could return some (possibly non—optimal) solutions in the courseof computing an optimal solution. It would be even better if the quality of thoseintermediate solutions improves monotonically with the computational time spent bythe algorithm. Since an algorithm with this property can give an answer at any timeafter computing the initial solution, it is called an anytime algorithm [8].We can think of an anytime algorithm as a program which generates a stream ofsolutions, ordered according their expected costs. For decision tree search, we caneasily obtain a naive anytime algorithm from a brute—force procedure and a filter. Fora given decision tree, the brute—force procedure systematically enumerates all of thepossible solutions of the decision tree, and passes the solution stream to the filter. Thefilter maintains the minimal cost of the solutions arrived so far and discards solutionswith cost 110 smaller than the minimal cost. Unfortunately, the performance of thisalgorithm can be bad.We have developed an algorithm A—DFS (an anytime version of DFS) that incorporates the pruning mechanism we discussed in Section 3.1.1 and at the same timebehaves like an anytime algorithm. A—DFS differs from DFS in the following twoaspects:CHAPTER 3. DECISION GRAPH SEARCH 33• When searching a choice node, DFS will exhaust all of the children of the choicenode and choose the best one. But A—DFS returns the first child that resultsin an admissible solution. Furthermore, A—DFS also sets a backtrack point sothat it can continue exploring the remaining children later on.• In the course of searching a chance node, if DFS finds that partial b, itreports a cutoff. But in a similar situation, A—DFS requests a backtrack.Fig. 3.4 shows the pseudo code of A—DFS. In the figure, we have a new procedurea-search as the interface of the algorithm. The statement backtracking means tocontinue from the latest backtrack point. If there is no backtrack point, then justreturn MAXNUM. A Prolog version of this algorithm is given in Fig. 3.5.3.1.6 Exploiting shared structures in decision graphsA decision graph can be considered as a compact representation of a decision tree inwhich some subtrees are identical. Conversely, a decision graph can be “expanded”into a decision tree by duplicating those shared nodes in the decision graph.The algorithms that we presented so far are based on decision trees. These algorithms are also applicable to decision graphs in the sense that they are applicable tothe “expanded versions” (the corresponding decision trees) of the decision graphs. Anadvantage of this treatment is that the algorithms have moderate space requirements4.A disadvantage of the treatment is its inability to exploit shared structure of decisiongraphs. In other words, the algorithms may search a subgraph rooted at a sharednode more than once. In order to overcome this disadvantage, we use a “cache technique.” When the expected cost of a shared node is obtained, the node along with the4For DFS, the space requirement is linear in the depth of the decision trees if it is not requiredto construct an optimal solution graph, and is otherwise linear in the size of solution graphs; thespace requirement of the anytime algorithm is linear in the size of solution graphs.CHAPTER 3. DECISION GRAPH SEARCH 34a_search (n, b)if n is a choice node then result = a_dnode(n, b)else result a_nnodel(n, b);report (result); backtracking;a_dnode(n, b)if n is a terminal thenif v(n) >= b then return MAXNUM; else return v(n);if h(n) >= b then return MAXNUM;result = b;k = # of children of n;let ni, n2, ..., nk = order_d(n);for (i = 1 to k) donb = result cost(n, i);if nb > h(ni) thenresulti = cost(n, i) + a_nnode(ni,nb);if resulti < result thenresult = resulti;set a backtrack point; return result;return MAXNUM;a_nnode(n, b)if n is a terminal thenif v(n) >= b then return MAXNUM; else return v(n);if h(n) >= b then return MAXNUM;k = # of children of N;let ni, n2, ..., nk = order_p(n);partial = h(nl)* prob(n, 1) + ... + h(nk) * prob(n, k);i = 0;while (partial < b) and (i < k) doi = i + 1; nb = (b - partial)/prob(n, i) + h(ni);partial=partial+prob(n, i)*(a_dnode(ni, nb)-h(ni));if partial > b then backtracking else return partial;Figure 3.4: The pseudo code of A—DFSCHAPTER 3. DECISION GRAPH SEARCH 35search node N for a solution tree R with cost C such that C < B:-dynamic stop/i.search(N, B, term(V, N), V) :- terminal(N), v(N, V), V < B.search(N, B, choice(C, N, R), C) :— h(N, HV), HV < B,choice(N), children(N, L), searchchoice(L, B, R, C).search(N, B, chance(C, N, R), C) :- h(N, HV), HV < B, chance(N),children(N, L), initial(L, CO),searchchance(L, B, R, C, CO).search children list of a choice nodeWhen find a solution, we can return the solution and update the bound.searchchoice([(LC, N)IL], B, R, C) :— NB is B — LC,search(N, NB, Ri, Ci), assert(stop(L)), C2 is LC + Ci,searchchoice_with_s([(LC, N)IL], Ri, R, C2, C).searchchoice([_IL], B, R, C):— \+ stop(L), searchchoice(L, B, R, C).searchchoicewith_s([(LC, N)I_], R, (LC, R), C, C).searchchoice_withs([_IL], —, R, Cl, C):— searchchoice(L, Cl, R, C).search children list of a chance nodesearchchance([], B, U, CO, CO) :— CO < B.searchchance([(P, N)IL], B, [(P, Rl)1R2], C, CO) :— CO < B, h(N, fly),NB is (B - CO)/P + HV, search(N, NB, Ri, Cl),C2 is CO + P*(Ci— HV), searchchance(L, B, R2, C, C2).An auxiliary functioninitial([], 0).initial([(P, N)IL], C) :- h(N, HV), initial(L, Ci), C is Ci + P* HV.Figure 3,5: A Prolog version of A—DFSCHAPTER 3. DECISION GRAPH SEARCH 36cost is stored in a cache for later use. When a node is to be searched, the algorithmsfirst check whether the the node has been cached, and will search the node only ifthe node is not in the cache.In a similar vein, we can also make use of an “adaptive” heuristic function. Thatis, whenever a cutoff occurs at a node n, we obtain a new lower bound on the expectedcost of the node. If the new bound is greater than h(n), we obtain a new heuristicfunction h’ that agrees with h at all nodes except that h’(n) equals the new lowerbound. Obviously, h’ is more informed than h. This could be viewed as an exampleof “learning from failure.”By incorporating this caching technique into DFS, we obtain DFS’ as shown in Fig.3.6. When the expected cost of a node is obtained, DFS’ needs to determine whetherthe expected cost should be cached or not; when a cutoff occurs at a node, DFS’needs to determine whether the heuristic function should be updated accordingly. Ifthe “caching policy” allows no node ever to be cached, the algorithm degeneratesto DFS. On the other hand, if the “caching policy” allows the expected costs of allnodes to be cached, the algorithm exploits the shared structure of decision graphs tothe maximum degree. However, this may lead to an exponential space requirement,defeating an advantage of DFS. Clearly, there is a tradeoff between time and spacein DFS’. It is a “meta—decision” problem to decide which nodes should be cached.Generally speaking, we would like to cache those nodes that are likely to be searchedagain and that would take much time to search. The searching time of a node canbe obtained when it is searched for the first time. The probability that a node willbe searched again may be estimated based on domain dependent knowledge. Russelland Wefald’s meta—reasoning mechanism [82, 81] can play a role in deciding whichnodes’ expected costs should be cached.CHAPTER 3. DECISION GRAPH SEARCH 37dnode’(n, b)if n is a terminal thenif v(n) >= b then return MAXNUM; else return v(n);if n is cached thenlet result be the cached value;if result > b then return MAXNUM else return result;if h(n) > b then return MAXNUM;result = b;k = It of children of n;let ni, n2, ..., nk = order-d(n);for (i = I. to k) donb = result— cost(n, i);if nb > h(ni) thenresult = mm {result; cost(n, i) + nnode’(ni,nb)};if result > b then{if function h should be updated at n then update it;return MAXNUM;}else {if n should be cached then cache n; return result;}nnode’(n, b)if n is a terminal thenif v(n) > b then return MAXNUM; else return v(n);if n is cached thenlet result be the cached value;if result > b then return MAXNUM else return result;if h(n) > b then return MAXNUM;k = It of children of N;let ni, n2, ..., nk = order—n(n);partial = h(nl)* prob(n, 1) + ... + h(nk) * prob(n, k);i = 0;while (partial < b) and (i < k) doi = i + 1;nb = (b - partial)/prob(n, i) + h(ni);partialpartial+prob(n, i)*(dnode’(ni, nb)-h(ni));if partial > b then{if function h should be updated at n then update it;return MAXNUM;}else {if n should be cached then cache n; return partial;}Figure 3.6: The pseudo code of DFS’CHAPTER 3. DECISION GRAPH SEARCH 38A similar version can be obtained for A—DFS in the same way.3.2 Applying AO* to Decision Graph SearchThe AO* algorithm was first developed in [52] for searching AND/OR graphs withadditive costs [52]. The name AO* was coined in [62]. As shown in [45], AO* isapplicable to AND/OR graphs with monotone evaluation functions [45]. We saya function g(xj,..., xk, L) is monotone if g(yi, ..., y, L) g(xi, ..., xj, L) provided<x for i = 1, ..., k. An evaluation function u is monotone if there is a mollotollefunction g such that u(n) = g(n(n1),..., u(flk), L) for each non—terminal node n, whereL represents the local information (in terms of arc costs or arc probabilities for thecase of decision graphs) associated with the arcs incident from n. From the definitionof the min—exp evaluation function, we can see that the min—exp evaluation functionis monotone. Thus AO* is applicable to decision graph search problems as well. Inthis section, we illustrate how to tailor AO* so that it can apply to decision graphsearch, and present some results on the resulting algorithm.We assume that the decision graph to be searched is given in the implicit form.We use h again to denote a heuristic function. The algorithm works on an explicitgraph G which initially consists of the root node only and is gradually expanded.During the entire process, G is always a subgraph of the original graph.A node in G is called a tip node if it has no children. A solution graph of theexplicit graph is called a potential solution graph (psg). A psg is a solution graph ofthe given graph if all the tip nodes in the psg are terminals.With the help of the heuristic function h, we define a min—exp function f on theexplicit graph. Let ii be any node in G, f(n) is defined as follows.• f(n) = v(n) if n is a terminal.CHAPTER 3. DECISION GRAPH SEARCH 39• f(n) = h(n) if n is a non—terminal tip node.• f(n) = min1{c( , n) + f(n)} if n is a choice node with children n1, ..., nk.• f(n) >j1 p(n, n) * f(n1) if n is a chance node with children n1, ...,Due to the admissibility of h, the following result is obvious.Lemma 3.6 For any node n in C, f(n) h*(n).At any moment, the explicit graph can have a number of potential solution graphs.We use opsg to denote an optimal potential solution graph — a potential solutionwith the lowest cost.AO* can be intuitively understood as an iteration of two major operations. Thefirst one is the node expansion that finds a non—terminal tip node in the identifiedoptimal potential solution graph and generates the children of the node. The costof each child is given by the heuristic function, if it is generated for the first time.The second operation is the cost updating operation that, starting from the newlyexpanded node, updates the costs of the ancestors of the newly expanded node, according to the cost function. In the course of the cost updating, a new optimalpotential solution graph is identified. The termination condition for this process isthat the optimal potential solution graph has no non—terminal tip node. The basicstructure of the algorithm is as follows:CHAPTER 3. DECISION GRAPH SEARCH 401. Initially, both G and opsg consist of oniy the root node.2. If all of the tip nodes of opsg are terminals, stop and output opsg as thesolution graph.3. Select a non—terminal tip node of opsg, generate the children of the node andadd them to G (if some children are already in G, just share them).4. Set opsg to be an optimal potential solution graph in C.5. Go to step 2.The above algorithm can be further refined by using the marking technique of[52]. The following version of AO* is adapted from [10], where h denotes a heuristicfunction for the given decision graph satisfying h(n) = v(n) for all terminals in thedecision graph. In the algorithm, the marked psg rooted at the root of the decisiongraph is the same as the opsg in the above algorithm.Algorithm AO*1. Initially the explicit graph G and the potential solution graph p.sg consistsolely of the root node s. Setf(s) h(s).If s is a terminal node, then mark s SOLVED.2. Repeat the following steps until s is marked SOLVED. Then, exit with f(s)as the solution cost.CHAPTER 3. DECISION GRAPH SEARCH 412.1 Choose a tip node n of the marked psg that is not marked SOLVED. Foreach child n of n not already present in the explicit graph G, setf(n) — h(n).Mark SOLVED those children of n that are terminals.2.2 Create a set Z of nodes containing only node n.2.3 Repeat the following steps until Z is empty.2.3.1 Remove from Z a node rn that has no descendent in Z.2.3.2 (i) If m is a choice node with children m1,..., mk, then setJ(m) minl<<k{f(m) + c(m, m)}.Mark that arc (m, mj) for which the above minimum occurs. [Resolveties arbitrarily, but in favour of a SOLVED node.] Mark m SOLVEDif and only if m0 is marked SOLVED.(ii) If m is a chance node with children m1,...,m, then setp(m, m) * f(m).1<i<kMark all arcs (m, me). Mark m SOLVED if and only if every m 5marked SOLVED.2.3.3 If f(m) changes value at step 2.3.2 or if m is marked SOLVED, thenadd to Z all of the immediate predecessors of m.CHAPTER 3. DECISION GRAPH SEARCH 42The oniy difference between the above algorithm and the AO* algorithm given in[10] for AND/OR graphs with additive costs lies in the way of updating function fat step 2.3.2 (ii).Lemma 3.7 At any stage during the search process, if a node n is marked SOLVED,a solution graph with cost f(n) can be obtained by tracing down the marked arcsfrom n.Proof. By a trivial induction on the stage of the algorithm.Lemma 3.8 If there exists some e 0 such that h(n) h*(n) + e for every tipnode n in the explicit graph G, then at any stage during the search process, we have](n) <h*(n) + e for all nodes in G.Proof. Similar to the proof of Lemma 1 in [52]. We prove the lemma by inductionon the stage of the algorithm. The lemma is trivially true initially. Suppose that itis true at a certain stage and let us prove it is true at the next stage, after executionof the body of the outer loop (i.e., steps 2.1 2.3).Since during the execution of the loop body, the f values of only those nodesthat are ancestors of node n may be changed, let us consider the subgraph, C’, ofG obtained up to this stage, which consists of all the ancestors of node n. Since G’is acyclic, an index can be attached to each node of G’, starting with n0 = n, insuch a way that all paths from node n to node n0 contain only n with j < i.Now, we prove by induction on the index i that the inequality still holds for eachnode in G’ after its f value is updated.First, we prove that this is true for n0. Let n1,...,nk be the children of n. Forany child, nI, 1 1 k, if it has been generated before, we have f(ni) h*(ni) + eCHAPTER 3. DECISION GRAPH SEARCH 43by the outer induction assumption, and if nj is generated for the first time, we alsohave f(rij) = h(n1) h*(ni) + e by the hypothesis on the heuristic function h.If n is a choice node, we have:)(n)= minl{)(nl) + c(n, ni)}mini{h(ni) H-c + c(n,nj)}= e + h*(n)Similarly, if n is a chance node, we have:f(n) =1{f(nj) *p(n,nl)}<Zi{(h*(ni) + e) * p(n, ni)}= e+h*(n)Now, let us assume that the lemma is true for all nodes n with j <i, then theinequality can be proved true for node ni by repeating the above argument. Thus, wehave proved that the lemma is true after the execution of the loop body. Therefore,the lemma holds by induction. UNote that Lemma 1 in [52] is actually a special case of Lemma 3.8 above withe = 0. The above lemma will not be true for AND/OR graphs with additive costs,because if e > 0, the error may be accumulated in the search process for additivecost AND/OR graphs.Theorem 3.9 The algorithm with heuristic function h satisfying h(n) h*(n) + efor every node n that is ever in the explicit graph G, where e 0, will return asolution graph with cost less than or equal to h*(s) + e, if the algorithm terminates.Proof. Follows from Lemma 3.7 and Lemma 3.8 immediately. UCHAPTER 3. DECISION GRAPH SEARCH 44Corollary 3.10 The algorithm with an admissible heuristic function will return anoptimal solution, if it terminates.Lemma 3.11 If h*(n) > 0 and there exists some e, e > 0, 0 < h(n) <for every node ri in the explicit graph G, and the arc costs in the given graph arenon—negative, then at any stage during the search process, we have f(n) < h*(n)(1+e)for every node n in G.Proof. Similar to that for Lemma 3.8.Theorem 3.12 If the arc costs in the given graph are non—negative, then the algorithm with heuristic function h satisfying 0 < h(n) h*(n)(1 + e) for every nodethat is ever in the explicit graph, where e 0, will return a solution graph with costless than or equal to h*(s)(1 + e), if the algorithm terminates.Proof. Follows from Lemma 3.7 and Lemma 3.11 immediately. 0As the reader may have already noticed, Theorems 3.9 and 3.12 above do not assurethat the algorithm always stops even if a finite solution graph exists for a given(infinite) decision graph. Thus they are weaker than Theorem 1 in [52]. However,for finite acyclic decision graphs, the algorithm is guaranteed to terminate. Thisweakness seems to be inevitable in general, and indeed is also shared by the depthfirst heuristic algorithms that we presented in the previous section, since there doesexist some case where a finite optimal solution graph does exist but the algorithmwill not terminate. This can be illustrated by the following example. Consider thedecision tree in Fig. 3.7. Node A in the decision tree is the root of the graph. Eachchoice node has two children, and the child on the right side is a terminal with zerocost. The cost of the arc to the left child is 1 and the cost of the arc to the rightCHAPTER 3. DECISION GRAPH SEARCH 45child is 2. The left child is a chance node. Each chance node has two children, eachwith 0.5 probability. The subtree below each child of a chance node is isomorphic tothe entire decision tree. Thus the decision tree is infinite. It is easy to prove thatthe min—exp value of this decision tree is 2 and an optimal solution tree consists ofonly two nodes: the root and its right child. However, if AO* adopts a depth—firstleft—to—right strategy in selecting the next tip for expansion, the algorithm will notterminate, since the f values of the marked potential solution trees will always beless than 2.Note that the possibility of non—termination of the algorithm stems from thearbitrariness in the selection of a tip node to be expanded next. In the case ofsearching into AND/OR graphs with additive costs, no matter which tip node isselected, the expansion of the tip node will increase by a certain amount the costsof the solution graphs consisting of that node, and the increasing amount can be22Figure 3.7: An example for which AO* may not terminateCHAPTER 3. DECISION GRAPH SEARCH 46bounded from below. However, in the case of searching into decision graphs, theincreasing amount contributed by a tip node expansion can be arbitrarily small.In order to guarantee the termination of the algorithm when a finite optimalsolution graph exists, we need another heuristic function for tip node selection. Here,we propose two heuristics for tip node selection out of termination consideration.Heuristics 1: using the breadth first strategy in tip node selection. That is, ift are the tip nodes of the marked potential solution graph, the tip node withthe smallest depth should be selected for expansion.Heuristic 2: using a best first strategy in tip node selection. Suppose t1,..., tj arethe tip nodes of the marked potential solution graph. The tip node with the largestF(t) value should be expanded, where P(t) is the product of the probabilities alongthe path from the root to tip node t,.3.3 Iterative Deepening SearchThe two kinds of algorithms that we discussed so far for decision graph search arecomplementary. A major disadvantage of AO* is that it requires exponentially largespace. The advantage of AO* is that it will not stick too long to a solution graphwhich is apparently “bad.” On the other hand, in comparison with AO*, the majoradvantage of the depth first search algorithms is their moderate requirement on space.However, the price for this is that they may search down to a deep layer iii a solutiongraph that is not optimal. Thus it would be nice if we could design algorithms thatcombine the advantages of AO* and the advantage of the depth search algorithmstogether. For OR—graph search, the iterative deepening search technique [40] wasproposed for such a combination, and was proved asymptotically optimal along thefollowing three dimensions: time complexity, space complexity and the quality ofCHAPTER 3. DECISION GRAPH SEARCH 47the solution. In this section, we propose two iterative—deepening heuristic—searchstrategies for decision graph search.3.3.1 Depth—bounded iterative deepeningThe first iterative—deepening search strategy is a depth—bounded iterative—deepeningstrategy. The strategy repeatedly applies DFS to a decision graph, with increasingdepth—bounds. Whenever a non—terminal node n on the depth boundary is visited,h(n) is used as its min—exp value. After each iteration, a potential solution graph withthe minimum cost is identified. This process terminates when the optimal potentialsolution graph identified this way is actually a solution graph (all tip nodes in thepotential solution graph are terminals).Unlike iterative—deepening A* [40], our algorithm uses search depth as the cutting off criterion. In this regard, our iterative—deepening strategy is similar to theiterative—deepening depth—first search algorithm (DFID) reported in [40]. However,unlike DFID, the depth—first search in each iteration in our algorithm is a kind ofheuristic search. In fact, our algorithm is very much like the iterative—deepening gametree searching algorithms [93, 105].The following result is obvious.Theorem 3.13 The depth—bounded iterative deepening algorithm returns an optimalsolution graph if the heuristic function it uses is admissible and if it terminates.3.3.2 Cost—bounded iterative deepeningThe second iterative deepening search strategy is a cost—bounded iterative deepeningstrategy, very much like iterative deepening A* (IDA*) [40]. The idea is that successive iterations correspond not to increasing depth of search, but rather to increasingCHAPTER 3. DECISION GRAPH SEARCH 48/3—values for the search. The strategy maintains two values: the upper bound b1, andthe lower bound b1 on the decision graph and works as follows: Initially, the lowerbound b1 is set to the value given by the hellristic functioll, while the upper bound isset to the min—exp value of a solution graph that can be obtained by identifying anarbitrary solution graph. At each iteration, a new /3—value b is set to bi*a+b*(1—a)for some a (0, 1) and a depth first search (using DFS) with b as the /3—value isperformed. If a solution with cost less than b is returned, then the solution is anoptimal solution, thus the algorithm stops. Otherwise, the lower bound b1 can be setto b. This process continues until either an optimal solution is found or the lowerbound and the upper bound become close enough.Theorem 3.14 The cost—bounded iterative deepening search algorithm returns an optimal solution graph if the heuristic function it uses is admissible and if it terminates.An advantage of this algorithm over DFS is that it is less sensitive to the nodeordering in a graph. This can be best illustrated by an example. Sllppose the rootof the decision tree is a choice node with two children n1 and n2. Suppose furtherthat the subtree below n1 is very large and so is its min—exp value, but the subtreebelow n2 is very small and so is its min—exp value. Clearly, node n1 cannot bein an optimal solution tree. However, if DFS happens to search the subtree belownode n1 first, then it will not come to node n2 until an optimal solution tree for thesubtree below node n1 is found. This may take a lot of time. On the other hand, thecost—bounded iterative deepening algorithm will not stick to the subtree below noden1 for too long (because the /3—value can be very small).The cost—bounded iterative deepening algorithm discussed above is analogous tothe binary iterative deepening A* [67] in the sense that both the upper bound andCHAPTER 3. DECISION GRAPH SEARCH 49the lower bound of the problem are maintained.3.3.3 Generic iterative deepeningSo far, we have discussed two particular iterative deepening techniques. As suggestedin [18], an iterative deepening search procedure in general can be divided into twocomponents: one for deciding which portion of the given graph should be searchednext, and the other for computing an optimal solution graph in the identified sub—graph. The procedure is a simple loop alternatively calling these two components. Inthe previous two iterative deepening procedures, these two components are integrated.In general, it is not a trivial issue to decide which portion of the given graph shouldbe searched. For a more detailed discussion, the reader is referred to [17].3.3.4 Co—routinesIt is interesting to note that the cost—bounded iterative deepening algorithm and theanytime algorithm A—DFS can work as co—routines in the following way. For a givenproblem, A—DFS gradually approaches the optimal value of the decision graph fromabove, and thus can be used to update the upper bound b of the cost—boundediterative deepening algorithm. The co—routines stop when either algorithm reportsfinding an optimal solution, or when the lower bound and the upper bound becomeclose enough. I.n this way, we obtain an anytime algorithm that also uses cost—bounded iterative deepening strategy.Theorem 3.15 The co—routines return an optimal solution graph if the heuristicfunction they use is admissible and if they terminate.Finally, we conclude this section with a result on the termination of the algorithmsdiscussed so far.CHAPTER 3. DECISION GRAPH SEARCH 50Theorem 3.16 All of the algorithms presented in this chapter terminate for finiteacyclic decision graphs.Proof. A finite acyclic decision graph can be expanded into a decision tree of finitesize.The termination property of DFS and AO* can be proved by an induction on thenumber of nodes visited by the algorithms.The termination property of the anytime algorithm A—DFS follows that of DFSand the fact that a decision tree of finite size can have oniy finite number of solutiontrees (graphs).The depth—bounded iterative—deepening algorithm terminates because of two facts:(a) the number of iterations is bounded from above by the depth of the graph, and(b) each iteration just involves a call to DFS, which terminates.The termination property of the cost—bounded iterative—deepening algorithm canbe proved by establishing an upper bound on the number of iterations.The co—routines terminate because of the termination property of A—DFS andthat of the cost—bounded iterative—deepeding algorithm. 03.4 SummaryIn this chapter, we present three classes of algorithms for decision graph search.DFS and and its anytime version A—DFS belong to the first class. They are depth—first search algorithms, derived from the well—known alpha—beta search algorithm[38] for minimax tree search. These algorithms use domain dependent knowledgefor increasing search efficiency. The algorithm AO* belongs to the second class,derived directly from the AO* algorithm [68, 62] for searching AND/OR graphs withCHAPTER 3. DECISION GRAPH SEARCH 51additive costs. The iterative—deepening algorithms belong to the third class. Thesealgorithms are derived by integrating the iterative—deepening search techniques [40]into the depth—first search algorithm.Now, a natural question is still to be answered. That is, under what circumstances are these algorithms more appropriate than the folding—back—and—averaging—out method [79] for decision graph search? Our answer to the question is that itdepends on the degree of node sharing in the graphs. At one extreme where thereis a substantial sharing in the graph such that the number of nodes in the graphis polynomial in the depth of the graph, then the folding—back—and—averaging—outmethod is more appropriate, since it exhibits polynomial (in the depth of the graph)complexity while the complexity of our search—oriented algorithms is still exponential.At the other extreme where there is no sharing at all in a decision graph, the graphis essentially a tree, and our algorithms, especially those in the first and the thirdclasses, are more appropriate.CHAPTER 3. DECISION GRAPH SEARCH 523.5 Proofs of TheoremsTheorem 3.1 If the heuristic function used by DFS is admissible, thendt b — I h*(n) if h*(n) <b,(n, )—MAXINT otherwise.for any node n in the decision tree and a number b.To prove this theorem, we make two observations. First, we observe that functionh* is equivalent to dt0 defined as follows:Case 1 n is a terminal:dto(n) = h*(n) = v(n). (3.1)Case 2 n is a chance node:dto(n) = to1. (3.2)where to, 0 i < 1, is recursively defined as follows:too =1h(n) *pj; (33.)to = t_1 + pj * (dto(n) — h(n)).If h is admissible, then to, < to1 for i = 0, ..., 1 — 1.Case 3 n is a choice node:dto(n) = t01 (3.4)where t, 0 < i < 1, is recursively defined as follows:=to = min{toji, c + dto(n)}.CHAPTER 3. DECISION GRAPH SEARCH 53In the above definition, c denotes the cost of the edge from a choice node to itsi-th child, and pj denotes the probability associated with the i-th child of a chancenode. They correspond to cost(n, i) and prob(n, i) respectively in algorithmDFS. This convention will be used in the rest of this section.Second, we observe that, according to the structure of algorithm DFS, the definition of dt can be further refined as follows:Case 1 n is a terminal;dt1n b — f v(n) if v(n) < b, 3 6‘ — MAXINT otherwise. . )Case 2 n is a chance node:It ift<b,dt(n, b)=MAXINT otherwise. (3.7)where t = t1 and tj, 0 < i < 1 is recursively defined as follows:to= >:.= h(n) *= (b— (t_1 — h(n) * pj))/pj;— f t_1 if t_1 b, 3.8— 1 ti—i + p * (dt(n, b) — h(n)) otherwise.Case 3 n is a choice node:It ift<b,dt(n, b)= MAXINT otherwise. (3.9)where t = t and t, 0 < i < 1 is recursively defined as follows:=— f t_1 if t_1 — h(n), (3.10)1 min{t_i, c + dt(n, t_1 — c)} otherwiseThus, to prove the theorem, it suffices to provedtn b—J dto(n) if dto(n) <b, 3 11‘— MAXINT otherwise.CHAPTER 3. DECISION GRAPH SEARCH 54for every node n in the decision tree.Based on the definition of dt0 and the characterization of dt given above, relation(3.11) can be proved by induction on the structure of the decision graph. First, weneed two intermediate results.Lemma 3.17 Let n be a choice node with children n1, ..., nj. Let to and t bedefined by equations (3.5) and (3.12) respectively. Suppose that for any number b’,di b’ — J dto(n) if dto(n) <b’,(ni, ) — i MAXINT otherwise.for1<il.Then,foranyi,1<i<l(A) t < b if to < b;(B) if t0 < b, then tO, = tj, otherwise. t, = b.Proof We prove this lemma by induction on i. Our observation here is that, underthe given assumptions, equation (3.10) is equivalent to the following simpler one:to=b;312= min{t_i, c + dt(n, t_1 — c)} . )Basis: i = 1. t1 = rnin{b, c1 + dt(ni, b — c1)} and t01 = c1 + dto(ni).Ifto1 = c1 + dto(ni) > bthen,dto(n1) > b—c1.By the given assumptions, we have:dt(ni,b—ci) =MAXINT> b.Thereforet1 = b.CHAPTER 3. DECISION GRAPH SEARCH 55Ifto1 = c1 + dto(ni) <bthen,dto(ni) < b — c1.By the given assumptions, we have:dto(ni) = dt(ni, b — ci);c1 + dt(n1,b — ci) = Cl + dto(ni) <b.Therefore, ti = c1 + dto(ni) = 101. The induction base holds.Induction: Suppose the lemma is true for i = k. For i = k + 1, we have:tk+i = rriin{tk,ck+l + dt(nk+l,b— ck+1)};tok+i = ITlin{tok, Ck+1 + dto(nk+i)}.Now, we have two cases:(A) tok+1 > b. In this case, we have tok b and Ck+1 + dto(nk+l) b. Thus wecan obtain tk = b by the induction assumption. Furthermore, sincedto(nk+l) b—then, by the given assumptions, we have:dt(nk+i, tk — ck+1) = dt(rlk+i, b — ck+i) MAXINT.Therefore, tk+1 = b.(B) tok+1 < b, In this case, we need to consider three subcases:(a) tOk > b. In this subcase, we have:Ck+i + dto(nk+l) = tOk < b;CHAPTER 3. DECISION GRAPH SEARCH 56dto(nk+l) <b— ck+1;tok+1 = Ck+1 —i- dto(nk+l).By the induction assumption, we have: tk = b. By the given assumptions, we obtain:dt(nk+l,b— Ck+1) = dto(nk+l).Thus,k+1 = Ck+1 + dt(rik+l, b — Ck+1) = dto(nk+l) + Ck+1.Therefore, tk+1 tok+1.(b) tok < b and ok Ck4 + dto(nk+l). In this subcase, we have:tok+1 = toktok = tk (by the induction assumption).Thus,Ck1 + dto(nk+l) + dto(nk+l, b—Therefore, k+l = ok+1.(c) ok < b and tok > Ck+1 + dto(nk+l) In this subcase, we have:tok = tk (by the induction assumption);tok+1 = Ck+1 + dto(nk+l)dto(nk+l) < tok— Ck+1 = tk—Ck÷1.Thus, by the given assumptions, we have:dt(nk+1,tk— Ck+1) = dto(nk+l).CHAPTER 3. DECISION GRAPH SEARCH 57Thus,dt(nk+l, b — Ck+1) + Ck+1 = dto(nk+l) + Ck+1 < tk.Therefore, tk+1 = dt(nk+l, b — ck+1) + Ck+1 tOk+1•In summary, the claim holds for i = k + 1. Consequently, the lemma holds byinduction. DLemma 3.18 Let n be a chance node with children n1,..., n. Let to and t bedefined by equations (3.3) and (3.8) respectively. Suppose that for any number b’,dt b’ — f dto(n) if dt0(n) <b’,(ni, )—MAXINT otherwise.for1<i<l.Then,foranyi,1il(A)t,<b iffto<b;(B) if t0 < b, then to = tj.Proof: By induction on i, similar to that for Lemma 3.18 . DProof of Theorem 3.1. We prove relation (3.11) by induction on nodes in thedecision tree.1. n is a terminal. Then v(n) = h*(n) = dto(n). Thus, relation (3.11) holdstrivially.2. n is a choice node. Suppose relation (3.11) holds for all the children of n. Weneed to consider two cases.(A). dto(n) b. We have the following derivation:CHAPTER 3. DECISION GRAPH SEARCH 58t01 b (by equation (3.4))t t1 b (by Lemma 3.17)dt(n, b) = MAXINT (by equation (3.9)).(B). dto(n) b. We have the following derivation:dto(n) = to1 < b (by equations (3.4))toi > b (by Lemma 3.17)dt(n, b) = t = to1 = dto(n) (by equation 3.9).In summary, relation (3.11) holds for n.3. n is a chance node. Suppose relation (3.11) holds for all the children of n.Similarly, it can be proved that the relation holds for n as well by using Lemma3.18.In summary, the theorem holds ill general. ElTheorem 3.4 Suppose DFS uses heuristic function h. If there exists a number5’> 0 such that h satisfies:h(n) h*(n) + 6 for every node n in a decision treethenh*(n)+6> b if dt(n,b) > b;andh*(n)+6 dt(n,b) ifdt(n,b) < bCHAPTER 3. DECISION GRAPH SEARCH 59for every node n in the decision tree.Theorem 3.5 Suppose DFS ‘uses heuristic function h. If the arc costs in a decision tree are all non—negative, h*(n) > 0 for every node n in the decision tree, andthere exists a number 6 > 0 such that h satisfies:h(n) (1 + 6) * h*(n),for every node n in the decision tree,then for every node n in the decision tree and any number b 0,h*(n)*(1+6)b if dt(n,b)b;andh*(n) * (1 + 6) dt(n, b) if dt(n, b) < b.Since h* is equivalent to dt0, all the occurrences of h*(n) in the above theoremscan be replaced with dto(n). Therefore, for Theorem 3.4, it suffices to prove that forevery node n in the decision treedto(n)+6>b if dt(n,b)b;anddto(n) +6 dt(n,b) if dt(n,b) < bFor Theorem 3.5, it suffices to prove that for every node n in the decision treedto(n)*(1+6) b if dt(n,b) b;anddto(n) * (1+6)> dt(n,b) if dt(n,b) < bCHAPTER 3. DECISION GRAPH SEARCH 60The proofs of Theorems 3.4 and 3.5 are very similar. Here we just present the proofof Theorem 3.5.Proof of Theorem 3.5Case 1 n is a terminal. Since dto(n) = h*(n) 0, thus, (1 + 6) * dto(n) dto(n).If dt(n, b) > b, then v(n) = h*(n) b, therefore, dto(n) * (1 + 6) > b.If dt(n,b) < b, then dt(n,b)v(n) = h*(n) = dto(n), thus dto(n) * (1+6)dt(n,b).Therefore, the theorem holds for node n.Case 2 n is a chance node.Suppose the theorem holds for all the children of node n. We need to considerthe following two cases:(A). dt(n,b) < b. According to equation (3.7), we have dt(n,b) = t = t1 < b.Thus, t < b for i = 1,..., 1. Consequently, by equation (3.8), we have:dt(n, b) < b. By the induction assumption, we have:dt(ri, b) (1 + 6) * dto(n)for i = 1,..., 1. According to equations (3.3) and (3.2), we obtain:dto(n) = to1=* dto(n)According to equation (3.8), we obtain:1 1= Zp *dt(n,b1)< pj *dt(n) * (1+6) = toi * (1+6)Thus, dtfri, b) < dto(n) * (1 + 6).CHAPTER 3. DECISION GRAPH SEARCH 61(B). dt(n, b) b. According to equations (3.7) and (3.8), we know that t =t1 > b. This implies that either to > b or there exists k, 1 < k < 1 suchthat tk_1 < b and tk > b. In the former case, we have:1 1dto(n) * (1 +6) = pj * dto(n) * (1 +6) > * h(n) = toThus, dto(ri)*(1+6)>b.In the latter case, we can obtain:dt(n, b) < b2 for 0 <j < kdt(nk,bk) bkwhere bk = (b — (tkl— h(nk) * pk))/pk. Consequently, by the inductionassumption, we have:dto(n) * (1+6) dt(ri3,b) for 0 <j < kand(1 + 6) * dto(nk) bkTherefore:Pk * (1 +6) * dto(nk) > b — (tk_1 — h(nk) * pk))tkl +73k * (1+6) *dto(nk) — h(nk) *pk bAccording to equation (3.8), we have:k—itk_i =j=i j=kThus,k—idt(n,bj)*p+pk*(1+6)*dto(nk)+ h(n)*p>bj=k+iCHAPTER 3. DECISION GRAPH SEARCH 62k>(1 +6)dto(n) *pj + (1 + 6)dto(n) *pj > bj1 j=k+1Therefore,(1 + 6) * dto(n) = (1 + 6)dto(n) * p3 bj1Case 3 n is a choice node.Suppose the theorem holds for all the children of node n. The theorem holdstoo for node n if we can prove(1 + 6) * t0 (3.13)for i = 0,..., 1, where to aild t are defined by equations (3.5) and (3.12)respectively. This inequality can be proved by induction on i.Basis:: i = 0, trivial.Induction:. Suppose the mequality is true for i = k. Consider the case wheni=k+1.(A). tok Ck+1 + dto(nk+l). In this case, we have tok = tOk+1. Since tk+1 tk,by the inner induction assumption, we conclude tk+1 (1 + 6) * tok+1.(B). tok > Ck+1 + dto(nk+l). In this case, we have: Ck+1 + dto(nk+l) = tok+1.If tk—ck+1 < h(nk+l), then,tk+1 = tk <Ck+1 + h(nk+l) ck+1 + (1 + 6) * dto(nk+l)Since c1 > 0 and 6> 0, we have tk+1 (1 + 6) * tok+1.If tk— Ck+1 > h(nk+l), then,tk+1 = min{tk, Ck+1 + dt(nk+l, tk— Ck+1)}CHAPTER 3. DECISION GRAPH SEARCH 63When tk — Ck+1 < dt(nk+1,tk — Ck+1), by the outer induction assumption,we have:(1 + 6) * dto(nk+l) > tk —tk+i = tk (1 + 6) * dto(nk+l) + Ck+1Since Ck+1 0 and 6 0, we have tk1 (1+6)*tok+1.When tk— Ck+1 > dt(nk+1,tk — Ck+1), by the outer induction assumption,we have:(1 + 6) * dto(nk+l) dt(nk+1,tk—Ck+1)tk+1 = Ck+1 + dt(nk+l,tk — Ck+1) Ck+1 + (1 + 6) * dto(nk+l)Since Ck+1 0 and 6>0, we have tk+1 (1+6)*tok+1.By induction, inequality 3.13 holds for all i = 0, ..., 1.In summary, the theorem holds for any node n. UPart IIINFLUENCE DIAGRAMEVALUATION64Chapter 4Decision Analysis and InfluenceDiagramsIn this chapter, we review some basic concepts about decision analysis and informallyintroduce influence diagrams a framework for decision analysis. We motivateour work by discussing the advantages and disadvantages of this framework. Inthe chapters to follow, we present an extension to influence diagrams and developa method for influence diagram evaluation. Our method, which aims to overcome thedisadvantages, employs decision graphs as an intermediate representation and usesthe decision graph search algorithms developed in Chapter 3 to compute optimalsolutions to decision problems.4.1 Bayesian Decision AnalysisWe all make decisions every day. A decision problem often involves many variables,each having a number of possible outcomes. Some variables can be controlled in thesense that we can select one of its possible outcomes as its value. They are calleddecision variables. Other variables are beyond our control. They are called uncertainvariables. Among uncertain variables, some are observable whereas others are not.65CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 66Variables are inter—related in one way or another. The problem is to choose appropriate values for decision variables so as to best meet our objective. A prescription onhow to choose values for decision variables is called a decision policy or policy. If thenon—observable uncertain variables in a decision problem are relevant to our objective,the decision problem is called a problem of decision making under uncertainty.As an example, consider a simplified version of the used car buyer problem [34]. Inthis example, Joe has to decide whether to buy a used car. Two variables are involvedin this decision problem: an uncertain variable representing the car’s condition thatcan be either “peach” or “lemon,” and a decision variable representing Joe’s purchasedecision. According to the current market, a peach is worth $1060 and a lemon isworth $900. The price of the car is $1000. The difficulty of Joe’s decision stems fromthe uncertainty of the car’s condition.Bayesian decision theory [25] is concerned with those decision problems in whichuncertain variables are random variables. A fundamental result of Bayesian decisiontheory is that any decision preference of a rational decision maker can be captured bya utility function such that his/her decision objective can be achieved by maximizingthe expected utility. A policy maximizing the utility function is called an optimalpolicy.In the used car buyer problem, if the uncertainty of the car’s condition can becharacterized by a probability distribution, then Bayesian decision theory is applicable. Suppose that the car is a peach with probability 0.8 and a lemon with probability0.2. Suppose further that Joe’s objective is to maximize the expected profit (in dollars). Then the policy maximizing the expected profit is to buy the car. This policywill bring Joe a profit of $60 with probability 0.8 and a loss of $100 with probability0.2. Thus, the expected profit is $28.CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 67Bayesian decision analysis (or decision analysis for short) [79, 96] is a disciplineabout the application of Bayesian decision theory to decision problems. There aretwo fundamental issues: how to model a decision problem, and how to compute anoptimal policy.4.2 Decision TreesDecision trees were used as a simple tool both for problem modeling and optimalpolicy computation in the early days of decision analysis [79]. A decision tree canrepresent the order in which decisions are made and the information available to thedecision agent when he/she makes a decision. It explicitly depicts all scenarios of theproblem and specifies the “utility” the agent can get in each scenario.In the literature of decision analysis, little attention has been paid to the computation of optimal policies of decision trees. It is commonly assumed that the so—called“average—out—and—fold—back” method [79, 96] is used for the computation.From the computational point of view, a decision tree is just a decision graphwithout node sharing. Thus, the algorithms presented in Chapter 3 can be used forcomputing an optimal policy for decision trees.In this section, we illustrate by an example some basic concepts about, and theadvantages and the disadvantages of, decision trees.A decision tree for the used car buyer problem is shown in Fig. 4.1. In this tree, thechoice node corresponds to the purchase decisions and the chance nodes correspondto the car’s condition. The choice node is followed by chance nodes, indicating thatthe car’s condition is not known to Joe when he makes the purchase decision. Eachpath from the root to a leaf node represents a possible scenario. For example, thepath on the top of the decision tree in Fig 4.1 corresponds to the decision scenario inCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 68which Joe decides to buy the car and the car turns out to be a peach. The probabilityfor this scenario is 0.8 (the product of probabilities on the arcs along the path). Thevalue for this scenario is 60.28 0.8 Peacl::) 6028lemon::-100F1K. 0.8 peach :Figure 4.1: A decision tree for the used car buyer problemIn general, a decision problem may involve many decision and random variables.For such a problem, the order of nodes in the decision tree is important; it must beconsistent with the decision making order and the constraints of information availability. More explicitly, a choice node N comes before another choice node N’ if andonly if the decision corresponding to N will be made before the decision corresponding to N’; a chance node N comes before a choice node N’ if and only if the valueof the random variable corresponding to the chance node N is known to the decisionmaker when the decision corresponding to the choice node N’ is to be made.In order to illustrate the above point, let us continue the used car buyer problem[34]. Suppose Joe knows that, of the ten major subsystems in the car, a peach has adefect in only one subsystem whereas a lemon has a defect in six subsystems (that iswhy Joe likes a peach and does not like a lemon). Joe also knows that a car withoutdefect is worth $1100 in market. Finally, Joe knows that it will cost him $40 torepair one defect and $200 to repair six defects. Observing Joe’s concern about thepossibility that the car may be a lemon, the dealer offers an “anti—lemon guarantee”CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 69option. For an additional $60, the anti—lemon guarantee will cover the full repair costif the car is a lemon, and cover half of the repair cost otherwise. At the same time,a mechanic suggests that some mechanical examination may help Joe have a betteridea of the car’s condition. In particular, the mechanic offers three alternatives: testthe steering subsystem alone at a cost of $9; test the fuel and electrical subsystems ata total cost of $13; a two—test sequence in which the transmission subsystem will betested first at a cost of $10 and, after knowing the test result, Joe can decide whetherto test the differential subsystem at an additional cost of $4. All tests are guaranteedto detect a defect if one exists in the subsystem(s) being tested (in other words, thetests provide perfect information about the subsystems being tested.In this modified example, Joe has to make three decisions: two decisions aboutwhether to perform mechanical tests and one decision about whether to buy the car.There are two new random variables: one for the result of the first test and the otherfor the result of the second test. The decision making order is: the first test, thesecond test and then the purchase decision. Furthermore, when deciding whether todo the second test, Joe remembers his choice for the first test and knows the testresults; when deciding whether to buy the car, Joe remembers his choices for bothtests and knows the results for the tests. A complete decision tree for this problemis shown in Fig. 4.2.In the figure, the choice node labeled T1 corresponds to the first test decision. Ithas four possible olltcomes: nt for no test, st for testing the steering subsystem alone,f&e for testing the fuel and electrical subsystems, and tr for testing the transmissionsubsystem first. The chance nodes labeled R1 correspond to the random variableof the first test result, which has three possible outcomes: zero for no defect beingdetected, one for one defect being detected and two for two defects being detected.CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 70The choice nodes labeled T2 correspond to the second test decision, which needs tobe made only when the choice for the first test decision is tr. It has two possibleoutcomes: nt for no test, and diff for testing the differential subsystem. The chancenodes labeled R2 correspond to the random variable of the second test result, whichhas two possible outcomes: zero for no defect being detected and one for one defectbeing detected. The choice nodes labeled B correspond to the purchase decision,which has three possible outcomes: b for buying the car without the anti—lemonguarantee, g for buying the car with the anti—lemon guarantee and S for not buyingthe car.Though conceptually simple, decision trees have a number of drawbacks.First, the dependence/independence relationships among the variables in a decision problem cannot be represented in a decision tree. For example, in the used carbuyer problem, the profit is conditionally independent of the test results, given thetest decisions, the purchase decision and the car condition. However, this conditionalindependence cannot be represented in the decision tree.Second, a decision tree specifies a particular order for the assessment of the probability distributions of the random variables in the decision problem. This order isin most cases not a natural assessment order. For example, in the used car buyerproblem, we need the probabilities of the first test results given the first test decision,and the conditional probabilities of the car condition given the test decisions andthe test results. These probabilities are harder to assess than the prior distributionof the car condition, the conditional distributions of the test results conditioned onthe test decisions and the car condition. In the process of constructing the completedecision tree for the used car buyer problem, we have to compute these conditionaldistributions using Bayes Theorem.CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 71:6o‘) -10060-10060-100. 020:> 40ntb‘) -100bpeach60-1004060-1000204060-1000204040RIT2onelemon 1Figure 4.2: A complete decision tree for the used car buyer problemCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 72Third, the size of a decision tree for a decision problem is in general exponential inthe number of variables in the decision problem. This makes decision trees appropriateonly for decision problems with a small number of variables.Finally, a decision tree is not easily adaptable to changes in a decision problem. Ifa slight change is made in a problem, one may have to draw a new decision tree. Thispoint can be illustrated by an example. Suppose that in the used car problem, Joehas $1060 disposable money and somebody offers Joe a lottery at cost of $50. Thelottery may return $120 or nothing, with equal probability. Joe has to decide whetherto buy the lottery before making any decisions about the used car. The decision treefor this variation will be very different from the one shown in Fig. 4.2.4.3 Influence DiagramsInfluence diagrams were first proposed as an alternative representation framework fordecision analysis [35, 56].Within the framework of influence diagrams, a decision problem can be specifiedat three levels: the level of relation, the level of function and the level of number [35].At the level of relation, a decision problem is depicted by a directed acyclic graphwith three types of nodes: random nodes, decision nodes and value nodes. Eachrandom node represents a random variable whose value is dictated by some probabilitydistribution. Each decision node represents a decision variable whose value is to bechosen by the decision maker. A value node represents a component of the utilityfunction. In this thesis, we use the terms decision (random) variables and decision(random) nodes interchangeably. At the level of function, all the possible outcomesof the random variables and all the possible alternatives of the decision variables areidentified. The set of all possible outcomes of a random variable is called the frame ofCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 73the random variable. The set of all possible alternatives of a decision variable is calledthe frame of the decision variable. At the level of number, (conditional) distributionsare assigned to the random variables.In an influence diagram, an arc from a variable x to a decision variable d indicatesthat the value of the variable x is known to the decision maker at the time when thedecision d is to be made. The Cartesian product of the frames of a decision variable’sparents is the set of information states for the decision variable. An information statedenotes the information available to the decision maker when the decision is to bemade. An arc from a node x to a random node (or a value node) y indicates thatthe random variable (resp. the value function) y is probabilistically dependent onthe variable x.For example, at the level of relation, the used car buyer problem can be representedby the diagram as shown in Fig. 4.3. By convention, boxes denote decision variables,circles denote random variables and the diamond denotes a value function (the utilityfunction to be maximized).The diagram can be read as follows. There are three decision variables (T1, T2and B), three random variables (R1, R2 and cc) and one value function (profit) inthe decision problem. The first test decision T1 is to be made first. The first test resultFigure 4.3: An influence diagram for the used car buyer problemCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 74Table 4.1: The prior probability distribution of the car’s condition P{CC}CC proipeachLlemon OR1 depends on this decision and the car’s condition CC. This dependence is indicatedby the two arcs to node R1. The decision maker remembers his choice for the firsttest decision and knows the test result when he makes the second test decision T2.This is indicated by the two arcs to node T2. The second test result R2 depends onthe first test, the result of the first test, the second test and the car condition. Whenmaking the purchase decision, the decision maker remembers his choices for the firsttwo decisions and knows the test results. The value function depends on the threedecisions and the car condition.To complete the influence diagram representation, we need to specify the frames ofthe variables and the (conditional) probability distributions for the random variables.The frame for the random variable CC contains two elements: peach and lemon.This variable has no parent in the graph. Its prior probability distribution is givenin Table 4.1.The frame for the decision variable T1 contains four elements: nt, st, f&e andtr, representing respectively the alternatives of performing no test, testing the steering subsystem alone, testing the fuel and electrical subsystems, and testing the transmission subsystem first with an option of testing the differential subsystem next.The frame for the random variable R1 contains four elements: nr, zero, one andtwo representing respectively the four possible outcomes of the first test: no result,no defect, one defect and two defects. The probability distribution of the variables,conditioned on T1 and CC, is given in Table 4.2.CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 75Table 4.2: The probability distribution of the first test result P{R1T,CC}T1 CC R1 probnt — nr 1.0nt— others 0St— nr 0St — two 0St peach zero 0.9St peach one 0.1St lemon zero 0.4St lemon one 0.6f&e — nr 0f&e peach zero 0.8f&e peach one 0.2f&e peach two 0fe lemon zero 0.13fe lemon one 0.53f&e lemon two 0.33Table 4.3: The probability distribution of the second test result P{R2T1,R1,T2, CC}T1 Ri T2 CC R2 probnt— —— nr 1.0nt — —— others 0St— —— nr 1.0St — —— others 0fRe ——— nr 1.0fRen — —— others 0tr nr —— nr 1.0tr nr —— others 0tr two—— nr 1.0tr two—— others 0tr— nt— nr 1.0tr — nt— others 0tr zero diff peach zero 0.89tr zero diff peach one 0.11tr zero diff lemon zero 0.67tr zero diff lemon one 0.33tr one diff peach zero 1.0tr one cliff peach one 0tr one diff lemon zero 0.44tr one diff lemon one 0.56CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 76The frame for the decision variable T2 contains two elements: nt and diff, denoting the two options of performing no test and testing the differential subsystem.The frame for the random variable R2 contains three elements: nr, zero and one,representing respectively the three possible outcomes of the second test: no result,no defect and one defect. The probability distribution of the variable, conditioned onT1, R1, T2 and CC, is given in Table 4.3.The frame for the decision node B contains three elements: S, b and g, denotingrespectively the alternatives of not buying, buying without the anti—lemon guarantee,and buying with the anti—lemon guarantee.From the above example, we observe that the influence diagram representationhas a number of advantages in comparison with decision trees: it is expressive enoughto explicitly describe the dependence/independence relationships among the variablesinvolved in the decision problem; it allows a more natural assessment order on theprobabilities of the random variables and it is compact. In addition, influence diagrams are easy to adapt to changes in problems. To illustrate this point, let usconsider again the variation of the used car problem given at the end of the previoussection. In that variation, Joe has to decide whether to buy a lottery before considering the decisions about the used car. That variation can be represented by aninfluence diagram as shown in Fig. 4.4, in which node B’ represents the decision oflottery purchase, node L represents the lottery and node R0 represents the actual return Joe gets from the lottery. This influence diagram is derived from the one shownin Fig. 4.3 by the addition of three new nodes and ten arcs.A decision function for a decision variable is a mapping from the set of the information states of the decision variable to the decision variable’s frame, prescribinga choice for the decision for each information state. A decision policy (or a policyCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 77Figure 4.4: An influence diagram for a variation of the used car buyer problemfor short) for an influence diagram is a collection of decision functions, one for eachdecision variable. Each policy determines an expected value of the influence diagram.A policy is optimal if it maximizes the expected value of the influence diagram. Wewill give a formal definition of these concepts in the next chapter.Influence diagrams were first proposed as a “front—end” for the automation ofdecision making process [56, 35]. The actual analysis of a decision problem was carriedout in two phases. An influence diagram was first transformed into a decision treeand then the decision tree was evaluated. The idea of evaluating influence diagramsdirectly was proposed in [64]. The first complete algorithm for influence diagramevaluation was developed by Shachter [85]. We will give a review on related researchefforts in Chapter 6.4.4 Disadvantages of Influence DiagramsWhile influence diagrams have many advantages as a representation framework fordecision problems, they have a serious drawback in handling asymmetric decisionproblems [12, 26, 70, 85, 95]. Decision problems are usually asymmetric in the sensethat the set of possible outcomes of a random variable may vary depending on differentconditioning states, and the set of legitimate alternatives of a decision variable mayCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 78vary depending on different information states. For example, in the used car buyerproblem, if the choice for the first test is nt (no test), the first test result can only benr. On the other hand, if the choice for the first test is st or tr, the test result canbe one of the two possible outcomes: zero and one, and if the choice for the first testis f&e, the test result can be one of the three possible outcomes: zero, one and two.Furthermore, the alternative of testing the differential subsystem for the second testdecision is possible only if the choice for the first test is tr (testing the transmissionsubsystem).To represent an asymmetric decision problem as an influence diagram, the problemmust be “symmetrized” by using common frames and assuming degenerate probabilitydistributions [95]. In the used car buyer problem, the frame of the variable T1 is acommon set of outcomes for all the three cases. The impossible combinations of thetest choices and the test results are characterized by assigning them zero probabilityin Table 4.2. The frame of the second test is also a common set of all alternativesin various states, while the fact that the second test option is not available in somestates is characterized by assigning unit probability to outcome nr of the variable R2conditioned on these states.This symmetrization results in two problems. First, the number of informationstates of decision variables is increased. Among the information states of a decisionvariable, many are “impossible” (having zero probability). For example, the information state corresponding toT1st,R=nr for the second test decision variable T2is an impossible state. The optimal choices for these impossible states need not becomputed at all. However, they are computed by conventional influence diagramevaluation algorithms [95]. Second, for each information state of a decision variable,because the legitimate alternatives may constitute only a subset of the frame of theCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 79decision variable, an optimal choice is chosen from oniy a subset of the frame, insteadof the entire frame. However, conventional influence diagram algorithms have to consider each alternative in order to compute an optimal choice for a decision in any ofits information states.From the above analysis, we observe that conventional influence diagram evaluation algorithms involve unnecessary computation. We show in Section 7.5 that, for atypical decision problem, the amount of unnecessary computation can be exponentialin the number of decision variables.4.5 Previous Attempts to Overcome the DisadvantagesSeveral researchers have recently proposed alternative representations that attemptto combine the strengths of influence diagrams and decision trees. Fung and Shachter[26] propose a representation, called contingent influence diagrams, for explicitly expressing the asymmetric aspects of decision problems. In that representation, eachvariable is associated with a set of contingencies, and associated with one relation foreach contingency. These relations collectively specify the conditional distribution ofthe variable.Covaliu and Oliver [12] propose a different representatioll for representing decision problems. This representation uses a decision diagram and a formulation tableto specify a decision problem. A decision diagram is a directed acyclic graph whosedirected paths identify all possible sequences of decisions and events in a decisionproblem. In a sense, a decision diagram is a degenerate decision tree in which pathshaving a common sequence of events are collapsed into one path [12]. Numerical dataare stored in the formulation table. Covaliu and Oliver [12] also give a backward alCHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 80gorithm to compute optimal policies from decision diagrams by using the formulationtable.Shenoy [92] proposes a “factorization” approach for representing degenerate probability distributions. In that approach, a degenerate probability distribution over aset of variables is decomposed into several factors over subsets of the variables suchthat the their “product” is equivalent to the original distribution.Smith et al. [95] observe that the computation of various probabilities involved ininfluence diagram evaluation can be sped up if the degenerate probability distributions in influence diagrams are used properly. Their philosophy is analogous to theone behind various algorithms for sparse matrix computation. In their proposal, aconventional influence diagram is used to represent a decision problem at the level ofrelation. In addition, they propose to use a decision tree—like representation to describe the conditional probability distributions associated with the random variablesin the influence diagram. The decision tree—like representation is effective for economically representing degenerate conditional probability distributions. They propose amodified version of Shachter’s algorithm [85] for influence diagram evaluation, andshow how the decision tree—like representation can be used to increase the efficiencyof arc reversal, a fundamental operation used in Shachter’s algorithm. However, theiralgorithm cannot avoid computing optimal choices for decision variables with respectto impossible information states, because it, like other influence diagram evaluationmethods, also takes a reduction approach. We will come back to this point in Section6.2.3.CHAPTER 4. DECISION ANALYSIS AND INFLUENCE DIAGRAMS 814.6 Our SolutionIn this thesis, we develop an approach for overcoming the aforementioned disadvantages of influence diagrams. Our approach consists of two independent components: asimple extension to influence diagrams and a top down method for influence diagramevaluation.Our extension allows the explicit expression of the fact that some decision variableshave different frames in different information states. We achieve this by introducinga framing function for each decision variable. The framing function characterizes theavailable alternatives for the decision variable in different information states. Withthe help of framing functions, our solution algorithm can effectively ignore the unavailable alternatives when computing an optimal choice for a decision variable inany information state. Our extension is inspired by the concepts of indicator valuations and effective frames proposed by Shenoy [92]. Note that our extension isorthogonal to Smith’s. In our influence diagram representation, we can also use theirdecision—tree like representation to describe conditional distributions of random variables. Furthermore, we can also use their method to exploit asymmetry in computingconditional probabilities.A novel aspect of our method is that it avoids computing optimal choices fordecision variables in any impossible states. This method works for influence diagramswith or without our extension.Our method, similar to Howard and Matheson’s [35], evaluates an influence diagram in two conceptual steps: it first generates a decision graph from the influencediagram and then evaluates an optimal policy from the decision graph. In such adecision graph, a choice node corresponds to an information state of some decisionCHAPTER 4. DECISION ANALYSIS AND INFL UENCE DIAGRAMS 82variable. The decision graph is generated in such a way that an optimal solutiongraph of the decision graph corresponds to an optimal policy of the influence diagram. Thus, the problem of computing an optimal policy is reduced to the problemof searching for an optimal solution graph of a decision graph, which can be accomplished by the algorithms presented in Chapter 3. Our method successfully avoidsunnecessary computation by pruning those choice nodes in the decision graph thatcorrespond to impossible states, and by ignoring those children of choice nodes thatcorrespond to unavailable alternatives for the decision variables.By using heuristic search techniques, our method provides an explicit mechanismfor making use of heuristic information that may be available in a domain—specificform. The combined use of heuristic search techniques and domain—specific heuristics may result in even more computational savings. Furthermore, since our methodprovides a clean interface between influence diagram evaluation and Bayesian netevaluation, various well—established algorithms for Bayesian net evaluation can beused in influence diagram evaluation. Finally, by using decision graphs as an intermediate representation, the value of perfect information [53] can be computed moreefficiently [110].Note that our method is more efficient than Howard and Matheson’s, partly because our method generates a much smaller decision graph for the same influencediagram.Chapter 5Formal Definition of InfluenceDiagramsIn this chapter, we give a formal definition of influence diagrams and present ourextension. The definition is borrowed from [109].5.1 Influence DiagramsAn influence diagram I is a quadruple I = (X, A, P, U) where• (X, A) is a directed acyclic graph with X partitioned into random node set C,decision node set D and value node set U, such that the nodes in U have nochildren.Each decision node or random node has a set, called the frame, associated withit. The frame of a node consists of all the possible outcomes of the (decisionor random) variable denoted by the node. For any node x E X, we use 7r(x)to denote the parent set of node x in the graph. For any x E C U D, we useto denote the frame of node x. For any subset J c C U D, we use ftj todenote the Cartesian product fJ€ .83CHAPTER 5. FORMAL DEFINITION of INFLUENCE DIAGRAMS 84• P is a set of probability distributions P{cr(c)} for all c E C. For each o Eand s E (c), the distribution specifies the conditional probability of eventc = o, given that rr(c) = s. Throughout this thesis, we use J = e to denotethe set of assignments that assign an element of e to the corresponding variablein J for any variable set J and any element e j.• U is a set {g : —* Riv € U} of value functions for the value nodes, where7? is the set of the reals.For a decision node d, a mapping 6 :—f is called a decision function ford. The set of all the decision functions for d, denoted by , is called the decisionfunction space for d2. Let D = {d1, ..., d,} be the set of decision nodes in influencediagram I. The Cartesian product z\ JJ Lj is called the policy space of I.5.2 Our ExtensionWe extend the definition of influence diagrams given in the previous section by introducing framing functions.An influence diagram I is a tuple I = (X, A, P, U, F) where X, A, P, U have thesame meaning as before, and F is a set {fd : —* 2d} of framing functions forthe decision nodes.The framing functions express the fact that the legitimate alternative set for adecision variable may vary in different information states. More specifically, for adecision variable d and an information state s E Q, the set fd(s) contains thelegitimate alternatives the decision maker can choose for d in information state s.Following Shenoy [92}, we call fd(s) the effective frame of decision variable d ininformation state s.CHAPTER 5. FORMAL DEFINITION of INFLUENCE DIAGRAMS 85In the used car buyer problem, the frame for the decision variable T2 has twoelements: nt and diff, denoting the two alternatives of performing no test andtesting the differential subsystem. However, the diff alternative is available only inthe information states where the choice for the first test is transmission. This can becaptured by a framing function fT2 defined as follows:#. f {nt, diff} ifuT1(s)=tr— 1 {nt} otherwisewhere T1 (s) denotes the projection of .s on T1 (the value of the first test in information state s).Similarly, we define a decision function for a decision node d as a mapping 6—+ In addition, 6, must satisfy the following constraint: For each s Ef(s). In words, the choice prescribed by a decision function for a decisionvariable d in an information state must be a legitimate alternative.5.3 Influence Diagram EvaluationGiven a policy 6 = (6k, 62, ..., 6), each decision node d can be regarded as a randomnode whose probability distribution is given as follows:1 if6(s)=a,= air(d) = s} = 0 otherwise.Policy 6 can be considered as a collection of probability distributions for the decisionvariables.Let I = (X,A,P,U,F) be an influence diagram, and let Y = CUD. Let A bethe set of all the arcs of A that lie completely in Y. Then the triplet (Y A, PUP3)is a Bayesian net, referred to as the Bayesian net induced from I by the policy 6, andCHAPTER 5. FORMAL DEFINITION of INFLUENCE DIAGRAMS 86is written as Ia. The prior joint probability P3{Y} is given byP3{Y}= fi P{xlir(x)} fJ Pa{xir(x)}. (5.1)xEC xEDBecause the value nodes do not have children, for any value node v, K(v) contains novalue nodes. Hence K(v)cY. The expectation Es{v] of the value function g, underP3 is given byE3[v] = Ps{ir(v) = o}gu(o).OE2(v)The expected value E3[I] of I under the policy S is defined byE3[I] = E&[v] (5.2)vEUThe optimal expected value E[I] of I is defined byE[I] = maxeEs[I]. (5.3)The optimal value of a decision network that does not have any value nodes is zero.An optimal policy 30 is one that satisfiesE50[I] = E[I]. (5.4)In this thesis we will only consider variables with finite frames. Hence there are onlya finite number of policies. Consequently, there always exists at least one optimalpolicy. To evaluate an influence diagram is to find an optimal policy, and to computethe optimal expected value.5.4 No—Forgetting and Stepwise DecomposableInfluence DiagramsIn this section, we introduce two classes of influence diagrams, which are the focus ofthe literature.CHAPTER 5. FORMAL DEFINITION of INFLUENCE DIAGRAMS 87An influence diagram is regular [35, 85] if there is a directed path containingall of the decision variables. Since the diagram is acyclic, such a path defines anorder for the decision nodes. This is the order in which the decisions are made. Aninfluence diagram is “no—forgetting” if each decision node d and its parents are alsoparents of those decision nodes that are descendants of d [35, 85]. Intuitively, the“no—forgetting” property means that a decision maker remembers all the informationthat was earlier available to him and remembers all the previous decisions he made.The lack of an arc from a node a to a decision node d in a no—forgetting influencediagram means that the value of the variable a is not known to the decision makerwhen decision d is to be made. A regular and “no—forgetting” influence diagramrepresents the decision maker’s view of the world.An influence diagram is called stepwise solvable [108, 109] if its optimal policycan be computed by considering one decision node at a time. A necessary and sufficient condition for the stepwise solvability of influence diagrams, called stepwise—decomposability, is provided in [106, 108, 109].Stepwise—decomposability is defined in terms of graph separation. Informally, aninfluence diagram is stepwise decomposable if the parents of each decision node dividethe influence diagram into two parts. In order to define the property formally, weneed some notation and concepts.We use nond(x) to denote the set of nodes that are not descendants of x inthe influence diagram. Thus, nond(d) fl D is the set of decision nodes that are notdescendants of node d. For a node set Z, let ‘r(Z) = Uezir(z) and ir*(Z) =Z U ir(Z).The moral graph [47] of a directed graph C is an undirected graph rn(G) withthe same node set such that there is an edge between node x and node y in rn(G)CHAPTER 5. FORMAL DEFINITION of INFLUENCE DIAGRAMS 88if and oniy if either there is an arc x —* y or y —* x in G, or there are two arcsx —* z and y —* z in G and x y. A node x is rn—separated from a node y by anode set Z in a directed graph G if every path between x and y in the moral graphrn(G) contains at least one node in the set Z. Because the “rn—separated” relationis syrnmetric, we sometimes simply say that two nodes x and y are rn—separated byZ if x is rn—separated from y by Z. Two sets X and Y are rn—separated by setZ if x is rn—separated frorn y by set Z for each x E X and each y E Y.Let d be a decision node in G, let m(G) be the moral graph of G and let Gdbe the undirected graph obtained from m(G) by rernoving all the nodes in ir(d).The downstrearn Yd of d is the set of all the nodes that are connected to d inGd, with d excluded. The upstream Xd of d is the set of all the nodes that arenot connected to d in Gd. The upstrearn Xd and the downstream Yd of d arern—separated by ir(d). This property is important for influence diagrams because m—separation implies conditional independence [108]. This property is used in Section7.2 when we establish a stochastic dynamic prograrnrning formulation for influencediagrarn evaluation.An influence diagram is stepwise decomposable if, for each decision node d and forany node x E Tr*(norid(d) fl D), the following holds: {x} U ir(x) Xd U ‘r(d). Thisdefinition implies that for each decision node d aild any node x ir*(nond(d) fl D),{x} U ir(x) Yd are rn—separated by ir(d).Note that a no—forgetting influence diagram is stepwise decomposable. In a stepwise decornposable influence diagram, an arc into a decision node indicates bothinforrnation availability and functional dependence. More precisely, for any decisionnode d and any other node x in a stepwise decomposable influence diagram, thepresence of an arc x —* d implies that the value of variable x is available at the timeCHAPTER 5. FORMAL DEFINITION of INFLUENCE DIAGRAMS 89when decision d is to be made, and it is not known that the information is irrelevantto the decision. On the other hand, the absence of an arc x —* d in a stepwisedecomposable influence diagram implies that either the value of variable x is notavailable at the time when decision d is to be made, or it is known that the information is irrelevant to the decision. Thus, one advantage of stepwise decomposabilityover no—forgetting is that stepwise decomposable influence diagrams can representthe knowledge that a piece of information (carried by a no—forgetting informationalarc to a decision node) is irrelevant to the optimal decision function of the decision.Our method is applicable to regular stepwise decomposable influence diagramswith multiple value nodes. For simplicity of exposition, however, we will first considerregular influence diagrams with exactly one value node. We come to regular influencediagrams with multiple value nodes in Chapter 8.Chapter 6Review of Algorithms forInfluence Diagram EvaluationIn this chapter, we review related research efforts on influence diagram evaluation,and classify them into two categories. Those in the first category use an intermediate representation and evaluate an influence diagram in two conceptual steps: firsttransforming the influence diagram illto its intermediate representation and then complltrng an optimal policy from the intermediate representation. Those in the secondcategory compute optimal policies directly from influence diagrams. Our method tobe presented in the next chapter belongs to the first category.6.1 Howard and Matheson’s Two—Phase MethodHoward and Mathesoll’s method belongs to the first category. Howard and Matheson [35] discuss a way to transform a regular no—forgetting influence diagram into adecision tree. The transformation involves two steps. An influence diagram is firsttransformed into a decision tree network and then a decision tree is constructed fromthe decision tree network. An influence diagram is a decision tree network if it isregular and no—forgetting, and if all predecessors of each decision node are direct90CHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 91parents of the decision node [35]. The basic operation for transforming a regular,no—forgetting influence diagram into a decision network is arc reversal [35, 85].The arc reversal operation is illustrated in Fig. 6.1. Suppose a —* b is an arc inan influence diagram such that both a and b are random nodes and there is no otherdirected path from node a to node b. The direction of the arc can be reversed andboth nodes inherit each other’s parents. This operation is an application of BayesTheorem. In Fig. 6.1, we begin with conditional probability distributions P{ba,.}and P{a.}, and end up with conditional probability distributions P{ab,.} andP{b.}. Formally, we have:P{bx, y, z} = P{a, bx, y, z} = P{ba, y, z} * P{ax, y}— P{a, bjx, y, z} — P{ba, y, z} * P{ax, y}{a ,x,y,z}_ —P{bx,y,z} P{bx,y,z}As an example, consider the influence diagram shown in Fig. 6.2, which representsthe oil wildcatter problem from [79]. In the problem, an oil wildcatter must decidewhether to drill or not to drill an oil well on a particular site. He is not certainwhether the site is dry, wet or soaking. Before making this decision, he can ordera test on the seismic structure. If he does, the test result will be available to him atthe time when the drill decision is to be made. The profit depends on the cost of thetest, the amount of oil and the cost of drilling, which can be either low, medium, orhigh.In the influence diagram, T and D are decision variables, corresponding to the testdecision and the drill decision. 0, S, R and CD are random variables, representingthe amount of oil, the seismic structure, the test result, and the cost of drilling,respectively.The two decision variables have the same frame with two alternatives: yes and no.CHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 92The frame of random variable 0 has three values: dry, wet and soaking. The frameof random variable S has three values: ns for no—structure, cs for close—structure andos for open—structure. The frame of random variable R has four values: ns for no—structure, cs for close—structure, os for open—structure, and nobs for no observation.The frame of random variable CD has three values: 1 for low, m for medium and h forhigh.The influence diagram in Fig. 6.2 is regular and no—forgetting, but is not adecision tree network, since node S and 0 are two predecessors of node D but theyare not parents of D. In order to transform that influence diagram into a decisionnetwork, we first reverse the arc 0 —* S and then reverse the arc S —* R. The resultingdecision network is shown in Fig. 6.3. In the course of this transformation, we havealso to compute the new conditional probability distributions for nodes 0 and S.More specifically, when we reverse the arc 0 —* 5, we need to compute probabilitydistributions P{D I s} and F{s} from probability distributions P{s I o} and P{0};when we reverse the arc S —* R, we need to compute probability distributions P{R I T}and P{ S I T, R} from probability distributions F { S } and F{R I T, S }. This operationintroduces the arc from T to S.-Figure 6.1: An illustration of the arc reversal operation: reversing arc a —* bA decision tree is constructed from a decision tree network as follows. First, definea total order-< over the set C U D U {v} satisfying the following three conditions:CHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 93Figure 6.2: An influence diagram for the oil wildcatter’s problemFigure 6.3: A decision tree network derived from the influence diagram for the oilwildcatter’s problemCHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 94(1) a -< b if b is the value node; (2) a -< b if there is a directed path from a to b;(3) a - b if a is a decision node and there is no directed path from b to a. Then,construct a decision tree by considering variables one by one in the order. Each layerin the decision tree corresponds to a variable. For the decision tree network in Fig.6.3, we obtain the following order:Using Howard and Matheson’s notation [35], the decision tree for the oil wildcatterproblem is shown in Fig. 6.4. In the figure, the boxes correspond to decision variables,circles to random variables, and diamonds to the value node. This is a compactrepresentation of a full decision tree. A layer of the decision tree is indicated by thecorresponding variable and its possible values (alternatives). In the case of a randomvariable, its probability distribution is also included in the layer. The full decisiontree can be obtained by systematically expanding each layer and adding necessaryconnections in the expanded graph. Fig. 6.5 shows a partial decision tree resultingfrom the expansion of the first two layers, and Fig. 6.6 shows a partial decision treeresulting from the expansion of the first three layers of the compact representation.Note that the decision tree here is slightly different from the definition of decisiontrees (graphs) in Chapter 2. In this decision tree, chance nodes and choice nodes(RITI {SIR,T} (Ols) (CD)Figure 6.4: A compact representation of the decision tree derived for the oil wildcatterproblemCHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION{RIT}Figure 6.5: A partial decision tree after the expansion of the first two layers95(RIT){RjT}Figure 6.6: A partial decision tree after the expansion of the first three layersCHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 96are not strictly interleaved. Nevertheless, the algorithms discussed in Chapter 3can be easily adjusted for such cases. Moreover, since the algorithms can work onimplicit representations of decision trees, it is not necessary to expand the compactrepresentation into a full decision tree. The algorithms can work on the compactrepresentation directly.The major problem with this approach is that the resultant decision tree tendsto be large. The depth of the decision tree so obtained from an influence diagramis equal to the number of variables in the influence diagram. Thus, the size of thedecision tree is exponential in the number of variables in the influence diagram.6.2 Methods for Evaluating Influence DiagramsDirectlyThe idea of evaluating influence diagrams directly was proposed in [64]. The firstcomplete algorithm for influence diagram evaluation was developed by Shachter [85].6.2.1 Shachter’s algorithmShachter’s algorithm takes a reduction approach. The algorithm evaluates an influence diagram by applying a series of value—preserving reductions. A value—preservingreduction is an operation that can transform an influence diagram into another onewith the same optimal expected value.Shachter identifies four basic value—preserving reductions, namely, barren noderemoval, random node removal, decision node removal and arc reversal. The arcreversal operation has been illustrated in the previous section. The other reductionsare illustrated as follows.Barren node removal. A node in an influence diagram is called a barren node ifCHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 97it has no child in the diagram. The barren node removal reduction states that anybarren node that is not a value node can be removed together with its incoming arcs.Random node removal. If the value node is the only child of a random node x inan influence diagram, then the node x can be removed by conditional expectation.As a result of this operation, the random node x is removed and the old value node isreplaced with a new one that inherits all of the parents of both the old value node andthe random node. The reduction is illustrated in Fig. 6.7 where the value functiong’ of the new value node v’ in the resultant influence diagram is given by:g’(a, b, c) = g(x, b, c) * P{xa, b}.Decision node removal. A decision node is called a leaf decision node if it has nodecision node descendant. If a leaf decision node d has the value node v as its onlychild and 7r(v) ç {d}Uir(d),then the decision node can be removed by maximization.The reduction is illustrated in Fig. 6.8 where the value function g’ of the new valuenode v’ in the resultant influence diagram is given by:g’(b) = maxdg(cl, b).The maximizing operation also results in an optimal decision function 8d for the leafdecision node through= arg maxdg(d, b).Note that the new value node has the same parent as the old value node. Thus, someof the parents of d may become barren nodes as a result of this reduction. In Fig.6.8, node a becomes a barren node. The arc from such a node represents informationavailable to the decision maker, but the information has no effect on either the optimalexpected value or the optimal policy of the influence diagram. This kind of arc (suchCHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 98as a —* d in Fig. 6.8) is called an irrelevant arc. Irrelevant arcs can be identified andremoved in a pre—processing step [106, 109, 111].-Figure 6.7: An illustration of random node removal: x is removed by expectation-/00 0Figure 6.8: An illustration of decision node removal: d is removed by maximization6.2.2 Other developmentsAfter Shachter’s algorithm, research on influence diagram evaluation has advancedin two directions: making use of Bayesian net evaluation methods, and exploitingseparability of the value function. In this section, we discuss the first direction. Wecome to the second direction in Chapter 8.Influence diagrams are closely related to Bayesian nets [69]. Quite a few algorithmshave been developed in the literature [36, 47, 69, 87, 107] for computing marginalprobabilities and posterior probabilities in Bayesian nets. Thus, it is natural to askwhether we can make use of these Bayesian net algorithms for influence diagramevaluation. This problem is examined in [11, 61, 86, 88, 90, 91, 108, 109], and theanswer is affirmative.CHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 99Recall that a decision function for decision node d in an influence diagram is amapping from to It is observed in [11, 86, 88, 108] that given a regular,no—forgetting influence diagram, the optimal policy can be computed by sequentiallycomputing the optimal decision functions for decision nodes, one at a time, startingfrom the last one backwards. The computation of the optimal decision function of adecision node is independent of those decision nodes that precede the decision node.Cooper [11] gives a recursive formula for computing the maximal expected valuesand optimal policies of influence diagrams. To some extent, the formula serves as abridge between the evaluation of Bayesian nets and that of influence diagrams.Shachter and Peot show in [88] that the problem of influence diagram evaluationcan be reduced to a series of Bayesian net evaluations. To this end, the value node ofan influence diagram is first replaced with an “observed” probabilistic utility node v’with frame {0, 1} and a normalized probability distribution. The optimal decisionfunction 6 for the last decision node d can be computed as follows: for eachelement e E ir(dn),= arg maxaçP{dfl = a, ir(d) = ev’ 1}.The optimal decision function 6 of the decision node d is computed after the optimaldecision functions 5i ..., S, have been obtained. The decision nodes d+1,are first replaced with their corresponding deterministic random—nodes in theinfluence diagram. The decision function 6j is then computed as follows: for eachelement e= arg maxaeQdP{d = a, K(d) = e6+i, 6,, v’ = 1}.The problem of influence diagram evaluation is then reduced to a series of problemsof computing posterior probabilities in Bayesian nets. Shachter and Peot [88] alsoCHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 100point out that an influence diagram can be converted into a cluster tree, which issimilar to the clique trees [47] of Bayesian nets, and that the problem of evaluatingthe influence diagram can thus be reduced to evaluating the cluster tree. A similarapproach has also been used by Shenoy for his valuation based systems [90] and byNdilikilikesha for evaluating potential influence diagrams [61].Zhang and Poole [108] propose a divide—and—conquer method for evaluating step-wise decomposable influence diagrams. This method is further studied in [109, 106].Like Shachter and Peot’s algorithm, Zhang and Poole’s method also deals with onedecision node at a time. Unlike Shachter and Peot’s algorithm, Zhang and Poole’smethod takes a reduction approach. Suppose node d is a leaf decision node of astepwise decomposable influence diagram I. r(d) separates the influence diagraminto two parts, namely a body and a tail. The tail is a simple influence diagram withonly one decision node (d). The body’s value node is a new node whose value function is obtained by evaluating the tail. A reduction step with respect to the decisionnode d transforms I to the body. The main computation involved in a reductionstep, however, is for evaluating the tail.Since the tail is a simple influence diagram with only one decision node, its evaluation can be directly reduced to a problem of computing posterior probabilitiesin a Bayesian net, as suggested in [88, 109]. The result of the evaluation consistsof two parts: a value function g’ : —> R, and an optimal decision function6d:— .The same reduction is applicable to the resulting body.6.2.3 Some common weaknesses of the previous algorithmsOne common weakness of the influence diagram evaluation algorithms that we havereviewed is that they fail to provide any explicit mechanism to make use of domainCHAPTER 6. REVIEW OF ALGORITHMS FOR ID EVALUATION 101dependent information (e.g., a heuristic functiou estimating the optimal expectedvalues of influence diagrams), even when it is available for some problems.Another notable and common shortcoming of these algorithms is inherited fromthe disadvantage of conventional influence diagrams for asymmetric decision problems. This was also observed in [85].To be represented by an influence diagram, an asymmetric decision problem mustbe “symmetrized.” This symmetrization results in many “impossible” informationstates (they have zero probability). The optimal choices for these impossible statesneed not be computed at all.Current algorithms for directly evaluating influence diagrams have two weaknesses,stemming from the symmetrization. The first one is that, for each decision node c/,they will perform a maximization operation conditioned on each information state ineven though the marginal probabilities of some states are zero. This weaknessarises from the fact that these algorithms compute the decision functions in the reverseorder of the decision nodes in the influence diagram. At the time of computing thedecision function for decision d, the marginal probabilities of the information states inare not computed yet. The second weakness is that optimal choices for a decisionnode are chosen from the whole frame d, instead of from the corresponding effectiveframes. Thus, it is evident that these algorithms involve unnecessary computation.In the next chapter, we develop an influence diagram evaluation method that avoidssuch unnecessary computation.Chapter 7A Search—Oriented AlgorithmIn this chapter, we present our method for influence diagram evaluation. We firstformulate influence diagram evaluation as a stochastic dynamic programming problem[80]. Then we give a graphical depiction of the computational structure of the optimalexpected value by mapping an influence diagram into a decision graph. Next, wepoint out how to avoid unnecessary computation in computing the optimal policyand propose a search—oriented approach to influence diagram evaluation. Finally, weanalyze how much our method can save in evaluating an influence diagram.7.1 PreliminariesA decision node d directly precedes another decision node d’ if d precedes d’ andthere is no other decision node d” such that d precedes d” and d” precedes d’. In aregular influence diagram, a decision node can directly precede at most one decisionnode.A decision node that is preceded by no other decision node is called a root decisionnode.Let I be a regular, stepwise decomposable influence diagram with a single value102CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 103node v. Suppose the decision nodes in I are d,, ..., d in the regularity order,then d, is the only root decision node and d is the oniy leaf decision node. Foreach k with 1 k < n, dk directly precedes dk+l. Let Ydk denote the downstreamof dk. Let I(dk, dk+l) denote the subgraph consisting of dk, lr(dk), lr(dk+,) andthose nodes in Ydk that are not rn—separated from dk+, by 7r(dk+,). Procedurally,I(dk, dk+,) can be obtained from I as follows: (1) remove all nodes that are m—separated from dk by 7r(dk), excluding the nodes in r(dk); (2) remove all descendantsof dk+,; (3) remove all barren nodes not in 7r(dk+,); (4) remove all arcs among nodesin 7r(dk) U {dk} and assign uniform distributions to the root nodes’ in lr(dk) U {dk}.I(dk, dk+,) is called the sector2 of I from dk to dk+l. The sector I(—, d,)contains only the nodes in 7r(d,) and those nodes that are not rn—separated from d1by i-(d,). The sector I(d,—) contains those nodes in 7r(d) U {d} and those nodesin the downstream Y of d.Note that the sector I(—, d1) is a Bayesian net. Furthermore, because I is step-wise decomposable, dk is the only decision node in the sector I(dk, dk+l) that is notin 7r(dk). Similarly, d is the only decision node in the sector I(d,—) that is not inr (d).As an example, consider again the oil wildcatter problem. The sector I(—, T) isempty. The sectors I(T, D) and I(D,—) are shown in Fig. 7.1.Let 6 = (8,, ..., &) be any policy for I. We have the following results on I.Lemma 7.1 For any j, k with 1 <j < k n, the set r(v) is independent of r(d)1The assignment of uniform distributions to the root nodes in lr(dk) U {dk} is oniy to makeI(dk, dk+1) a Bayesian network. Since we will only be considering probabilities conditioned onr(dk) U {dk} the distributions of those nodes are irrelevant.2Previously, such a part of an influence diagram has been called a section [109].CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 1040< 0(a) X(T, D) (b) I(D,)Figure 7.1: Two sectors of the influence diagram for the oil wildcatter problem.and d, given Tr(dk). Formally, the following relation holds:Ps{7r(v) = o7r(dk)= y} = Ps{ir(v) = oIlr(dk) = y, ir(d) = x, d3 = a}for any o E 1Z(u), y E l(dk), a and for any x consistent with y(i.e., the projections of and y on the common variable are the same).Proof Immediately follows from the rn—separation property of a stepwise decomposable influence diagram.Lemma7.2 For any k with 1 <k<n, and any oEL() and XEf(dk),F5{K(v) = o(dk) = x,dk = =P5{K(v) = o(dk) = x}.Proof. Recall that, with respect to a policy 6=(6, ...6), the decision node d,for i = 1, ..., n, is characterized by the following probability distribution:F{d = xI(d) = c} = 1 if 6(c)1 0 otherwise.Thus,P5{7r(v) = oJIr(dk) = x}CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 105= P5{r(v) = or(dk) x,dk = a}*P3{dk = a7r(dk) x}aEQ(dk)= Ps{r(v) = or(dk) = x,dk =Lemma 7.3 For any x E d1), the probability Ps{7r(di) = x} depends only onnodes in the sector I(—, d1), and is independent of 6. Consequently, for any otherpolicy 6’,P3{ir(di) = x} = Psi{ir(di) = x}.Proof. Since all nodes not in I(—, d1) are non—ancestors of the nodes in K(di), theyare irrelevant to the marginal probabilities of 7r(di). Since there is no decision nodein I(—, d1), then P3{K(di) = x} is independent of 6. 0Lemma 7.4 (1) For any o E x E d) and a E the conditionalprobability P3{ir(v) = oir(d) = x, d = a} depends only on those nodes in the sectorI(d,—), and is independent of 6. In other words, for any other policy 8’,Ps{K(v)= oIir(d) = x,d = a} = Ps{7r(v) = oir(d) = x,d = a}.(2) For any y E x E (dk) and a dk, the conditional probability F3{7r(dk+l) = y1r(dk) x,dk = a} depends on only those nodes in the sectorI(dk, dk+l), and is independent of 6. In other words, for any other policy 6’,P5{7r(dk+l) = yITr(dk) = x,dk = a} = Ps’fr(dk+l) = yir(d) = x,dk = a}.(3) Suppose 6’ = (8, ..., 8) is another policy for I such that 8 = 6, ..., =6k—1 for some k, 1 k<n, then, for any j, 1 <j < k, and any XE= x} = P51 {ir(d)=x}.CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 106Proof. Follows from the definition of sectors and the rn—separation property of astepwise decomposable influence diagram. CLemmas 7.3 and 7.4 indicate that some conditional probabilities in an influencediagram are independent of policies for the influence diagram, and can be computedin a sector of the influence diagram. The computation can be carried out by anywell—established algorithms for Bayesian nets. This fact facilitates a clean interfacebetween influence diagram evaluation and Bayesian net evaluation.7.2 Influence Diagram Evaluation via StochasticDynamic ProgrammingIn this section, we establish a stochastic dynamic programming formulation for influence diagram evaluation by studying the relationship among the conditional expectedvalues of influence diagrams.Let e be any event in 13 and let Es[vle] be defined as follows:Es[vje]= g(o)*P{7r(v)=oe}.OEr(v)For each k with 1 <k <n, let Uk be a function defined as follows.Uk(x,6) = E3{v7r(dk) = x] (7.1)Informally, Uk(x, 6) is the expected value of the influence diagram with respect topolicy 6, conditioned on lr(dk) = x.Lemma 7.5 The expected value of the influence diagram with respect to policy 6 canbe expressed in terms of Uk as:E3[v]= Uk(x,6)*P5{7r(dk)=x}.CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 107Proof By the definition of E6[v], we have:g(o)*P5{ir(v =o}.OE,r(v)SinceP5{ir(v) = o} = P5{K(v) = ojlr(dk) = x} * P5{K(dk) =2Jér(dk)thus,Ea[vJ = g(o) * P5{ir(v) = or(dk) = x} * Ps{lr(dk) = x}.OE(v) xE(dk)By changing the order of the two summations, we have:E3[v] = P3{rr(dk) = x} * g(o) *P3{7r(v) = oLlr(dk) = x}.XE2(dk) OEr(v)By the definition of Uk, we have:Uk(x,6)*P5{7r dk)=x}.xE2ir(dk)Lemma 7.6 The following relation between functions Uk and Uk_i holds.Ukj(x,6) = Uk(y,6)*Ps{lr(dk) = yllr(dk_l) = x}Yec,r(dk)for any xProof By the definition of Uk_i, we have:Uk_l(x,) = g(o) *P5{rc(v) = or(dk_1)=x}.oé2( v)CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 108SinceP3{7r(v) = or(dk_l) = x} =P3{r(v) o7r(dk_l) = x,Tr(dk) = y}*P5{7r(dk) = y1r(dk_) =YE2ir(dk)then by Lemma 7.1, we have:Uk_l(x,6) = g(o)* P{K(v) = o7r(dk) = y}*Ps{ir(dk) = y7r(dk_l) =x}.OEQ() YEir(dk)By reordering the two summations, we obtain:Uk_l(x,S) = Ps{ir(dj) = y[ir(dk_l) = g(o)*F3{7r(v) = oir(dk)=y}.YE7r(dk)By the definition of Uk, we have:Uk_1(x,6)= Uk(y,6)*P3{’lr(dk) = yir(dk) = x}.YEO,r(dk)Lemma 7.7 Let 6’ = (6, ..., 6) be another policy for I such that S 6k, ..., 6 =6,, for some k, 1 < k < n. Then, U(x, 6) = U(x, 6’) for each j, k <j <n andeach x EProof By induction.Basis: Consider U,,. By the definition of Un,, we have:U(x, 6) = g(o) * P5{7r(v) = oir(d)=x}°Eir(v)andU(x, 6’) = g(o) * Psi{K(v) = oJ7r(d) = x}.OEr(v)CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 109By Lemma 7.2, we have:P3i{ir(v) = oir(d) = x} =P3{ir(v) oir(d) = x,d = 6(x)}andP5{lr(v) = or(d) = x} =P5{Tr(v) = oir(d) x,d = 6(x)}.Since 6 = 6, then by Lemma 7.4-(1), we have:Fs{K(v) = or(d) = x,d = = Py{ir(v) = or(d) = x,d = 6(x)}.Thus, U(x, 6) = U(x, 6’). Therefore, the basis holds.Induction: Suppose U1(x, 6) = U(x, 6’) for all i, k < i n. By Lemma 7.6, wehave:U_i(x, 6) = U(y, 6) *P5{7r(d) = yir(d_1)= x}YE(d)andU_i(x,6’) = U(y,6’) *P3fr(d) = yI(d_i) = x}.By the induction hypothesis, we have:U(y,6) = U(y,6’).By Lemma 7.2, we have:P5{r(d) = yir(d_1)= x} = Fs{ir(d) = y[r(d_1)= x,d_1=61(x)}and= y(d_1)=x} = Psi{K(d) = y(d_i) = x,d_1 =Since 6_ = then by Lemma 7.4-(2), we obtain:P3i{ir(d) = yr(d_i) = x,d_1 = 6_i(x)} = Pô{r(d) = yr(d_i) x,d_1 =CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 110Thus,Ps{ir(d) = y7r(di) = x} = Pi{7r(d) = yr(d_1)=Therefore,U_i(x,S) = U_i(x,6’).Therefore, the lemma holds in general. LISo far, we have developed some results on the expected values of an influencediagram with respect to an arbitrary policy. Let us now examine the properties ofan optimal policy. Let 6° = (6?, ..., 6) be an optimal policy for influence diagram Iand let Vk be a function defined asVk(x) = Uk(x,6°).Intuitively, Vk(x) is the optimal expected value of the influence diagram I conditioned on 7r(dk) x. In other words, Vk(x) is the expected value a decision makercan obtain if he starts to make optimal decisions in the situation represented by theevent 7r(dk) = x.Let V(x, a) be an auxiliary function defined as:V(x,a) = g(y)*P8o{ir(v) =yir(d) = x,d = a} (7.2)YEO7r(v)V(x,a) = V+(y) *Pso{lr(dk+l) =yr(dk) = x,dk = a} (7.3)Intuitively, V(x, a) is the optimal expected value of the influence diagram I conditioned on Tr(dk) = x, dk = a. In other words, V(x, a) is the expected value a decisionmaker can obtain if he starts to make decisions in the situation represented by theevent lr(dk) = x, and first chooses a for dk (in this situation) and then follows anoptimal policy for the remaining decisions.The next two lemmas characterize the relationship between Vk and V,’.CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 111Lemma 7.8 For all k = 1, ..., n,V(x,6(x)) = Vk(x).ProofVk’(x, 6(x))= Vk+l(y)*Pso{K(dk+l) = yr(dk) = x,dk = 6(x)}YE-Rr(dk+l)= Vk+l(y)*Pao{K(dk+l) = yr(dk) = x} by Lemma 7.2vEO7(dk+l)=U(y,6°)*P6o{K(dk+l) = yr(dk) = x} by the definition of VkYEOir(dk+l)= Uk(x, 60) by Lemma 7.6= Vk(x) by the definition of Vk.ElLemma 7.9 For all k = 1, ...,V,’(x,8(x)) > V’(x,a) for each x E f(dk) and each a E 1dk•Proof Suppose the inequality does not hold. Thus, there exist x0 E Z(dk) anda0 E dk such thatV,’(xo,ao) > V,’(xo,6(xo)).Construct a policy 6’ = such that 6 = 6 for all i, 1 < < n, i kand 6(xo) = a0, and 6(x) = 6(x), for all x E (dk), x Xo. For any xany a E dk and any y E L(dk+l), by Lemma 7.7, we haveUk+1(x,6’) = Uk+1(x,6°) = Vk+l(x);CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 112by Lemma 7.3, we have= x} = PSo{7r(dk) =by Lemma 7.4, we havePs’fr(dk+l) = y’1r(dk) = x,dk = a} = Pso{lr(dk+l) = yir(dk) = x,dk = a}.We can prove E3 [v] > E3o by the following derivation:E5i[v]= Uk(x,6’)*PSI{lr(dk)=x}= ( Uk+1(y,6’)*P3l{7r(dk+l) = yITr(dk) = x}) *PsI{7r(dk) = x}XEO(dk) YQ1r(dk+l)= ( U(y,8’) *P6’{7r(dk+l) = yr(dk) = xo}) * Ps’{lr(dk) = XO}YE-1r(dk+l)+ ( U(y,6’) *P5{7r(dk+l) = yr(dk) = x}) *Fs1{r(dk) = x}XEr(dk),XXO YEO7(dk+l)= ( Uk+1(y,6°) * Ps’{7r(dk+l) = y7r(dk) = x0}) * Pso{lr(dk) = xo}yecr(dk+l)+ ( U+i(y,6°) *Psofr(dk+l) = y7r(dk) = x})*PSo{7r(dk) = x}XEO,r(dk),XO r(dk+l)= ( V(y) * = yr(dk) = x0, dk = S’(xo)}) * Psofr(dk) = x0}YEir(dk+l)+E5o[v]— ( V+(y) * Pso{lr(dk+l) = yr(dk) = x0}) * Pso{Tr(dk) = xo}=( Vk+l(y)*Pslfr(dk+l) =yr(dk) = xo,dk = ao})*PSo{7r(dk) = xo}+E5o[v]—( Vk+1(y) * PSo{lr(dk+l) = yr(dk) = xo,dk = 6o(xo)) * Pso{lr(dk) = x0}YEr(dk+l)= ( Vk+1(y) * Pso{7r(dk+l) = yr(dk) = xo, dk = ao}) * Pso{7r(dk) = x0}YE(dk+l)CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 113+E5o[v] — V(xo,6°(xo)) * Pso{7r(dk) = xo}= V(xo, ao) * P5o {K(dk) = x0} — V(xo, 6°(xo)) * F6o {K(dk) = x0} + E5o [v]= (V(xo, ao) — V(xo, 6°(xo)) * P80 {7r(dk) = xo} + F50 [v]> E8o[v]This contradicts the optimality assumption of 6°. ElThe results we have obtained so far are summarized by the following theorem.Theorem 7.10 Let I be a regular and stepwise decomposable influence diagram, letfunctions Vk and V be defined as before, and let 6° = (6, ..., 6) be an optimalpolicy for I. For any k with 1 <k <n, and x E (dk) and a E 1dk, the followingrelations hold:V(x,a) = Vk+l(y)*F{7r(dkl) = yr(dk) x,dk = a} (7.4)Vk(x) = Vk’(x,6(x)) = maxaeOdkVk(x,a) (7.5)= arg maxaecldk{Vk(x,a)} (7.6)E3o[v] = Vo(x) * P{ir(di) = x}. (7.7)where P{r(d1)= x} =P3o{7r(di) = x} can be computed in the sector I(—,d1) andP{7r(dk+l) = yj7r(dk) = x,dk = a} = Pso{7r(dk+l) = yr(dk) = x,dk = a} can becomputed in the sector I(dk, dk+l), both independently of 6°.Equations 7.4, 7.5 and 7.6 establish the computational structure of influence diagramevaluation in the form of finite—stage stochastic dynamic programming [80]. Theyessentially describe an expectation—maximization iteration for computing the optimalCHAPTER 7. A SEARCH-ORIENTED ALGORITHM 114policy and the optimal expected value. Equations 7.4 and 7.5 collectively form avariation of Bellman’s optimality equation [5] for stochastic dynamic programming.Theorem 7.10 shows that functions 14,..., V and 6, ..., 6, as well as E3o[v] canbe computed from 14. The computation process is similar to the one implied in therecursive formula given in [11]. For an influence diagram, the computation can beroughly divided into two parts: one for computing conditional probabilities in thesectors of the influence diagram and one for the summations and maximizations asspecified in Equations 7.4 and 7.5. The definition of 14 is given in Equation 7.2. Thecomputation of 14 can be computed from the sector I(d,—).As shown in the previous example, the sectors of an influence diagram can havesome overlap. Consequently, some redundancy may be involved in computing conditional probabilities in different sectors. This problem does not arise for smooth influence diagrams [109]. A non—smooth influence diagram can be transformed into anequivalent smooth influence diagram by a series of arc reversal operations [106, 109].Therefore, we can either deal with an influence diagram directly or first transform theinfluence diagram into a smooth influence diagram and deal with the smooth one.It is now fairly clear that the amount of computation involved in the above mentioned process is comparable to that involved in the other algorithms such as thosein [88, 108].In the above mentioned process, the value 6?(x) will be computed for each d1and each x E fZr(d1). Thus, unnecessary computation has not been avoided.CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 1157.3 Representing the Computational Structure byDecision GraphsIn this section, we use decision graphs to depict the computational structures of theoptimal expected values of influence diagrams. For each influence diagram, we canconstruct a decision graph and define a max—exp evaluation function for the decisiongraph in such a way that the solution graphs of the decision graph correspond to thepolicies for the influence diagram, and the optimal solution graphs correspond to theoptimal policies.Before we discuss how to construct decision graphs for influence diagrams, weneed some terminology.Let d be a decision variable in an influence diagram. For each x E Q(d), wecall assignments of the form 7r(d) = x a parent situation for the decision variable d.For each alternative a E d, we call assignments of the form ir(d) = x, d = a aninclusive situation for the decision variable d. Two situations are consistent if thevariables common to the two situations are assigned the same values.For an influence diagram, we define a decision graph in terms of situations. In thegraph, a choice node represents a parent situation and a chance node represents aninclusive situation. The following is a specification of such a decision graph.• A chance node denoting the empty situation is the root of the decision graph.• For each information state x E d1), there is a choice node, denoting theparent situation ?r(di) = x, as a child of the root in the decision graph. Thearc from the root to the node is labeled with the probability P{ir(di) = x}.• Let N be a choice node in the decision graph denoting a parent situation 7r(d) =x for some decision variable d and some x Let fd(x) be the effectiveCHAP TER 7. A SEARCH-ORIENTED ALGORITHM 116frame for the decision variable d in the information state x. Then, N hasfd(x) children, each being a chance node corresponding to an alternative3infd(x). The node corresponding to alternative a denotes the inclusive situationir(d) = x, d = a.• The node denoting an inclusive K(d) = x, d = a is a terminal in the decisiongraph, having value x, a).• Let N be a chance node denoting an inclusive situation ir(d_i) = x, d_1 = a,and let A be the subset of the parent situations for decision variable d whichare consistent with 7r(d_i) x, d_1 = a. Node N has A children, eachbeing a choice node denoting a parent situation in A. The arc from N tothe child denoting a parent situation K(d)=y is labeled with the conditionalprobability4 P{7r(d) = yir(d_i) = x, d_1 = a}.In such a decision graph, a choice node represents a situation of the form ir(d) = xfor some i and x E L(d). In such a situation, the decision agent needs to decidewhich alternative among fd (x) should be selected for d. Thus, the choice nodehas f (x) children, each for an alternative in fd (x). The child corresponding toalternative a is a chance node, representing the probabilistic state 7r(d) = x, d = a.From this probabilistic state, one of the information states of d+1 may be reached.The probability of reaching information state y is P{7r(dIi) yir(d) = x, d = a}.As an example, consider the oil wildcatter problem. The probability distributionsand the value functions are given in Tables 7.1—7.5. A complete decision graph for theoil wildcatter problem is shown in Fig. 7.2. As we know, a probability is associated3Note that here we are using the effective frame fd(x) instead of the frame 2d.4Note that probabilities of this kind on arcs are not computed unless necessary. This point willbe clear in Section 7.4.1.CHAPTER 7. A SEARCH-ORIENTED ALGORITHMTable 7.1: The function of the value node1177 D 0 CD Vno no ———— 0no yes dry 1 -40no yes dry m -50no yes dry h -70no yes wet 1 80no yes wet m 70no yes wet h 50no yes soaking 1 230no yes soaking m 220no yes soaking h 200T D 0 CD Vyes no 10yes yes dry 1 —50yes yes dry m -60yes yes dry h -80yes yes wet 1 70yes yes wet m 60yes yes wet h 40yes yes soaking 1 220yes yes soaking m 210yes yes soaking h 190Table 7.2: The conditional probability distribution of RT S R probno —— fobs 1.0no —— others 0yes—— nobs 0yes ns ns 1.0yes cs cs 1.0yes os Os 1.0with each arc from a chance node to a choice node. These probabilities can becomputed in the sector I(T, D) as shown in Fig. 7.1.In Fig. 7.2, those arcs without labels are associated with zero probability. Non—zero probabilities are computed as follows.P{Tyes,R=ns ITyes}= P{RinslTyes}= P{R=ns IThyes , S=ns} * P{S=ns}Table 7.3: The conditional probability distribution of CDCD prob1 0.2m 0.7h 0.1CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 118Table 7.4: The prior probability distribution of 00 pro6jdry 0.5wet 0.3soaking 0.2Table 7.5: The conditional probability distribution of S0 5 probdry ns 0.6Cs 0.1os 0.3wet ns 0.3Cs 0.3os 0.4soaking ns 0.1C5 0.5Os 0.4+P{RnsITyes,S=os} * F{Sos}+P{Rns ITyes,S=cs} * P{S=cs}= 1 * F{Sns} + 0 * F{Sos} + 0 * F{S=cs}= F{Sns}Similarly, we have:P{Tyes,RosIT=yes} = F{Sos}andF{Tyes, RcsITyes} = F{Scs}.The marginal probabilities of S are computed as follows.F{Sns} = P{SnsIOdry}*F{0=dry}—j- F{Sns I Dwet} * F{Dwet}+ P{Sns I D=soaking} * F{O=soaking}CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 119= 0.6 * 0.5 + 0.3 * 0.3 + 0.1 * 0.2 = 0.41Similarly, we can obtain that P{Sos} = 0.35 and P{Scs} = 0.24.yesT=yesRnS no[ yes.‘1=yesR0snor=yesRcS no::Figure 7.2: A complete decision graph for the oil wildcatter problemLet DG be a decision graph derived from a regular stepwise—decomposable influence diagram with a single value node. We can define an evaluation function, u1, onDG as follows:• If N is a terminal representing a situation 7r(d) = x, d = a, thenui(DG,N) = (x,a)CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 120• If N is a chance node with children N1 N1, thenui(DG, N) = p(N, N) * ui(DG, N)where p(N, N) is the probability on the arc from node N to node N.• If N is a choice node with children N1, ..., N1, thenui(DG,N) = max{ui(DG,N)}.The following lemma can be easily proved by induction on nodes in the decision graph.Lemma 7.11 (1) If N is a choice node representing a parent situation K(dk) =then ni(N) = Vk(x).(2) If N is a chance node representing an inclusive situation lr(dk) = x, dk = a,then ui(N) = Vk’(x,a).(3) If N is the root, then ui(N) is equal to the optimal expected value of theinfluence diagram.The correspondence between the optimal policies of the influence diagrams and theoptimal solution graphs becomes apparent. As a matter of fact, an optimal solutiongraph of the decision graph can be viewed as a representation of decision tables inwhich all the unreachable situations are removed [111].7.4 Computing the Optimal Solution GraphAs we discussed in Chapter 3, an optimal solution graph of a decision graph can becomputed either “bottom—up” or “top—down.” If a bottom—up approach is taken, thevalues of the leaves are computed first. The max—exp values of interior nodes canbe computed when the max—exp values of all children of the node are available. TheCHAPTER 7. A SEARCH-ORIENTED ALGORITHM 121max—exp value of a choice node denoting a parent situation ?r(d) = x is computedby a maximization operation ranging over the children of the choice node, each corresponding to an alternative in the effective frame fd(x). As a result of this operation,an optimal choice for the decision variable d is also determined for the informationstate x. The computational complexity of this process is linear in the size of thedecision graph5. This method cannot avoid computing optimal choices for decisionnodes with respect to impossible states.7.4.1 Avoiding unnecessary computationWe observe that the asymmetry of an influence diagram is reflected by arcs with zeroprobability in the corresponding decision graph. As we know, the value of a chancenode in a decision graph is the weighted sum of the values of its children. If theprobability on the arc to a child is known in advance to be zero, then there is no needto compute the value of the child (as far as this chance node is concerned). In casethe probabilities on the arcs to a choice node are all zero, the value of the node willnever be required. Thus, we need not compute the max—exp value of, and the optimalchoice for, the node. In this case, the probability of the information state denotedby the choice node is zero. Thus, it is equivalent to say we need not compute theoptimal choice for the corresponding decision variable for the impossible informationstate. One way to avoid such computation for the impossible states is by pruningthose choice nodes corresponding to impossible states. The following procedure forgenerating a decision graph for an influence diagram effectively realizes this objective:(1) generate the root node and put it into set G;5Note that the size of the decision graph is normally exponential in the size of the decision node’sparents. See the analysis in the next section.CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 122(2) if G is empty then stop;(3) pick a node N from G and set G to G — {N};(4) if N is a terminal node then go to (2);(5) generate the child set C(N) of node N; if N is a choice node, set C to G UC(N), otherwise let C’(N) be the subset of C(N) such that the probabilitiesof the arcs from N to the nodes in C’(N) are non—zero; set G to G U C’(N);(6) goto (2);The above procedure will not expand a choice node unless its probability is not zero.Thus, the procedure effectively ignores subtrees rooted at nodes corresponding toimpossible states. Thus, the computation (for computing optimal choices for choicenodes and for computing the conditional probabilities) involved in the subtrees istotally avoided.Consider again the decision graph for the oil wildcatter problem shown in Fig. 7.2.In the figure, those arcs without labels have zero probability. If we apply the aboveprocedure to the influence diagram of the problem, we obtain the simpler decisiongraph shown in Fig. 7.3. With this decision graph, we no longer need to computeoptimal choices for decision D with respect to impossible states such as Tno ,R=ns.Values for the other terminals can be computed locally in the sector I(D,—) (asshown in Fig. 7.1) as follows. The solution graph in Fig. 7.4 corresponds to the policyof no test and drill. The expected value of the influence diagram with respect to thepolicy is 40, which is the optimal expected value of the influence diagram. Thus, thesolution graph corresponds to an (the) optimal policy of the influence diagram.CHAPTER 7. A SEARCH-ORIENTED ALGORITHMFigure 7.3: A simplified decision graph401234040040no40Figure 7.4: The optimal solution graph for the oil wildcatter problemCHAPTER 7. A SEARCH-ORIENTED ALGORITHM 1247.4.2 Using heuristic algorithmsThe method just described effectively exploits asymmetry in decision problems. Better performance will be achieved if the algorithms presented in Chapter 3 are usedfor computing an optimal solution graph. By using these algorithms, some subgraphsmay not be processed at all.To use these algorithms, we need a heuristic function that gives admissible estimation on ui(s) for any situation s. Note that the notion of admissibility for aheuristic function here is different from the traditional one. Because we are maximizing merit instead of minimizing cost, we define a heuristic function to be admissibleif it never under—estimates for any situation s. Formally, a function h is admissibleif h(s) ui(s) for any situation s. An obvious admissible heuristic function for aninfluence diagram evaluation problem is the one returning +oo for each situation.Performance can be further enhanced if we can obtain a more informed heuristicfunction by using domain—dependent knowledge.7.4.3 A comparison with Howard and Matheson’s methodThe relationship between our method and Howard and Matheson’s should now beclear. In both methods, an influence diagram is first transformed into a secondaryrepresentation from which an optimal policy is computed. However, there are a fewnotable differences between the two methods.First, Howard and Matheson’s method works only for no—forgetting influence diagrams while ours is applicable to regular stepwise decomposable influence diagrams(we handle influence diagrams with multiple value nodes in the next chapter).Second, the sizes of the secondary representation generated by the two methodsare different. For a given influence diagram, the depth of the decision tree obtained byCHAPTER 7. A SEARCH-ORIENTED ALGORITHM 125Howard and Matheson’s method is equal to the number of variables in the influencediagram, while the depth of the decision graph obtained by our method is 2n, where nis the number of decision variables in the influence diagram. Typically, there are morerandom variables than decision variables in a decision problem, thus the depth of thedecision tree obtained by Howard and Matheson’s method is larger than the depth ofthe decision graph obtained by ours for the same influence diagram. Furthermore, thenumber of nodes in the decision tree obtained by Howard and Matheson’s methodfrom an influence diagram is exponential in the depth of the tree, but this is notnecessarily true for the decision graph obtained by our method. In fact, the numberof nodes in a decision graph obtained by our method is:1 + (dn)I +((d) + (d)I *Third, our method provides a clean interface to those algorithms developed forBayesian net evaluation.7.5 How Much Can Be Saved?In this section, we first give a general analysis of how much can be saved by exploiting asymmetry in a typical decision problem and then examine the used car buyerproblem. Since the number of optimal choices to be computed is a relative measureon the time an algorithm takes for evaluating an influence diagram, we compare thenumber of optimal choices to be computed by our method against other methods.7.5.1 An analysis of a class of problemsIn this subsection, we analyze our algorithm against the following generalized buyingproblem:CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 126Suppose we have to decide whether to buy an expensive and complex item.Before making the decision, we can have n tests, denoted by T, ..., T, onthe item. Suppose test T has k alternatives and j possible outcomesfor i = 1, ..., n.. Among the k alternatives for test T, one stands for theoption of no—testing. Correspondingly, among the i possible outcomesof test T, one stands for no observation, resulting from the no—testingalternative.An influence diagram for this problem is shown in Fig. 7.5. In this influence diagram,decision variable T has a frame of size k, including an alternative nt for not testing,and random variable R has a frame of size j, including an outcome nr for noobservation. Let H = The size of is flZH1. Thus, the decision Thas ll:H1 information states, the decision B has HL1 information states. If wedo not exploit the asymmetry of the problem, we will have to compute an optimalchoice for every decision and each of its information states. The total number isZn+1 i—ii= 1=1 1•Let us consider for how many information states our method will compute optimalchoices for each decision variable. To do so, we simply count the number of choicenodes in the decision graph (actually, decision tree) generated by our method fromthe influence diagram.Before counting, let us first make some notes on the conditional probabilities wewill use in the process of decision graph construction. During the process, we need theconditional probabilities F{7r(dj+i)l1r(d), d}, where d denotes T for i = 1, ..., nand d+1 denotes B, the purchase decision. First, because 7r(d+i) = K(dj)U{d1,R},we have:= = F{Rir(d),d}.CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 127Second, for any given information state y with 1 < i < n, the totalnumber of test—choice/test—result combinations is Among these combinations,some have zero probability, as illustrated by the partial decision tree shown in Fig.7.6, where shadowed boxes correspond to the zero probability combinations. Thenumber of zero probability condition/outcome combinations determines the degreeof asymmetry of the problem. Let Jjr(y) denote the set of possible outcomes of Rif the choice for T in situation ir(d) = y is ar E In terms of probabilitydistributions, this is equivalent to saying that, for all y é for all ar e andfor all x— Jir(y)F{R = x(d) = y,d = ar} = 0and for all y e and for all ar E= x[ir(d) = y,d = ar} = 1.X€Jir (y)Thus, the total number of non—zero probability combinations of test—choice/test—result, conditioned on 7r(d) = y, is bounded from above by J(y)j. Let h(y)1Jir(y) and h=maxh(y).At the most conservative extreme, we can assume that for any y E thereexist some a Q and some x E such thatP{R, = xr(d) = y,d = a} = 0.In this case, the total number of non—zero probability combinations of test—choices/test—results is bounded from above by kj — 1.A reasonable assumption we can make about a practical decision problem is that,for any decision variable, the choice of do—nothing will always lead to no result whileCHAPTER 7. A SEARCH-ORIENTED ALGORITHM 128a choice other than do—nothillg will always lead to some results. We call this assumption the “no action, no result” assumption. For the generalized buying problem, thisassumption means that no-test choice nt for decision node T will lead to no observation nr as the only possible outcome of R, and other test choices will not leadto no observation nr. The partial decision tree in Fig. 7.6 depicts this case, whereshaded nodes correspond to the states with zero probability. In terms of probabilitydistributions, we have:P{R = xw(d) = y,d = nt} = 0P{R = nrir(d)=y, d = nt} = 1for any x E x nr, and any y E andP{R = nrjir(d)= y, d = a} = 0for any a e Q, a nt, and y E In this case, the total number of nonzero probability combinations of test—choices/test—results is bounded from above by1 + (k — 1)(j — 1).Now, let us count the number of choice nodes in the decision graph that correspondto information states with non—zero probabilities. Based on the above analysis of theprobability distributions, we have the following: T1 has one parent situation; T2has at most h1 parent situations with non—zero probabilities; T2 has at most h12parent situations with non—zero probabilities; etc., and B has at most ll1h parentsituatiolls with non—zero probabilities. Thus, the total number of choice nodes in thedecision graph is bounded from above by i’ ll,: hi.Let p, = We call p, the “savings factor” with respect to decision T.Since h 1 + (k — 1)j H, we have p (kj)/(1 + (k — 1)j) 1.CHAPTER 7. A SEARCH-ORIENTED ALGORITHM 129Figure 7.5: An influence diagram for a generalized buying problemThe overall savings factor is given byn+1 HH1 fl 1H llp’— \‘fl+1 H1j — ‘fl+1 fl’h — 1I_.j1 1=1 “1 Li=1 1=1 1 L_j1 fl’.h1It is reasonable to assume that h1 > 2 for i = 1, ..., n. Then we have p> 1For example, suppose k = 4 (each test has three alternatives plus the alternativeof no test) and j = 4 for i = 1, ..., n. Then we have p, > 16/15 in the most conservative extreme, and p, > 16/10 under the “no action, no result” assumption. Then,the overall savings factor is bounded from below by (16/l5Y in the most conservativeextreme, and by (16//210 under the “no action, no result” assumption.In the above analysis, we assume that test T has exactly k alternatives in everyinformation state. In fact, the set of legitimate alternatives for test decision T inan information state .s is fT(s) and may have fewer than k elements. Thus, theactual overall savings factor could be much greater than Hp/2.Finally, let us note that the above analysis is applicable to any decision problem:as long as it satisfies the “no action, no result” assumption, exploiting asymmetryCHAPTER 7. A SEARCH-ORIENTED ALGORITHM 130yxR1ntd10LI0Figure 7.6: An illustration of the asymmetry of a decision variableCHAPTER 7. A SEARCH-ORIENTED ALGORITHM 131will lead to a savings factor that is exponential in the number of asymmetric decisionsin the decision problem.It is worth pointing out that an exponential savings factor for an exponentialproblem may not change the exponential nature of the problem. More specifically,suppose that problem P will take algorithm A O(c) time units and take algorithmB O(c/a) time units with a > 1. We say that algorithm B has an exponentialsavings factor in comparison algorithm A because algorithm B runs c? times fasterthan algorithm A. Although algorithm B is still an exponential algorithm, for a fixedtime resource, algorithm B may be able to handle larger problems than algorithm A.Let na and b be the size of the largest problems algorithms A and B can handle,respectively. Then a and n& satisfy:cTa= (c/a)orlogc— logclogaFor example, let us consider again the generalized buying problem. Suppose thateach decision in the problem has two alternatives and each random node has threeoutcomes, i.e., k = 2 and j = 3 for i = 1, .., n. It takes those algorithms that do notexploit asymmetry O((2*3)) time units and it takes our algorithm O(3) time units.In this case, we have c = 6 and a = 2. Thus, we have b = a log 6/log 3 = l.63Tla.Note that for the same a, the bigger the value of c, the smaller the ratio flb/fla.7.5.2 A case analysis of the used car buyer problemWhen applying our algorithm to the used car buyer problem, a decision tree as shownin Fig. 7.7 will be generated. In the tree, the leftmost box represents the only stateCHAPTER 7. A SEARCH—ORIENTED ALGORITHM 132in which the first test decision is to be made. The boxes in the middle columncorrespond to the information states in which the second test decision is to be made.Similarly, the boxes in the right column correspond to the information states in whichthe purchase decision is to be made. From the figure we can see that among thosenodes corresponding to the information states of the second test, all but two have onlyone child because the second test in the corresponding information states has onlya single legitimate alternative no—test. Making use of the framing function thisway is equivalent to six prunings, each cutting a subtree under a node correspondingto an information state of the second test. The shadowed boxes correspond to theimpossible states. Our algorithm effectively detects these impossible states and prunesthem when they are created. Each pruning amounts to cutting a subtree under thecorresponding node. Consequently, our algorithm does not compute optimal choicesfor a decision node for those impossible states. For the used car buyer problem, ouralgorithm computes optimal choices for the purchase decision for only 12 informationstates, and optimal choices for the second test for only 8 information states (amongwhich six ca be computed trivially). These constitute the minimal information stateset one has to consider in order to compute an optimal policy for the used car buyerproblem. This suggests that, as far as decision making is concerned, our methodexploits asymmetry to the maximum extent. In contrast, those algorithms that donot exploit asymmetry will compute the optimal choices for the purchase decision for96 (4 x 4 x 2 x 3) information states and will compute the optimal choices for thesecond test for 16 information states. The overall savings factor is 5.5.By applying the analysis results we obtained in the previous section to this problem, we have: H1 = 16, H2 = 6, h1 = 8, and h2 = 3. Thus, p = 2 and p2 = 2.p = (1 + 16+96)7(1+8+24) = 3.7.32.87133Figure 7.7: A decision graph (tree) generated for the used car buyer problem1.0282809-13--‘4Chapter 8Handling Influence Diagrams withMultiple Value NodesIn the previous chapter, we developed a method for influence diagram evaluation.We assumed that the influence diagrams were regular and had one value node. Aspointed out in [101], if the value function of an influence diagram is separable, thenthe separable nature can be exploited to increase the efficiency of influence diagramevaluation. The separability of a value function can be represented by multiple valuenodes. Thus, it is desirable to develop algorithms that can be used to evaluateinfluence diagrams with multiple value nodes.A generalization of Shachter’s algorithm [85] has been developed in [101] that canexploit the separability of value functions. Zhang and Poole’s algorithm [108] and thelater study [109, 106] is developed for influence diagrams with multiple value nodes.In this chapter, we generalize the method presented in the previous chapter so thatit applies to regular influence diagrams with multiple value nodes as well.134CHAPTER 8. HANDLING MULTIPLE VALUE NODES 1358.1 Separable Value FunctionsIf the value function of the value node in an influence diagram can be expressed asthe sum of two or more functions with fewer variables, we say the value function isseparable. More precisely, let g(Z) be a value function with variable set Z, let gi,., gq be functions with variable sets Zi ... Zq respectively, where Zi ... Zq areall proper subsets of Z. Function g is separable ifg(Z) =Consider again the oil wildcatter problem. The value node depends on four variables:T, D, 0 and CD. The value function can be separated into three parts: a functionon the cost of the seismic structure test, a function g, on the drilling cost, and afunction g3 on the value of the oil. Formally, this can be expressed asg(T, D, 0, CD)=gi(T)+g2( , CD)+g3( , 0).With this separation of the value function, the value node can be split into three valuenodes, v1, v2 and v3, with gi, g and g3 as their value functions respectively. Thisseparation results in a new influence diagram as shown in Fig. 8.1.The semantics of influence diagrams with multiple value nodes is the same as thatof influence diagrams with a single value node except that, for an influence diagramwith multiple value nodes, we interpret the sum of the values of all individual valuenodes as the value of the influence diagram. Therefore, all the terminology we haveused before can be used here for influence diagrams with multiple value nodes.CHAPTER 8. HANDLING MULTIPLE VALUE NODES 136Figure 8.1: A new representation of the oil wildcatter problem by an influence diagramwith multiple value nodes8.2 Decision graphs for influence diagrams withmultiple value nodes.As for influence diagrams with a single value node, the computational structure of theoptimal expected value of an influence diagram with multiple value nodes can also berepresented as a decision graph. The difference is that in the case of a single valuenode, the values are associated only with the terminals in the decision graph while inthe case for influence diagrams with multiple value nodes, values can be associatedwith arcs as well. In order to illustrate this, consider the influence diagram shown inFig. 8.1, which is an influence diagram with three value nodes for the oil wildcatterproblem. Since the value node v1 depends only on node T, its value can be determinedin any situation where the variable T is instantiated. Thus, at the nodes representingthe situations T=yes and T=no, the value of v1 can be determined. In particular, thevalue of g for the situation T=yes is —10 and the value of g for the situation T=nois 0. The value —10 can be viewed as the value resulting directly from performingthe test action, while the value 0 can be viewed as the value resulting directly fromperforming the no—test action.CHAPTER 8. HANDLING MULTIPLE VALUE NODES 137In order to deal with general cases, we introduce a new concept. Let K(dk) xbe a parent situation for dk, 7r(dk) z, dk = a be an inclusive situation for dkand let I(dk, dk+1) be the sector of I from dk to dk+1 . Withoilt loss of generality,suppose nodes vj, ..., vj are the value nodes in I(dk, dk+l). For i 1 j, letEI(dk,dk+l)[vi7v(dk) = x, dk = a] denote the expected value of the value node Vi,conditioned on 7r(dk) = x, dk = a, in sector I(dk, dk+1). We call the sum= x,dk = a]the value of the inclusive situation 7r(dk) = x, dk = a. This value can be computedby a method in [109]. Intuitively the value can be viewed as the utility directlyresulting from selecting a as the choice for decision node dk in the parent situationlr(dk) = x. Using the terminology for decision graphs, the value can be associatedwith the arc from the choice node representing the parent situation K(dk) = x to thechance node representing the inclusive situation lr(dk) = x, dk = a. The following isa new specification of the decision graph of an influence diagram with multiple valuenodes.• A chance node denoting the empty situation is the root of the decision graph.• For each information state x d1), there is a choice node, denoting theparent situation 7r(di) = x, as a child of the root in the decision graph. Thearc from the root to the node is labeled with the probability P{7r(di) = x}.• Let N be a choice node in the decision graph denoting a parent situation7r(d) = x for some decision variable d and some x E qd). Let fd(x) be theeffective frame for the decision variable d in the information state x. Then, Nhas fd(x)I children, each being a chance node corresponding to an alternativeCHAPTER 8. HANDLING MULTIPLE VALUE NODES 138in fd(x). The node corresponding to alternative a E fd(x) denotes the inclusivesituation ir(d) = x, d = a. The arc from the choice node to the chance nodedenoting the inclusive situation Tr(d) = x, d = a is labeled with the value of theinclusive situation.• The node denoting an inclusive K(d) = x, d = a is a terminal in the decisiongraph having value 0.• Let N be a chance node denoting an inclusive situation 7r(d_1)= x, d_1 a,and let A be the subset of the parent situations for decision variable d, whichare consistent with ir(d_i) = x, d_1 = a. Node N has A children, eachbeing a choice node denoting a parent situation in A. The arc from N tothe child denoting a parent situation K(d)=y is labeled with the conditionalprobability P{7r(d) = yr(d_1)= x,d_1 = a}.As an example, consider again the influence diagram as shown in Fig. 8.1. A decisiongraph for the influence diagram is shown in Fig. 8.2.Let DG be a decision graph derived from an influence diagram with multiple valuenodes, we can define an evaluation function, u2, on it as follows:• If N is a terminal, thenu2(DG,N) = 0• If N is a chance node with children N1, ..., N1, thenu2(DG, N) = p(N, N) *u2(DG, N)where p(N, N) is the probability on the arc from node N to node N.CHAPTER 8. HANDLING MULTIPLE VALUE NODES 139047.5-11-1040107.504040Figure 8.2: A decision graph for the influence diagram shown in Fig. 8.1CHAPTER 8. HANDLING MULTIPLE VALUE NODES 140• If N is a choice node with children N1, ..., N1 ,thenu2(DG, s) = max1{c(N, N1) +u2(DG, N1)}where c(N, N1) is the value on the arc from node N to node N1.The reader may have noticed that the decision graphs corresponding to the exampleswe have considered so far are decision trees. This need not be trne in general. Herewe consider another example whose decision graph has shared structnre.Consider the following variation of the oil wildcatter problem. In the previousexamples, we implicitly assumed that the amount of oil the oil wildcatter can obtainis equal to the amount of the oil underground. Now we replace this assumption bya more realistic one, namely, the amount of oil the oil wildcatter can obtain dependsalso on the equipment status. Thus, the oil wildcatter needs also to decide whether toupgrade his equipment. Furthermore, suppose the profit by selling oil also dependson market information and the sale policy. This more elaborate problem can berepresented by the influence diagram shown in Fig. 8.3.The amount of obtained oil can be zero, low, medium or high. The decisiongraph corresponding to the problem is shown in Fig. 8.4. The decision problemis asymmetric in the following sense: if the drill decision is yes, then the amountFigure 8.3: A variation of the oil wildcatter problemCHAPTER 8. HANDLING MULTIPLE VALUE NODES 141of obtained oil must not be zero, and if the drill decision is no, the amount of obtained oil must be zero. Therefore, some of the arcs to the nodes representing thesituations for oil—obtained are labeled with zero probability. After removing thesezero—probability arcs, the decision graph becomes the one in Fig. 8.5, which is indeeda graph (not a tree).A possible heuristic function for this problem can be defined as follows: For anynode N, if N represents a situation in which the drill decision is no, return zero,otherwise, return M where M is a large integer. Obviously, if M is large enough,this heuristic function is admissible. Suppose we use Algorithm DFS presented inChapter 3 with this heuristic function to search the decision graph in Fig. 8.5. If theupgrade=yesupgrade=noFigure 8.4: A decision graph for the influence diagram in Fig. 8.3CHAPTER 8. HANDLING MULTIPLE VALUE NODES 142Figure 8.5: A decision graph, with zero probability arcs removed, for the influencediagram in Fig. 8.3upgrade=yesCHAPTER 8. HANDLING MULTIPLE VALUE NODES 143algorithm searches the branch corresponding to drillyes first, then the subgraphunder the branch corresponding to drillno could be pruned altogether, providedthat, according to the actual numerical setting, it is profitable to drill. Furthermore,if we have a stronger heuristic function asserting that the optimal expected valuefor the node representing the situation drillno equals zero, then the lower half ofthe decision graph in Fig. 8.5 need not be expanded at all. As a matter of fact,when building a decision tree for the decision problem, one will use exactly the sameheuristic information to avoid expanding the branch corresponding to drillno.8.3 SummaryWe have developed a new method for influence diagram evaluation. The basic idea ofthe method is to transform an influence diagram into a decision graph in such a waythat the optimal policies of the influellce diagram correspond to the optimal solutiongraphs of the decision graphs. In this respect, our method is similar to Howard andMatheson’s original approach to influence diagram evaluation. However, our methodis more sophisticated and efficient than theirs.To the best of our knowledge, our method is the only one enjoying all of thefollowing merits simultaneously.(1) It is applicable to a class of influence diagrams that is more general than theclass of no—forgetting influence diagrams.(2) It provides an interface to algorithms for Bayesian net evaluation.(3) It can make use of heuristic search techniques and domain dependent knowledge.(4) It exploits asymmetry in decision problems.Part IIINAVIGATION IN UNCERTAINGRAPHS144145This part of the thesis is concerned with the problem of navigation in uncertainenvironments. We propose U—graphs (uncertain graphs) as a framework for uncertainenvironment representation. A U—graph is a distance—graph in which edge (arc)weights are not constants, but are random variables. We consider this problem fortwo reasons. First, the problem is of importance in its own right and there has beenin the literature a growing interest in it. Second, as we will show, such a navigationtask can be represented by a decision graph whose optimal solution graphs correspondto the optimal plans for the navigation task. Thus it provides an ideal applicationdomain for the decision graph search algorithms discussed in Chapter 3.In the next chapter, we first outline two domains where the problem of navigationin uncertain graphs arises and show how to represent uncertain environments by U—graphs, then define the problem of U—graph based navigation, and finally discusssome related work. In Chapter 10, we develop a decision theoretic formalization forthe problem of U—graph based navigation. In Chapter 11, we discuss two algorithmicapproaches to the problem: the off—line approach and the on—lime approach. In theoff—line approach, an agent computes a complete navigation plan for the task andthen follows the plan. The algorithms presented in Chapter 3 are used for computingcomplete plans. In the on—line approach, the agent needs to decide where to go nextin each encountered situation. We present a polynomial—time algorithm for on—linenavigation. We also give some experimental data on the performance of some off—linealgorithms and the on—line algorithm.Chapter 9U—graph based navigationIn this chapter, we introduce a problem of decision making under uncertainty navigation in uncertain environments. We first discuss some motivations to the problem,and then introduce U—graphs as a framework for representing uncertain environments.Next, we define the problem of U—graph based navigation. Finally, we discuss therelated work in this area.9.1 MotivationIn this section we outline two major domains where the problem of navigation inuncertain environments arises. One is autonomous agent navigation and the other iscomputer network communication.9.1.1 High—level navigation for autonomous agentsRobotics has been an active research area in the past decade. One of the ultimategoals of robotics research is to make robots capable of autonomous navigation inpractical and large environments. A large environment is an environment whosestructure is at a significantly larger scale than the observation range of the agent[42]. Practical environments often change over time. A central part of the navigation146CHAPTER 9. U-GRAPH BASED NAVIGATION 147system of a robot is a path planning component. The primary goal of the pathplanning component is to generate a description of a route which can guide the robotto the desired destination. However, due to the complexity and the uncertainty ofthe environment, it is unreasonable to expect the path planning component to havea priori knowledge of every relevant detail necessary to generate an executable planfor a given task. Consequently, the path planning component has to appeal to someperception components for obtaining information dynamically.In order to strike a balance between the use of prior knowledge and dynamicinformation, various kinds of hierarchical structure are commonly employed in mostnavigation systems [2, 13, 55, 58, 89]. In these systems, a distinction is made betweena high—level (global) path planner and a low—level (local) path planner (as shown inFig. 9.1).The basic motivation for making this distinction between the low—level and high—level path planners is to isolate a component that can make maximllm use of information known about the environment. Mitchell et al. [58] classifies path planningtechniques into three categories: deterministic, stochastic and servo control, basedon the nature of available information and the way in which a goal is decomposedinto subgoals. A deterministic planning process is one in which the available information is relatively static and is assumed to be complete. With complete data, adeterministic planner can generate a plan which bridges the gap between the initialstate and the goal state. A stochastic planning process is one in which the availableinformation is not complete, but may be augmented as a result of plan execution. Astochastic planning process typically consists of an infinite loop of three steps: planning a primitive subgoal based on available information, achieving the subgoal andperceiving the environment. A servo control planning process is characterized by itsCHAPTER 9. U-GRAPH BASED NAVIGATION 148Global Path PlannerSensoryLocal Path PlannerSystem_<tiveFigure 9.1: The block diagram of a typical autonomous navigation systemCHAPTER 9. U-GRAPH BASED NAVIGATION 149inherent use of feedback as a mechanism for control. Servo control processes may beused to achieve a goal which can be expressed in terms of a measurable state variable.In most navigation systems, the high—level path planner takes a deterministicapproach while the low—level path planner takes a stochastic approach. For a givennavigation task in an environment, the high—level path planner generates a planconsisting of an ordered set of subgoals based on a high—level representation, usuallycalled a global map, of the environment. In order to be relatively static and complete,the granularity of the global map must be coarse. Consequently, the granularity ofthe plan generated by the high—level path planner is also coarse. In order to carryout these subgoals, the low—level path planner needs to further elaborate them. Thiselaboration may involve a stochastic loop as described above.With this distinction, the low—level path planner, along with the executive, can beregarded as a highly reactive system with limited reasoning capability [1, 7, 13, 58, 63],but able to perform local navigation, while the high—level path planner provides thesource of global rationale, but is limited in its ability to react quickly to changes inthe environment [55, 58].In most navigation systems for autonomous robots, primary attention is focusedon the low—level path planner and the executive, while little attention is paid toproblems related to high—level path planning. This is largely dile to the fact that,based on current technology, the development of a quality agent that can navigatelocally is still a challenge to researchers and practitioners. The problem of high—levelpath planning is usually modeled as a variation of computing the shortest distancepath from a distance—graph which serves as a representation of the global map.However, a major drawback of using distance—graphs as global maps is that theuncertainty arising from practical environments cannot be modeled. For example,CHAPTER 9. U-GRAPH BASED NAVIGATION 150suppose we are in a region where traffic jams may occur along some roads. Thetraveling cost (time) along a road in the event of a traffic jam can be different fromthat without a traffic jam. Furthermore, we do not know in advance whether thereis a traffic jam along a road, though we can estimate the “probability” that such ajam may happen. As another example, we know that there is a route between twoplaces, and that the route may be blocked; we are not sure whether the route isblocked now, though we know that, according to past experience, the probability forthe route being traversable is about 0.8. Clearly, distance—graphs are not sufficient toprecisely model such situations, and it is highly desirable to take this uncertainty intoconsideration when we decide whether to go to a destination via an uncertain route.Therefore, some natural questions arise: how do we model this kind of uncertainty?how do we define the path planning problem? how do we characterize the quality ofa navigation plan? how do we compute a “good” plan for a navigation task in anuncertain environment?9.1.2 Packet routing in computer networksIn a computer network, a key issue for the network layer is how messages (packets)are routed from source nodes to destinations. This issue is referred to as the networkrouting problem.A commonly used approach for network routing is called packet switching [97, 100].In this approach, each packet is routed independently. A packet going from a sourceto a destination in a network is analogous to an autonomous agent traveling from thesource to the destination in an environment. Unlike an autonomous agent, a packet isnot able to decide where to go. Instead, whenever a packet arrives at an intermediatenode, the node is responsible for deciding where the packet goes next (which link itCHAPTER 9. U-GRAPH BASED NAVIGATION 151goes through or which neighbour it reaches next). This approach is used widely inmodern computer networks.A class of commonly used routing algorithms in the computer network communityis based on the concept of shortest distance paths (e.g., [54]). In these routing algorithms, a distance—graph is used to represent the topology of a network. In such agraph, nodes represent the computer nodes and arcs represent communication links.The weight of an arc from node A to node B represents the time that it will takefor a packet to be forwarded from node A to node B through the communicationlink represented by the arc. Based on this model, a simple shortest path algorithmis used to make an optimal decision for every node—packet pair. Furthermore, if thenetwork is static (i.e., neither the network topology nor the arc weights ever change),a fixed optimal routing table, indexed by all possible destinations, can be constructedfor each node. When a packet arrives at a node, the node needs only to look up thedestination of the packet in the table in order to decide through which link the packetshould be forwarded.Despite its simplicity, the routing model described above suffers from an obviousproblem arising from the impracticality of the stasis assumption. In a network, thetime for a packet to go through a link can be roughly divided into three components:the transmission time, the CPU processing time, and the time the packet has to waitin a buffer. Although the first two components are comparatively stable, the thirdcomponent varies depending on the traffic load in the network.A standard way to alleviate this problem is “routing—information updating” [100].That is, each node periodically measures the lengths of the queues of those arcsemanating from it and broadcasts this information to the whole network. Uponreceiving new routing information, each node updates its graph representation of theCHAPTER 9. U-GRAPH BASED NAVIGATION 152network and recomputes a new routing table based on the updated graph.Experience [6, 103] shows that a routing—information updating approach can resultin good network performance when the general traffic load in the network is light.However, when a network is under heavy traffic load, the solution does not workwell. Since the weight of each arc in the graph is measured at some earlier time, thedistance—graph represents only a “snapshot” of the network at a particular moment.As the traffic load becomes heavy, the waiting times for links tend to change quickly.Thus, the weight information in the graph tends to be out—of—date quickly. Therefore,the shortest paths based on the representation of a snapshot are likely different fromthe real shortest paths.One way to compensate for this deviation is to increase the frequency of routing—information updating. Unfortunately, this leads to new problems. First, frequentrouting—information updating requires frequent re—computation of routing tables,therefore, the requirement on CPU resource is increased. Second, since the routing—information is distributed across the network by “flooding,” frequent routing updatingconsumes an unacceptable portion of network bandwidth. This in turn leads to evenheavier traffic loads. A possible consequence of frequent updating can be a wildoscillation and performance degradation of the network [6, 103]. In practice, thefrequency of routing information updating is chosen to be quite low to ensure thestability of the network. For example, in ARPANET, the typical interval betweenrouting information updating ranges from 10 to 50 seconds.Another way to ensure that the weight information in the graph remains up—to—date for a longer time is to use probabilistic information. Since the traffic load in acomputer is dynamic, the lengths of the waiting queues for links change over time.Therefore, it is no surprise that constant estimates of the link delays will be out—of—CHAPTER 9. U-GRAPH BASED NAVIGATION 153date quickly. On the other hand, the changes of the queue lengths can often exhibitsome kind of probabilistic pattern (this is supported by queuing theory [37]). It seemsreasonable to use probability distributions to represent the dynamic changes of thequeues.If computation of the routing tables is based on probabilistic information, then theoptimality of the routing tables can be robust against changes in link delays. Routinginformation updating will not be necessary as long as the link delay changes conformstatistically to the corresponding probability distributions. Thus, it can be hopedthat the frequency of routing information updating could be substantially decreased(say, to a few times per day).As in the case of autonomous navigation, the immediate questions are: how dowe represent and use probabilistic information for network routing?In the next section, we propose a framework, called U—graphs, for uncertain environment representation. In Section 9.3, we illustrate how to use U—graphs to representuncertain environmeilts. The rest of this chapter and the chapters to follow are devoted to problems of U—graph based navigation. By U—graph based navigation, wemean the process of navigatillg in an uncertain environment represented by a U—graph. Although this abstract treatment is applicable to both “real” navigation innatural environments and to packet routing in a computer network alike (see [75]),we assume in the rest of this thesis that our application context is the former whenwe describe the intuitive meanings of various concepts.9.2 U—graphsA U—graph is an extension of an ordinary distance—graph. The weights of some edges(arcs) in a U—graph are not constants, but are random variables. Clearly, probabilisticCHAPTER 9. U-GRAPH BASED NAVIGATION 154information on the uncertainty with respect to travel costs in a natural environmentor link delays in a computer network can be represented in a U—graph.A U—graph can be either directed or undirected. In this thesis, we consider undirected U—graphs only, because (1) this makes the presentation less cumbersome; (2)undirected U—graphs are sufficient in most cases encountered in navigation; and (3)the techniques and the algorithms developed for undirected U—graphs can easily beadapted to directed U—graphs.If the weight of an edge is a random variable, the edge is referred to as an uncertainedge. Otherwise, the edge is referred to as an ordinary edge, or simply as an edgeif no confusion arises. An uncertain edge between vertices A and B represents theknowledge that there exists a connection between the locations denoted by the verticesA and B, and that the weight of the connection is not certain and is given by a randomvariable.A formal definition of a U—graph is as follows.Definition 9.1 A U—graph is quadruple (VE,U,P), where:• V is a finite set of vertices;• E is a set of ordinary edges over V with constant weights;• U is a set of uncertain edges over V whose weights are discrete non—negativerandom variables;• F specifies a joint probability distribution over U.This definition is a generalized version of the one presented in [76]. Let G =(V E, U, F) be a U—graph. For each u e U, let denote the set of possibleCHAPTER 9. U-GRAPH BASED NAVIGATION 155weight values of u. We call the frame of u. For each subset U’ U, let= FIuEU’ USince P is a joint distribution over U, for each subset C U, we can talk aboutthe marginal distribution over U’, which is defined as follows:P{U}.U-U,Suppose U is indexed such that U (Ui,..., Urn). The distribution P can beregarded as a mapping from Iu to [0, 1]. For any a = (a’, ..., a) E Iu, welet U = a denote the event that each uncertain edge ‘u takes weight a, for eachi = 1, ..., m, and let P{U = a} denote the probability of that event. Let G(a)be a graph obtained from G by assuming that uncertain edge u has weight a,i = 1, ..., m. G(a) is called a possible realization of G. The probability that G(a)is the actual realization is given by P{U = a}.The joint distribution P ca be given in various forms such as a table or a Bayesiannet. In the rest of this thesis, we assume that there are algorithms to computemarginal distributions P{U’} from a joint distribution for ay subset U’ C U, andto compute a posterior distribution from a joint distribution aild a piece of evidence.As an example, a U—graph is shown in Fig. 9.2. The solid lines are edges and thedotted lines are uncertain edges. Each edge has a positive weight. The frames of theuncertain edges are given as follows:== {10, 25} and== {20, 45}.The joint distribution is given in Table 9.1.This U—graph of Fig. 9.2 can be viewed as a representation of a set of locationsconnected in a ring structure by highways. The uncertain edges model those highwayswhich may be jammed. The uncertain edges carry the following meaning: It will cost10 (in some normalized unit) to traverse the highways denoted by .s and s2 if theyare not jammed and cost 25 if they are jammed; similarly, it will cost 20 to traverseCHAPTER 9. U-GRAPH BASED NAVIGATION 156Table 9.1: The weight distribution Ps s 53 54 prob10 10 20 20 0.009610 10 20 45 0.086410 10 45 20 0.002410 10 45 45 0.021610 25 20 20 0.014410 25 20 45 0.129610 25 45 20 0.003610 25 45 45 0.032425 10 20 20 0.022425 10 20 45 0.201625 10 45 20 0.005625 10 45 45 0.050425 25 20 20 0.033625 25 20 45 0.302425 25 45 20 0.008425 25 45 45 0.0756the highways denoted by 33 and .34 if they are not jammed and cost 45 if they arejammed.• ®•.•..Figure 9.2: An example U—graphIn a navigation process, the actual states of some uncertain edges may be discovered. We distinguish two ways for discovering the states of uncertain edges. One wayis to “buy” the information. The other way is to observe.A question related to information purchasing is: for which edges is informationavailable? and at what price, in what situation? The answer to this question dependson the concrete problem setting. Another related question is: if a piece of informationCHAPTER 9. U-GRAPH BASED NAVIGATION 157Table 9.2: The weight distribution F’ after observing s1 = 102 83 84 prob10 20 20 0.032010 20 45 0.288010 45 20 0.008010 45 45 0.072025 20 20 0.048025 20 45 0.432025 45 20 0.012025 45 45 0.1080on the actual state of an uncertain edge is available at some price in some particular situations, should an agent buy this information at the price? This problem isaddressed in Section 9.4.3.If the state of an uncertain edge is observable, the agent can discover the actualstate of the uncertain edge at no cost. A related question is which uncertain edgesare observable in what situations? It is plausible to assume that an agent can observethe state of an uncertain edge when it is at a vertex of the uncertain edge1. Of course,an uncertain edge may also be observable in some other situations. For example, whenan agent is at a location with high altitude, the uncertain edges in the nearby areamay be observable as well. However, this depends on the concrete problem settingand we treat this case as a special kind of information purchase.Whenever an agent obtains some information I regarding the actual state ofsome uncertain edges, a new U—graph is needed to represent the updated knowledgeabout the environment. The difference between the new U—graph and the old onelies only in their joint probability distributions. Thus, in order to derive the newU—graph, we only need to compute a posterior probability distribution F’ = F{.I}‘For the case of directed U—graphs, it is reasonable to assume that an uncertain edge is observablein the situation when the agent is at the tail of the arc (at the entry of a one way road). This isalso the case for network routing where it is assumed that a computer node knows the buffer stateof the outgoing channel.CHAPTER 9. U-GRAPH BASED NAVIGATION 158from the joint distribution and the new information I. If the joint distribution isrepresented as a Bayesian net [69j, then the computation can be carried out by anywell—established Bayesian net evaluation algorithm. The updating would be trivial ifthe weight variables are mutually independent.For example, suppose that we discovered that the actual weight of uncertain arc51 in the previous example is 10. Let I denote this evidence. Then, we need tocompute a posterior probability F’ = P{.I} such that F’{e} = F{eI} for anyevent e. The joint distribution F’ is given in Table 9.2.In Definition 9.1, we make an explicit distinction between edges and uncertainedges. One may immediately argue that this distinction is not necessary becausean edge is a special uncertain edge. This argument is valid. Theoretically, it is notnecessary to make such a distinction. We do this for two reasons. First, the number ofuncertain edges turns out to be a crucial complexity parameter — the time complexityof some of the algorithms to be developed is exponential in the number. Second, bymaking this distinction, we encourage the user to use ordinary edges where they aresufficient, in order to obtain better computational performance.In Definition 9.1, we make the following assumption.Assumption 1: The weight distribution of any uncertain edge in a U—graph isdiscrete.While this assumption facilitates our formulation of U—graph based navigation, itdoes not limit much the expressive power of U—graphs, since any continuous distribution can, at least in theory, be approximated by a discrete distribution.CHAPTER 9. U-GRAPH BASED NAVIGATION 1599.3 Representing Uncertain Environments in U—graphsThe usefulness of U—graphs can be demonstrated from two aspects. First, as a naturalextension of distance—graphs, U—graphs inherit the expressive power of distance—graphs, in addition to the capability of expressing a kind of uncertainty. Thus, U—graphs can be useful in situations where distance—graphs can be used.Second, we can think of various practical situations where distance—graphs areinsufficient and U—graphs can be useful. We illustrate this below by considering someexamples in high—level navigation.The essential knowledge that a path planner needs to have about an environmentfor high—level path planning is the environment’s topological structure, which canbe represented by a set of “interesting places”, and the connectivity relation amongthese places. The places can be abstracted as vertices, and the connectivity relationabstracted by edges in a U—graph. If it is uncertain how much it costs an agentto traverse the route between two places, this uncertainty can be represented by anuncertain edge. Here are some examples.Example 1: Fords.Suppose an agent is at place A in an environment as shown in Fig. 9.3-(a) andwants to reach place F. In this environment there are two fords (BD and CE) alonga river. Suppose that the agent is not sure whether the fords are traversable or not,though it knows the joint probability for the traversability state of the two fords.Suppose further that the agent knows the distances between the place pairs (thenumbers in the figure). Based on all this information, we can construct a U—graphas shown in Fig. 9.3-(b) to model the environment, where the directed graph is aCHAPTER 9. U-GRAPH BASED NAVIGATION 160Bayesian net representing the joint distribution of the weights of the uncertain edges.sis2Figure 9.3: Modeling an uncertain environment by a U—graphExample 2: Traffic Jams.Consider the situation shown in Fig. 9.4. There may be a heavy traffic jam, or alight traffic jam along segment AB of a highway; and the probabilities for the eventsof heavy traffic jam, light traffic jam and no traffic jam along the segment areP2 and P3 (P1 + P2 + p3 = 1), respectively. Let c1, c2 and c3 be the costs that anagent needs to spend to go through the segment in light of the events heavy trafficjam, light traffic jam and no traffic jam, respectively. This segment can be modeledby an uncertain edge s with weight distribution that takes value c1, c2 or c3 withprobabilities P1, P2 and p3 respectively.-Figure 9.4: A segment of road that may have traffic jams(a)(b)Example 3: Rooms connected by doorsCHAPTER 9. U-GRAPH BASED NAVIGATION 161In a typical indoor environment, doors are a source of uncertainty since they maybe locked from time to time. This kind of environment can be modeled by U—graphsas well. As an example, a simple map of the LCI lab area in the old building ofthe computer science department of UBC is shown in Fig. 9.5. Suppose this map isconstructed for a mobile robot in the lab whose responsibility is to bring coffee fromthe coffee machine in Room 300. For each given task, the robot must go to the coffeemachine from the lab and then come back with coffee. To accomplish such a mission,the robot needs to plan a return route to Room 300. Since the doors shown in themap are not always open and the robot is unable to go through a locked door, therobot should take into account the uncertainty arising from these doors.For high-level path planning, the map can be represented by the U—graph shownin Fig. 9.6, where each uncertain edge models a door in the map. The frame of eachuncertain edge consists of two values: the small one representing the effort needed togo through the corresponding door when it is open, and the larger one representingthat a locked door is not traversable.Example 4: Choke Regions.Linden and Glicksman [49] partition an environment into three kinds of regions:passable regions, impassable regions, and choke regions. A region is passable if theagent concerned can travel through it, and is otherwise impassable. A choke region isa relatively narrow traversable region between impassable regions, where some kind ofblockage (hard or soft) that cannot easily be detected remotely may happen. Roadsthrough dense forests and passes through mountains are examples of choke regions.Clearly, if the probabilities of choke regions being traversable can be estimated in someway, the uncertainty induced by choke regions can be readily modeled by uncertainedges.CHAPTER 9. U-GRAPH BASED NAVIGATION 162Figure 9.5: A simple map of the LCI lab areas9 slOs5s4coffeeFigure 9.6: A U—graph representing the LCI lab areaCHAPTER 9. U-GRAPH BASED NAVIGATION 163An example environment is shown in Fig. 9.7, where the dark areas are impassable regions and those narrow areas between the dark areas are choke regions. Thetopological structure of the environment can be depicted by a U—graph like the oneembedded in the figure.Figure 9.7: A map of an environment with choke regions9.4 U—graph Based NavigationWhen we refer to a navigation task, we assume that we are given a U—graph representation of an environment. We do not consider the problem of how to obtainCHAPTER 9. U-GRAPH BASED NAVIGATION 164a U—graph representation. There is a large body of research on map acquisition byexploring environments (e.g., [14, 22, 41, 57]).9.4.1 Distance—graph based navigation vs. U—graph basednavigationThere are several differences between distance—graph based navigation and U—graphbased navigation. First, the plan structures are different. Since it is assumed thatno contingent event occurs during distance—graph based navigation, the navigationplan generated for a distance—graph based navigation task is a simple path in thedistance—graph between the start vertex and the goal vertex. On the other hand, forU—graph based navigation, since it is impossible to predict the actual weight of anuncertain edge in advance, some contingent events may occur (e.g., a plan specifiestraversal an uncertain edge only if its weight is iess than some constant). A plan for aU—graph based navigation task specifies what to do in some or all of the contingenciesencountered during the course of a traversal.Second, the interactions between the executives and the environments are different. For distance—graph based navigation, the executive need not interact with theenvironment. However, for U—graph based navigation, the executive needs to examine the status of an uncertain edge when it is at a vertex of the edge. Based on theexamination result, the environment model (i.e., the U—graph) is updated and thenext move is selected.Third, the interactions between path planners and the executives are different. Indistance—graph based navigation, the path planner computes a shortest path betweenthe start vertex and the goal vertex and passes the description of the path to theexecutive. The executive simply follows the path with no interaction with the pathCHAPTER 9. U-GRAPH BASED NAVIGATION 165planner. But in U—graph based navigation, if the executive encounters a situation forwhich no action is specified in the plan, the executive has to appeal to the path plannerfor a new plan. The practice of interleaving planning and execution is usually calledreactive planning [27]. For the extreme case where the plans generated by the plannercover all the contingencies possibly encountered during the course of navigation, noreplanning is ever needed. Schoppers called these plans Universal Plans [84]. In thisthesis, we call these plans “complete plans.” We will discuss the computational issuesof navigation both in the paradigm where a path planner is used to generate completeplans and in the paradigm where a path planner is used to perform reactive planning.9.4.2 The optimality issueAs we mentioned in the previous section, the plan generated for a given distance—graph based navigation task is a simple path between the start vertex and the goalvertex in the distance graph. Thus the optimality criterion for distance—graph basednavigation is simply to minimize the path distance. On the other hand, for U—graphbased navigation, some contingencies may arise. It is impossible to predict the exactnavigation trace for a given task. Therefore, the optimality criterion for distance—graph based navigation is not applicable to U—graph based navigation. Actually,U—graph based navigation can be thought of as a game against nature [65]. At anymoment, the concerned agent faces some uncertainty about the actual weights ofthose uncertain edges, and needs to decide where to go next. Thus the actual trace(and the actual cost) of a navigation cannot be determined based purely on the plan,but is also dependent on the actual state of the environment.An important question about U—graph based navigation is what optimality criteriato use. In the literature, three criteria [4] have been used for this kind of problem:CHAPTER 9. U-GRAPH BASED NAVIGATION 166minimizing the competitive ratio [94], minimizing the worst—case cost and minimizingthe expected cost.In this thesis, we use the criterion of minimizing the expected cost for U—graphbased navigation. However, the formalization we will develop is general enough forother optimality criteria as well. We will discuss this in Section 10.4.In order to illustrate the optimality criterion of minimizing the expected cost, letus consider a navigation task in an environment represented by the U—graph shown inFig. 9.8-(a) where the weight distribution of the uncertain edge takes values d4 andM with probabilities p and (1 — p) respectively, where M is a very large number.Intuitively, this uncertain edge denotes a segment of route that may be traversablewith probability p. Suppose that an agent is at vertex A and is asked to go tovertex B.There are two plans for the task2. The first one is to go to B through edge AB.Another one is as follows: go to C first; if the weight of CD is d4, go to D then toB; otherwise go back to A, then go to B through edge AB. The expected cost for thefirst plan is d1, the weight of edge AB. The expected cost for the second plan is givenby the following formula:d2+p*(d4d3)+(1—p)*(d2+di).The agent can choose between the two options based on the expected costs.9.4.3 The possibility of information purchaseIn the previous example, we assumed that the status of an uncertain edge can bedetermined when an agent reached either vertex of the uncertain edge. In many2Actually, there are many “silly” plans such as repeatedly going back and forth between node Aand C for an arbitrary number of times and then to node B. Here, we are not interested in thesesilly plans and just ignore them.CHAPTER 9. U-GRAPH BASED NAVIGATION(a)(b)Figure 9.8: A simple path planning case with uncertainty167d2 d3pd4/M10030200.610CHAPTER 9. U-GRAPH BASED NAVIGATION 168situations, the status of an uncertain edge can be determined remotely, possibly atsome cost. For example, there may be a telephone service that can tell an agentwhether or not there is a traffic jam along some road segment.If we assume that an agent can “buy” information on the actual status of one ormore uncertain edges, a natural problem arises: “at what price should an agent buya particular piece of information?” In order to illustrate this problem, consider againthe navigation situation shown in Fig. 9.8-(a). Suppose that the actual status of theuncertain edge is available to the agent at cost c. Then, what is the range of cost csuch that the agent would be willing to buy the information?As we know, if the agent does not buy information about the status of the edge,the minimal expected cost is:min{di,d2+p*(d4+d3)+(1 —p)*(d2+d1)}.On the other hand, if the agent buys the information at cost e, the expected cost is:c+p* (d2 + d4-j- d3) + (1 —p) *d1.Thus, the agent should buy the information if and only if:c+p*(d43)+(1—p)*di<min{ i,d((1_p)*(d}Orc < rnin{p(di— d2 —d4 —d3),2(1 —p)d2}.For the situation shown in Fig. 9.8-b, the agent should buy the information if andonly if c < 24.CHAPTER 9. U-GRAPH BASED NAVIGATION 1699.5 Related Work9.5.1 Related work in path planningA number of approaches to path planning and navigation in uncertain environmentshave been proposed and many navigation systems for mobile robots have been built(e.g., [2, 13]). However, in most of these systems, attention is focused primarilyOil the problem of how to make a robot capable of moving around in relatively smallenvironments. Some notable exceptions are Kuipers’s TOUR model [41], Kuipers andLevitt’s COGNITIVE MAP [42] for the representation of spatial knowledge of largescale environments, and Levitt and Lawton’s qualitative approach [48] to navigationin large scale environments.The emphasis of Kuipers’s TOUR model is on its power to express and assimilateknowledge gradually acquired over time. The emphasis of Levitt and Lawton’s workon qualitative navigation is on techniques for an agent to locate itself relative to thegoal position and various remote landmarks. The emphasis of our work on U—graphbased navigation is, however, on the precise formulation of optimal navigation inenvironments with uncertainty, as well as algorithmic solutions to optimal navigation.Dean et al [15] describes a navigation system that makes use of utility theoryin navigation planning. They emphasize the coordination of task—achieving activitiesand map—building activities so as to efficiently accomplish a group of navigation tasksin an uncertain environment. In contrast, our planner is concerned primarily withfinding optimal plans for accomplishing given U—graph based navigation tasks.Linden and Glicksman’s work on contingency planning [49] is also concerned withcomputing optimal plans for navigation in large and uncertain environments. Theyuse uncertain grids to represent uncertain environments. A cell of an uncertain gridCHAPTER 9. U-GRAPH BASED NAVIGATION 170has value 0 or 1 or is a binary random variable. A cell with value 1 means the pointis free; a cell with value 0 means the point is occupied. A cell with a binary randomvariable means that it is uncertain whether the point is free or not, and the actualstate of the point is determined by the random variable. A grid is partitioned intothree regions: passable regions, impassable regions and choke regions. It is assumedthat all the cells in a passable region have value 1, all the cells in an impassableregion have value 0, and the values of all the cells in a choke region are determinedby random variables. Linden and Glicksman propose a path planning algorithm forthis model. The algorithm is an extension of A*.Mobasseri has studied the problems of path planning in uncertain environmentsfrom a decision theoretic point of view [59]. He also uses uncertain grids to representuncertain environments, and formalizes the problem in a dynamic programming style.9.5.2 Canadian Traveler ProblemsPapadimitriou and Yannakakis [66] first named the problem of traveling with uncertainty the Canadian Traveler Problem (CTP). In their formulation, a traveler is givenan unreliable graph (map); some of the edges of the graph may disappear at certaintimes. They also assume that the traveler cannot know whether an edge is actuallythere unless he/she reaches an adjacent vertex of the edge, and that the status of anedge will not change after being revealed. The problem is to devise a travel plan thatresults in optimal travel (according to some predefined measure) from one vertex toanother.Papadimitriou and Yannakakis define their optimality criteria in terms of competitive ratios. Competitive ratios are used in the literature to measure the qualityof on—line algorithms [51, 94]. The competitive ratio of an on—line algorithm withCHAPTER 9. U-GRAPH BASED NAVIGATION 171respect to a problem can be informally defined as the ratio of the performance of theon—line algorithm to the best performance that may be achieved for any problem instance consistent with the given problem and with the information discovered duringthe operation [4]. Papadimitriou and Yannakakis show that devising an on—line travelplan with a bounded competitive ratio is PSPACE-complete. They also show thatthe problem of devising a plan that minimizes the expected ratio, provided that eachedge in a graph has associated with it a given probability of being present, remainshard (#P—hard and solvable in polynomial space).Bar—Noy and Schieber have studied several interesting variations of the CTP [4].One variation is the k—Canadian Traveler Problem, which is a CTP with k as theupper bound on the number of blocked roads (disappeared edges). They give arecursive algorithm to compute a travel plan that guarantees the shortest worst—casetravel time. The complexity of the algorithm is polynomial for any given constant k.They also prove that the problem is PSPACE-complete when k is non—constant.Another variation Bar—Noy and Schieber have studied is the Stochastic RecoverableCTF. In this problem, it is assumed that blocked roads can be reopened in certaintime. They present a polynomial algorithm for devising a travel plan that minimizesexpected travel time, under the assumption that for any edge e in a given graph, therecover time of e is less than the weight of any edge adjacent to it in the graph. Itis unclear how to relax this assumption.Polychronopoulos studies in his Ph.D. thesis [71] a problem called Stochastic Shortest Distance Problem with Recourse (SSDPR). There are two common aspects between his study on the SSDPR and our study on U—graph based navigation problem.First, the problem statement, is very similar. In a SSDPR, an agent is asked to reacha goal vertex from a start vertex in an uncertain graph. It is assumed that the actualCHAPTER 9. U-GRAPH BASED NAVIGATION 172weight of an uncertain edge is determined only when an adjacent vertex is visited forthe first time. The problem is to find a navigation plan minimizing expected cost.Second, Polychronopoulos also considers off—line and on—line navigation paradigms,and develops algorithms for both paradigms.There are four major differences between Polychronopoulos’ study and ours.First, Polychronopoulos considers two versions of the problem, one with and another without an independence assumption on the weight distributions of uncertainedges. In the version without the independence assumption, an uncertain graph isrepresented by a graph along with the set of possible realizations of the edges in thegraph. In contrast, our formulation treats both cases uniformly. In both cases, a U—graph is represented by a graph and a joint probability distribution over the uncertainedge set. The joint probability distribution can be given in various forms such as aBayesian net. Within our framework, the case with the independence assumption isnaturally a special case of the one without the independence assumption.Second, our algorithms are different. His off—line planning algorithm is a dynamicprogramming version while ours are search oriented. In the development of our algorithms, we emphasize the possibility of pruning the non—optimal part of the searchspace both at problem formulation stage and at plan computing stage. For on—lineplanning, our algorithm is more sophisticated than his. In his on—line planning algorithm, the weight of a substituting edge is simply the mean value of the weightdistribution of the corresponding uncertain edge. Thus, only the information localto an uncertain edge is taken into account in computing the substituting weight forthe uncertain edge. However, in our on—line algorithm, in addition to the local information, the goal position and the global structure of the U—graph of the givennavigation task are also taken into account in computing the substituting weights forCHAPTER 9. U-GRAPH BASED NAVIGATION 173uncertain edges. Initial simulation results show that our on—line algorithm results inbetter navigation quality.Third, Polychronopoulos discusses navigation problems with both directed andundirected graphs, while our attention is focused on problems with undirected graphs,by noting that algorithms developed for undirected graphs can be adapted to directedgraphs.Fourth, Polychronopoulos presents some theoretical results about the complexityof the problem. In particular, he shows that SSDPR is #—P hard and is solvable inP-SPACE.9.5.3 Related work in AT and decision theoryIn some sense, our study on U—graph based navigation can be viewed as an applicationof decision theory to planning. In the literature, much work has been reported onapplications of decisioll theory to AT problems. A common characteristic of theseapplications is that a given problem is viewed as a decision making problem andone is interested in a “good” or optimal solution, instead of an arbitrary solution tothe problem. These applications include general planning [9, 16, 29, 46], diagnosticreasoning [72, 73], reactive systems [24, 28] and domain specific problem solvingsuch as the monkey and bananas problem [23], the blocks world [39] and navigationproblems [15, 59, 60]. Our work on U—graph based navigation belongs to the lastgroup, i.e., applying decision theory to the domaill of autonomous navigation.Chapter 10A Formalization of U—GraphBased NavigationIn this chapter, we present a decision theoretic formalization for U—graph based navigation. With this formalization, a U—graph based navigation problem is representedby a decision graph whose optimal solutions represent the optimal plans. This formalization reduces the problem of computing optimal plans for navigation tasks toa decision graph search problem, for which we have developed various algorithms.Moreover, the formalization is general enough to deal with various variations of U—graph based navigation.In the previous chapter, we made an assumption about the weight distributionsof U—graphs. Before starting our formalization, we make two additional assumptionsfor U—graph based navigation.Assumption 2: The agent can determine the status of an uncertain edge of aU—graph when and only when it is at either vertex of the uncertain edge.This assumption rules out the possibility of “buying” information on the statusof uncertain edges. The sole motivation for making this assumption is to simplify ourpresentation. Our formulation can easily be extended to handle decisions by agents174CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 175to buy or decline information. We come to this point in Section 10.2.Assumption 3: The status of any uncertain edge does not change after beingrevealed.This assumption is crucial to the formalization and the algorithms to be developedin this and the next chapters. An immediate consequence of this assumption is thatwhen the agent determines the actual weight of an uncertain edge, the uncertain edgecan be regarded as an edge with this weight.With Assumptions 2 and 3, we can precisely specify how an agent updates aU—graph after observing the states of some uncertain edges. Let G = (V E, U, F)be a U—graph. Suppose the agent has just arrived at vertex v E V. Let U(v) ={ uj, ..., u} be the set of the uncertain edges adjacent to v. Then, according toAssumption 2, the agent can observe the weights of these uncertain edges and computea new U—graph based on the observation. Suppose the observation is that the actualweights of uncertain edges u1,..., u are w1, ..., wk respectively. Let a = (wi, ..., wk).We use U(v) = a to denote the observation. The new U—graph G(a) = (17, E, U, F’)is the same as G except the joint distribution is the posterior distribution F’ obtainedfrom F based on evidence U(v) = a. More precisely, F’ is a distribution over Usuch that F’{X} = F{XU(v) = a} for any event X about the uncertain edges inU.Let= ]JueU(v) R For each element a , G(a) is one of the possible newU—graphs that the agent may have after the observation. The probability that G(a)is the actual one is given by the marginal probabilityF{U(v) = a} = F{u1 = w1,..., Uj = wk}.CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 17610.1 PreliminariesLet G = (V, E, U, P) be a U—graph. An uncertain edge u E U is deterministic ifthere exists a value w e Q such that P{u = w} = 1. An uncertain edge is randomif it is not deterministic.Let Ud = {u E Uju is deterministic} and UT = {u E Uu is random}. U° andUT partition U.For each v E V, let UT(v) = {u E UTu is adjacent to v}. UT(v) is the set ofrandom edges adjacent to vertex v. Let vT = {v E VUT(v) ç} and Vd = V—VT.V” is the set of vertices to which at least one random edge is adjacent and Vd is theset of vertices to which no random (uncertain) edge is adjacent.If the agent comes to a vertex v in VT, it can discover the actual weights of therandom edges in UT(v). Thus, the vertices in VT are called information collectingvertices.For each u U, we say w is a possible realization of u if w E Let u° =min{w E f2P{u = w} > O} and u° = max{w e P{u = w} > O}. We call u°the optimistic realization of u and u’ the pessimistic realization of u. Notice thatu is deterministic if &‘ = u°.An induced graph of G is a variation of G in which each uncertain edge has a possible realization as its weight. Note that an induced graph is a distance graph. Eachelement a e u determines an induced graph, denoted as G(a), of G. The probability that G(a) is the actual realization of G is given by the marginal probabilityF{U = a}.The pessimistic induced graph G of G is the induced graph of G in which theweight of each uncertain edge is equal to the corresponding pessimistic realization.CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 177The optimistic induced graph G° of C is the induced graph of G in which the weightof each uncertain edge is equal to the corresponding optimistic realization.We use configurations to represent the situations that an agent may encounterduring navigation. A configuration is a quadruple (G,v,v,c), where G is a U—graph representing the current knowledge that an agent has about the environment;v and v9 are two vertices of the U—graph representing the current position and thegoal position, respectively; c is a parameter representing the cost that the agent hasincurred so far when it reaches this configuration. For a given navigation task of goingfrom v3 to Vg in G, the configuration (G, v, v,, 0) is called the initial configuration.A terminal is a configuration (C, V, ‘vs, c) such that the shortest distance from v tov, in G is equal to that in G°. Obviously, (C, Vg, v9, c) is a terminal.These concepts about induced graphs can be generalized to configurations. Wesimply refer to the pessimistic (or optimistic) induced graph of the U—graph of aconfiguration as the pessimistic (or optimistic) induced graph of the configuration.A configuration C = (G, V, V,, c) is called an uncontrolled configuration if v isan information collecting vertex (i.e., v VT). A configuration C = (C, v, v9, c) isa controlled configuration if v is not an information collecting vertex (i.e., v V’).When an agent is in a non—terminal controlled configuration C = (C, v, v9, c)with k (k > 0) neighbours v1,..., vk, the agent has the freedom to move to any oneof these neighbours. Should the agent decide to move to a particular neighbour, theagent incurs cost equal to the weight of the edge between the current vertex and theneighbour’. Such a move leads to a new configuration. In order to model such moves,we define k controlled transitions, each corresponding to a move to one of the k‘In case there are multiple edges between the two vertices, the cost is equal to the minimum ofthe weights. In the discussion to follow, we assume that there is at most one edge connecting twovertices, without loss of generality.CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 178neighbours. We use traris(C, t) to denote the configuration to which the transitionI can lead from configuration C. Formally, we have:trans((G,vc,vg,c),t) = (G,v’,vg,c+c(t))where I is the transition corresponding to neighbour v’, and c(t) is the cost oftransition I, equal to the weight of edge (vs, v’). Since the agent chooses whichtransition will happen, these transitions are called controlled transitions.When in a non—terminal uncontrolled configuration C = (G, v, Vg, c) with Ur(V) ={ u, ..., ujk} (k > 0), the agent can observe the states of the random edges in Ur(V)and update the U—graph based on the observation. For each a E fZUr(vc), G(a) isa possible new graph and (G(a), v, Vg, c) is a possible next configuration. Depending on the actual state of the random edges in Ur(vc), the current configuration willchange to one of the possible next configurations. Such configuration changes incur nocost. To model such configuration changes, we define a set of uncontrolled transitionsfor configuration C, each corresponding to a configuration change from C to one ofits possible next configurations. We refer to these transitions as uncontrolled transitions because which transition will actually happen is not controlled by the agent,but determined by the probability distribution over the possible next configurationsof the uncontrolled configuration. The probability distribution is the same as that ofthe new U—graphs. More specifically, the probability that C’ = (G(a), v, Vg, c) is theactual next configuration is P{Ur(V) = a}. This probability is associated with thetransition from C to C’.With these definitions, we can think of U—graph based navigation as a sequence ofdecision making and configuration transitions. When an agent is in a non—terminalcontrolled configuration, it can select a controlled transition and then reach a newconfiguration. The new configuration can be either a terminal, an uncontrolled con-CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 179figuration or a controlled configuration. In the case that the new configuration is acontrolled configuration, the agent can repeat this transition selection. If we takea longer perspective on the transition selection, we observe that, from a controlledconfiguration, the agent will eventually either come back to the same configuration orreach an uncontrolled configuration or reach a terminal by taking a sequence of controlled transitions. The first possibility corresponds to a case where an agent takes acircular trip without discovering any new information. Obviously, any navigation planresulting in this kind of behavior must not be optimal. Thus, we want to exclude suchcircular behaviors. To do so, we use composite transitions to model “non—circular”controlled transition sequences and ignore the “circular” transition sequences.A sequence of controlled transitions is called a composite transition if it leadsto the destination v9 or to an uncontrolled configuration. Formally, a sequence ofcontrolled transitions T = t, ...,t with k > 1, is a composite transition fromcontrolled configuration C = (G, v, rig, c) to configuration C’ = (G, v’, v,, c’) if C’ =trans(...trans(C, t1),..., tj) and if v’ = Vg or C’ is an uncontrolled configuration.We use ctrans(C, T) to denote the configuration resulting from taking the compositetransition T in C. For the above case, we have ctrans(C, T) = C’. The cost for acomposite transition T, denoted by c(T), is defined as the sum of the costs of theconstituent transitions of T. Intuitively, starting from any controlled configuration,the next intermediate vertex the agent will reach is either the goal vertex or aninformation collecting vertex. Note that, for a given controlled configuration C anda configuration C’, there may exist zero, one or many composite transitions fromC to C’. A composite transition from C to C’ is optimal if it has the least cost.A configuration C’ is called a composite successor of a controlled configuration Cif there is a composite transition from C to C’. From now on, whenever we referCHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 180to a successor of a controlled configuration we mean a composite successor of theconfiguration. Similarly, whenever we refer to the transition set associated with acontrolled configuration, we mean the set of optimal composite transitions of theconfiguration.The composite successor set of a controlled configuration C = G, v, Vg, c) can becomputed as follows. Let V, = V U {ug}. V is called the set of candidate verticesfor the next move. If the shortest path from v to a vertex v E V in the pessimisticinduced graph G’ contains no vertex in V, other than v, then v is a vertex theagent can reach next by a (composite) transition. Vertex v is called a next vertexof C. Formally, let V’ denote the set of all next vertices of C and let d3(G, v, v)denote the shortest distance from v, to v in G’. The successor set of C is givenby:{(G,v,vg,c+ds(G’,vc,v))v V’}.The cost of the composite transition from C to (G, v, v2,c+d3(G°, v, v)) isd3(G, v, v).This successor set can be computed by Dijkstra’s shortest distance algorithm [21].Configuration C’ is reachable from configuration C if C’ is a successor of Cor if there exists a configuration C” such that C” is a successor of C and C’ isreachable from C”. The configurations reachable from the initial configuration alongwith the successor relationship among them form a directed graph. Such a graph iscalled the representing graph, and can be specified as follows.• The initial configuration is in the representing graph... If a non—terminal controlled configuration C is in the graph, so are all of its(composite) successors, which form its children in the graph. The arc from C toeach child configuration C’ corresponds to the (optimal composite) transitionCHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 181from C to C’ and is labeled with the cost of the transition.• If a non—terminal uncontrolled configuration C is in the graph, so are all ofits successors, which are the children of C. The arc from C to each childconfiguration C’ corresponds to the uncontrolled transition from C to C’ andis labeled with the probability of C’.We call such a graph the repre.enting graph of the navigation because it representsall the possible courses of the navigation.Lemma 10.1 The representing graph of a U—graph based navigation task is acyclic.The longest path in the representing graph is bounded by 2k+2 where k = min{rn, n},m is the number of uncertain edges and n is the number of vertices in the U—graph.Proof. The representing graph is acyclic due to the following four facts: (a) achild of a controlled configuration is either a terminal or an uncontrolled configuration,a child of an uncontrolled configuration is either a terminal or a controlled configuration; (b) a terminal has no child; (c) the number of random edges in a controlledconfiguration is equal to the number of random edges in any of its successors; (d) thenumber of random edges in an uncontrolled configuration is strictly greater than thatin any of its successors. Because of facts (a) and (b), controlled configurations anduncontrolled configurations must be interleaved along any path C1, ..., C of lengthi > 2. Without loss of generality, let us assume that i is even and C2,C4, ..., C areall uncontrolled configurations and C1,C3, ..., C_1 are all controlled configurations.Let u(C) denote the number of random edges in configuration C. According to facts(c) and (d), we have u(C) > u(Ci) and u(C) > u(G+2), for j = 1,...,i—2.Suppose the lemma is false, then there must be a cycle in the graph containing atCHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 182least one uncontrolled configuration C. Thus, we can find a path C1, ..., C suchthat G1 = C2 = C, and ji + 2 J2 < i. Therefore, we have u(C) = u(C1),u(C) = u(C2), and u(C1) > u(C2), a contradiction.From the above analysis, we can conclude that the longest path in the representing graph is bounded by 2m + 2. Furthermore, note that any two uncontrolledconfigurations along a path must have different current vertices, thus there are atmost n uncontrolled configurations along any path. Therefore, the longest path inthe representing graph is also bounded by 2n + 2. CBy the above lemma, we know that the representing graph of a navigation is aDAG with the initial configuration as the root of the graph.As an example, let us consider the U—graph shown in Fig. 10.1 in which CD isthe only uncertain edge. The weight of CD is d4 with probability p and M withprobability (l—p) where M is a very large number. Suppose an agent is asked to reachvertex B from vertex A in the U—graph. Note that if d1 d2+d34,then the initialconfiguration is a terminal. Suppose d1 > d + d3 + d4. The representing graph ofthis task is shown in Fig. 10.2, where solid boxes represent controlled configurations,dotted circles represent terminals, solid circles represent uncontrolled configurations,arcs represent transitions, the labels for controlled transitions are their costs and thelabels for uncontrolled transitions represent their probabilities. In the figure, node 1is the initial configuration, node 2 is the configuration when the agent reaches vertexB through edge AB, node 3 is the (uncontrolled) configuration when the agent reachesvertex C through edge AC.The representing graph of a navigation task represents the possible traces of thetask. In a representing graph, a controlled configuration represents a situation wherethe agent can choose the next configuration, while an uncontrolled configuration rep-CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 183Figure 10.1: A simple U-graphdl2i-pFigure 10.2: The representing graph of a navigation taskCHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 184resents a situation where “nature” chooses the next configuration during a navigationprocess. Thus, a representing graph can be regarded as a decision graph with the controlled configurations as choice nodes and the uncontrolled configurations as chancenodes. With this analogy, a solution graph of a representing graph can be viewed asa representation of a navigation plan for the corresponding navigation task. Furthermore, the plan is complete in the sense that it covers all the possible configurationsan agent may encounter if the agent follows the plan. The evaluation function of thedecision graph will be defined in an appropriate way to reflect our optimality criterionof minimizing the expected cost. Thus, the path planning problem for U—graph basednavigation is, for any given navigation task, to compute an optimal solution graph ofthe representing graph of the task.10.2 Modeling Information PurchaseWhen information on the status of some random edges is available to the agent atsome cost, the agent must decide in each controlled configuration whether or not tobuy the information. This option can be modeled by a special (controlled) transition.Taking the transition will incur a cost equal to the price of the information and resultsin an uncontrolled configuration whose children can be determined according to thepossible states of the uncertain edges. To illustrate this, consider again the task ofgoing from vertex A to vertex B in the U—graph shown in Fig. 10.1. Suppose theagent can determine the state of uncertain edge CD at cost c. The representinggraph of this task is shown Fig. 10.3, where the initial configuration has one morechild (node 6 in the figure) than the one in Fig. 10.2. The new child results from thespecial transition modeling the option of information purchase. It has two children,one corresponding to the case where CD has weight d4 and the other correspondingCHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 185to the case where CD has weight MFigure 10.3: The representing graph of a navigation task with the information purchase option10.3 The Expected Costs of U—graph Based Navigation PlansIn order to define the expected costs of navigation plans, we need to define a costfunction on the solution graphs. The cost function should, for each solution graph,return the expected cost an agent incurs if the agent follows the plan correspondingto the solution graph. Let X denote a cost function with two parameters: a solutiongraph and a configuration in the solution graph. X can be recursively defined asfollows.Id3(G,v,v9) if C is a terminal,X(sg, C) = X(sg, C’) + c(C, C’) if C is a controlled configuration,1. X(sg, C) * q(C) if C is an uncontrolled configurationwhere G is the pessimistic induced graph of the configuration C, d3(G’, v, vg)denotes the shortest distance from the current vertex v to the goal vertex Vg in G’,q(C) denotes the probability of the successor C3 of the uncontrolled configurationC, and C’ denotes the successor configuration of C in sg. Intuitively, X(sg, C) isCHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 186the expected cost an agent incurs if the agent follows the plan corresponding to thesolution graph starting in configuration C. The above definition can be understoodas a formal expression of the following intuition, If C is a terminal, the agent cancomplete the task by following the shortest path from the current vertex to the goalvertex in the pessimistic induced graph; if C is a controlled configuration, X(sg, C)is equal to the sum of the cost of the transition from C to C’ prescribed by the planfor configuration C and the expected cost the agent incurs starting in C’; if C is anuncontrolled configuration, X(sg, C) is the average of the costs that the agent incursstarting in C’s successor configurations.The representing graph shown in Fig. 10.2 has two solution graphs as shown inFig. 10.4, corresponding to two possible plans for the corresponding navigation task.Plan (a): go to B through edge AB.Plan (b): go to C first, if CD’s weight is d4, go to B via D, otherwise, go back toA then to B through edge AB.ri(a) (b)Figure 10.4: Two solution graphs of a navigation taskThe expected costs for Plans (a) and (b) are: d1 and2-I-p(d4+d3)+(1—p)(d +d1), respectively.CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 187In order to express the optimality criterion, we can define an evaluation functionX’ for representing graphs by extending the definition of function X. The definitionof function X’ on a representing graph rg is as follows.I d3(G’, v, v) if C is a terminal,X’(rg, C) = miri{X’( g, C) + c(C, C3)} if C is a controlled configuration,( j X’(rg, C) * q(C) if C is an uncontrolled configuration.Therefore, the problem of computing an optimal plan for a navigation task is reducedto computing an optimal solution graph of the representing graph of the task with X’as the evaluation function. This problem can be solved by applying the algorithmspresented in Chapter 3. We discuss this issue in the next chapter.10.4 Other VariationsSo far we have presented a formalization for U—graph based navigation with respect tothe optimality criterion of minimizing the expected cost. Our formalization is generalenough to deal with other optimality criteria and/or other variations of the problem.We explore this aspect in this section.10.4.1 Minimizing the competitive ratioPapadimitriou and Yannakakis [66] define optimality criteria for the Canadian Traveler Problem (CTP) in terms of competitive ratios. The competitive ratio of a planwith respect to a problem instance is informally defined as the ratio of the cost ofthe plan to the minimum cost that may be achieved for the problem instance. Thecompetitive ratio of a plan is the maximum of the ratios of the plan with respect to allpossible problem instances. Intuitively, a plan with competitive ratio r guaranteesthat for any problem instance, the cost of the plan is bounded from above by r timesthe minimal cost for the same problem instance.CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 188We define a function r to precisely express the competitive ratio of a plan.c(C)+ds(G°,vc,vg). c t 1ds(G0,us,vg) i is a erminar(sg,C) = max{r(sg,C)q(C) > O} if C is an uncontrolled configuration,r(sg, C) if C is a controlled configurationwhere d8(G°, v, v) denotes the shortest distance from v, to Vg in G°, the optimisticinduced graph of configuration C. Note that in the second clause of the abovedefinition, the maximization is over oniy those successors with non—zero probabilities.To express the optimality criterion of minimizing the competitive ratio, we extend thedefinition of r into an evaluation function r’ for the representing graphs as follows.D(C)+ds(G°,vc,v)•f c 1d(G°,u8,ug) i is a erminar’(rg, C) = maxj{r’(rg, C)q(C) > O} if C is an uncontrolled configuration,mirlj{r’(rg, C3)} if C is a controlled configuration.Note that the representing graph with this evaluation function is a minimax graph.An optimal solution graph of a representing graph with r’ as its evaluation functionrepresents an optimal plan minimizing the competitive ratio.10.4.2 Minimizing the expected competitive ratioSimilarly, we define a function re to express the expected competitive ratio of a plan.The function is defined as follows.c(C)+ds(G°,vc,vg) f C 1ds(G0,vs,vg) i is a erminare(.sg, C) r(sg, C) * q(C) if C is an uncontrolled configuration,r(sg, C) if C is a controlled configuration.To express the optimality criterion of minimizing the expected competitive ratio, weextend the definition of re into an evaluation function r for the representing graphsas follows.c(C)+d(G° Vc V ) . .if C is a terminald3(G ,Vs,Vg)r(rg, C)= j r(rg, C) * q(Cj) if C is an uncontrolled configuration,minj{r(rg, C3)} if C is a controlled configuration.CHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 189An optimal solution graph of a representing graph with r as its evaluation functionrepresents an optimal plan minimizing the expected competitive ratio.10.4.3 Minimizing the worst case costThe quality measurement of plans in terms of worst case cost can be given as afunction c defined on solution graphs.c(C) +d8(G°, v, v9) if C is a terminal,cw(sg, C) = maxj{cw(sg, C)q(C) > O} if C is an uncontrolled configuration,L. cw(sg, C) if C is a controlled configuration.To express the optimality criterion of minimizing the worst case cost, we extend thedefinition of c into an evaluation function c for representing graphs as follows.I c(C) +d3(G°, v, v9) if C is a terminal,c(rg, C) = maxj{c(rg, C)q(C) > O} if C is an uncontrolled configuration,( min{c(rg, C) if C is a controlled configuration.Again, the representing graph with this evaluation function is a minimax graph.10.4.4 Reaching one of the goal verticesSuppose that an agent is given a U—graph G, a start vertex and a set V of goalvertices and is asked to reach any one of the goal vertices in /. The optimalitycriterion is to minimize the expected cost.This variation can be reduced to a problem with a single goal vertex. To do so,we construct a new U—graph G’ which is the same as G except that all the verticesin are “collapsed” into a single vertex Vg. The original problem is equivalent tothe problem of going to vertex Vg fl G’.10.4.5 Reaching multiple goal verticesSuppose an agent is asked to visit all of the goal vertices in 1’, (in any order),instead of just one of them. In order to deal with this variation, we need to modifyCHAPTER 10. A FORMALIZATION OF U-GRAPH BASED NAVIGATION 190our formalization.First, the definition of configurations needs to be changed as follows: a configuration is a quadruple (G, v, V, c) where G, v and c have the same meanings asbefore and V is a set of goal vertices yet to visit.Next, the definition of terminals needs to be changed as follows: a terminal is aconfiguration (G, v, {}, c).Third, the definitions of the composite successors and the composite transitionsof controlled configurations are redefined as follows: Let C = (G, v, V, c) be acontrolled configuration. Let V = V U 1’. V is called the candidate vertex setfor the next move. If the shortest path from v,, to a vertex v E V, in G containsno other vertex in V than v, then v is a vertex the agent can reach next by a(composite) transition. Vertex v is called a next vertex of C. Formally, let V’denote the set of all next vertices of C and let d8(G, v, v) denote the shortestdistance from v to v in G. The successor set of C is given by:{(G,v,V— {v},c+d3(G12, ,v))v E V’}.Finally, the function returning the expected cost of a plan is defined as follows:1 0 if C is a terminal,Y(sg, C) = j Y(sg, C) * q(C) if C is an uncontrolled configuration,1. Y(sg, C’) + c(C, C’) if C is a controlled configuration.Similarly, we can extend Y into an evaluation function Y’ for the representinggraphs.Chapter 11Computational Issues of U—graphBased NavigationThis chapter is concerned with computational issues of U—graph based navigation.We first discuss off—line and on—line paradigms, and then present algorithms for U—graph based navigation in both paradigms. Finally, we present some experimentaldata obtained for these algorithms.11.1 Planning ParadigmsA plan for a U—graph based navigation task covers some or all contingencies whichmay arise during navigation. If a plan covers all possible contingencies for a navigationtask, it is called a complete plan. The solution graphs of the representing graph of anavigation task represent complete plans.U—graph based navigation can be carried out either in an off—line paradigm or inan on—line paradigm. In the off—line paradigm, the planner computes a complete planfor each navigation task. After computing such a complete plan, the planner will notbe invoked during the navigation process.In the on—line paradigm, the planner and the executive can be considered as191CHAPTER 11. COMPUTATIONAL ISSUES 192co—routines. At any point during a navigation process, the current U—graph andthe current position, together with the goal position and the cost incurred so far,constitute a configuration. The planner acts as an “oracle,” telling the agent whereto go next. Once the agent reaches an information collecting vertex, the agent canupdate the U—graph according to the actual status of the random edges incident fromthe vertex. As the result of the updating, a new configuration is reached. Note that aconfiguration here is essentially a controlled configuration. The following procedure,Navigate, captures this kind of interaction between the executive and the planner.NavigateInput: C: a U—graph v3 and v9: a pair of vertices P: an on—line plannerOutput: the amount spent for the task1. Set vc = v3 and c = 0.2. If Vc = Vg,stop with cost c.3. Call P with arguments G, v, and Vg to obtain a plan.4. Follow the plan until either reaching vg or reaching an information collectingvertex v.5. Set v=v.6. Increase c by the amount spent on following the plan.7. Update G based on the new information on the uncertain edges adjacent to vand go to step 2.CHAPTER 11. COMPUTATIONAL ISSUES 19311.2 Computing Complete PlansIn the previous chapter, we showed that a U—graph based navigation can be represented by a decision graph. A solution graph of the decision graph is a completerepresentation of a navigation plan. Computing a complete plan for a navigationtask amounts to computing the corresponding solution graph from the decision graph.Thus, we can use the algorithms presented in Chapter 3 for off—line planning.In order to apply those algorithms, we need to define heuristic functions to estimate minimum expected costs for configurations.For a given navigation task, let rg denote the representing graph with the functionX’ defined in Section 10.3 as its min—exp evaluation function. One possible heuristicfunction, denoted f, is defined as follows:f(C) = ds(G°,Vc,Vg)where G° is the optimistic induced graph of configuration C. Intuitively, f(C) is theshortest distance between the current position and the goal position in the optimisticinduced graph of configuration C. Thus f(C) X’(rg, C) for any configuration C.Therefore, f is admissible.11.2.1 Some experimental dataIn this section, we present some experimental data obtained when applying some ofthe algorithms discussed in Chapter 3 to U—graph based navigation. Our primaryobjective for carrying out these experiments is to compare the performance of thosealgorithms.CHAPTER 11. COMPUTATIONAL ISSUES 194Problem InstancesIn our experiments, we consider two classes of U—graphs. The U—graphs in Class 1 arerandomly generated from grids (as shown in Fig.11.1) with parameters d1, d2, P1,P2 and two reference numbers r1 and r2. Here, d1 and d2 are two positive integersspecifying the numbers of rows and columns of the grids; 0 < < 1 specifies theprobability that a connection (either an ordinary edge or an uncertain edge) existsbetween any pair of neighbouring vertices on the grids; P2 specifies the probabilityof a connection, if it exists, being an uncertain edge. The reference numbers r1 > 1and r2 > 1 are used to generate uniform distributions from which the weights of aU—graph are generated. A U—graph of this kind is an abstraction of the road layoutof a city. We assume that the weight distributions of all uncertain edges are binaryand independent of one another.Given parameters d1, d2, Pi, P2, r1 and r2, a random U—graph is generated asfollows. First, d1 x d2 vertices are generated, arranged in a d1 by d2 grid. Next, withprobability Pi, a connection is generated between each pair of neighbouring vertices.Third, each connection is marked as an uncertain edge with probability P2 and asan ordinary edge with probability 1— P2 Finally, the weights for the connectionsof the graph are generated as follows. For each edge, we first randomly generate twonumbers a1 and a2; a1 is drawn from a uniform distribution between 1 and r1,and a2 from a uniform distribution between 1 and r2. Then we construct a uniformdistribution V with a lower bound a1 and an upper bound a1 * a2. The weightof the edge is a random number generated from the uniform distribution V. Foreach uncertain edge, we first generate a probability p from a uniform distributionbetween 0.01 and 0.99 and construct two uniform distributions V1 and V2 in thesame way as described above, and then generate two numbers c1 and c2 from theCHAPTER 11. COMPUTATIONAL ISSUES 195uniform distributions Vi and V2 respectively. The weight of the uncertain edge is abinary random variable having value c1 with probability p and c2 with probability1—p. For each randomly generated U—graph, we assume the navigation task is to gofrom the upper—left corner to the lower—right corner on the grid. We discard thoseU—graphs whose optimistic induced graphs are disconnected with respect to the twocorners.HQP1dO-EFigure 11.1: A U—graph in Class 1 a representation of city roadsThe U—graphs in Class 2 are randomly generated from a structure as shown inFig. 11.2, which is a model of two parallel—highway systems joined by a bipartite graph.The U—graphs are generated with four parameters: n, pi, r1 and r2. Here r1 andr2 are used in the same way as in Class 1, n is the number of parallel highways,and Pi is the probability that an ordinary edge exists between a pair of vertices, onein each half of the bipartite graph. Again, we assume the weight distributions of alluncertain edges are binary and are independent of one another.CHAPTER 11. COMPUTATIONAL ISSUES 196Figure 11.2: A U—graph in Class 2— an abstraction of a parallel highway systemGiven parameters n, p1, r1 and r2, a random U—graph is generated as follows.First, two partial graphs as shown in the boxes of Fig.11.2 are generated. Each partialgraph has 3n + 1 vertices, 2n edges and n uncertain edges. Next, with probabilityPi, an edge is generated between each pair of vertices, one on the right boundary ofthe left partial graph and one on the left boundary of the right partial graph. Finally,the weights of the ordinary edges and the weight distributions of the uncertain edgesare generated in the same way as for the U—graphs in Class 1. For each randomlygenerated U—graph, we assume that the navigation task is to go from vertex s tovertex t. Again, we discard those U—graphs whose optimistic induced graphs aredisconnected with respect to vertices .s and t.Note that in the process of generating a random U—graph, the weights are generated from uniform distributions that are also randomly constructed. We expect thatthis treatment reduces the correlation among the connection weights in a U—graph.The Lisp functions that are used in our experiments for generating random graphsare included in Appendix A.CHAPTER 11. COMPUTATIONAL ISSUES 197Experiment 1Our first experiment compares the performance of algorithms AO* and DFS. AO* is abest—first heuristic search algorithm. DFS is a depth—first heuristic search algorithm.We mentioned in Chapter 3 that the primary advantage of DFS over AO* is thatit needs only linear space (if the solution graph need not be explicitly constructed).We want to test if DFS examines fewer nodes than AO* does for randomly generated problems. Because structure sharing in a decision graph of a U—graph basednavigation task is not substantial, we did not make use of structure sharing in ourimplementation of AO*. Thus, a decision graph is essentially treated as an unfoldedtree, and we avoid high overhead for checking whether a node is already in the graph.The algorithms are implemented in Common Lisp.In our first experiment, we made two hundred successful trials. A trial consists ofgenerating a problem instance and applying DFS and AO* to the problem instance.A trial is aborted if it cannot be finished in a reasonable period of time (typically anhour on a Sparc—2 machine).Of the two hundred successful trials, one hundred are generated from Class 1. Theparameters used for generating U—graphs are set up in the following way:with probability 0.5, d1 and d2 are both set to 5; with probability 0.5,d1 is set to 4 and d2 to 6;Pi is a random number generated from the uniform distribution between0.6 and 0.9; P2 is a random number generated from the uniform distribution between 0.2 and 0.5;r1 is a random number generated from the uniform distribution between100 and 200; r2 is a random number generated from the uniform distriCHAPTER 11. COMPUTATIONAL ISSUES 198bution between 10 and 20;Another hundred problem instances are generated from Class 2. The parametersused for generating U—graphs are set up in the following way:n =5;p1 is a random number generated from the uniform distribution between0.5 and 0.8;r1 is a random number generated from the uniform distribution between100 and 200; r2 is a random number generated from the uniform distribution between 10 and 20.We measured for each (successful) trial the number of nodes examined by DFS andby AO*. For seventy—seven problems out of one hundred in Class 1, DFS examinedfewer nodes than AO* did. For fifty—two problems out of one hundred in Class 2,DFS examined fewer nodes than AO* did.Moreover, our experiments showed that, for most of the problems, DFS spent lesstime, even when it examined more nodes than AO* did. This is due to the fact thatDFS is a depth first algorithm, involving less overhead, and suggests that DFS isbetter suited for U—graph based navigation.Experiment 2If the admissibility of a heuristic function for a given problem cannot be assured,the solution computed by DFS may not be optimal. An algorithm with a particular heuristic function can be evaluated by two factors: its speed and the expectednavigation cost it induces.CHAPTER 11. COMPUTATIONAL ISSUES 199Our second experiment is for testing the effect of heuristic functions on DFS. Thisexperiment casts some light on the tradeoff between computational time and solutionquality. DFS is tested with four different heuristic functions: h0, h1, h2 and h3,defined as follows.h0(C) = f(C),h1(C) =h0(C)/(l-h2(C) =h0(C)/(1 — c)2 andh3(C) =h0(C)/(1 -where c = 0.2. Heuristic functions h, h2 and h3 are not admissible. Theorems 3.4and 3.5 give quality bounds for DFS with these heuristic functions.For this experiment, we also made two hundred successful trials using the sameapproach to trial construction as in the first experiment. For each successful trial,we measured the cost of the solution graph and the number of nodes visited by thealgorithm for each heuristic function. More specifically, for each problem instance,we measured the following data (for i = 0, 1, 2 and 3):• c,: the cost of the solution graph returned by DFS with heuristic function h.• n: the number of nodes examined by DFS with heuristic function h.From the measured data, we computed the following data:cr3 = cs/co, sr = no/ni, for j = 1,2,3Intuitively, cr means the cost ratio of the solution returned by DFS with heuristicfunction h3 to the cost of the optimal solution; sr means the speedup ratio of DFSwith heuristic function h3 to DFS with heuristic function h0. We use the cost ratiosto measure the quality of the algorithm with inadmissible heuristic functions, andCHAPTER 11. COMPUTATIONAL ISSUES 200Table 11.1: The average speedup ratios of Algorithm DFSh1 h2 h3Class 1 1.54 2.07 2.41Class 2 2.77 9.71 23.04Table 11.2: The average cost ratios of Algorithm DFS_______________h2 h3Class 1 1.011 1.031 1.041LClass 2 1.012 1.064 1.131use the speedup ratios to measure the performance of the algorithm with inadmissible heuristic functions. Table 11.1 contains the average speedup ratios. Table 11.2contains the corresponding average cost ratios.From these tables, we observe that the average cost ratios of DFS with heuristicfunctions h, 112 and 113 are all quite close to one. This implies that, even thoughthese heuristic functions are not admissible, they usually give very conservative estimates. Therefore, there is a great potential to obtain more informed heuristicfunctions.Experiment 3This experiment tests the effect of another heuristic function, h’, on DFS. Likeheuristic functions h1, h2 and 113, 11’ is not admissible. Unlike those functions, wedo not have a bound on the admissibility of heuristic function 11’.Heuristic function 11’ is defined as follows:h’(C) ds(Ga,vc,vg)where G° is an induced graph of the Ti—graph G of C in which each uncertain edgeis replaced with an edge whose weight is the mean value of the weight distribution ofCHAPTER 11. COMPUTATIONAL ISSUES 201the uncertain edge.This experiment showed that, with heuristic function h’, the average cost ratioand the average speedup ratio of DFS for the problems in Class 1 were 1.006 and39.49, respectively; the average cost ratio and the average speedup ratio of DFS forthe problems in Class 2 were 1.074 and 23.18, respectively. From these data, we madethe following observation:For the U—graphs in both classes, DFS with heuristic function h’ outperformedDFS with heuristic functions h1, h2 and h3. This was especially true for theU—graphs in Class 1. For the U—graphs in this class, the average cost ratio ofDFS with the heuristic function h’ was 1.006, less than the cost ratio of DFSwith the heuristic function h1; the average speedup ratio was almost 40, fargreater than the speedup ratios of DFS with the heuristic function h3.The good performance of DFS with the heuristic function h’ can be attributed to thefact that more domain—dependent knowledge is encoded in h’ than in h1 or h2 or113. This is another illustration of the importance of domain—dependent knowledgefor decision graph search in particular and for heuristic search in general.11.3 On—Line PlanningIn this section, we first discuss the issue of characterizing the quality of an on—lineplanner and then develop a polynomial algorithm for on—line planning. Finally, wepresent some experimental data. In our experiments, we compare the navigation quality of our algorithm with the navigation quality of another simple on—line algorithmgiven by Polychronopoulos [71]. The experimental data show that our algorithmresults in good navigation quality and is better than Polychronopoulos’s on—line alCHAPTER 11. COMPUTATIONAL ISSUES 202gorithm in terms of navigation quality.11.3.1 The optimality characterization of on—line plannersThe navigation quality of an on—line planner can be measured in terms of the expectedcost for given navigation tasks.Let J = (0, V, Vg) be the navigation task of going from v. to Vg in U—graph 0,let P be an on—line planner the agent uses for the navigation task and let Ca(P, J)denote the expected cost of J induced by planner P. Cost Ca(P, J) can be determined by simulating the navigation process under the guidance of P against all ofthe possible realizations of the U—graph G.Suppose C(J) is the minimal expected cost of J. The cost ratio of P withrespect to task J is defined as Ca(P, J)/C(J). We say that the planner P willresult in an optimal navigation for task J if the cost ratio is unity. We say theplanner is optimal if Ca(P, J)/C(J) = 1 for any navigation task J.11.3.2 A graph transformation based algorithmThe basic idea behind this algorithm is to substitute for the uncertain edges in a U—graph by edges with “appropriate weights”. In so doing, we hope that the “net effect”of the uncertain edges in a U—graph can be approximated by a set of substitutingedges. Suppose we are given the task of navigating from vertex v to vertex vgin U—graph 0. Assume that the uncertain edges of U—graph 0 can be orderedas u1,u2,..., u. Let W = (w1,w2, ..., WI) be a vector of weights. We call W asubstituting weight vector. Let G(W) denote the graph obtained by replacing theuncertain edges, u1, u2 u, in U—graph G respectively by edges e1, e2 el wheree and u have the same incident vertices and e has weight w, for 1 i 1. TheCHAPTER 11. COMPUTATIONAL ISSUES 203central problem here is to compute an “appropriate” substituting weight vector Wfor a given configuration. We first define ideal substituting weight vectors.A vector W = (wi,..., wi) is an ideal substituting weight vector for U—graph Gand the goal vertex Vg, if the following two conditions are met:1. for any vertex v in C, the shortest distance from v to vertex v9 in graphG(W) is equal to the minimal expected cost for the configuration (G, v, v9, c);2. the initial segment of the shortest path from v to Vg is consistent with theoptimal next move for the configuration (0, v, v9, C).Note that, in the above definition, the second condition is sufficient for optimal navigation. However, as we will see, the above definition facilitates a way to compute an“appropriate” substituting weight vector.It should also be noted that, if we have an algorithm to compute an ideal substituting weight vector for any configuration, we in effect have an on—line navigationalgorithm that always results in optimal navigation. However, since the problem ofoptimal navigation is #—P hard [71], it should not be surprising if the substitutingweight vectors computed by a polynomial algorithm are not ideal.Suppose we have magically obtained an ideal substituting vector W = (wi,..., w)and the corresponding substituting graph G(W); and suppose we happen to forgetthe value of w for some i, 1 i < 1. We want to make a good guess at the valueof w on the basis of the known information.Suppose u1 = (vi, v2). Let u denote the optimistic realization and let u’ denotethe pessimistic realization of ui,. For each a E , let W(a, i) = (wi, ..., w) be aweight vector such that w = a and w = w for all j with 1 j 1 and j i; letD(G, W, i, a) denote the absolute difference between the shortest distances from v1CHAPTER 11. COMPUTATIONAL ISSUES 204to Vg and from v2 to Vg in graph G(W(a, i)). Intuitively, D(G, W i, a) reflects thedifference between the expected costs for an agent to go to vertex Vg from verticesv1 and v2 in the U—graph under the condition that uncertain edge uj has weight a.We define e(G,Wi) as follows:e(G, W, i) = max{u, > P{u = a} * D(G, W, i, a)}.Intuitively, e(G, 144, i) is the weighted sum of D(W i, a) for each a ELemma 11.1 For any i, e(G,Wi) c(u) where Ca(Uj) is the expected value ofthe weight distribution of the uncertain edge u.Proof. We simply note that D(G, W, i, a) <a in the above definition of e(G, W, i).ElFunction e can be extended to a vector function ê such that ê(G, W) is a newvector W’ = (wi, ..., w) satisfying w = e(G, W, i). We use ê(G, W) as an estimateof W. We say a weight vector W is appropriate if it is a fix—point of ê, i.e.,ê(G,W) = W.The above condition imposes 1 equations with 1 variables. Since these equations arenot linear, it is hard to obtain an analytic solution. However, we can approximatetheir solution by iteration.Practically, for any initial vector Wo, we compute a sequence Wo, ..., W, Wj+1satisfying W1 = ê(G, Wi), 0. Vector W can be regarded as a good estimateto the solution if and W are close enough. An iteration algorithm based onthis idea for computing appropriate substituting weight vectors is given as follows.CHAPTER 11. COMPUTATIONAL ISSUES 205Algorithm AAInput: a U—graph G, a current vertex v and a goal vertex Vg.Output: A substituting vector.1. Set j = 0, Wo = (a(t’i),...,a(Ui)).2. Set l’l/ = ê(G, W3).3. Set j = j + 1.4. If and W are close enough, return W, otherwise, go back to step 2.The condition for termination at step 4 in the above algorithm can be tailored todifferent situations. For example, if an agent is in an urgent situation, it may adopta very loose condition; otherwise it can choose a more restricted one. One candidatecondition could be that the sum of the absolute differences between the element pairsfrom W and W_1 is less than a given threshold. Our experimental data show thata small number of iterations is good enough for the two classes of randomly generatedproblems. We will see this later in this section.The time complexity of this algorithm can be roughly estimated as follows. Ineach iteration, we need to compute a new weight for each uncertain edge u. Thiscomputation involves calls of a shortest distance algorithm. Therefore, the timecomplexity of AA is O(k * 1 * b * .s) where k is the number of iterations, 1 is thenumber of uncertain edges, b is the average size of the frames of the uncertain edgesand s is the time complexity of the shortest distance algorithm used in AA. The sizeCHAPTER 11. COMPUTATIONAL ISSUES 206of the space required by the algorithm is O(E + b * 1), which is of the same order asthe size of the U—graph.With the help of algorithm AA, an on—line planning algorithm is defined as follows.Algorithm AA1Input: C, v and VgOutput: A plan specifying where to go next1. If (G, Vc, Vg,—) is a terminal, return a shortest path from v to Vg in G’, thepessimistic induced graph of G.2. Use algorithm AA to compute a substituting vector W.3. Compute the substituting graph G(W).4. Compute a shortest path from v to Vg fl G(W).5. Output the initial segment (up to the first uncertain edge) of the path.The next issue is the size of our algorithm’s average cost ratio. Unfortunately, wedo not have theoretical results for general cases. In the next subsection, we presentsome experimental results which suggest that the cost ratio is quite low.11.3.3 Experimental resultsIn this section, we present some experimental results on our on—line planning algorithm AA1 and another simple on—line planning algorithm AA2, which is given byPolychronopoulos [71].CHAPTER 11. COMPUTATIONAL ISSUES 207AA2Input: G, v and VgOutput: A plan specifying where to go next1. Compute a shortest path between v, and Vg in 0a where G is obtainedfrom G by replacing each uncertain edge with an ordinary edge whose weightis equal to the mean value of the weight distribution of the uncertain edge.2. Output the initial segment of the path.AA2 has the same time complexity as that of the shortest distance path algorithmit uses.This experiment was conducted in the same way as were the previous experiments.The difference was that for each trial, we measured the cost ratios of AA1 and AA2for the problem instance of the trial as follows. We first generated a problem instancefor each trial, then computed the optimal expected cost of the problem by callingDFS. To determine the expected cost that AA1 (AA2) will incur for the problem, wesimulated AA1 (AA2) against all of the possible realizations of the U—graph of theproblem instance. Each simulation may involve up to 1 calls of AA1 (AA2) where1 is the number of uncertain edges of the U—graph. Thus, up to 1 * 2 calls of AA1(AA2) may be needed in order to determine the expected cost that AA1 (AA2) willincur for the problem.The experimental results are summarized in Table 11.3. The two rows of the tablecorrespond to the two problem classes. The data in column AA2 are the average costratios of AA2 for the problems in the two classes. The data in the AA1/i columnsCHAPTER 11. COMPUTATIONAL ISSUES 208are the average cost ratios of AA1 with iteration time i for the problems in the twoclasses. From these tables we observe that the average cost ratios of both algorithmsare all close to one and that the average cost ratio of AA1 is lower than that of AA2.It is not hard to explain why AA1 is better on average than AA2. In bothalgorithms, the initial segment of the shortest path between the current position andthe goal position in a substituting graph is computed as the plan. The difference,however, lies in the ways that the weights of the substituting edges are computed. InAA2, the weight of an edge substituting for an uncertain edge is the mean value ofthe weight distribution of the uncertain edge, thus no information on other uncertainedges is integrated in the computation. On the other hand, in AA1, the weightof an edge substituting for an uncertain edge is computed with the following kindsof information being taken into account: the goal vertex; the weight distributionof the uncertain edge; the structure of the entire U—graph; and the information onthe other uncertain edges. Although yet to be verified through real applications, theinitial experimental results suggest that AM exhibits good quality for U—graph basednavigation.Finally, we would like to make a note on the (lack of) convergence property of thealgorithm AA1. The main component of AA1 is AA, which is an iterative algorithmtaking a set of non—linear equations as its input and computing an approximation ofa “fix—point” of the equations. However, there is no guarantee that the outputs of AAcan converge to a “fix—point” as iteration time increases. It is not known whetherthe equations have a fix—point. The quality of AA1 is not guaranteed to improvemonotonically as the the number of iterations increases. This is also reflected inthe results in Table 11.3. The average cost ratios of AA1 fluctuate as iteration timeincreases.CT1APT1R 11 CO1WPUTATTOTVA 1, SSURR 209Table 11.3: The average cost ratios of the on—line algorithmsclass 1class 2I AA2 AA1/1 AA1/2 I AA1/3 AA1/4 AA1/5 AA1/61.060 1.044 1.042 1.042 1.041 1.032 1.0421.082 1.014 1.011 1.008- 1.010 1.009 1.010Chapter 12ConclusionsIn this thesis, we took a uniform approach to computational issues of the problemswith decision making with uncertainty. We proposed decision graphs as a simpleintermediate representation, for the decision making problems, and developed somealgorithms based on this representation.These algorithms can be readily applied to decision problems given in the formof decision trees, since decision graphs are a generalization of decision trees. It isalso straightforward to apply these algorithms to decision problems given in the formof finite stage Markov decision processes, since a problem in such form can be represented as a decision graph. In order to make use of these algorithms for solvingdecision making problems in influence diagrams, we developed a stochastic dynamicprogramming formulation of the problem of influence diagram evaluation and presented a method to systematically transform a decision making problem in influencediagram representation into a decision graph representation. In effect, we obtained atwo—phase method for influence diagram evaluation. In comparison with other algorithms in the literature for influence diagram evaluation, our method has a few notablemerits. First, it exploits asymmetry of decision problems in influence diagram evaluation, which leads to exponential savings in computation for typical decision problems.210CHAPTER 12. CONCLUSIONS 211Second, by using heuristic search techniques, it provides an explicit mechanism formaking use of heuristic information that may be available in a domain—specific form.Third, because it provides a clean interface between influence diagram evaluation andBayesian net evaluation, various well—established algorithms for Bayesian net evaluation can be used in influence diagram evaluation. Finally, by using decision graphs asan intermediate representation, the value of perfect information [53] can be computedmore efficiently [110].High—level navigation in uncertain environments can be viewed as a decision problem with uncertainty, but is given neither in the form of Markov decision processes,nor in the form of influence diagrams. We developed a decision theoretic formulationof the problem and showed how to represent such a problem in a decision graph. Asa result, we can use the decision search algorithms for off—line path planning.Since the problem is of importance in its own right, we also developed an online path planning algorithm with polynomial time complexity. Experimental resultsshow that the on—line algorithm results in satisfactory navigation quality.12.1 Contribution SummaryThe contributions of this thesis are summarized as follows.• A number of algorithms for decision graph search.• A new method for influence diagram evaluation.• A decision theoretic formalization of the problem of U—graph based navigation,and a general approach to the computation of optimal plans for U—graph basednavigation tasks.CHAPTER 12. CONCLUSIONS 212A polynomial time heuristic on—line algorithm for U—graph based navigation.12.2 Future ResearchFuture research can be carried out in several directions. For decision graph search, wewould like to have a more thorough examination of the (absolute and/or comparative)performances of the decision graph search algorithms presented in Chapter 3. Atheoretical examination, as has been taken for minimax tree search [38, 68], would bevery interesting, but is likely to be challenging. An experimental examination mayalso be valuable for a better understanding on the performances of the algorithms.Another possible research direction is, as for other search problems [43], to investigateparallel algorithms for decision graph (tree) search.Our work on influence diagram evaluation can be extended in at least two aspects.First, our method can be generalized to handle irregular stepwise decomposable influence diagrams as well. To do so, we need to introduce a new kind of node to ourdecision graphs: sum nodes to capture parallel decision threads. Second, an influence diagram evaluation system using our method is yet to be built. With such asystem, we can experimentally compare the performance of our method with otheralgorithms. This would also provide a platform for experimentally comparing variousdecision graph search algorithms.The analysis we performed in Section 7.5.1 on potential savings by exploitingasymmetry of decision problems is quite conservative. We expect that an averagecase based analysis would reveal greater potential of savings.Some interesting future work related to U—graph based navigation would be toapply the approach to a practical autonomous navigation system. By experimentingwith such a practical system, we could realistically evaluate the value of taking unCHAPTER 12. CONCLUSIONS 213certainty into consideration at high level planning stages in general, and assess thepractical value of U—graphs and U—graph based navigation theory in particular.Some theoretical questions about U—graph based navigation remain open. Forexample, can we find an interesting class of navigation problems for which optimalpolynomial algorithms exist? Can we find a good bound on the cost ratio of AA1?Bibliography[1] R. C. Arkin. Motor schema based navigation for a mobile robot: An approachto programming by behavior. In IEEE International Conference on Roboticsand Automation, pages 264—271, 1987.[2] R. C. Arkin. Navigational path planning for a vision based mobile robot. Robotica, 7:43—48, 1989.[3] B. W. Ballard. The *minimax search procedure for trees containing chancenodes. Artificial Intelligence, 21(3):327—350, 1983.[4] A. Bar-Noy and B. Schieber. The Canadian traveller problem. In Proc. 2ndAnnul ACM—SIAM Symposium on Discrete Algorithms, pages 261—270, 1991.[5] R. Bellman. Dynamic Programming. Princeton University Press, Princeton,New Jersey, 1957.[6] D. Bertsekas. Dynamic behavior of shortest path algorithms for communicationnetworks. IEEE transaction on Automatic Control, AC-27(1), Feb. 1982.[7] R. Bhatt, D. Gaw, and A. Meystel. A real time guidance system for an autonomous vehicle. In IEEE Conference on Robotics and Automation, pages1785—1791, 1987.[8] M. Boddy. Anytime problem solving using dynamic programming. In Proc. ofAAAI—91, pages 360—365, Anaham, CA., USA, 1991.[9] M. E. Bratman, D. J. Israel, and M. E. Pollack. Plans and resource—boundedpractical reasoning. Computational Intelligence, 4(4):349—255, 1988.[10] P. P. Chakrabarti, S. Ghose, and S. C. DeSarkar. Admisibility of AO* whenheuristics overestimate. Artificial Intelligence, 34(1) :97—113, 1987.[11] G. F. Cooper. A method for using belief networks as influence diagrams. InProc. of the Fourth Conference on Uncertainty in Artificial Intelligence, pages55—63, Univ. of Minnesota, Minneapolis, USA, 1988.214BIBLIOGRAPHY 215[12] Z. Covaliu and R. M. Oliver. Formulation and solution of decision problemsusing decision diagrams. Technical report, University of California at Berkeley,April 1992.[13] J. L. Crowley. Navigation for an intelligent mobile robot. IEEE Robotics andAutomation, RA-1(1):31—41, 1985.[14] E. Davis. Representing and Acquiring Geographic Knowledge. London: PitmanPublishing, 1986.[15] T. Dean, K. Basye, R. Chekaluk, S. Hyun, M. Lejter, and M. Randazza. Copingwith uncertainty in control system for navigation and exploration. In Proc. ofAAAI—90, pages 1010—1015, 1990.[16] T. Dean, R. J. Firby, and D. Miller. Hierarchical planning involving deadlines,travel time and resources. Computational Intelligence, 4(4):381—398, 1988.[17] T. Dean, L. P. Kaebling, J. Kirman, arid A. Nicholson. Deliberation schedulingfor time-critical sequential decision making. In Proc. of the Ninth Conferenceon Uncertainty in Atificial Intelligence, pages 309—316, Washington, DC, 1993.[18] T. Dean, L. P. Kaebling, J. Kirman, and A. Nicholson. Planning with deadlinesin stochastic domains. In Proc. of AAAI—93, pages 574—579, Washington, DC,1993.[19] T. L. Dean and M. P. Wellman. Planning and Control. Morgan Kaufmann,1992.[20] C. Derman. Finite State Markovian Decision Process. Academic Press NewYork and London, 1970.[21] E.W. Dijkstra. A note on two problems in connection with graphs. NumerischeMath., 1:269—271, 1959.[22] G. Dudek, M. Jenkin, E. Milios, and D. Wilkes. Robotic exploration as graphconstruction. Technical Report RBCV-TR-88-23, Research in Biological andComputational Vision, University of Toronto, November 1988.[23] J. A. Feldman and R. F. Sproull. Decision theory and Artificial Intelligence II:The hungry monkey. Cognitive Science, 1, 1977.[24] R. J. Firby. An investigation into reactive planning in complex domains. InProc. of AAAI—87, pages 202—206, 1987.BIBLIOGRAPHY 216[25] P. C. Fishburn. The Foundations of Expected Utility. Dordrecht, Holland:Reidel, 1982.[26] R. M. Fung and R. D. Shachter. Contingent influence diagrams, 1990.[27] M. P. Geogeff and A. L. Lansky. Reactive reasoning and planning: an experiment with a mobile robot. In Proc. of AAAI—87, pages 677—682, 1987.[28] M. P. Georgeff and F. F. Ingrand. Decision—making in an embedded reasoningsystem. In Proc. of IJCAI-89, pages 972—978, Detroit, USA, 1989.[29] P. Haddawy and S. Hanks. Representations for decision-theoretic planning:Utility functions for deadline goal. In B. Nebel, C. Rich, and W. Swartout,editors, Proc. of the Fourth International Conference on Knowledge Representation and Reasoning, pages 71—82, Cambridge, Mass., USA, Oct. 1992. MorganKaufmann.[30] L. R. Harris. The heuristic search under conditions of error. Artificial Intelligence, 5(3):217—234, 1974.[31] E. J. Horvitz, J. S. Breese, and M. Henrion. Decision theory in expert systemsand Artificial Intelligence. International Journal of Approximate Reasoning,2:247—302, 1988.[32] E. J. Hovitz. Reasoning about beliefs and actions under compiltational resourceconstraints. In Uncertainty in Artificial Intelligence, Volume 3. Armsterdam:North Holland, 1988.[33] R. A. Howard. Dynamic Programming and Markov Processes. Technology Press,Cambridge, Massachusetts, and Wiley New York, 1960.[34] R. A. Howard. The used car buyer problem. In R. A. Howard and J. E. Matheson, editors, The Principles and Applications of Decision Analysis, Volume ILpages 690—718. Strategic Decision Group, Mento Park, CA., 1984.[35] R. A. Howard and J. E. Matheson. Influence diagrams. In R. A. Howard andJ. E. Matheson, editors, The Principles and Applications of Decision Analysis,Volume II, pages 719—762. Strategic Decision Group, Mento Park, CA., 1984.[36] F. V. Jensen, K. G. Olesen, and K. Anderson. An algebra of Bayesian beliefuniverses for knowledge based systems. Networks, 20:637—659, 1990.[37] L. Kleinrock. Queueing Systems. John Wiley and Sons, 1976.BIBLIOGRAPHY 217[38] D. E. Knuth and R. W. Moore. An analysis of alpha beta pruning. ArtificialIntelligence, 6(4) :293—326, 1975.[39] S. Koenig. Optimal probabilistic and decision theoretic planning using markovdecision theory. Technical Report UCB-CSD-92-685, Computer Science Division (EECS), University of California, Berkeley, 1992.[40] R. E. Korf. Depth first iterative deepening: An optimal admissible tree search.Artificial Intelligence, 27(1):97—109, 1985.[41] B. J. Kuipers. Modeling spatial knowledge. Cognitive Science, 2:129—153, 1978.[42] B. J. Kuipers and Tod S. Levitt. Navigation and mapping in large—scale space.Al Magzine, 9(2):25—43, 1988.[43] V. Kumar, P.S. Gopalakrishnan, and L. N. Kanal (Eds). Parallel Algorithmsfor Machine Intelligence and Vision. Springer-Verlag, 1990.[44] V. Kumar and L. Kanal. The cdp: A unifying formulation for heuristic search,dynamic programming, and branch-and-bound. In L. Kanal and V. Kumar,editors, Search in Artificial Intelligence, pages 1—27. Springer-Verlag, 1988.[45] V. Kumar, D. S. Nau, and L. Kanal. A general branch-and-bound formulationfor AND/OR graph and game tree search. In L. Kanal and V. Kumar, editors,Search in Artificial Intelligence, pages 91—130. Springer-Verlag, 1988.[46] C. P. Langlotz and E. H. Shortliffe. Logical and decision-theoretic methods forplanning under uncertainty. Al Magazine, 10(Spring):39—47, 1989.[47] S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilitieson graphical structures and their application to expert systems. J. R. Statist.Soc. Ser. B, 50:157—224, 1988.[48] T. S. Levitt and D. T. Lawton. Qualitative navigation for mobile robots. Artificial Intelligence, 44(3) :305—360, 1990.[49] T. A. Linden and J. Glicksman. Contingency planning for an autonomous landvehicle. In Proc. IJCAI—87, pages 1047—1054, Milan, Italy, 1987.[50] A. Mahanti and A. Bagchi. AND/OR graphs heuristic search methods. J.ACM, 32(1):28—51, 1985.[51] M. S. Mannasse, L. A. McGeoch, and D. D. Sleator. Competitive algorithmsfor on—line problems. In the 20th ACM Symp. on Theory of Computing, pages322—333, 1988.BIBLIOGRAPHY 218[52] A. Martelli and U. Montanan. Additive and/or graphs. In Proc. of IJCAI—73,pages 1—7, Stanford, CA., USA, 1973.[53] J. E. Matheson. Using influence diagrams to value information and control.In R. M. Oliver and J. Q. Smith, editors, Influence Diagrams, Belief Nets andDecision Analysis, pages 25—63. John Wiley and Sons, 1990.[54] J. M. McQuillan. The new routing algorithm for the arpanet. IEEE Transactions on Communications, COM-28:711—719, May 1980.[55] A. Meystel. Planning in a hierarchical nested autonomous control system. InSPIE Mobile Robots, pages 42—76, 1986.[56] A. C. Miller, M. M. Merkhofer, R. A. Howard, J. E. Matheson, and T. T. Rice.Development of automated aids for decision analysis. Technical report, StanfordResearch Institute, 1976.[57] D. P. Miller and M. G. Slack. Global symbolic maps from local navigation. InProc. of AAAI—91, pages 750—755, 1991.[58] J. S. B. Mitchell, D. W. Payton, and D. M. Keisey. Planning and reasoningfor autonomous vehicle control. International Journal of Intelligent Systems,2:129—198, 1987.[59] B. G. Mobasseri. Path planning under uncertainty: From a decision analyticperspective. In IEEE International Symposium on Intelligent Control, pages556—560, 1989.[60] B. G. Mobasseri. Decision analytic approach to weighted region problem. InSPIE Mobile Robots, V, pages 438—445, 1990.[61] P. Ndilikilikesha. Potential influence diagrams. Technical report, BusinessSchool, University of Kansas, 1991. Working paper No. 235.[62] Nils J. Nilsson. Principles of Artificial Intelligence. Springer-Verlag BerlinHeidelberg New York, 1982.[63] J. J. Nitao and A. M. Rarodi. An intelligent pilot for an autonomous vehiclesystem. In IEEE International Conference on Robotcs and Automation, pages176—183, 1985.[64] S. M. Olmsted. On representing and Solving Decision Problems. PhD thesis,Engineering Economics Department, Stanford University, 1983.BIBLIOGRAPHY 219[65] C. H. Papadimitriou. Games against nature. Journal of Computer and SystemScience, 31(2), October 1985.[66] C. H. Papadimitriou and Mihalis Yannakakis. Shortest paths without a map. InProc. the Sixteenth ICALP, Lecture Note in Comp. Sci. No. 372, pages 610—620.Spring—Verlag, July 1989.[67] B. G. Patrick. Binary iterative deepening A*: An admissible generalization ofIDA* search. In Proc. of Ninth Canadian Conference on Artificial Intelligence,pages 54—59, Vancouver, Canada, 1992.[68] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving.Addison-Wesley Publishing Company, 1984.[69] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of PlausibleInference. Morgan Kaufmann, Los Altos, CA, 1988.[70] L. D. Phillips. Discussion of ‘From Influence to Relevance to Knowledge byR. A. Howard’. In R. M. Oliver and J. Q. Smith, editors, Influence Diagrams,Belief Nets and Decision Analysis, page 22. John Wiley and Sons, 1990.[71] G. H. Polychronopoulos. Stochastic and Dynammic Shortest Distance Problems.PhD thesis, MIT, Operations Research Center, Technical Report 199, May 1992.Technical Report 199.[72] D. Poole. Probabilistic Horn Abduction and Bayesian networks. ArtificialIntelligence, 64(1):81—129, 1993.[73] G. Provan and D Poole. The utility of consistency-based diagnosis. In Proc. ofthe Third International Conference on Knowledge Representation and Reasoning, pages 461—472, 1991.[74] M. L. Puterman. Markov decision processes. In D. P. Heyman and M. J. Sobel,editors, Handbooks in Operations Research and Management Science, Volume2. Elsevier Science Publishers B. V. (North-Holland), 1990.[75] R. Qi. A new method for network routing: a preliminary report. In the Proc. ofPacific Rim Conference on Communications, and Computers and Signal Processing, 1993.[76] R. Qi and D. Poole. High level path planning with uncertainty. In B. D.D’Ambrosio, P. Smet, and P. P. Bonissone, editors, Proc. of the Seventh Conference on Uncertainty in Al, pages 287—294, UCLA, Los Angeles, USA, 1991.Morgan Kaufmann.BIBLIOGRAPHY 220[77] R. Qi and D. Poole. A framework for high level path planning with uncertainty.In Proc. of the Second Pacific Rim International Conference on Artificial Intelligence, pages 287—293, Seoul, Korea, 1992.[78] R. Qi and D. Poole. Two algorithms for decision tree search. In Proc. of theSecond Pacific Rim International Conference on Artificial Intelligence, pages121—127, Seoul, Korea, 1992.[79] H. Raiffa. Decision Analysis. Addison—Wesley Publishing Company, 1968.[80] 5. M. Ross. Introduction to Stochastic Dynamic Programming. Academic Press,1983.[81] 5. Russell. Fine—grained decision—theoretic search control. In Proc. Sixth Conference on Uncertainty in Artificial Intelligence, 1990.[82] S. Russell and E. Wefald. Principles of metareasoning. In R. J. Brachman, H. J.Levesque, and R. Reiter, editors, Proc. of the First International Conference onKnowledge Representation and Reasoning, pages 400—411, Cambridge, Mass.,USA, 1989.[83] L. J. Savage. The Foundations of Statistics. Dover, 1954.[84] Marcel J. Schoppers. Representation and automatic synthesis of reaction plans.Technical Report UIUCD C S—R—89—1 546, Department of Computer Science,University of Illinois at Urbana—Champaign, 1989.[85] R. D. Shachter. Evaluating influence diagrams. Operations Research, 34(6):871—882, 1986.[86] R. D. Shachter. An ordered examination of influence diagrams. Networks,20:535—563, 1990.[87] R. D. Shachter, B. D’Ambrosio, and B. A. Del Favero. Symbolic probabilisticinference in belief networks. In Proc. of AAAI-90, pages 126—131, 1990.[88] R. D. Shachter and M. A. Peot. Decision making using probabilistic inference methods. In Proc. of the Eighth Conference on Uncertainty in ArtificialIntelligence, pages 276—283, San Jose, CA., USA, 1992.[89] 5. Shafer and W. Whittaker. Development of an integrated mobile robot system at Carnegie Mellon University. Technical Report CNU-RI-TR-90-12, TheRobotics Institute, Carnegie Mellon University, January 1990.BIBLIOGRAPHY 221[90] P. P. Shenoy. Valuation—based systems for Bayesian decision analysis. TechnicalReport working paper No. 220, School of Business, University of Kansas, April1990.[91] P. P. Shenoy. A fusion algorithm for solving Bayesian decision problems. InB. D. D’Ambrosio, P. Smet, and P. P. Bonissone, editors, Proc. of the SeventhConference on Uncertainty in Artificial Intelligence, pages 361—369, UCLA, LosAngeles, USA, 1991. Morgan Kaufmann.[92] P. P. Shenoy. Valuation network representation and solution of asymmetricdecision problems. Technical Report working paper No. 246, School of Business,University of Kansas, April 1993.[93] D. J. Slate and L. R. Atkin. CHESS 4.5 The Northwestern UniversityUniversity Chess Program. Springer—Verlag, New York, 1977.[94] D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and pagingrules. CACM, 28: 202-208, 1985.[95] J. E. Smith, S. Holtzman, and J. E. Matheson. Structuring conditional relationships in influence diagrams. Operations Research, 41(2):280—297, 1993.[96] J. Q. Smith. Decision analysis a Bayesian approach. London ; New YorkChapman and Hall, 1988.[97] W. Stallings. Data and Computer Communications. Macmillan PublishingCompany New York and Collier Macmillan Publishers London, 1985.[98] Y. Sun and D. S. Weld. Beyond simple observation: Planning to diagnose. InProc. of the 3rd International Workshop on Principles of Diagnosis, 1992.[99] Y. Sun and D. S. Weld. A framework for model based repair. In Proc. ofAAAI-93, pages 182—187, 1993.[100] A. S. Tanenbaum. Computer networks. Englewood Cliffs, N.J. : Prentice-Hall,1989.[101] J. A. Tatman and R. D. Shachter. Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man, and Cybernetics, 20(2):365—379,1990.[102] J. von Neumann and 0. Morgenstern. Theory and Games and Economic Behavior. Princeton University Press, 1947.BIBLIOGRAPHY 222[103] Z. Wang and J. Crowcroft. Analysis of shortest path routing algorithms ina dynammic network environmrnt. Computer Communication Review, 22(2),1992.[104] M. P. Weliman. Formulation of Tradeoffs in Planning Under Uncertainty. Pit-man and Morgan Kaufman, 1990.[105] P. H. Winston. Artificial Intelligence. Addison—Wesley, Reading MA, 1984.[106] L. Zhang. A Computational Theory of Decision Networks. PhD thesis, Department of Computer Science, University of British Columbia, 1993.[107] L. Zhang and D. Poole. Sidestepping the triangulation problem. In Proc. ofthe Eighth Conference on Uncertainty in Artificial Intelligence, pages 360—367,Stanford University, Standford, USA, 1992.[108] L. Zhang and D. Poole. Stepwise decomposable influence diagrams. In B. Nebel,C. Rich, and W. Swartout, editors, Proc. of the Fourth International Conference on Knowledge Representation and Reasoning, pages 141—152, Cambridge,Mass., USA, Oct. 1992. Morgan Kaufmann.[109] L. Zhang, R. Qi, and D. Poole. A computational theory of decision networks.accepted by International Journal of Approximate Reasoning, also available asa technical report 93-6, Department of Computer Science, UBC, 1993.[110] L. Zhang, R. Qi, and D. Poole. Incremental computation of the value of perfectinformation in stepwise-decomposable influence diagrams. In Proc. of the NinthConference on Uncertainty in Artificial Intelligence, pages 400—410, Washington, DC, 1993.[111] L. Zhang, R. Qi, and D. Poole. Minimizing decision tables in stepwisedecomposable influence diagrams. In Proc. of the Fourth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, Florida, USA, Jan.1993.Appendix AFunctions for U—graph GenerationIn this appendix, the Lisp functions used in our experiments for generating randomgraphs are included for reference. The functions random-grid and random-parallel-graphare for generating random grids and random parallel graphs respectively. To generatea random graph, we first generate a list of connections and then generate appropriateweights for those connections by calling function weight-generation.(defun random-grid (dl d2 p1 p2 ri r2)(weight-generation (random-grid-connections dl d2 p1 p2) ri r2))(defun random-parallel-graph (n p1 ri r2)(weight—generation (parallel—connections n p1) ri r2))(defun random-from-uniform-distribution (a b)(+ (mm a b) (* (abs (— a b)) (random 1.0))))(defun random-switch-prob 0(I (round (* 100 (random—from—uniform—distribution 0.01 0.99))) 100.0))(defun random-weight (ri r2)(let ((al (random-from-uniform—distribution 1 ri))(a2 (random-from-uniform-distribution 1 r2)))(round (random-from-uniform—distribution al (* al a2)))))(defun weight—generation(connection-list ri r2)(let* ((switch—list (cadr connection—list))(edge—list (caddr connection—list))(edge—inform nil)223APPENDIX A: FUNCTIONS FOR U-GRAPH GENERATION 224)(dolist (e edge—list)(let ((weight (random-weight ri r2)))(push ‘(,(car e) ,(cadr e) (,weight) (1.0)) edge—inform)))(dolist (e switch—list)(let ((weightl (random-weight ri r2))(weight2 (random-weight ri r2))(p (random-switch-prob)))(push ‘(,(car e) ,(cadr e) (,weightl ,weight2) (,p ,(- 1 p)))edge-inform)))(cons (car connection—list) edge—inform)))(defun random-grid-connections (dl d2 p1 p2)(let ((switch—list nil)(edge—list nil)(ran—testi 0)(ran—test2 0))(dotimes (j dl) ;; generating “vertical connections”(dotimes (i (— d2 1))(setf ran-testi (random 1.0))(setf ran-test2 (random 1.0))(cond ((> ran—testi p1) nil)((> ran—test2 p2)(push ‘(,(+ (* i dl) j) ,(+ (* (+ i 1) dl) j)) edge—list))(T (push ‘(,(+ (* i dl) j) ,(+ (* (+ i 1) dl) j)) switch—list)))))(dotimes (i d2) ;; generating “horizontal connections”(dotimes (j (— dl 1))(setf ran—testi (random 1.0))(setf ran—test2 (random 1.0))(cond ((> ran—testi p1) nil)((> ran—test2 p2)(push ‘(,(+ (* i dl) j) ,(+ (* i dl) j 1)) edge—list))(T (push ‘(,(+ (* i dl) j) ,(+ (* i dl) j 1)) switch—list)))))(list (* dl d2) switch—list edge—list)APPENDIX A: FUNCTIONS FOR U-GRAPH GENERATION 225))(defun parallel-connections (n p1)(let* ((edge—list nil)(switch—list nil)(in (+ (* 6 n ) 1)))(dotimes (i n)(push (list 0 (+ 1 i)) edge—list)(push (list (+ 1 i n) (+ i 1 (* 2 n))) edge—list)(push (list (+ 1 1) (+ i 1 n)) switch—list)(push (list (+ i 1 (* 5 n)) m) edge—list)(push (list (+ 1 i (* 3 n)) (+ i 1 (* 4 n))) edge—list)(push (list (+ 1 i (* 4 n)) (+ i 1 (* 5 n))) switch—list)(dotimes (j n)(cond ((> p1 (random 1.0))(push (list (+ i 1 (* 2 n)) (+ 1 j (* 3 n)) edge—list)))(T nil))))(list (+ in 1) switch—list edge—list)
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Decision graphs : algorithms and applications to influence...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Decision graphs : algorithms and applications to influence diagram evaluation and high-level path planning… Qi, Runping 1994
pdf
Page Metadata
Item Metadata
Title | Decision graphs : algorithms and applications to influence diagram evaluation and high-level path planning under uncertainty |
Creator |
Qi, Runping |
Date Issued | 1994 |
Description | Decision making under uncertainty has been an active research topic in decision theory, operations research and Artificial Intelligence. The main objective of this thesis is to develop a uniform approach to the computational issues of decision making under uncertainty. Towards this objective, decision graphs have been proposed as an intermediate representation for decision making problems, and a number of search algorithms have been developed for evaluating decision graphs. These algorithms are readily applicable to decision problems given in the form of decision trees and in the form of finite stage Markov decision processes. In order to apply these algorithms to decision problems given in the form of influence diagrams, a stochastic dynamic programming formulation of influence diagram evaluation has been developed and a method to systematically transform a decision making problem from an influence diagram representation to a decision graph representation is presented. Through this transformation, a decision making problem represented as an influence diagram can be solved by applying the decision graph search algorithms. One of the advantages of our method for influence diagram evaluation is its ability to exploit asymmetry in decision problems, which can result in exponential savings in computation. Some problems that can be viewed as decision problems under uncertainty, but are given neither in the form of Markov decision processes, nor in the form of influence diagrams, can also be transformed into decision graphs, though this transformation is likely to be problem-specific. One problem of this kind, namely high level navigation in uncertain environments, has been studied in detail. As a result of this case study, a decision theoretic formulation and a class of off-line path planning algorithms for the problem have been developed. Since the problem of navigation in uncertain environments is of importance in its own right, an on-line path planning algorithm with polynomial time complexity for the problem has also been developed. Initial experiments show that the on-line algorithm can result in satisfactory navigation quality. |
Extent | 3214382 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-04-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051643 |
URI | http://hdl.handle.net/2429/6999 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1994-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1994-894334.pdf [ 3.07MB ]
- Metadata
- JSON: 831-1.0051643.json
- JSON-LD: 831-1.0051643-ld.json
- RDF/XML (Pretty): 831-1.0051643-rdf.xml
- RDF/JSON: 831-1.0051643-rdf.json
- Turtle: 831-1.0051643-turtle.txt
- N-Triples: 831-1.0051643-rdf-ntriples.txt
- Original Record: 831-1.0051643-source.json
- Full Text
- 831-1.0051643-fulltext.txt
- Citation
- 831-1.0051643.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051643/manifest