Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A market-based approach to resource allocation in manufacturing Brydon, Michael 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2000-565149.pdf [ 18.34MB ]
Metadata
JSON: 831-1.0089755.json
JSON-LD: 831-1.0089755-ld.json
RDF/XML (Pretty): 831-1.0089755-rdf.xml
RDF/JSON: 831-1.0089755-rdf.json
Turtle: 831-1.0089755-turtle.txt
N-Triples: 831-1.0089755-rdf-ntriples.txt
Original Record: 831-1.0089755-source.json
Full Text
831-1.0089755-fulltext.txt
Citation
831-1.0089755.ris

Full Text

A MARKET-BASED APPROACH TO RESOURCE ALLOCATION IN MANUFACTURING by Michae l Brydon M.Eng., Engineering Management, Royal Military College of Canada, 1993 B.Eng., Engineering Management, Royal Military College of Canada, 1990 A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in The Faculty of Graduate Studies Commerce and Business Administrat ion We accept this li^siis^s<oliforming to the required standard The University of Bri t ish Columbia October, 2000 © Michael Brydon, 2000 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Faculty of Commerce and Business Administration The University of British Columbia Vancouver, Canada p m 11 0<-A- gcj A B S T R A C T In this thesis, a framework for market-based resource allocation in manufacturing is developed and described. The most salient feature of the proposed framework is that it builds on a foundation of well-established economic theory and uses the theory to guide both the agent and market design. There are two motivations for introducing the added complexity of the market metaphor into a decision-making environment that is traditionally addressed using monolithic, centralized techniques. First, markets are composed of autonomous, self-interested agents with well defined boundaries, capabilities, and knowledge. By decomposing a large, complex decision problem along these lines, the task of formulating the problem and identifying its many conflicting objectives is simplified. Second, markets provide a means of encapsulating the many interdependencies between agents into a single mechanism—price. By ignoring the desires and objectives of all other agents and selfishly maximizing their own expected utility over a set of prices, the agents achieve a high degree of independence from one another. Thus, the market provides a means of achieving distributed computation. To test the basic feasibility of the market-based approach, a prototype, system is used to generate solutions to small instances of a very general class of manufacturing scheduling problems. The agents in the system bid in competition with other agents to secure contracts for scarce production resources. In order to accurately model the complexity and uncertainty of the manufacturing environment, agents are implemented as decision-theoretic planners. By using dynamic programming, the agents can determine their optimal course of action given their resource requirements. Although each agent-level planning problem (like the global level planning problem) induces an unsolvably large Markov Decision Problem, the structured dynamic programming algorithm exploits sources of independence within the problem and is shown to greatly increase the size of problems that can be solved in practice. In the final stage of the framework, an auction is used to determine the ultimate allocation of resource bundles to parts. Although the resulting combinational auctions are generally intractable, highly optimized algorithms do exist for finding efficient equilibria. In this thesis, a heuristic auction protocol is introduced and is shown to be capable of eliminating common modes of market failure in combinational auctions. 11 TABLE OF CONTENTS Abstract 1 1 List of Tables vii i List of Figures i x Dedication x 1 1 Chapter 1: Introduction to the Problem 1 1.1 Research Question 1 1.2 Existing Approaches to the Manufacturing Problem 2 1.3 Overview of the Market-Based Approach 3 1.4 Proof-of-Concept Criteria .6 1.4.1 Decomposition and Global Optimality 6 1.4.2 The Computational Feasibility of Agent-Level Rationality 7 1.4.3 Aggregation Using Competitive Markets 8 1.4.4 Solution of a Benchmark Problem 9 1.5 Summary of Contributions 10 1.5.1 Primary Contribution 10 1.5.2 Secondary Contributions 10 1.6 Overview of the Thesis 11 Chapter 2: Planning arid Scheduling In Manufacturing 13 2.1 The Manufacturing Problem 13 2.2 Conventional Decomposition • 13 2.2.1 Rough-Cut Planning 14 2.2.2 Detailed Scheduling 15 2.2.3 Achieving Computational Tractability 16 2.2.4 Limitations of Conventional Decomposition 17 2.2.4.1 Dependence Between Stages 17 2.2.4.2 Constraints versus Decision Variables 18 2.2.4.3 Imprecision of MRP 18 2.3 Heuristic Approaches 19 2.4 The Emergence of ERP Systems 20 2.5 A Benchmark Problem 21 2.5.1 A Special Cases of the Three-Machine Job 23 2.5.2 Limitations of the Exact Approach 25 2.5.3 The Meaning of Optimality 25 2.6 Criteria for Comparison 27 Chapter 3: Theoretical Foundations and Prior Research 29 3.1 Scheduling 29 iii 3.1.1 Exact Versus Heuristic Approaches 30 3.1.2 Scheduling In O R 31 3.1.2.1 Exact Approaches 32 3.1.2.2 Heuristic Approaches 34 3.1.3 Scheduling and Classical Planning in AI 36 3.1.3.1 Knowledge Representation 37 3.1.3.2 The Situation Calculus 38 3.1.3.3 The Strips Representation 40 3.1.3.4 Classical Planning and Refinements 41 3.1.4 Decision-Theoretic Planning 42 3.1.4.1 Probabilistic Representations of Actions 44 3.1.4.2 Finding Optimal Policies • 46 3.1.4.3 D T P Observations 47 3.1.4.4 The Curse of Dimensionality 48 3.1.5 Constraint-Based Planning 49 3.1.6 Recent Advances in Planning 50 3.2 Markets and Equilibrium 51 3.2.1 Rationality 52 3.2.2 Equilibrium. . 54 3.2.2.1 Individual Consumers and Budgets 54 3.2.2.2 Pareto Optimal Allocations in an Edgeworth Box Economy 55 3.2.2.3 Equilibrium Prices '. 57 3.2.3 Auction Theory : 58 3.2.3.1 Auction Forms 59 3.2.3.2 Bidder Characteristics 60 3.2.3.3 Equivalences of Forms 61 3.3 Distributed Artificial Intelligence 61 3.3.1 Environment 62 3.3.2 Coordination Mechanisms 63 3.3.2.1 Hierarchy 63 3.3.2.2 Social Laws/Standard Operating Procedures 64 3.3.2.3 Team Utility Functions 65 3.3.2.4 Negotiation 66 3.3.2.5 Markets 67 3.3.3 Agent Capabilities 69 3.3.4 Classification of Approaches 70 3.3.4.1 Issues of Fit 71 3.3.4.2 Classification of the Proposed Approach 72 Chapter 4: A Framework for Market-Based Resource Allocation 74 4.1 Problem Decomposition 76 4.1.1 Approaches to M D P Decomposition 77 4.1.2 Market-Based Decomposition 79 4.1.3 Definition of the Global Problem 80 4.1.4 Modeling Machines 81 4.1.4.1 Properties of Machine-Agents 82 iv 4.1.4.2 The Decision Problem for Machine-Agents 83 4.1.4.3 Planning Without Machine Agents 84 4.1.5 Modeling Parts • 85 4.1.5.1 The Life-Cycle of a Part-Agent 85 4.1.5.2 Mating Parts and Assemblies 86 4.1.5.3 The Decision Problem for Part-Agents 87 4.1.6 Global Utility and the Rules of Money 88 4.1.6.1 Quasi-Linear Utility Functions 88 4.1.6.2 Risk Neutrality 89 4.1.6.3 Additive Utility 90 4.1.6.4 Common Numeraire 91 4.1.6.5 From Pareto Optimality to Global Optimality 91 4.1.7 Caveats and Scope 92 4.1.8 A Knowledge Representation Language for Agents 93 4.1.8.1 Sources of Uncertainty 94 4.1.8.2 Elements of the Knowledge Representation Language 95 4.1.8.3 Events and Time 108 4.1.8.4 Terminal States 112 4.2 Agent-Level Rationality 113 4.2.1 Structured Dynamic Programming 115 4.2.1.1 Policy Trees 115 4.2.1.2 Relevance Relationships 116 4.2.1.3 Policy Improvement 117 4.2.1.4 Computational Properties of the SDP Algorithm 119 4.2.2 Growing Policy Trees 120 4.2.2.1 Infrastructure for Fixed Actions 120 4.2.2.2 Reaping Rewards 121 4.2.2.3 Policy Mapping and Evaluation 123 4.2.2.4 Policy Improvement 125 4.2.2.5 Assertions and Constraints 128 4.2.3 Rolling Planning Horizons 131 4.2.3.1 Building Horizon Estimates 133 4.2.3.2 Collapsing Resource Subtrees 134 4.2.4 Bidding Strategies for Agents 136 4.2.4.1 The Resource State Space 136 4.2.4.2 Determining Reservation Prices 137 4.2.4.3 The Role of the Numeraire Good 143 4.2.4.4 Price Discovery Versus the Price Convention 144 4.3 Aggregation 144 4.3.1 Setting the Stage: A n Example Problem 145 4.3.2 Choice of Auction Form 146 4.3.3 Complementaries and Substitutes 147 4.3.4 The Challenges of Combinational Auctions 150 4.3.4.1 Consolidation of Supply 151 4.3.4.2 Consolidation of Demand 151 4.3.4.3 Compound Bids 153 v 4.3.4.4 Next-Best States 155 4.3.4.5 Deadlock 156 4.3.4.6 Stability of Equilibria 158 4.3.5 Choice of Market Protocol 159 4.3.5.1 Elements of the protocol 161 4.3.5.2 Illustration 164 4.3.5.3 Summary of Example Problem 170 Chapter 5: Empir ical Results 172 5.1 Achieving Agent-Level Rationality 172 5.1.1 Conventional Stochastic Optimization 173 5.1.2 Coping Strategies 174 5.1.3 Computational Leverage 175 5.1.3.1 Planning Horizon 176 5.1.3.2 Stochastic Processing Times. . . 178 5.1.3.3 Number of Operations 180 5.1.4 Interpretation of Results 180 5.2 Distributed Combinational Auctions 182 5.2.1 Searching for Next-Best States 182 5.2.2 Complexity Analysis 183 5.2.3 Bidding as Hi l l Climbing 184 5.3 Evaluating the Performance of the Market 186 5.3.1 Empirical Results 186 5.3.2 Extensions to Multiple-Machine Problems 188 5.3.3 Flexibility of the Market Approach 190 5.3.4 Interpretation of Results 194 5.4 Solution to the Benchmark Problem 195 5.4.1 Results of the Sequential Auction 196 5.4.2 The Problem of Nearsightedness 198 5.4.3 Interpretation of Benchmark Results 198 Chapter 6: Conclusions and Recommendations for Future Research 201 6.1 Conclusions . 201 6.1.1 Contributions of the Thesis 203 6.1.2 Limitations of the Thesis 203 6.2 Recommendations for Future Research 207 6.2.1 Computational Issues 208 6.2.1.1 Platform and Prototype Enhancements 208 6.2.1.2 Additional Computational Leverage within the D T P Approach 209 6.2.1.3 Variable-Granularity of Time 210 6.2.1.4 Support for Learning 211 6.2.2 Agent Modeling and Problem Formulation 212 6.2.2.1 Support for Interdependent Parts 213 6.2.2.2 Support for Setups and Other Externalities 214 6.2.3 Other Areas of Application 214 6.2.3.1 Structured Dynamic Programming 215 vi 6.2.3.2 Combinational Auctions 216 References 217 LIST OF TABLES Table 2.1: Processing times for the benchmark problem 22 Table 2.2: Processing times for the transformed benchmark problem 24 Table 2.3: Holding costs for the optimal benchmark and alternative schedule 26 Table 3.1: A 2 X 2 matrix for classifying approaches to scheduling based on discipline and degree of exactness 31 Table 3.2: Common machine configurations and their standard notations 33 Table 3.3: Classification of three classes of single-machine scheduling problems.) 34 Table 3.4: The probability of each outcome for the MoveWest action 44 Table 4.1: State variables used to model a part agents 99 Table 4.2: Determination of the state space for a three-operation three-machine task with a • decision horizon of 10 time units 114 Table 4.3: Details of the example scheduling problem 145 Table 4.4: A problem for which no equilibrium price exists 149 Table 5.1: A problem in which nearsightedness is potentially cosdy 199 Table 6.1: Benefits of the market-based approach to manufacturing planning 204 Table 6.2: Secondary contributions made by the thesis 205 Table 6.3: Sources of approximation in the market-based framework 207 V l l l LIST OF FIGURES Figure 1.1: The proof-of-concept pyramid for the thesis 7 Figure 2.1: The conventional decomposition of the "manufacturing problem" 14 Figure 2.2: A n optimal schedule for the benchmark problem 24 Figure 2.3: A n alternative solution to the benchmark problem 26 Figure 3.1: A n example of an optimal contingency plan for a probabilistic planner 43 Figure 3.2: A model of rational decision making under uncertainty 53 Figure 3.3: The Walrasian budget set 55 Figure 3.4: Equilibrium in an Edgeworth box economy 56 Figure 3.5: Classification of D A I systems with respect to the environment and coordination mechanism dimensions 70 Figure 4.1: The market-based approach to decomposition 75 Figure 4.2: Major milestones of the market-based approach 76 Figure 4.3: Agent-based decomposition in a manufacturing environment 81 Figure 4.4: A graphical representation of the elements of the knowledge representation language for agents 97 Figure 4.5: The action representation hierarchy for P-STRIPS 100 Figure 4.6: Action preconditions specify conditions that must be true in the world for the action to be feasible 101 Figure 4.7: Discriminants are used to identify the relevant features of the current state 102 Figure 4.8: The execution of the action will lead to one of the outcomes with a known probability 103 Figure 4.9: Different effects lists are associated with each outcome 104 Figure 4.11: Reward trees conditioned on the value of NewTimeUnit 105 Figure 4.10: A n aspect is used to represent the passage of time in a temporal action 106 Figure 4.12: Action cost trees for holding costs'. 107 Figure 4.13: The schema for the Ship action 109 Figure 4.14: A n example of the use of P - S T R I P S to represent an action-dependent event 110 Figure 4.15: A partial action schema for the IncrementClock action I l l Figure 4.16: Aspects for action-independent events 113 Figure 4.17: Basic policy trees 117 Figure 4.18: The "major limbs" of a policy tree for a part agent 121 Figure 4.19: The core policy tree with rewards 122 ix Figure 4.20: The core policy tree with mapping information 124 Figure 4.21: A policy tree prior to improvement 125 Figure 4.22: The cumulative completion time distribution for operation Op3 126 Figure 4.23: A partial policy tree showing outcomes 127 Figure 4.24: A n example of an invalid combination of state variable values 129 Figure 4.25: A tree-based representation of assertions 130 Figure 4.26: A screen-shot showing a portion of a policy tree 132 Figure 4.27: The backward induction approach to generating horizon estimates 133 Figure 4.28: A partial subtree for Time =1 134 Figure 4.29: A policy tree sorted to enable extraction of price information 138 Figure 4.30: Indifference curves and initial endowments for the seller (a) and the buyer (b) . . . . 139 Figure 4.31: A n Edgeworth box representation of different exchanges 140 Figure 4.32: The optimal solution to the 3-job, 1-machine example problem 146 Figure 4.33: Tree-based representations of dependence between goods 149 Figure 4.34: The resource tree for Agent 3 depicts an all-or-nothing situation 152 Figure 4.35: A pricing scenario in which a Pareto efficient exchange requires consolidation of demand ; , 152 Figure 4.36: A sub-optimal 2-job, 1-machine schedule 154 Figure 4.37: A tree-base representation of a compound transaction 154 Figure 4.38: The critical elements of the proposed market protocol 161 Figure 4.39: The initial (sub-optimal) resource allocation for the example problem 165 Figure 4.40: The allocation of resources following the purchase of [Tl, T2] by Agent 3 169 Figure 4.41: Two possible provisional transactions 170 Figure 5.1: The curse of dimensionality for a part-agent 174 Figure 5.2: The effect of the length of the planning horizon on the number of concrete and abstract states 177 Figure 5.3: The probability that Op3 is complete after executing Process(Op3, M3, •) 178 Figure 5.4: A comparison of deterministic and stochastic problems with the same number of concrete states 179 Figure 5.5: The relationship between the number of operations in a job and the size of the abstract state space ' 180 Figure 5.6: The sequence of transactions used by the market prototype to attain an equilibrium outcome 187 x Figure 5.7: Processing times and the optimal schedule for a coarse-grained three-machine problem 189 Figure 5.8: Equilibrium allocations of resources for the coarse-grained three-machine problem 191 Figure 5.9: The completion time distribution and payoff function for the new job 192 Figure 5.10: The sequence required to attain equilibrium after the arrival of the new job 193 Figure 5.11: The optimal minimum cost solution to the benchmark problem (reproduced from Figure 2.3) 195 Figure 5.12: Equilibrium allocations of resources at the end of the auctions for the rolling horizon case 197 Figure 5.13: A n example of a problem in which the market "recovers" from nearsighted mistake in an early stage of the auction 200 Figure 6.1: The impact of initial state constraints on the size of a policy tree 210 Figure 6.2: Integration of machine learning (data mining) and production planning using the market-based framework 212 xi DEDICATION To my mother, for all her efforts. And to my wife Stephanie for giving me the things that matter most in my life: Gregory and Chloe. 0 xu CHAPTER 1: INTRODUCTION TO THE PROBLEM Despite dramatic increases in computing power over the last half century, certain classes of routine decision problems remain too difficult to be solved on even the most powerful computers. Unfortu-nately, one such class of hard problems—the allocation of scarce resources to competing uses—con-' stitutes the foundation of many human decision-making activities. For example, governments collect taxes and redistribute resources in an effort to benefit society as a whole; firms allocate people, money, and other productive resources in an effort to maximize profit; individuals allocate their time and effort among a large number of competing demands in an effort to be happy. Generally, such allocation problems generate too many possible solutions to make the application of brute force methods feasible. As a consequence, the role of computers in solving hard resource allocation prob-lems has traditionally been limited to one of decision support—computers do some of the work but the final decisions rely on human intervention and judgement [Keen and Scott Morton, 1978]. Given the many well-known computational limitations and biases in human decision making (e.g., [Kahneman and Tversky, 1979], [Damasio, 1994]), it is worthwhile to ask whether there is a way to restructure hard problems so that computers can generate solutions without undue reliance on human interven-tion or guidance. 1.1 RESEARCH QUESTION One possible means of restructuring a hard problem is to decompose it into many smaller pieces that can be solved independently. Accordingly, the research question addressed in this thesis is the follow-ing: If itpossible to generate optimal solutions to large resource allocation problems using electronic markets and artifi-cial rational agents'? The specific problem used to motivate and illustrate the thesis is manufacturing scheduling. However, the proposed techniques are generic to the broader class of resource allocation problems. The fundamental issue being addressed in this research is not how jobs should be assigned to machines, but rather how distributed computation can be used to side-step computational com-plexity. Hard problems arise whenever the complexity of the problem grows faster than the computational power of the machines used to solve the problem. A problem like manufacturing scheduling is com-putationally troublesome precisely because it is insensitive to Moore's Law 1 . That is, although the 1 ^processing power of a computer may double every 18 months or so, the number of unique schedules in a job shop with n jobs increases by a factor of n\ when a new piece of production machinery is introduced. Thus, an algorithm that requires only a few seconds to yield an exact solution for a toy scheduling problem may require many centuries of deliberation to solve an industrial-scale version of the same problem. In such circumstances, dramatic increases in computational power do little to alter the basic infeasibility of the solution technique. Broadly stated, the goal of this thesis is to rethink the way in which large resource allocation problems are formulated and solved. The result is a problem-solving approach based on a market metaphor: large problems are decomposed into many smaller agent-level problems, the agents generate solu-tions to their own problems, and a market is used to combine the agent-level solutions into optimal3 or near-optimal global-level outcomes. The fundamental advantage of the market-based approach is that conflict and resource dependency between agents is replaced by a price mechanism. The result-ing inter-agent independence permits the global problem to be distributed over a large number of computers. . 1.2 EXISTING APPROACHES TO THE MANUFACTURING PROBLEM Decision makers in manufacturing environments are faced with "the manufacturing problem". The manufacturing problem involves the allocation of production resources and the coordination of activ-ities so that goods are produced in an efficient manner. Efficiency is defined in terms of a broad cost function that includes a number of important factors such as holding costs, lateness penalties, and the opportunity cost of using the resources in an alternative manner. In any real-world manufacturing environment, the manufacturing problem bears litde resemblance to the simple n-job, w-machine problems that are solved using exact algorithms in production management textbooks. Instead, real-world manufacturing is characterized not only by scale, but also by rapid change and uncertainty. Moore's-Law was originally coined by Intel founder Gordon Moore in the mid-1960s to describe the growth in the number of transistors (and thus its processing power) on a microchip. In its current incarnation, the law states that the power of microchips doubles every 18 months [Raymond, 1994]. A n exact solution technique is guaranteed to provide an optimal solution to the problem. In this chapter, the terms "exact" and "algorithmic" are used interchangeably. In this thesis, "optimality" is defined with respect to a special global utility function. This issue is discussed in much greater detail in Chapter 4. 2 In the face of such complexity, algorithmic approaches to solving the manufacturing problem have little practical utility. To get around the computational difficulties created by the manufacturing problem, two approaches are used in practice. In the first approach, algorithmic techniques are applied to a greatly simplified version of the problem and an effort is made to coerce the real system to satisfy the assumptions of the simplified model. The most common example of this approach is materials requirement planning (MRP). In the underlying MRP logic, sources of complexity (such as non-deterministic lead times, high-priority orders, capacity constraints, and machine breakdowns) are assumed not to exist.1 These simplifying assumptions permit the MRP system to generate a rough-cut schedule that is then used to generate finer-grained schedules for particular jobs and machines. Since the problem solved by the computerized system may be quite different from the real-world problem, the quality of the plans resulting from MRP systems is questionable [Turbide, 1993], [Nahmias, 1989]. In the second approach, a satisficing strategy based on heuristics is used. For example, an increasingly popular approach is to formulate factory-wide scheduling problems as constraint satisfaction prob-lems (CSPs). High-level constraints such as precedence between operations, the capabilities of machines, and due dates are used to guide the search through the space of all possible schedules. A good schedule (i.e., a schedule that satisfies the constraints) can be found very quickly using CSP techniques. There are two general disadvantages of heuristic approaches. First, heuristics provide no guarantee that the solutions they generate are optimal. The second is that the quality of the solution often depends on the quality of inputs from users. For example in CSPs, the constraints placed on the problem interact in complex ways to define the feasible set of solutions. In many cases, however, the initial set of constraints does not admit any feasible schedules and trade-offs in the form of constraint relaxations must be made. However, knowing which constraints to relax and by how much is a diffi-cult decision problem in its own right; it requires judgement and a deep understanding of the overall objective function. 1.3 O V E R V I E W O F T H E M A R K E T - B A S E D A P P R O A C H Given the shortcomings of both the conventional and heuristic approaches to addressing the manu-facturing problem, there are clear opportunities for alternative techniques that provide good solutions 3 within the bounds of computational practicality. The power of markets to coordinate the activities of a large number of boundedly-rational agents contending for scarce resources is well understood. Equally well understood is the difficulty of using centralized hierarchical methods to address large resource allocation problems. Consider for example the following vignette from the Economist [Anon-ymous, 1995]: When Soviet planners visited a vegetable market in London during the early days of perestroika, they were impressed to find no queues, shortages, or mountains of spoiled and unwanted vegetables. They took their hosts aside and said: *We understand, you have to say it is all done by supply and demand. But can't you tell us what is really going on? Where are your planners, and what are their methods?" Given the current popularity of using market mechanisms to address large-scale, societal issues (such as school vouchers, pollution control, and the allocation of radio spectrum) it is surprising that greater use of markets has not been made in the design and implementation of computer-based deci-sion support systems. Markets have three fundamental advantages over centralized control. First, the notion of an agent provides a convenient means to decompose complex systems into self-contained chunks for goals, capabilities, beliefs, and so on [Simon, 1981]. By decomposing the problem into sub-problems that are roughly independent, distributed computation can be used to reduce the calen-dar time (as distinct from processor time) required to generate a solution. Second, markets are inher-ently scalable. When making decisions, agents need only consider their own preferences and the relative prices of goods in the economy. The addition of a new agent to the economy may result in a change in prices, but does not change the complexity of the decision problem faced by any other agent. Finally, the system of. prices that emerges from the interactions of the agents in the market guides the system as a whole to a welfare-maximizing state. The marvel of the price system has long been recognized. Consider, for example, the following obser-vation by Nobel laureate Friedrich von Hayek [1945]: I have deliberately used the "word" marvel to shock the reader out of the complacency with which we of take the working of this mechanism for granted. I am convinced that if [the price system] were the result of deliberate human design and if the people guided by the price changes understood that their decisions have significance far beyond their immediate aim, this mechanism would have been acclaimed as one of the greatest triumphs of the human mind. 4 The approach to distributed computation proposed in this research is based on a literal interpretation of classical microeconomic theory. The global-level resource allocation problem is restated in terms of a large number of agent-level problems. Rather than relying on a centralized controller the aggre-gation of the agent-level solutions into a coherent global solution is achieved by the unseen band..By selfishly maximizing their own well-being, the agents unknowingly maximize the well-being of the system as a whole. The process of transforming a resource allocation problem to a market-based problem (consisting of agents, goods, bidding protocols, and prices) can divided into three distinct phases: 1. Decomposition — Agents are created to represent the interests of objects in the real world, such as parts and production machinery. Each agent is simply a process on a computer that makes decisions on behalf of the physical-world object it is assigned to represent. 2. Agent-level rational choice — - Each agent is implemented as a decision-theoretic planner. The decision-theoretic planning algorithm generates a contingency plan that maximizes the agent's expected utility over some planning horizon. Since the objects that the agents repre-sent exist in an imperfect physical world, rationality implies the ability .to reason about uncer-tain outcomes. 3. Aggregation — To resolve inter-agent conflicts, a market is provided in which agents can buy and sell contracts for resources. The ultimate allocation of resources to agents is deter- • mined by the equilibrium of prices that emerge through the market interactions. The body of economic theory that addresses the global welfare effects of competition by agents is Walrasian4 microeconomics. The core of the Walrasian approach—the first fundamental theorem of : welfare economics—provides a formal statement of the workings of the unseen hand. The theorem states that a competitive equilibrium produces an allocation of resources that is Pareto optimal5 [Mas-Colell, Whinston and Green, 1995]. Naturally, the first theorem is based on a number of assumptions about agents and the markets in which the agents interact. Most notably, the agents are assumed to be strictly rational and the markets are assumed to be frictionless and efficient. In human Named for Leon Walras (1834-1910), French economist and pioneer of general equilibrium theory. A n allocation is Pareto optimal (or Pareto efficient) i f there is no way to make one agent better off without making another agent worse off. The importance of Pareto optimality is discussed in greater detail in Section 3.2. 5 economies, these assumptions are never fully satisfied and thus Walrasian microeconomics is seldom seen as anything but a first-cut descriptive model of human behavior. However, the goal of this research is to build artificial agents, not model real ones. As such, relatively simplistic economic the-ory can be used in the normative sense to guide the design and implementation of the com-, puter-based agents and markets. The issue, therefore, is not whether the theory accurately models the agents, but whether the agents can be engineered to accurately model the theory. 1.4 P R O O F - O F - C O N C E P T C R I T E R I A To demonstrate the viability of the market-based approach presented in this thesis, the combination of analytical and empirical reasoning shown in Figure 1.1 is used. At the top of the "proof-of-concept pyramid" is the fundamental premise that it is possible to find a globally optimal solution to a large resource allocation problem using a market-based approach. The claim of optimality rests into two additional premises: • it is possible to express global utility (i.e., the utility of the system as a whole) as a special func-tion of agent utilities; and, • the first fundamental theorem of welfare economics holds in the market in which the agents interact. The first premise, which establishes a relationship between global optimality and Pareto optimality, can be established as a design feature of the system of agents. The second premise is more complex since the first theorem imposes a number of preconditions on the structure of the agents and the market. The net result, as Figure 1.1 illustrates, is that the proof of concept pyramid rests on three distinct lines of reasoning that correspond to the decomposition, rational choice, and aggregation phases described in the preceding section. These lines of reasoning are introduced in more detail in the sections that follow. 1.4.1 D E C O M P O S I T I O N A N D G L O B A L O P T I M A L I T Y The relationship between Pareto and global optimality is important in the context of the manufactur-ing problem because the agents themselves are created solely as a means of achieving a globally opti-mal outcome. In other words, the ultimate welfare of the agents is much less important than the 6 problem global optimality? solution Pareto optimality -» global optimality problem formulation (Section 4.1) first fundamental theorem of welfare economics agent-level rationality A structured dynamic programming algorithm (Section 4.2) market equilibrium protocol for . combinational auctions (Section 4.3) empirical results (Section 5.1) complexity analysis (Section 5.1) FIGURE 1.1 The proof-of-concept pyramidfor the thesis: The feasibility of the market-based approach is supported by lines of analytical and empirical reasoning. ultimate allocation of resources to agents. Although the first theorem guarantees Pareto optimal out-comes for agents, Pareto optimality is generally insufficient for global optimality. Consequently, in order for the first theorem to be of any practical value, the global-level problem must be formulated in such a way as to create and maintain a sufficiency relationship between Pareto and global optimal-ity. Fortunately, in environments such as manufacturing in which there is a well-defined global utility function, it is possible to decompose the problem such that global utility is defined as the sum of agent utilities. The justification for, and implications of, this approach are discussed in Section 4.1 1.4.2 T H E C O M P U T A T I O N A L F E A S I B I L I T Y O F A G E N T - L E V E L R A T I O N A L I T Y Attaining outcomes that are economically efficient requires that all agents participating in the market be stricdy rational. To achieve the requisite level of rationality, agents in this system are implemented as solvers of stochastic optimization problems. Consequently, the agents are subject to the complexity 7 constraints inherent in any optimization technique. The difficulty of achieving agent-level rationality (for multiagent or single-agent systems) is summarized by [Mullen and Wellman, 1995]: Despite the centrality of decision-theoretic rationality in our view of computational economies, at present we have little to say [...] about how to make economic agents rational in the decision-theoretic sense. The reason is that there is no difference in the problem of achieving computational rationality in this context compared to any other context. That is, designing rational agents is the general [artificial intelligence] problem, and we are working on pieces of it just like every other research group. The core problems is that all planning problems are in principle intractable [Chapman, 1987], [Garey and Johnson, 1979]. The critical issue in this research is whether the market-based decomposition of large problems leads to the formulation of agent-level problems that are small enough to be solvable in practice. Put another way, a well-known shortcoming of conventional stochastic optimization tech-niques is that despite their power and elegance, they only permit the solution of "toy" problems. This research investigates whether market-based decomposition can be used to generate a large number of toy problems that can be solved and recombined to provide the solution to much larger problems. One of the conclusions of the thesis is that although the agent-level planning problems are much smaller than the global (factory-level) planning problem, the agent-level problems are still much too large to represent (never mind solve) using conventional stochastic optimization techniques. How-ever, as shown in Section 4.2, a structured dynamic programming algorithm can be used to exploit source of independence within the agent-level problem formulations. Precisely how much can be gained by using the structured dynamic programming algorithm is an empirical question—it depends on both the problem domain and the problem representation. Section 5.1 presents empirical evi-dence that suggests that it is possible to achieve agent-level rationality for the class of resource alloca-tion problems addressed in the thesis. . • 1 .4 .3 A G G R E G A T I O N U S I N G C O M P E T I T I V E M A R K E T S The second precondition of the first fundamental theorem of welfare economics is that the market be capable of attaining equilibrium. To attain equilibrium, a large number of potential sources of market failure (such as externalities, information asymmetries, transaction costs, and so on) need to be elimi-nated. In Section 4.3, the sources of market failure relevant to multiagent resource allocation systems 8 are identified and an efficient incomplete6 protocol for distributed combinational auctions is pre-sented. The strategies for avoiding market failure embodied in the protocol are described with the aid of a simple example. The worst-case computational complexity of the protocol is analyzed in Section 5.1. 1 .4 .4 S O L U T I O N O F A B E N C H M A R K P R O B L E M In addition to the analytical/empirical chain of reasoning shown in Figure 1.1, a benchmark problem is used to evaluate the ability of the market-based approach to converge on an optimal allocation of resources. The problem used for this purpose in this thesis is a deterministic three-machine flow shop scheduling problem that can be solved using a variation of Johnson's rule, a well-known opera-tions research technique. Although the market-based approach could be applied to much larger or more realistic problems (indeed, this is the rationale for its introduction in the first place), solutions that are known to be optimal do not exist for such problems—hence the reliance on a very simple problem as a benchmark. The details of the problem and the conventional solution technique are pre-sented in Section 2.5. Although the benchmark problem used in the thesis is very small, it belongs to a large, well-defined class of resource allocation problems that share the same core structure. As a consequence, conclu-sions made regarding the inner workings of the agent and market algorithms on small problem instances should generalize well to larger «-job, /w-machine scheduling problems with the same struc-ture. For example, in the case of the deterministic 1-machine example introduced in Section 4.3.5 to • illustrate the operation of the auction protocol, the.market-based approach can be shown to reduce to the shortest processing time (SPT) dispatch rule (which is known to be optimal for the entire class of problems). Since the equivalence between the market and STP is independent of scale, the fact that the market-based system can find the optimal solution to the small instance implies that it can find the optimal solution to larger instances of the same class (assuming that there is sufficient computa-tion time and memory). A n incomplete algorithm is one that is not guaranteed to find the optimal solution. The combinational auctions in which the agents participate are known to be intractable [Sandholm and Suri, 2000] and therefore complete algorithms are infeasible for systems with large numbers of agents and resource goods. 9 1.5 S U M M A R Y O F C O N T R I B U T I O N S This thesis brings together established theories and techniques from a number of foundation disci-plines (specifically, economics, operations research, and artificial intelligence) and applies them to an important problem in production management. As such, the primary contribution of the thesis is the novel integration of existing research to address a practical problem. However, to achieve the integra-tion and create a working prototype of a market-based system, a number of secondary incremental contributions were made in the foundation disciplines. Both the primary and secondary contributions are described in detail in Section 6.1.1. In this section, the contributions are briefly summarized. 1.5.1 P R I M A R Y C O N T R I B U T I O N The primary contribution made by this thesis is an approach to multiagent resource allocation that is built on a foundation of microeconomic theory. The theoretical foundation is important because it provides assurance that the equilibrium outcome of the system is optimal and stable with respect to a given information set. In addition, the inherent scalability of the market-based approach permits the decomposition and solution of stochastic optimization problems that are impractical to solve using conventional monolithic techniques. 1.5.2 S E C O N D A R Y C O N T R I B U T I O N S The secondary contributions made by the thesis involve new or refined techniques that have been introduced in order to satisfy the preconditions of the first fundamental theorem of welfare econom-ics. These contributions involve technical refinement to the work of others (e.g., adopting and imple-menting the structured dynamic programming algorithm of Boutilier etal([Boutilier, 1997], [Boutilier and Dearden, 1996], [Boutilier, Dearden and Goldszmidt, 1995]) in Section 4.2.1) and the introduc-tion of novel approaches to dealing with the technical challenges inherent in the approach (e.g., the two-phase protocol for combinational auctions in Section 4.3.5 and a rolling planning horizon tech-nique for long term planning in Section 4.2.3). Without rational agents and efficient markets, it is impossible to draw on classical economic theory to make predictions about the quality of equilibrium outcomes. Thus, the ability to solve large agent-level planning problems and attain equilibrium in a combinational auction can be seen as the "enabling technologies" of the market-based approach. 10 1.6 O V E R V I E W O F T H E T H E S I S In Chapter 2, the'manufacturing problem is described in greater detail and the shortcomings of both conventional approaches to manufacturing planning and emerging enterprise resource planning sys-tems are identified. The objective of the chapter is to illustrate the need for better methods of addressing the manufacturing problem. The chapter closes with a description of the benchmark scheduling problems used throughout the thesis for illustration. Chapter 3 summarizes theory and prior research in the disciplines relevant to this research: schedul-ing, economics, and artificial intelligence. • In Section 3.1, approaches to planning and scheduling from two different disciplines (opera-tions research and artificial intelligence) are reviewed. The purpose of the review is to identify the strengths and weaknesses of various exact and heuristic approaches. • Section 3.2 contains a summary of a number of important economic concepts. First, the con-cept of individual rationality is defined. This is followed by a brief review of the concept of general equilibrium in which the Edgeworth box model is used to illustrate a number of basic properties of pure exchange economies and to introduce notation used in the discussion of the implemented system in Section 4.2.4. The section on economic theory ends with a review of auction theory. Auction theory is used to establish the equivalence of the different types of markets considered in this research. • Section 3.3 briefly reviews theory and practice from distributed artificial intelligence. In this section, a three dimensional taxonomy is introduced and used to situate different approaches, and systems reported in the literature. In addition, the market-based approach proposed in this thesis is situated within the taxonomy. The objective of this section is to evaluate the suit-ability of various approaches to the task of manufacturing scheduling. Chapter 4 describes the proposed market-based approach in detail. The sections of the chapter corre-spond to the three phases identified in Section 1.3: decomposition, agent-level rational choice, and aggregation. • • In Section 4.1, the decomposition of a scheduling problem into a number of self-interested agents is described. The action description language for agents is summarized and illustrated with examples. 11 • In Section 4.2, the challenges of achieving agent level rationality are discussed and a tree-based structured dynamic programming algorithm for solving very large stochastic opti-mization problems is described. • In Section 4.3, the aggregation mechanism for the systems—a form of continuous reverse auction—is introduced and illustrated with an example. Chapter 5 contains a discussion of the empirical results used to support the proof-of-concept pyra-mid in Figure 1.1. First, in Section 5.1, the efficacy of the structured dynamic programming algorithm is analyzed. A n effort is made to extrapolate the results to estimate the feasibility of addressing real-world problems using the agent-level planning techniques described in Chapter 4. Section 5.2 contains an analysis auction protocol and derives the worst-case performance of the algorithm. In Section 5.3 and Section 5.4, the market-based solutions to the benchmark and other scheduling prob-lems are analyzed. Chapter 6 concludes the thesis with a summary of conclusions and limitations of the thesis. The pri-mary and secondary contributions of the research are enumerated and described in detail. In addition, a number of areas for further research are identified. 12 C H A P T E R 2: P L A N N I N G A N D S C H E D U L I N G I N M A N U F A C T U R I N G 2.1 T H E M A N U F A C T U R I N G P R O B L E M Effective management of a manufacturing enterprise requires the coordination of many types of pro-duction resources such as time, materials, machines, and personnel. The explicit goal of the coordina-tion exercise is to simultaneously maximize a number of important performance measures (such as profitability, product quality, responsiveness, and worker satisfaction) and minimize others (such as environmental impact and waste). In short, decision makers in manufacturing environments are faced with a very large, ongoing resource allocation problem that requires judicious trade-offs between numerous conflicting objectives. Not surprisingly, the complexity of the manufacturing problem far exceeds the capacity of any human decision maker. Similarly, existing analytical methods—that is, methods that can be expressed as algorithms and executed by computers—fall far short of being able to address such problems in their entirety. As discussed in Section 1.2, there are two broad approaches to coping with the computational intrac-tability of the manufacturing problem. The first is to generate exact solutions to a series of simplified sub-problems (the conventional decomposition approach). The second-is to address the problem in its full complexity but to satisfice—that is, accept an approximate or "good enough" solution as long as it is found within an acceptable amount of time (the heuristic approach). In this chapter, both approaches are examined in greater detail and an effort is made to highlight some of the important shortcomings associated with each. 2.2 C O N V E N T I O N A L D E C O M P O S I T I O N Figure 2.1 shows the conventional decomposition of the manufacturing problem. The term "conven-tional" is used to distinguish this form of decomposition from other forms (such as the agent-based decomposition described in Chapter 4) and to emphasize the extent to which practitioners and aca-demics have converged on a scheme for dividing the manufacturing problem into a standardized set 13 capacity status schedule constraints production planning, master scheduling I quantities, due dates material requirements planning, capacity planning schedule performance ± orders, demand forecasts material requirements shop orders, release dates scheduling and rescheduling shop status I schedule dispatching 1 detailed scheduling shopfloor management data collection a job loading shopfloor FIGURE 2.1 The conventional decomposition of the "manufacturing problem": The global problem is divided into a number of distinct sub-problems (from [Pinedo, 1995, p. 4]). of sub-problems. Indeed, most production management textbooks contain a diagram very similar to the one shown in Figure 2.1. 2.2.1 R O U G H - C U T P L A N N I N G The starting point of the conventional decomposition is forecasts of future demand across all prod-uct lines. In many cases, forecasts are combined into measures of aggregate demand to facilitate the • estimation of workforce requirements over the planning horizon. The forecasts are also used to cre-ate a master production schedule (MPS), which describes the target quantities and delivery dates of finished goods. Once the MPS is complete, it is passed to the materials requirements planning (MRP) system. The role of the M R P system is to "explode" the schedule for finished goods into order release schedules for the raw materials and purchased components that constitute the finished goods. 14 In addition to the MPS, the inputs to a M R P system are the bill of materials (BOM) information for each finished product and estimates of the lead time required to produce or purchase each item in the B O M . The system uses a simple procedure to work backwards from the due date of the finished good to determine the timing and quantities of B O M items. The output of the M R P system is a general list of material requirements and an order release schedule for each B O M item that must be produced or purchased. The order release schedule consists simply of the date on which the order should be released to the production system and a date by which the order should be completed. Since the order release schedule does not allocate specific production resources (such as personnel or machines) to orders, M R P is sometimes called rough-cut scheduling. 2.2.2 D E T A I L E D S C H E D U L I N G The detailed scheduling stage in Figure 2.1 refers to the process by which jobs are assigned to machines. In prototypical manufacturing environments, the terms "jobs" and "machines" are used lit-erally. However, in the general case, a "job" can refer to any object or collection of objects that requires processing and a "machine" can refer to any production resource. One consequence of the sequential nature of the decomposition shown in Figure 2.1 is that the shop-floor scheduler is con-strained by a number of decisions that have already been made upstream: 1. Process plan: The process plan specifies the sequence of operations that must be performed to produce the part. In addition, the process plan may specify precedence constraints between the operations, machine requirements or preferences, setup requirements (e.g., jigs or fix-tures), and personnel skill requirements. 2. Release and due dates: The rough-cut planning stages specify an order release date and a due date for each job. The task of the scheduler is to ensure the job is complete within the designated window of time. Typically, penalties are specified for both missed deadlines and early completion (e.g., the holding cost for finished goods). 3. Priority: To facilitate the process of resolving contention for resources during scheduling, jobs are often assigned priorities. These priorities may be subjective (e.g., based on the strate-In queueing theory, such resources are referred to as "servers". Although this term is in many ways preferable, "machine" is used throughout this thesis to remain consistent with the established scheduling nomenclature. 15 gic importance of the customers placing the orders) or objective (e.g., proportional to the value already added to the order). Despite constraints imposed on the scheduling problem by the preceding sub-problems, the search for optimal solutions typically involves evaluation of a very large number of alternatives. Generally, the number of possible schedules increases exponentially with the number of factors considered, such as operations, machines, tools, personnel, and so on. For this reason, commercial scheduling systems typically restrict their focus to a single resource—machines [Fox, 1987]. However, the combinatorics remain challenging, even for the simplified problem. For example, the task of scheduling n jobs on m machines generates (n\)m unique schedules. Naturally, the goal in scheduling is to find a good sched-ule without having to search through a massive number of alternative schedules. 2 . 2 . 3 A C H I E V I N G C O M P U T A T I O N A L T R A C T A B I L I T Y The conventional decomposition achieves computational tractability in two ways. First, each sub-problem in the sequence is assumed to depend only on the output of the preceding sub-problem. For example, the master schedule generated during MPS is used to generate an order release schedule for individual parts by the M R P system. However, material planning is assumed to be independent of capacity planning, shop-floor scheduling or any other downstream activity. Clearly, the assumption of infinite (capacity gready reduces the computational complexity of the materials planning process. The second way in which the conventional decomposition achieves computational tractability is by using simplified, single-attribute objective functions for each sub-problem in lieu of the more com-plex multi-attribute utility function faced by the firm. Subjective attributes such as product quality, responsiveness, and worker satisfaction are ignored, quantified to the extent possible, or replaced with proxies. For example, in M R P the implicit objective function is to have the lateness (due date minus completion date) of all jobs equal to zero. Since the due date is assumed fixed and binding in M R P logic, the profitability and customer satisfaction implications of relaxing the due dates are never considered. In shop-floor scheduling, decision makers have a choice of alternative objective functions to mini-mize. Single-attribute measures that are commonly used include tardiness, lateness, makespan, and the sum of completion times. In some cases, the objective functions are weighted according to a pre-16 determined system of priorities. Although these objective functions are related (they all are cost func-tions with respect to time) the choice of which objective function to minimize can greatly impact the computational complexity of the solution process and the solution itself. In addition to the core objective functions, other measures are used at a global level to guide the choice of objective functions and solution methods at the sub-problem level. For example, it is often desirable to quantify the "robustness" of a production plan [Pinedo, 1995], [Nahmias, 1989, p. 86]. A robust plan is relatively insensitive to unexpected occurrences in the production environment and • therefore limits the workload uncertainty faced by human workers. In contrast, a brittle plan may involve lower up-front costs but increase the possibility of having to cancel shifts or request overtime shifts in response to some disturbance. By maximizing the robustness measure of a schedule, the decision maker is attempting to minimize many hard to quantify and intangible costs. In a similar way, there have been numerous efforts in the last three decades to express the issues surrounding product quality in economic terms (e.g., [Crosby, 1979]). 2.2.4 L I M I T A T I O N S O F C O N V E N T I O N A L D E C O M P O S I T I O N In this section, a number of limitations of the conventional decomposition are described. The objec-tive is to demonstrate that the simplifications and assumptions used to support the sequential deci-sion process shown in Figure 2.1 are seldom justified in practice. 2.2.4.1 D E P E N D E N C E B E T W E E N STAGES Perhaps the most obvious shortcoming of the conventional decomposition is that the stages shown in Figure 2.1 are not independent. In fact, the stages are highly dependent whenever constraints from downstream stages adversely affect the feasibility of the solutions generated at upstream stages. For example, a common criticism of M R P systems is that the order release schedules that they generate assume (among other things) infinite downstream capacity. If a particular order release schedule is found to violate capacity constraints in the capacity planning stage, the order release schedule must be fixed. The situation is similar for the scheduler: although an order release schedule may be feasible with respect to aggregate capacity, it may be impossible for the scheduler to satisfy the specific con-straints it is given. If the MRP, CRP, and scheduling systems are not well integrated, the resulting 17 "backtracking" can be very expensive in terms of effort and disruption [Sillince and Sykes, 1993], [Turbide, 1993]. 2.2.4.2 CONSTRAINTS VERSUS D E C I S I O N VARIABLES Given the sequential dependencies between the sub-problems in Figure 2.1, it is clear that quality of the final production plan is sensitive to the accuracy of the initial demand forecasts. However, one problem with forecasts in general is that they tend to ignore the impact that a firm can have on gener-ating demand. In other words, demand is a decision variable in its own right; it can be controlled via changes in marketing and promotion [Raman and Singh, 1998], [Nahmias, 1989]. The situation is similar for other "constraints" in the problem formulation. For example, production capacity is taken to be a hard constraint; however, it may be possible to alter short term capacity by adding shifts or subcontracting certain items. Similarly, a customer may not have a firm due date but instead have a utility function over a wide range of delivery dates. As such, it might be advantageous to allow customers to pay a higher price for early delivery or give them a discount when their orders can be moved to off-peak production times. By treating decision variables as fixed constraints, deci-sion makers forego the opportunity to make trade-offs between various aspects of the problem. 2.2.4.3 IMPRECISION O F M R P The logic that underlies standard M R P systems is based on a number of exceedingly simplistic assumptions concerning lead times, capacity, and the priority of different jobs. The basic shortcom-ings of M R P can be summarized as follows: • 1. Lead times for parts making up a product are assumed to be constant and known with cer-tainty. When actual lead times differ from the estimates used by the M R P system, the system's outputs may be misleading [Neely and Byrne, 1992]. 2. Lead time is ignored as a variable in its own right. Treating lead time as an exogenously-deter-mined factor ignores the fact that decision makers can often influence lead times, especially for products built in-house. 3. The decision horizon is discrete and there is limited look ahead. When the order release schedule is generated, it is generated over a fixed horizon of n time periods. However, the 18 subsequent scheduling of a very large, high-priority job at time n.+ 1 could render the existing order release schedule suboptimal. 4. M R P does not provide a framework for making trade-offs. Since all jobs in the master pro-duction schedule are treated equally, there is no systematic way of assessing the value of plan refinements such as expediting high-priority orders, changing due dates, adding overtime shifts, and so on. 2.3 H E U R I S T I C A P P R O A C H E S In heuristic approaches,-the manufacturing problem is treated as a semi-structured decision problem. The distinction between structured, semi-structured, and unstructured problems in the context of decision support systems (DSS) is due to [Keen and Scott Morton, 1978]. The classic example of an unstructured operational task is the selection of the monthly cover by a magazine editor. Selection of the cover requires human capabilities such as taste and judgement. Moreover, there is litde structure on which to build an algorithmic approach. As a consequence, cover selection, like any unstructured decision problem is left to human decision makers with little or no computer-based decision support.. At the other extreme of the continuum are structured problems. The classic example of a structured problem is inventory control. Before the widespread use of computers in business, the task of moni-toring and reordering inventory required a well-paid middle manager with experience and specialized skills [Keen and Scott Morton, 1978, p. 89]. However, .the economic trade-off between holding costs and the risk of stocking out is quantitative and well understood. Since structured operational tasks are relatively easy to solve algorithmically, fully automated inventory control systems are now common. Decision support systems that rely on a mixture of algorithmic decision making, heuristic decision making, and human participation are typically targeted at the middle ground—semi-structured decision problems. The example of a semi-structured decision problem used by Keen and Scott Morton is . bond trading. The objective of a bond trader is to maximize long-run profitability by buying and sell-ing bonds with different coupon rates and maturities. Clearly, the task itself is highly-structured—it is a stochastic optimization problem with a well-defined objective function. The difficulty occurs because there is an overwhelming number of sources of relevant information for bond traders and the relationship between the market and the information is so complex and uncertain that a fully algorithmic solution is generally impractical. As a consequence, bond trading normally involves a 19 mixture of algorithmic analysis, rules of thumb, and human judgement. The manufacturing problem has much in common with bond trading and is also used as an exemplar of the semi-structured class of problems (e.g., (Turban, 1995]). However, unlike bond trading, the manufacturing environment is highly structured and relatively predictable. The semi-structured designation arises more from the dif-ficulty of applying conventional algorithmic techniques to the problem rather than from any inherent lack of structure in the problem itself. Given the computational infeasibility of conventional structured techniques and the shortcomings of MRP-based approaches, there is a growing number computerized systems for scheduling that adopt a semi-structured approach—that is, they combine human judgement, decision-making heuristics, and computational brute force. A n example of this trend is systems for production planning based on constraint-directed search. CSP-based scheduling can be viewed as a semi-structured approach for two reasons. First, search over the space of possible schedules is heuristic. Without exhaustive search, there is no guarantee that a schedule that satisfies the constraints is also an optimal schedule. Second, human experience and judgement is required to identify and formulate constraints. Although some constraints—such as physical precedence constraints—are binding (e.g., a hole must be bored before a bolt can be inserted), others—such due dates and minimum levels of quality—are more subjective and could involve complex trade-offs. Selection and relaxation of softer constraints involves a very important element of human intervention in the decision making process. 2.4 T H E E M E R G E N C E O F E R P S Y S T E M S Vendors of Enterprise Resource Planning (ERP) systems have recognized the dependence between the sub-problems shown in Figure 2.1, but only from an information management point of view. ERP systems provide a centralized repository for all types of manufacturing information, such as demand forecasts, raw materials inventories, order status, due dates, and so on. Although the integra-tion of information is certainly an improvement over stand-alone information systems, the funda-mental problem of dependence from a planning and coordination point of view remains unsolved. In many cases, the production planning engines used by E R P systems are based on the same simplistic assumptions and approximations as the M R P systems they replace [AMR Research, 1997]. 20 To rectify the problem, most major E R P vendors are in the process of integrating so-called advanced planning systems (APS) algorithms within their production planning modules. For example, RED PEPPER software, a vendor of constraint-based planning systems, was recendy acquired by PEOPLE-SOFT; ORACLE, on the other hand, has decided to partner with 12 to sell its RYTHYM APS products. According to reports in the trade press (e.g., [Bartholomew, 1997], [Gould, 1997]), the transition from MRP-based systems to more sophisticated CSP-based algorithms has resulted in significant payoffs in a number of manufacturing organizations. In broad terms, APS products represent a shift away from the compartmentalized sub-problems in Figure 2.1 towards heuristic techniques that support more complex problem formulations. 2.5 A B E N C H M A R K P R O B L E M Benchmark problems are used in this thesis to facilitate concrete illustrations of techniques and to demonstrate the viability of the market-based approach on small but interesting problems. In the fol-lowing sections, the primary benchmark problem is introduced and solved using a conventional oper-ations research technique. In Chapter 5, the optimal solutions found here are compared to the solution generated by the market based approach. Three jobs need to be scheduled. 2 Each job consists of three operations and each operation must be performed On a different machine. Thus the scheduling environment consists of three jobs and three machines. Each job is subject to the same precedence constraints. Specifically, the completion of Operation 2 (Op2) must precede the commencement of 0p3. Similarly, the completion of O p l must precede the commencement of Op2. The processing time of each operation on each machine is shown in Table 2.1. . Given the formulation above, the benchmark problem can be classified as a classic flow shop prob-lem. Flow shop problems have the following characteristics: • each job consists of multiple operations and each operation can only be processed on a par-ticular machine; • there are precedence constraints of the form Opj -> O p l + 1 . Whenever specific operations on the benchmark problem are described, a sans serif font is used. 21 TABJJE. 2.1: Processing times for the benchmark problem. Operations Op1 Op2 Op3 2 3 4 J2 4 3 5 J3 4 1 3 For reasons discussed in Section 2.5.3, algorithms for flow shop problems typically focus on mini-mizing makespan—the total amount of time required to complete all the jobs in the problem. Thus, in operations research, the benchmark problem would be classified as an instance of the F3 | J Cmax class of scheduling problems . The F3 in the notation indicates that the scheduling environment is a flow shop consisting of three machines whereas Cmax indicates that the optimal solution minimizes the maximum completion time for all operations in the problem. Although the benchmark problem might appear trivial, shop problems (such as open shop, flow shop, and job shop) are generally very difficult to solve using exact techniques. The algorithms that do exist are typically directed at special cases, such problems with two machines, or problems for which each operation for each job is limited to a single unit of processing time (see [Brucker, 1998] for a summary of shop scheduling algorithms). A n interesting feature of the three-job, three-machine flow shop problem is that it is classified as being.strongly NP-hard [Pinedo, 1995, p. 101]. In other words, there is no algorithm that is known to provide the optimal solution to all instances of the problem in an amount of time that is a polynomial function of the size of the problem. Since a prob-lem with n jobs generates n\ unique permutation schedules4, solution techniques based on enumera-tion of all possible schedules becomes computationally impractical as the number of jobs increases. See Chapter 3 for more information on scheduling notation and the classification of scheduling problems. In a permutation schedule, the sequence of jobs on all machines is the same. Thus, i f the sequence of jobs on Ml is JI —» J2 —> J3, then jobs follow the same sequence on all other machines. Permutation schedules are known to be optimal with respect to the Cmax criterion for two and three-machine flow shop problems. However, this is not the case when more than three machines are involved [Brucker, 1998, p. 165]. 22 For example, if a computer can examine one billion schedules a second, solution to even a relatively modest 20-job problem requires just over 77 years of computation. 2 . 5 . 1 A S P E C I A L C A S E S O F T H E T H R E E - M A C H I N E J O B One of the earliest results in the field of operations research was an algorithm for solving the F2 | | Cmax class of problems [Johnson, 1954]. Known as the SPT-LPT rule (or Johnson's rule), the algorithm can generate a schedule that minimizes makespan and do so in an amount of time that is a polynomial function of the number of jobs, n. Although Johnson's rule does not scale to the three-machine problem in the general case, there are certain three-machine problems that can be transformed into two machine problems. If the processing time of job j on machine / is denoted Cj- •, then Johnson's rule can be used to schedule three machines whenever either min C j • > max C 2 j or min C 3 • > max C2 • is true for at least one j [Nahmias, 1989]. To schedule the three jobs with three operations using Johnson's rule, composite operations (e.g., O p l ' and 0p2') must be defined for each job. The duration of each composition operation, C / y , is determined as follows: C'\,j = C \ , j + C 2 , j Johnson's rule is then applied to the two composite operations in the normal manner (see Algorithm 2.1) to generate an optimal permutation schedule. Since (min C 3 • = 3) > (max C 2 j = 3) in the problem shown in Table 2.1, it is possible to used the three-machine version of Johnson's rule to generate a schedule. The composite (transformed) operation times are shown in Table 2.2. Since J3's 0p2' entry is the smallest, J3 is added to and removed from the list. The next smallest remaining entry is J l ' s Op l ' , which is added to JH. Finally, J2 is appended to the tail ofJH. The final schedule is the concatenation of JH and JT: J I , J2, J3. The Gantt chart for the final schedule is shown in Figure 2.2. 23 A L G O R I T H M 2.1: J O H N S O N S R U L E 1. Define "head" and "tail" schedules JH <- 0 and JT <- 0 1. List the processing time of the operations in two columns: O p l and Op2 (or in the transformed case, O p l ' and Op2') 2. Find the remaining operation in the two columns with the smallest processing time. a. If the operation is from O p l ' , append its job the tail of JH b. Otherwise, append the job to the head of JT 3. Remove the job from the table and continue until no further jobs remain. 4. Return J<—J„ + J7 TABLE 2.2: Processing times for the transformed benchmark problem. Operations Opt" Op2' J1 5 7 J2 7 8 J3 5 4 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 FIGURE 2.2 An optimal schedule for the benchmark problem: Johnson's rule has been used to minimise the total makespan of the schedule. 24 2 . 5 . 2 L I M I T A T I O N S O F T H E E X A C T A P P R O A C H Given that Johnson's rule reduces the search through n\ permutation schedules to a relatively small number of sorting and search operations, it should be possible to schedule any number of jobs that one might encounter in practice. However, there are at least three obvious limitations of the approach: 1. Number of machines: The algorithm can only be applied to two-machine and a restricted class of three-machine problems. Since manufacturing environments typically contain a much larger number of machines, the practical value of the approach is limited. 2. Deterministic processing times: Although a slightly modified version of Johnson's rule can be applied to stochastic flow shop problems, optimality is only assured if processing times are exponentially distributed [Pinedo, 1995]. However, it is difficult to imagine a case in manufac-turing in which the memoryless property of the exponential distribution is satisfied5. 3. Inflexibility: The formulation of the problem assumes that all jobs have the same priority, holding costs, and release dates. Due dates are not even considered. In the more realistic case in which high-priority jobs with tight deadlines arrive while other jobs are in process, the Cmax objective function and assumption of permutation schedules becomes overly restrictive. Although F3 | | Cmax is only one type of scheduling problem, and although Johnson's rule is only one of many sequencing rules and techniques in the scheduling literature, the limitations noted above apply to virtually all conventional scheduling techniques. This problem is discussed in greater detail in Chapter 3. 2 . 5 . 3 T H E M E A N I N G O F O P T I M A L I T Y It is important to keep in mind that the schedule Figure 2.2 is "optimal" with respect to a particular objective function, in this case Cmax. However, when considering the realities of modern manufac-turing, Cmax might seem to be an odd measure to minimize. Most manufacturing organizations are going concerns that run their expensive production machinery at maximum capacity always. As such, 5 The memoryless property states that the probability of an event occurring in the next interval of time is independent of the amount of time that has passed without it occurring. In the context of manufacturing, this implies that the probability of an operation being completed in the next interval of time is independent of the amount of processing it has already undergone. While this may be a legitimate assumption for an activity such as troubleshooting, it is less likely for conventional machining tasks [Pinedo, 1995, p. 256]. 25 it is rare that the true objective function of the organization is to schedule a finite number of jobs so that makespan is minimized. A more realistic objective function is to minimize total costs on an on-going basis. To illustrate the difference between minimizing makespan and minimizing total cost, an alternative schedule in which J2 and J3 have been switched is shown in Figure 2.3. Although the makespan of the total schedule has increased from 17 time units to 18 time units, the sum of the holding cost incurred by each job (assuming a holding cost per unit time in the system of $1) is actually less in the alterna-tive schedule, as shown in Table 2.3. M3 Time 1 2 3 4 5 6 M1 JI JI M2 JI JI JI I 7 8 9 10 11 12 13 14 15 16 17 18 I J2 J2 J2 J2 J2 J2 J2 JI JI JI JI J2 J2 J2 J2 J2 Legend JI n J3 FIGURE 2.3 An alternative solution to the benchmark problem: In this schedule, the sum of holding costs for all jobs is minimised. FABLE 2.3: Holding costs for the optimal benchmark and alternative schedule. Job optimal schedule (Johnson's rule) alternative schedule J i $9 $9 J2 $14 $18 J3 $17 $12 Total holding cost $40 $39 26 The reason for the difference is straightforward: the objective of Johnson's rule is to minimize the total makespan of the schedule by minimizing the amount of time that machines are sitting idle at the beginning and end of the schedule. For example, precedence constraints ensure that no job can start on M2 until the job is first processed on M l . By selecting the job with the shortest processing time on M l first, Johnson's rule minimizes the amount of time that M2 must wait before its first job arrives. Similarly, the "other half" of Johnson's rule minimizes the amount of time that the last machine (in . this case M3) is running by itself by sequencing jobs with short processing times on M3 at the end of the schedule. In this way, concurrent processing on more than one machine is maximized. The alternate schedule shown in Figure 2.3 is the solution to the F3 | | 1M>JCJ problem (where Wj — $1 for all jobs j). Because the benchmark problem is so small, this particular minimum cost solution can be found by inspection, or explicit enumeration of all solutions. However, in the general case, even the two-machine version of the minimum cost problem is NP-hard [Garey and Johnson, 1979]. As such, it appears that the popularity of the C M A X objective function is due more to its feasibility than its applicability to real-world problems. Minimizing makespan creates an interesting mathemati-cal structure and permits certain problems to be solved exacdy using polynomial time algorithms, such as Johnson's rule. O f course, there are certain situations in which minimizing makespan does make sense. For example, when erecting a building, the "holding cost" incurred by the developer is a function of the building's occupancy date. As such, it is not rational to expedite certain operations if the overall effect is to delay the occupancy date of the building as a whole. More generally, makespan minimization is suitable for many types of project schedulingVn which the final completion time of the project determines its overall costs. However, in the type of manufacturing problems considered in this thesis, the payoffs that jobs receive for being processed are assumed to be independent of one another. In this context, the "opti-mal" schedule shown in Figure 2.3 (minimum cost) is preferred to the "optimal" schedule shown in Figure 2.2 (minimum makespan). 2.6 C R I T E R I A F O R C O M P A R I S O N In order to be considered "better" than Johnson's rule, the proposed market-based solution should be able to find the minimum cost solution to the benchmark problem shown in Figure 2.3 without 27 resorting to exhaustive enumeration of all possible schedules. To be considered "much better" than Johnson's rule, the market-based solution should address the issues of scalability, non-determinism, and flexibility identified in Section 2.5.2. 28 C H A P T E R 3: T H E O R E T I C A L F O U N D A T I O N S A N D P R I O R R E S E A R C H The planning system described in this thesis is built on theory and practice from a number of differ-ent disciplines such as operations research, artificial intelligence, economics, and distributed artificial intelligence. In this chapter, relevant aspects of each of these disciplines is reviewed in an attempt to make the theoretical foundations of the system explicit and to situate the proposed system with respect to alternative approaches. In addition, the discussions of planning languages and economic theory set the stage for the more detailed descriptions in Chapter 4 of the market-based architecture developed in this thesis. 3.1 S C H E D U L I N G In this section, existing theory and practice in scheduling is examined from the perspective of two disciplines: operations research (OR) and artificial intelligence (AI). In both fields, a distinction is made between scheduling techniques that are guaranteed to provide optimal solutions (exact approaches) and techniques that do not provide guarantees of optimality but are computationally practical when applied to real-world problems (heuristic approaches). The distinction between exact and heuristic approaches is discussed in greater detail in Section 3.1.1. The O R perspective on conventional and emerging scheduling techniques is reviewed in Section 3.1.2. Since much of the research in O R involves the specification of algorithms for standard-ized classes of scheduling problems, and since thousands of distinct problem classes have been iden-tified and analyzed, the summary of the literature is necessarily superficial. The objective of Section 3.1.2.1 is merely to illustrate the basic hardness of scheduling problems and highlight some of the difficulties involved in creating algorithms that are both exact and computationally tractable. In Section 3.1.2.2, heuristic approaches based on dispatch rules and local search are reviewed. In Section 3.1.3, the scheduling problem is revisited from the perspective of AI research on planning systems. Planning and scheduling are normally taken to be distinct activities in manufacturing (recall Figure 2.1 on page 14). However, in the more general case, scheduling can be seen as a special subset of planning in which the focus shifts from sequencing actions to sequencing actions in such a way as 29 to optimize the use of time and resources [Georgeff, 1987], [Fox, 1987]. Since many of the gen-eral-purpose planning techniques developed in A I can be applied to scheduling problems, the broader A I planning literature is relevant. The review of A I planning is largely historical: it begins with the earliest formalisms and algorithms and proceeds to describe a series of increasingly complex planning systems. In all cases, the focus is on classicalplanning systems—systems based on deductive logic and theorem proving techniques to rea-son about means and ends. In Section 3.1.4, a non-classical form of planning—decision-theoretic planning (DTP)—is examined. In contrast to the logic foundations of classical planning, deci-sion-theoretic planning is based on the fundamental constructs of decision theory: utility and proba-bility. The overview in this chapter lays the groundwork for the more detailed evaluation of the relative merits of classical and decision-theoretic planning in Chapter 4. The section on A I planning methods concludes with a brief review of the constraint satisfaction problems (CSP) approach to scheduling. Constraint satisfaction is important in the context of sched-uling for two reasons. First, as discussed in Section 2.3, commercial constraint-based planning sys-tems are becoming increasingly common in manufacturing environments. Second, there has been dramatic success recently combining theorem-proving concepts from classical planning with con-straint propagation techniques from the CSP literature [Weld, 1998]. Related heuristic search tech-niques such as simulated annealing, tabu search, and genetic algorithms are also discussed. 3 . 1 . 1 E X A C T V E R S U S H E U R I S T I C A P P R O A C H E S Regardless of whether a particular approach to scheduling evolved from research in OR, A I , or any other discipline, the fundamental issue is the conflict between the exactness of the solution and the computational feasibility of generating the solution in the first place. A useful distinction is often made in A I between the time required to generate a solution (the search cost) and the time required to execute the solution (the path cost) [Russell and Norvig, 1995]. The objective of a scheduler is to mini-mize the path cost—that is, the costs associated with execution of the schedule. Performance goals (such as minimizing makespan or maximum lateness) are functions of time. Consequently, the spe-cific times at which actions start and end are critically important. However, when minimization of path cost is the stated goal, there is a tendency to de-emphasize search costs. This leads to what has 30 been called the assumption (or perhaps fallacy) of calculative rationality. Algorithms based on the assumption of calculative rationality perform an exhaustive search for a schedule that minimizes path cost. When the search space is large, however, the solution will come too late to be of any practical use [Jennings, Sycara and Woolridge, 1998]. Because of the conflict between path cost and search'cost, scheduling research in both OR and AI has experienced a bifurcation between exact approaches (approaches that emphasize path cost and optimality) and heuristic approaches (approaches that seek a balance between path cost and search cost) (Pox, 1987]. As shown in Table 3.1, the scheduling research reviewed in this chapter can be situ-ated into quadrants according to discipline and stance with respect to calculative rationality. TABLE 3.1:A2x2 matrixfor classifying approaches.to scheduling based on discipline and degree of exactness. Exact Heuristic OR Optimization techniques Heuristic approaches • deterministic rules (e.g., • dispatch rules W S P T , Johnson 's rule) • bottleneck schedul ing • integer programming • search (local, branch and bound) Al Classical planning Non-classical planning • theorem-proving systems • reactive planning • partial-order • constraint-directed search least-commitment planners • expert systems 3 . 1 . 2 S C H E D U L I N G I N O R O R as we know it today emerged from the Second World War as a distinct set of mathematical mod-eling and optimization techniques. Shortly following the war, these techniques were applied to the dif-ficult problem of determining the optimal sequence of activities in manufacturing environments [Hillier and Lieberman, 1986], [Pinedo, 1995]. 31 3.1.2.1 E X A C T A P P R O A C H E S The exact approach to scheduling in O R focuses on identifying generic classes of scheduling prob-lems and developing tractable algorithms that are guaranteed to find optimal solutions for all instances of the class. Informally, a tractable algorithm is one for which the amount of time required to attain a solution is a polynomial function of the size of the problem. A n important contribution of O R scheduling research has been to partition the space of scheduling problems into those that are known to be easy (polynomial-time solution techniques exist) and those which are suspected of being hard (no polynomial-time solution techniques are known to exist). The latter set can be further subdi-vided into problems that are N P (nondeterministic polynomial) hard in the ordinary sense and those which are strongly NP-hard [Pinedo, 1995], [Garey and Johnson, 1979]. A standardized notation of the form OC |. J3 | yhas been developed to describe generic classes of scheduling problems. The first term, a, describes the number and configuration of machines in the scheduling environment. The most common machine configurations and their notation forms are -listed in Table 3.2. The p term is used to specify zero or more processing characteristics of the sched-uling environment. For example, the Ji \ recrc notation describes a three-machine job shop in which a job may visit a machine more than once {recrc denotes "recirculation"). Finally, the y term describes the objective function to be minimized. Different objective functions are used in an attempt to capture the different underlying cost structure of scheduling environments. For example, a com-mon measure to be minimized is the sum of weighted completion times for each job j—^WjCj. A n alternative measure to minimize is the lateness of each job, Lj, where lateness is defined as the differ-ence between the job's completion time, Cj, and its due date, dj. In the first objective function, the primary source of cost is assumed to be associated with holding during production; in the latter objective function, the primary source of cost is assumed to be endogenously-determined penalties and rewards based on pre-specified due dates. The important thing to keep in mind when considering scheduling notation is that the goal is not to describe real-world problems; rather, the goal is to pro-vide a compact and uniform way to describe the tractability of certain classes of problems. For exam-ple, Table 3.3 shows the computability of three different single-machines scheduling problems. The most obvious conclusion that one draws from examining Table 3.3 is that even seemingly minor changes to the processing characteristics or objective function of a problem can result in intractabil-32 TABLE 3.2: Common machine configurations and their standard notations (from [Pinedo, 1995]). Configuration Description Example Single machine (1) the simplest case : all jobs must visit the single machine for a single operation the queue for a family physician Identical machines in parallel (Pm) m functionally-identical machines are placed in parallel; a job j requires processing on one of the machines a bank with m tellers. Flow shop (Fm) m different machines are placed in series; each job must be processed on e a c h machine and the s e q u e n c e of machines is the s a m e for all jobs an automobile assembly line with m stations Job shop (Jm) the shop consists of m machines; the route taken by e a c h job is fixed however different jobs may have different routes a c lass ic manufacturing job shop with m machines O p e n shop (Om) the shop consists of m different machines; the route taken by e a c h job is open (up to the scheduler) visiting m booths at a trade show ity—even in the single-machine case. Indeed, the class of scheduling problems that is considered trac-table is surprisingly small. In the case of "open shop" and "job shop" configurations—which are common in practice—only the 02 | | Cmax and J2 | | Cmax problems are known to be solvable in polynomial time [Pinedo, 1995]. Even the benchmark problem introduced in Section 2.5.belongs to a class of problems (F3 \ \ Cmax) that is known to be strongly NP-hard. In the case of the benchmark, it is the special structure of the problem instance that permits it to be solved in polynomial time using Johnson's rule. From the point of view of industrial-scale systems, exact approaches are of virtually no practical use due to the computational effort required to solve problems with more than a handful of jobs and machines. Although general optimization techniques such as integer programming (IP) can be used in practice to solve much larger problems, there is no guarantee that polynomial time solution methods exists for all IP problems [Hillier and Lieberman, 1986]. 33 TABLE 3.3: Classification of three classes of single-machine scheduling problems (from [Pinedo, 1995]). Problem class Description Computational complexity 1 | prec | Lmax single machine shop with precedence constraints between jobs (prec); the objective is to minimize the lateness of the latest job polynomial time solvable 1 | rj, prmp \ ~LWJUJ single machine shop in which e a c h job is released to the system at a different time (r7) and jobs can preempt jobs already in progress (prmp); the objective is to minimize the s u m of unit (i.e., fixed) penalties for late jobs NP-hard in the ordinary s e n s e 1 I rj I Lmax single machine shop in which each job is re leased to the system at a different time (rj)\ the objective is to. minimize the lateness of the latest job strongly NP-hard 3.1.2.2 H E U R I S T I C A P P R O A C H E S As indicated in the preceding section, the range of scheduling problems for which provably-optimal polynomial time algorithms exist is quite narrow. As a consequence, much of the applied work in O R has been directed at devising techniques that trade-off guarantees of optimality for tractability. In this section, a number of heuristic approaches from O R are briefly reviewed. 1. Dispatch rules — One heuristic approach that is commonly used in scheduling practice is dispatch rules (orpriority rules). A dispatch rule is simply a means of determining which job to schedule on a machine when the machine becomes free. Many dispatch rules were originally developed as exact techniques for small problems. However, experience has shown that cer-tain exact algorithms provide reasonably good results when used as approximations in larger, more complex problems. A n example of a heuristic dispatch rule that is commonly used in practice is shortest queue at next operation (SQNO) (see [Pinedo, 1995 p. 143]). Under this rule, the jobs waiting for the free machine are examined and the one with the shortest queue for its next operation is selected. The rationale for the S Q N O heuristic is that it helps to balance the queues at downstream machines and minimize starvation1. Note that since the rule does not 34 take into account other issues such as job priority or due dates, it is clearly incapable of sched-uling jobs such that overall costs are minimized. 2. Composite dispatch rules — One problem raised in Chapter 2 is that single-attribute objective functions may not accurately represent the complex multi-attribute objective func-tion faced by the firm. To address the problem of dispatch rules that are too narrowly focused on one measure of quality, composite dispatch rules can be used. In a composite dispatch rule, several basic dispatch rules are combined into a single index number. Weights are applied to each basic rule to determine its contribution to the overall index and simulation or experi-ence can be used to fine tune the weights for a particular scheduling environment (see [Pinedo, 1995 p. 145]). 3. Bottleneck scheduling — Another class of heuristics, which has been used to minimize the makespan of job shops (i.e., problems of the form Jm | | Cmax) is the shifting bottleneck heuristic [Pinedo, 1995]. The premise underlying the botdeneck heuristic and other bottle-neck-based approaches such as Optimized Production Technology (OPT) is that the overall rate of flow through the production system is governed by a small number of critical resources (e.g., bottleneck machines) that determine overall system throughput. Once the bot-tlenecks are identified, all scheduling effort is directed at ensuring efficient use of the botde-neck resources only (e.g., [Goldratt and Cox, 1986]). 4. Local search — In addition to the domain-specific heuristics described above, a number of generic local search techniques have been applied to scheduling problems. In local search, a neighborhood of alternate solutions is associated with each candidate solution. For example, for a particular schedule, a neighborhood of alternatives could be defined as all the schedules that result from a single pairwise switch in the ordering of jobs at a machine. Local search' proceeds by searching the neighborhood for an alternative that is better than the current solu-tion. If one is found, it replaces the current solution, a new neighborhood is defined, and the search continues. If a search technique is guaranteed to find an optimal solution, then it is characterized as complete; in the more common case, incomplete search is used to find a solution that satisfies some minimum desirability criteria. 5. Meta-heuristics — A n important issue in local search is the existence of local minima. A number of different approaches (i.e., meta-heuristics [Shaw, Brind and Prosser, 1996]) have been 1 A machine is said to "starve" when there are no jobs available for it to process. 35 . used to avoid premature commitment to a local minimum solution. For example, in simulated annealing, there is a non-zero probability of replacing the current solution with one that is worse. In tabu search, the current solution can be replaced by any neighbor that it not on an ever-changing list of forbidden transitions. In genetic algorithms, random mutations are inserted in the solution strings. The non-determinism injected into the search by these tech-niques has been shown to reduce the problem of premature commitment to local minima and minimize the risk of cycling over the same set of solutions (see [Russell and Norvig, 1995], > [Pinedo, 1995]). 3 . 1 . 3 S C H E D U L I N G A N D C L A S S I C A L P L A N N I N G I N A I The A I community's interest in scheduling per se is relatively recent (e.g., [Fox, 1987]). However, scheduling can be seen as a special case of planning and planning research dates back to the first "problem solving" systems in the mid 1950s (e.g., [Newell and Simon, 1956]) and to early robot plan-ning systems (e.g., [Fikes and Nilsson, 1971]). The primary difference between a typical planning problem (such as stacking blocks so that they are in a desired order) and scheduling is the relative importance of search and path costs (recall Section 3.1.1 on page 30). In planning, the goal is to determine an ordered (or partially ordered) sequence of actions that achieves a goal. As in any prob-lem-solving domain, the emphasis is placed on either minimizing search cost or finding a good solu-tion within an upper bound on search cost. The exact timing of the executed actions is generally unimportant to the planner as long as precedence constraints embodied in the plan are satisfied. In scheduling, many of'the critical decision (such as which actions to take, which resources to use, and so on) are predetermined. In this sense, scheduling is much easier than generalized planning. How-ever, the importance of path cost in scheduling means that any solution is not good enough—the best solution relative to a well-define cost function is the desired outcome. The requirement (or at least desire) for optimality and the requirement to explicitly include path costs in the problem formulation makes scheduling much harder than generalized planning. Despite these differences, the techniques used for planning can normally be adapted for use in scheduling. Within planning systems, a distinction can be made between on-line planners and off-line planners. In an on-line planning system, the agent reasons (at some level) about its decision problem before each 36 action. Thus, each decision "epoch" (or period) includes both search costs and path costs. For an on-line planner to operate effectively in its environment, the time required for search must be small in relation to the time required to execute actions, perceive changes in the environment. For example, an on-line planner to get groceries from a grocery store would not be particularly effective if it required several hours of deliberation after selecting an item in order to decide which item to select next on the grocery list. In off-line planning, reasoning about actions is done beforehand and the agent has a complete plan before executing its first action. Although the amount of computation is ultimately the same regardless of whether a particular search technique is implemented on-line or off-line , the off-line approach permits a clear separation between search and path costs. For this rea-son, off-line planning has been more widely used approach A I scheduling research. The question of whether a planning system's output is guaranteed to achieve the agent's goal is a function of the problem formulation. If the planning environment is accessible (the agent knows all it need to know to make decisions) and deterministic (the agent's actions and other events in the environ-ment have known outcomes), then theorem proving techniques can be used to generate a sequence of actions that is guaranteed to move the agent from its initial state to its goal state. Such systems are known-as "classical" planning systems and are reviewed in the following sections.4 3.1.3.1 K N O W L E D G E R E P R E S E N T A T I O N Classical planning is logic-based and is therefore inextricably linked to formalisms for representing and reasoning about knowledge. The two formalisms that are most relevant for a discussion of plan-ning systems are propositional logic and first-order logic (alternatively, first-order predicate calculus). This section contains a brief review of both these logics as they apply to planning. For a more detailed introduction to the use of logic in A I , see [Russell and Norvig, 1995]. Propositional logic consists of constants and symbols that are taken to be either "true" or "false" in the world. For example, if the symbol P represents the proposition "it is raining", then P is true In A I , systems that satisfy this assumption are described as being momentary because execution is assumed to occur within a single moment. This assumes the deterministic case in which there is no advantage to be gained from observing the actual outcomes of actions during planning. The classical approach is also described as "planning from first principles" because very general inference rules from deductive logic are used to select and sequence actions [Jennings, Sycara and Woolridge, 1998]. 37 whenever it is raining and false otherwise. The symbols of propositional logic can be used to form atomic sentences (e.g.,P) or combined using standard connectives to make complex sentences..For example, the sentence —\P v (Q A R) => S can be interpreted as follows: If P is not true or Q. and R are true then S is true. Propositional logic also includes a number of inference rules for deductive rea-soning about the truth value of sentences. For example, given that the two sentences P => Q and P are true, the modusponens rule can be used to conclude that Q is also true. First-order logic (FOL) is more expressive than propositional logic in that F O L provides a compact means of describing relations between objects. For example, the predicate On(A, B) is true whenever the object represented by the symbols is physically on top of the object represented by symbol B. In addition to relations, F O L permits the use of universal (V) and existential (3) quantifiers over vari-ables in sentences. For example, the universal quantifier can be. used to make general statements such as: "If the mother of x is Anne and the mother of y is also Anne but x and y are not the same person then x and y are siblings." In F O L , this can be written: Vx, y Mother(x, Anne) A Mother(y, Anne) A —>(X = y) => Siblings(x, y) In this way, a small number of facts and predicates can be combined via the inference rules of F O L to deduce a much larger set of facts and predicates. 3.1.3.2 T H E S I T U A T I O N C A L C U L U S First-order logic is sufficient for describing the state of the world. However, in planning systems, the objective is to describe changes in the world that result from events or the agent's own actions. The sit-uation calculus [McCarthy and Hayes, 1969] provides a means of describing states, events, and out-comes using first-order logic. The primary advantage of the situation calculus is that the deductive inference rules of F O L remain valid and may therefore be used to reason about transitions from situ-ation to situation. To represent changes in the planning environment, each relational predicate is nested within a special predicate holds f, s) where / i s a propositional fluent (that is, a property that is either true or false in the world) and sis a particular situation or state. For example, in a "blocks world" environment the fluent On(A, B) may be true in one situation, but false in another situation in which the stacking order of the blocks has been reversed. To represent the different situations in the situation calculus, two sentences 38 can be written: Holds(On(A, B), Sj) and Holds(On(B, A), S^)- Since the situation in which each fluent is true is made explicit, the inconsistency between the sentences is eliminated (i.e., both sentences can be true at the same time as long as S^^ S2-To represent actions, the special predicate Result(a, s) is used to describe the state that results when action a is performed in state s. Since the Results predicate returns a state, it is possible to represent any state in terms of a predecessor state and the actions applied to it. For example, the following sen-tence describes the result of executing the ReverseBlocks action on any two blocks x and y in any state s: Vs, x,'y • Holds(On(x,y), s) Holds(On(y, x), Result(ReverseBlocks(x, y), s)) In words: the property On(B, A) "holds" whenever the ReverseBlocks action is executed in any state in which On(A, B) holds. Thus, the new implied state (s') is written in terms of the initial state and the results of an action (Result(ReverseBlocks(x, y) , j)).Very complicated action descriptions (including preconditions) can be expressed using the Holds and Results constructs in combination with F O L . Given the formal representations of the situation calculus, it is possible to implement a planner using standard theorem proving techniques. To illustrate, assume that the desired goal state for an agent is Holds(On(B, A), s) and that On(B, A) is not true in the initial state. The planner can search through the list of possible actions and find one that has On(x, y) as an effect and make the appropriate sub-stitution of B for x and A for y. For example, the action StackBlocks may be within the set of capabil-ities for the planner and defined using the situation calculus as: V(s, x ,y ) • Holds(Clear(x), s) A Holds'Clear(y), s) A —i(x = y) A Holds(InGrip{x), s) => Holds(On(x,y), Result(StackBlocks(x,y), s)) The planner knows that if it can satisfy the preconditions of the action (i.e., that the two blocks are clear and that the block to be placed on top is in its grip) then it will achieve its goal. If the action's preconditions are not true in the current state, then the planner can push the preconditions on to its goal stack and set about finding an action that satisfies the preconditions. The planner repeats the process until the first precondition in the chain of action preconditions is satisfied by the initial state. This recursive process is called regression planning since the planner starts with the goal and works back-39 wards towards the initial state. It is also possible to search forward from the initial state [progression planning). The relative merits of either search strategy depend on the structure of the planning prob-lem being addressed [Russell and Norvig, 1995], [Georgeff, 1987]. Adoption of the situation calculus for planning systems has been hampered somewhat by the exist-ence of the so-called frame problem. The frame problem occurs because action descriptions describe what changes in the world as a result of an action. However, the action descriptions do not describe what has not changed as a result of the action. In order to be able to reason effectively, frame axioms need to be appended to the action descriptions to make them complete [Reiter, 1991]. The computational load created by the need to reason about the action descriptions and frame axioms has restricted the use of the situation calculus to this point to relatively small systems [Pednault, 1989], [Georgeff, 1987]. 3.1.3.3 T H E S T R I P S R E P R E S E N T A T I O N The STRIPS representation pikes and Nilsson, 1971] was developed in the late 1960s in response to the perceived shortcomings of the situation calculus and the frame problem in particular. Although the STRIPS planning system has long ago been superseded, elements of the approach—specifically, the STRIPS representation and the STRIPS assumption—remain in widespread use in the A I planning community. In the STRIPS representation, the current state is represented by a conjunction of atoms. Atoms are simply ground literals that can be either true or false. The literals are "ground" in the sense that STRIPS does not support reasoning about relationships and quantification like the situation calcu-lus does. Although one is free to write a STRIPS state atom as On(Book, Table), the sentence has no more or less meaning than the simpler alternative OBT5. The action representation in STRIPS consists of three elements: 1. Action name: each action in STRIPS has a unique name. Although names that look like com-plex predicates are permitted, they are not interpreted as complex predicates. As such, the action name StackBlocks(A, B) could be represented by any other combination of symbols, such as StackAonB. To make a clear distinction between sentences in F O L and STRIPS, the latter are shown in a sans-serif font. 40 2. Precondition list: The preconditions for an action are represented as a conjunctive list of atoms. For example, the preconditions for the StackAonB action could be written: {ClearA, -ClearB, InGripA}. 3. Effects list: The effects of an action are also represented as a conjunctive list of atoms. For example, the effects of StackAonB could be written: {OnAB, ^ClearB}. The STRIPS assumption states that atoms that are not explicitly referred to in the effects list are assumed to be unaffected by the execution of the action. By assuming invariance as the default, there is no need to add frame axioms that state that (for example) the color and the size of the blocks remain unaffected by the StackAonB action. Although the STRIPS representation is not as expressive as the situation calculus [Pednault, 1989], it has the virtues of being simple to understand and imple-ment. Further more, as discussed in Chapter 4, the language can be extended in a straightforward manner to support rich representation of non-deterministic actions. 3.1.3.4 CLASSICAL P L A N N I N G A N D R E F I N E M E N T S The basic process of planning using theorem proving has been refined in a number of different ways to avoid unnecessary computational effort, decompose the problem into smaller, manageable pieces, and account for unexpected outcomes in plan execution. For a more detailed overview of advances in AI planning see [Weld, 1998], [Russell and Norvig, 1995], [Weld, 1994], [Georgeff, 1987]. 1. Partial order planning: In partial order (or least commitment) planning, the planner refrains from specifying a fully-ordered sequence for actions until it.is forced to by a binding prece-dence constraint. In all other cases, it simply maintains a list partial ordering and causal rela-tionships between actions. If a particular combination of actions in a particular sequence creates conflicts or inconsistencies, the sequence of the offending actions is fixed to make them consistent. To illustrate, consider the process of getting dressed in the morning. Certain actions, such as PutOnSocks and PutOnShoes are subject to a precedence constraint. As such, the partial order planner must enforce the sequence: PutOnSocks—>PutOnShoes. However it is immaterial whether one puts on one's shirt before or after putting on one's socks and shoes. Indeed, there is nothing inconsistent with the sequence PutOnSocks—»PutOnShirt—> PutOnShoes. As such, a partial order planner would leave the groups of actions PutOnShirt and PutOnSocks—>PutOnShoes unordered but specify the ordering of the latter two actions. 41 2. Hierarchical planning: Execution of an action such as StackAonB by a robot could involve a large number of detailed steps. For example, the robot would have to move its arm to Block A, open its grip, close its grip on the block, lift the block, and so on. A n action such as CloseGrip could also be further decomposed into fine-grained actions including sensing the status of the gripper, sending signals to the motor, and so on. From a computational point, of view, increasing the number of actions can lead to an exponential increase in the complexity of the search [Russell and Norvig, 1995]. As such, it is advantageous to create a hierarchy of abstract actions that encapsulate more primitive actions. If such a hierarchy can be con-structed, then the planner can plan using a relatively small number of abstract actions and then work out the specifics of the finer-grained actions independently of other fine-grained actions located under different branches of the hierarchy. 3. Conditional planning: Conditional planning addresses the fact that plan execution is not flawless. The agent may execute an action and find itself in a different state than expected. For example, when executing the PutOnShoes action, the agent may break a lace and be unable to complete the action. Conditional planners attempt to enumerate all possible contingencies and generate a sequence of actions for each. O f course, conditional planning presupposes the agent's ability to sense the current state of the world. If, for example, the agent broke a lace and correctly recognized the outcome, it could execute an alternate branch or actions to deal with the contingency. Although classical planning systems such as STRIPS work well for small "blocks world" problems, such systems do not scale well to larger problems: One important consideration is the amount of search that might be required to find a complete plan. If the number of actions available to the agent is large or some lead to dead ends or loops (e.g., the ReverseBlocks action above), the amount of time required for the planner to satisfy the goal (assuming the goal is satisfiable) could be considerable.' Indeed, Chapman [1987] has shown that planning—regardless of method—is intractable. 3.1.4 D E C I S I O N - T H E O R E T I C P L A N N I N G A second problem with classical planning approaches is that the agent is assumed to have perfect information about its state and the effects of its actions. These assumptions are relaxed somewhat in conditional planning since the agent is given a contingency plan—a mapping from situations to 42 actions for every situation in which the agent can find itself. Unfortunately, a conditional plan is not sufficient for an agent in an uncertain environment. To illustrate, consider the simple mobile robot scenario described in [Dean et al., 1993] and reproduced in Figure 3.1. A mobile robot is navigating a two-dimensional space in an effort to reach its goal (a charging station) at coordinates (1, 4). The charging station is adjacent to the open end of a stairwell (which the robot is not equipped to navi-gate). The remaining three sides of the stairwell are closed off by a railing which the robot cannot breach. The task of the planner is to determine the robot's best action in any state in which it might find itself. t t Legend charging station (high positive utility) stairwell (high negative utility) railing around the stairwell robot action for state FIGURE 3.1 An example of an optimal contingency plan for a probabilistic planner: The shaded area in the center of the two-dimensional space is a stairwell. Clearly, the robot's best action to perform in any state is contingent on the robot's location with respect to the charging station and the stairwell. A conditional planner would provide the robot with a contingency plan in the form of a mapping from states (in this case, x, y coordinates) to actions (such as MoveNorth, MoveSouth, and so on). However, what a conditional planner does not do is pro-vide the agent with a means of reasoning about how its own actions affect the states in which it finds itself. In any real-world environment, an agent's actions are non-deterministic. For example, the agent may execute the MoveWest action but, due to some mechanical problem, end up instead moving one unit to the south with probability p. If the agent were to execute the non-deterministic MoveWest action in the location (2, 4), it would achieve its goal with probability (1 - p) and fall down the stairs and be smashed with probability p. In such circumstances, it is important that the agent be able to 43 take into account not only the possibility that its actions are imperfect, but also all the potential con-sequences of all possible outcomes. 3.1.4.1 P R O B A B I L I S T I C R E P R E S E N T A T I O N S O F A C T I O N S A natural way of reasoning about the desirability of uncertain outcomes is decision theory. A deci-sion-theoretic (or probabilistic) planner assigns utilities to states of the world and uses its knowledge of probabilistic outcomes to select actions such that its expected utility is maximized. For example, assume that the mobile robot in Figure 3.1 associates a utility of 10 with reaching the charging station and a utility of -100 with falling down the stairwell. Assume as well that although the agent does not know the exact outcome of any of its actions, it knows the probability that a particular action exe-cuted in a particular situation will result in a particular outcome with a known probability. For instance, each action could have a probability distribution over outcomes similar to that shown for MoveWest in Table 3.4 below. TABLE 3.4: The probability of each outcome for the MoveWest action. Effect Probability West 0.80 No change 0.10 North 0.05 South 0.05 The utility of being in a particular state is therefore the reward (or penalty) associated with the state, plus the utility the agent can expect to receive from executing an action in the state. Returning to the example of location (2j 4) in Figure 3.1 and the probabilities in Table 3.4, the immediate reward real-ized by the robot by being in state (2, 4) is zero (in this problem formulation, only the stairwell and charging states have utilities associated with them). When contemplating whether to execute the MoveWest action in an effort to reach the very attractive state to the West (the charging station), the robot must also consider the 10% chance that it will remain in the same zero-reward state and the 5% chance that it will tumble down the stairwell. 44 To express the elements of a decision-theoretic planning problem formally, the following notation is used: • S is the finite set of mutually exclusive states that describes the agent's environment. State s is an element of the state space (s 6 S). In Figure 3.1, the state space is defined as a two-tuple (x, y) describing the robot's location on a 4 X 5 grid. • U(s) is the utility to the agent of being in state s. In the case in which the agent is risk-neu-tral, utility is a linear function of the monetary value associated with the state, V(s) and the terms "utility" and "value" are interchangeable. • r(s) is the immediate reward the agent receives for being in state s. For example, if the mobile robot moves into the location containing the charging station, it receives an immediate reward of 10 units of utility. O f course, rewards can also be negative: the states corresponding to the stairwell have large negative rewards. , • a e As denotes that an action a belongs to the set. of feasible actions A associated with state s. Not all actions are feasible in every state. For example, whenever the robot is in a location such as (2,1) bounded on the North by the stairwell railing, it cannot execute the action MoveNorth. In other words, MoveNorth g ^4^2 ^ . • c(a, s) is the cost of executing action a in state s. The inclusion of action costs in the for-mulation permits the costs and benefits of decisions to be compared direcdy. For example, a cost of one unit of utility may be charged to the robot whenever it moves to a new location. In the mobile robot context, the action cost represents the drain on the robot's battery as it moves around. As with negative rewards, it is possible to have negative costs (i.e., execution .of the action creates an increase in the agent's utility)6. • (3 is a constant between 0 and 1 that is used to discount costs and rewards in the future. A value of zero means that the agent fully discounts the future and considers immediate rewards only. Conversely, a (3 value of 1 means that the planner treats future costs and rewards and immediate cost and rewards identically. In many infinite horizon problems, the discounting factor is used to capture the intuition that a unit of utility today is worth more than a unit of utility in the future. Although the semantics o f negative costs and rewards can be confusing, the terms are used throughout this thesis to remain consistent with the standard DTP nomenclature. 45 • P(j\a> s) is the probability of moving to state j after performing action a in state s. Essen-tially, this is a transition probability matrix that contains the probability of moving between any two states in the state space as a consequence of executing action a. A complete action representation for a particular, planning problem requires a |'J"|X matrix for every action. Given this notation, it is possible to define the utility of being in a particular state as the immediate reward for being in the state plus the cost of executing the chosen action in the state plus the expected value of executing the action. Assuming the risk neutral case: U(s) = V(s) = r(s) + c(a,s) + ^^P(J-W,s)V(J) . Eq: 3.1 jeS Since the value of each state can depend on the value of all other states, a separate equation must be written for each state in the state space. The | S | equations in | S | unknowns can be solved as a set of simultaneous equations or using iterative techniques such as value iteration [Puterman, 1994]. 3.1.4.2 F I N D I N G O P T I M A L POLICIES A policy, TC, is defined as a mapping 71: s —> a e. As for all s e. S. The question for a decision-theo-retic planner is how to select actions such that the expected utility of the policy is maximized. Fortu-nately, this formulation of the decision-theoretic planning problem maps to a well-known class of discrete, finite state, history independent, fully-observable Markov Decision Problems (MPDs) [Boutilier, Dean and Hanks, 1999], [Boutilier and Dearden, 1996], [Boutilier, Dearden and Gold-szmidt, 1995], [Puterman, 1994], [Dean et al., 1993]. One means of solving this type of M D P is policy iteration [Howard, I960]. Policy iteration consists of two stages per iteration. In the evaluation stage, the value of each state in the state space is determined using the series of equations induced by Equation 3.1. In the refinement stage, the policy is improved on a state-by-state basis by replacing the current action with a better action (if one exists). The evaluation and refinement procedures are repeated until the refinement results in no changes being made to the policy. A n important feature of the M D P formation is that policy iteration is guaranteed to converge on the optimal policy, TC*, in polynomial time [Puterman, 1994]. The details of the policy iteration algorithm are summarized in Algorithm 3.1. 46 A L G O R I T H M 3.1: P O L I C Y I T E R A T I O N 1. TC1 <— any policy on S 2. While T C * T C ( a. TC' <— TC b. For all s e S, find V%(s) using Equation 3.1 c. For all s € S, evaluate a* where ( a* e arg max ae A V(s) = r(s) - c(a, s) + p £ P ^ s) V<J) jeS Eq: 3.2 d. If n(s) * a* then lt(>) <— a* 3. Return TC 3.1.4.3 D T P O B S E R V A T I O N S There are two observations that can be made about D T P at this point. First, taking into account many different costs and rewards can lead to complex behaviors that may be unintuitive. For exam-ple, consider the case of the mobile robot in (2, 4). The value to the robot of executing MoveWest action is less than one might think due the 5% chance of falling down the stairwell and the large neg-ative reward associated with this outcome. Indeed, the rational course of action might be to avoid the open end of the stairwell to the greatest extent possible. Thus, if the robot started in a state on the East side of the state space, it might expend extra effort to travel around the South side of the stair-well (which has a protective railing) even though an action cost is incurred for each movement. This policy is shown by the arrows in Figure 3.1. 47 More generally, an important advantage of the decision-theoretic framework is that it provides a means of making trade-offs between different aspects of the agent's planning environment in the face of uncertainty. 1. Multiple goal states — Unlike classical planning, a decision-theoretic planner can reason about multiple goal states. In some cases (e.g., the charging station and the stairwell) the goals can be conflicting. 2. Multi-attribute goals — Since goals are represented by utility values, multi-attribute utility functions can be used to express complex relationships between states and the agent's assess-ment of the state's utility. ; 3. Standard unit of measure — A l l cost and benefits faced by the agent'are expressed in the same units of utility. The second observation about decision-theoretic planning is that solution via the M D P techniques such as policy iteration requires explicit enumeration of the state space. Unfortunately, the size of the state space is exponential in the number of attributes used to represent the planning problem. To illustrate, consider the following extension to the mobile robot planning problem in Figure 3.1: After some experimentation, it is determined that the robot is more reliable on a scuffed floor than a recently polished floor. Thus, on scuffed floor, the probability of staying in same position given an action is 0.025 instead of 0.10. The Markovian assumption on which policy iteration is based requires that all the information that is required to make a decision be encoded in the state definition. Thus, the state definition would have to be expanded to a three-tuple (x,y,f), where/describes the condition of the floor. This simple change doubles the size of the state space from 20 to 40 states. 3.1.4.4 T H E C U R S E O F D I M E N S I O N A L I T Y In general, a problem consisting of d propositional state variables induces as state space of size | S | =2^ states. The exponential growth in problem size is known as the curse of dimensionality because it greatly limits the usefulness of the conventional M D P formulation in practice. Thus, although pol-icy iteration can be used to compute the optimal policy in an amount of time that is polynomial in the 48 number of states, the number of states are exponential in the complexity of the domain description. The net result is therefore intractability for all but the smallest problem instances. A number of approaches have been proposed in the literature to cope with the curse of dimensionab ity. For example, [Dean et al., 1993] restrict the planner's attention to areas of the state space ("enve-lopes") that are likely to be encountered as the agent moves towards its goal state. In a similar approach, [Barto, Bradtke and Singh, 1995] interleave planning and execution in order to use infor-mation about the current state to restrict the planner's attention to likely future states. [Boutilier, Dearden and Goldszmidt, 1995] employ the concept of "abstract states" to reduce the effective num-ber of states that are visited by the policy iteration algorithm. The notion of abstract states is based on the observation that not all state variables are relevant in all states. For example, i f the mobile robot in Figure 3.1 has to execute the action GetCharged while at the charging station to get the reward, then only those state variables that affect the probability distribution over action outcomes or the utility of the state itself are relevant. In this example, the condition of the floor is irrelevant once the robot is in the abstract state (1, 4, •). The foundations of D T P and strategies for avoiding the curse of dimen-sionality are discussed in greater detail in Section 4.2. 3 . 1 . 5 C O N S T R A I N T - B A S E D P L A N N I N G Given the complexity of real-world manufacturing environments, it is sometimes better to abandon the strategy of focusing on narrow objectives (as a reducing makespan or eliminating tardiness) and reframe the task of allocating resources to jobs as a constraint satisfaction problem (CSP). In a CSP, the objective is not to find an optimal solution. Instead, the objective is to consider a very large space of possible solutions and find one that satisfies a number of domain-specific constraints. To achieve this end, the constraints are used to guide the search for .solutions and eliminate large areas of the search space. For example, if it is known that a drilling operation must be followed by a deburring operation, then all regions in the state space in which the precedence constraint is violated (i.e., deburring pre-cedes drilling) can be eliminated from further consideration. One of the first and best-known constraint-based systems for scheduling was the ICIS system (and its successors) developed by Mark Fox [Fox, 1994], [Fox, 1987]. ICIS was used for production planning in a Westinghouse Electric turbine factory. At the time of the study, there were as many as 200 shop 49 orders in progress and each order consisted of ten or more operations. One of Fox's key observations about production environment is that schedulers spend only a small fraction of their time (10% to 20%) dealing with issues such as due dates, process routings and machine availability. The rest of the schedulers' time is spent resolving other issues such as the availability of tools and fixtures, machine breakdowns, setup time reduction, and a multitude of soft preferences and constraints. In ICIS, the term "constraint" is defined broadly to include many different types of domain knowl-edge such as organizational goals, physical limitations and capabilities, causal relationships, resource 'availability, and preferences. A frame-based knowledge representation language called SRL is used to express the constraints and their interrelationships. The constraint-representation language also includes constructs such as priority (for determining which, constraints are more important), utility of the current attribute value, possible relaxations of the constraint, and interactions between con-straints. • ' • In addition to using constraints to limit the search space, ICIS uses hierarchy to partition the problem into a series of levels. In Level 1 search, orders are scheduled according to priorities and due dates. In Level 2, capacity analysis is performed to determined the earliest start and latest finish times for each order. In Level 3, specific resources other than machines are considered and time bounds for each resource are created. In Level 4, the "partial order" specifications from Level 3 are fully ordered with the objective of minimizing work-in-progress time. In this sense, the decomposition of the manufac-turing problem into levels mirrors the conventional decomposition from Figure 2.1 on page 14. [Zweben and Fox, 1994] contains a collection of papers describing several applications of CSP-based methods for scheduling. For example, the "decision-theoretic scheduler" (DTS) described in [Hans-son and Mayer, 1994] uses decision-theoretic concepts (utility and probability) to guide the selection of heuristic functions for searching the space of possible solutions. The expected utility of each heu-ristic is used to make trade-offs between evaluation functions. [Kumar, 1992] contains review of other techniques for local search and constraint propagation. 3 . 1 . 6 R E C E N T A D V A N C E S I N P L A N N I N G There have been a number of recent advances in A I planning since this thesis was started (see [Weld, 1998] for a review). For example, the BLACKBOX planner is capable of solving 105-action planning 50 problem over 14 time steps with 10 possible states in six minutes [Kautz and Selman, 1998]. Like many of the newer planning systems, BLACKBOX is. based on the synthesis of classical planning tech-niques and constraint-directed search techniques. 3.2 M A R K E T S A N D E Q U I L I B R I U M The market-based approach to problem solving is very different from the monolithic scheduling and planning techniques described in the preceding sections. In market-based approaches, solutions emerge from the interaction of many individual agents. From a designers' point of view, the critical issues are quality and stability of the market outcome. Fortunately, there is a large body of economic theory that addresses precisely these issues. The basic problem investigated in economics is the "allocation of scarce means to satisfy competing ends." [Becker, 1976]. The fundamental building blocks used in economics to represent and reason about allocations problems are agents and markets. According to Becker, the economic approach reduces to three assumptions about agents and markets: 1. agents have stable, well-defined preferences, 2. agents exhibit maximizing behavior, and . 3. markets exist to facilitate the allocation of resources among agents. The first two assumptions concern the rationality of the agents that participate in the economic sys-tem. The third assumption concerns the provision of markets and the emergence of equilibria in the markets. In the following sections the theoretical foundations of microeconomics are briefly reviewed. In Section 3.2.1, the term "rationality" is defined and illustrated with a simple model. In Section 3.2.2, the focus shifts from a normative model of individual behavior to a descriptive model of collective behavior. The concept of competitive equilibrium is reviewed through the use of a two-agent, two-good Edgeworth box economy. The objective of the section on equilibrium is to illus-trate the concept of Pareto optimality and the stability of Pareto optimal outcomes. A shortcoming of standard economic models of equilibrium is that they rely on the assumption of a competitive market. In the markets used in this thesis, there is generally insufficient market depth for an equilibrium price to be discovered through tdtonnement. As a consequence, an alternate price discovery mechanism must be used and Section 3.2.3 contains a short summary of auction theory. Different auction forms 51 are reviewed and equivalence relationships between forms under certain conditions are emphasized. The section on auction theory is included for two reasons: first, it shows how certain auctions can be relied on to yield a Pareto efficient outcome; second, it provides a theoretical justification for the use of a continuous reverse auction in Section 4.3.2. 3.2.1 R A T I O N A L I T Y According to the standard economic and decision-theoretic definitions of rationality, a rational agent is one that maximizes its expected utility subject to a consistent set of beliefs and desires [Von Neu-mann and Morgenstern, 1944]. Under this formulation, beliefs are represented using probability distri-butions and therefore a consistent set of beliefs is one that satisfies the basic axioms of probability theory. For example, a rational agent cannot believe that it will rain with a probability of 0.6 and also believe that it will not rain with a probability of 0.6. A n agent's desires are represented using real-valued utility functions and therefore a consistent set of desires must satisfy the standard axioms of utility theory (e.g., transitivity, continuity, and monotonicity) [Clemon, 1996], [Holloway, 1979]. A model of rational decision making is shown in Figure 3.2. A n agent starts with a belief (expressed as a probability) that it is in a particular state. As in Section 3.1.3, the assumption is often made in practice that the agent is in an accessible (or fully observable) environment and that its beliefs about its state are certain. Given that the agent is situated in a world of infinite detail and complexity, the assumption of accessibility is difficult to justify. However, the assumption is often critical because partially-observable problems are much more complex mathematically [Littman, Kaelbling and Cas-sandra, 1995]. The compromise stance adopted here is that the state of the agent is not fully known, but that the most relevant features (such as the w-tuple descriptions of the from the robot example in Section 3.1.4) are known with adequate certainty. In order to select an action to execute in a state, the agent draws on a second type of belief—belief about causality. Since actions are non-deterministic, an agent does not know a priori what the precise outcome of an action will be. However, the agent does have beliefs in the form of a probability distri-bution over outcomes for each action in each state. Moreover, each possible outcome is associated 7 Tatonnement—the iterative process of signals and price adjustments—was originally described by Leon Walras in the 1800s [Cheng and Wellman, 1998]. 52 states • beliefs about state actions • knowlege of alternatives and capabilities outcomes A A payoffs A beliefs about causality rational decision making knowledge of preferences and objectives FIGURE 3.2 A model of rational decision making under uncertainty: Agents select actions in states in order to maximize their expected utility over some decision horizon. In order to do this, agent require different kinds of knowledge about their state, the outcome of actions, and their own preferences regarding outcomes. with a known payoff for the agent and each payoff in turn induces a utility. In mapping payoffs to utility, the agent may be risk adverse, risk neutral, or risk seeking. In the case in which the agent is risk neutral and all payoffs are expressed in monetary terms, the maximization of expected utility is equiv-alent to the maximization of expected monetary value (EMV). It is important to note that the fundamental constructs in the economic/decision-theoretic definition of rationality (maximization of expected utility over some decision horizon given an information set) are identical to those of the M D P formulation of agent planning in Section 3.1.4. That is, the M D P formulation provides a computational method of implementing the model of rationality shown in Figure 3.2. It is equally important to note, however, that a distinction is sometimes made between decision-theo-retic (or thin) rationality and a broader notion of rationality that requires that beliefs not only be con-sistent with respect to axioms, but also that the beliefs be grounded in available evidence [Elster, 1983]. The precision and accuracy of an agent's beliefs are important when one considers the ques-tion of optimal decision making. A course of action which leads to maximization of expected utility by an agent can be considered optimal, but only with respect to the agent's information set. Thus, although a thinly rational course of action is first-best, it is generally possible to do better by improving the 53 degree to which the agent's beliefs reflect the reality of its environment. This corresponds to the con cept offineness: a rational agent can expect to do at least as well and possibly better with finer informa tion [Marschak and Radner, 1972]. In the extreme case, the agent's information about cause and effect is so fine that each conditional probability is equal to 1.0. On the other hand, there are costs associated with gaining finer and better information. In the formulation of rationality shown in Figure 3.2, the costs and benefits of better information are not considered. 3 . 2 . 2 E Q U I L I B R I U M In systems consisting of more than one rational agent, there is often conflict between the agents' individual goals. In a pure market environment, inter-agent conflict manifests itself in a single form: contention for scarce goods. The issue addressed in equilibrium analysis is how multiple rational agents arrive at a solution—that is, how they divide multiple goods amongst themselves. Since mar-kets contain no element of centralized control, an equilibrium outcome (if one exists) evolves solely through the self-interested behavior of market participants. Formally, a competitive (or Walrasian) equilibrium occurs when the demand for goods exacdy equals the supply of goods in an economy. To illustrate the basic concepts of equilibrium when multiple goods are being exchanged simultaneously (i.e., general equilibrium), the simplest case of an Edge-Q worth box economy is used. The concepts and theories introduced in the Edgeworth box analysis in the following sections are revisited in Chapter 5 to support conclusions about the efficiency of cer-tain market mechanisms. 3.2.2.1 INDIVIDUAL C O N S U M E R S A N D B U D G E T S The critical feature of a market economy is that the goods that an agent may wish to acquire are avail-able for trade with other goods at known rates of exchange [Mas-Colell, Whinston and Green, 1995]. To understand how agents make trade-offs between goods, consider the case of a single agent i in a two-good economy. The agent has an initial endowment of each good, CO- = (COj,-, C02l) and the objective of the agent is to find a bundle x* = (x*,-, x*2i) that maximizes its utility. In seeking to max-imize its utility, the agent is subject to the constraint that its wealth is fixed. The agent's wealth, w-, is 8 The material on general equilibrium and Edgeworth boxes can be found in most microeconomic texts. The terminology and notation used here are from [Mas-Colell, Whinston and Green, 1995]. 54 FIGURE 3.3 The Walrasian budget set: In (a) the budget line delimits the set of feasible consumption bundles for agent i subject to the agent's initial endowment CO,- and the prices of the goods. The agent's most preferred bundle within its budget set, xi is shown by the right-most indifference curve in (b). simply defined as its initial endowment of goods multiplied by the market prices of those goods: wt = p • x;- = P\XXi + P2x2i-> where/? is a vector of prices for the L goods in the economy: p e 9 ^ . The Walrasian budget set denotes the set of feasible consumption bundles for the agent subject to its budget constraint, as shown in Figure 3.3(a). A n indifference curve for agent i, >j, denotes all the consumption bundles that are equally preferred by the agent. In other words, every bundle along an indifference curve provides the agent with the same utility.9 Under normal circumstances, increasing utility is represented by curves further from the ori-gin. Thus, although the initial allocation CO,- and the alternative allocation x, shown in Figure 3.3(b) are equally affordable (they both lie on the budget line) the indifference curves indicate that the bundle * * X{ is stricdy preferred by the agent. Indeed, the bundle represented by x,- is optimal since it is the point at which the budget line is tangent to an indifference curve. 3.2.2.2 P A R E T O O P T I M A L A L L O C A T I O N S IN A N E D G E W O R T H B O X E C O N O M Y A n Edgeworth box is a graphical tool for representing general equilibrium in pure exchange markets consisting of two agents and two commodities. In a pure exchange market, there is no production or 9 Indifference curves are traditionally drawn with a concave shape to indicate decreasing marginal utility as the consumption bundle approaches either of the axes. 55 x n (a) (b) F I G U R E 3.4 Equilibrium in an Edgeworth box economy: In (a), an Edgeworth box is used to show a Var eto optimal outcome at point x* given an initial endowment CO and a vector of prices p. The graphic labeled (b) shows the effects of a non-equilibrium price and the excess supply for Good 2 created by the price. The grey area show the path taken by the budget line as the prices converge to their equilibrium values. transformation of goods. Instead, the two agents seek to maximize their utility by exchanging goods 4 with each other. A n allocation x G is a non-negative consumption bundle of good for each agent: x = ^ l ) ' ( x 1 2 ' x22^ • A n Edgeworth box is created by taking a diagrams of the form shown in Figure 3.3(b) for each agent and placing them kitty-corner, as shown in Figure 3.4(a). The origin for Agent 2 is at the top right of the box and thus points within the box describe how the total endowment of each good is split between the two agents. Since there is no production of goods within the economy, the length and width of the box are simply the sum of the initial endowments of each agent: 05 j = COj j + (0\2 a n d G 5 2 = C 0 2 i + G ) 2 2 • Given a vector of prices p = {p\,P2}, a budget line with slope ~P\/p2 c a n be drawn through the box. Since the initial endowment is known to be affordable for each agent (indeed, it defines the wealth of the agents), the budget line must intersect the point CO. To illustrate the interpretation of an Edgeworth box, consider the initial allocation and the indiffer-ence curves for each agent shown in Figure 3.4(a). The allocation at CO is not Pareto optimal since it is possible to find a different allocation that makes neither agent worse off and at least one agent better 56 off. Specifically, an allocation is better for Agent 1 if it lies above and to the right of Agent I's initial indifference curve, >j. Conversely, an allocation is better for Agent 2 if it lies below and to the left of Agent 2's original indifference curve, >j. Since a feasible allocation must simultaneously satisfy the budget constraints of both agents (i.e., it must lie on the budget line), the set of allocations that are preferred by both agents to the initial allocation CO are the points on the budget line bounded by >i and >2- • ' At the allocation represented by the point x in Figure 3.4(a), the budget line is tangent to the indif-ference curves for both agents. Given the slope of the budget line, neither agent can do better than x and thus a state of. competitive (or Walrasian) equilibrium is achieved. In addition to being stable, the equilibrium is Pareto efficient. Indeed the relationship between equilibrium and welfare maximi-zation holds across a wide range of economies, not just the Edgeworth box economy considered here. The firstfundamental theorem of welfare economics'states that under perfectly competitive conditions, any equilibrium allocation is a Pareto optimum (see [Mas-Colell, Whinston and Green, 1995], -[Mukherji, 1990], [Katzner, 1988]). 3.2.2.3 E Q U I L I B R I U M PRICES To this point, little has been said about the equilibrium vector of prices,/?, that determines the slope of the budget line and thus the location of the equilibrium allocation. Under the fully competitive conditions required by the first theorem, agents are price takers—that is, they have no influence on the prices of the goods in the marketplace. To illustrate the manner in which the equilibrium price * p is discovered by the market, consider the non-equilibrium case in which the prices of the two goods are set to some arbitrary values p' ^p* . A t these prices, the optimal allocations for each agent lie on different points of the budget line, as shown in Figure 3.4(b). To move from CO to its preferred alloca-tion at the point marked X j , Agent 1 would have to sell a certain quantity of Good 1 and buy a cer-tain quantity of Good 2. Conversely, to move from CO to its preferred allocation at the point marked x 2 , Agent 2 would have to sell a certain quantity of Good 2 and buy a certain quantity of Good 1. However, as the figure shows, the amount of Good 2 demanded by Agent 1 is much less than the amount that Agent 2 needs to sell (and vice-versa for Good 1). 57 To see how an equilibrium price emerges, consider the case of Good 2 along the horizontal axis of Figure 3.4(b). Although Agent 2 requires a significant amount of Good 1 to move to its most pre-ferred allocation, Agent 1 is willing to sell much less than this amount given the current ratio of prices. In this case, Agent 2 should be willing to pay a little more for the good. In a competitive mar-ket, no single agent can control the prices of goods. Thus, the relative prices of the goods change so that supply exactly equals demand and the market clears. Graphically, the budget line in Figure 3.4(b) would pivot through CO, as shown by the shaded area until no excess demand exists for either good. 3 . 2 . 3 A U C T I O N T H E O R Y In competitive markets, an ill-defined process of tatonnment is constantiy at work adjusting prices in response to excess supply or demand. The metaphor of an auction is often used to describe the pro-cess: prices evolve as if each market is controlled by an implicit Walrasian auctioneer who collects infor-mation about supply and demand and selects a price so that the market clears. In non-competitive markets (i.e., in a market in which there are not many buyers and sellers of undifferentiated goods), the auction process must be explicit. In practice, we see auctions used to sell a diverse range of goods such as financial instruments, agricultural commodities, and antiques. The key functions of an auction are allocation of goods and price discovery. As such, an auction is especially useful when a selling agent does not know the value of a good it intends to sell but wishes to maximize its revenue from the sale. Auctions are very different from fixed-price markets. In a fixed-price market, the seller sets a price and the first agent that is willing to pay the price gets the good. If the market is not fully competitive, the seller may set the price too low. Although the good will sell if underpriced, there may be a differ-ent agent who would have paid more for the good. In contrast, the seller may set its price too high in which case no transaction occurs even though a Pareto efficient exchange may be possible at a lower price. In either case, a non-competitive fixed-price market does not guarantee allocative efficiency or revenue maximization for the seller. When fixed-price markets lead to market failure, a more flexible price discovery process is required. For example, bilateral negotiation (i.e., haggling between buyer and seller) may yield higher revenues 1 o for the seller on average than fixed-price markets. However, negotiation does not guarantee alloca-58 tive efficiency and may be impractical for a large number of goods. Auctions, as discussed in the fol-lowing sections, provide a balance between allocative efficiency and practicality. 3.2.3.1 A U C T I O N F O R M S To achieve the simultaneous objectives of allocative efficiency, revenue maximization, and practicality, different auction forms have evolved in response to different market environments. The auction form specifies the procedure used by buyers and sellers to converge on a mutually satisfactory price. Dif-ferent auction forms are defined by three characteristics: bidformat, allocation rule, and price function [Engelbrecht-Wiggans, 1983]. The bid format describes how bid information is submitted to the auc-tioneer. For example, in ascending price (or English) auctions, bidders submit progressively higher bids for the good until a single bidder remains. In a descending price (or Dutch) auctions, the.ask price starts high and descends until a bidder accepts the price. Since Dutch auctions tends to achieve quiescence faster than English auctions, they are often used for low-valued perishable goods, such as fish and fresh flowers [Kambil and van Heck, 1998]. There are a number of alternatives to open-outcry auctions. For example, in sealed-bid auctions, bids are submitted in secrecy to the auctioneer. At a predefined point in time, the auctioneer ceases to accept bids and determines the allocation and transaction price of the good. Although the allocation rule and price function aspects of English and Dutch auctions are straightforward, sealed-bid auc-tions vary greatly along these two dimensions. For example, since identical bids can be submitted in sealed-bid auctions, an allocation rule is required to break ties. Different price functions can also be used to create different auction forms. For example, in a second-price (or Vickrey) auction, the good is allocated to the highest bidder, but the transaction price is set to the second-highest bid. When multiple identical goods are sold, other price functions are used (such as discriminating and uniform price). A third general class of auctions is double auctions. In a double auction, potential buyers submit bids and potential sellers submit asks to the auctioneer. The auctioneer's task is to match bids and asks according to some allocation rule and price function so that the market clears. In a continuous double 1 0 Negotiation is attractive to sellers when there are information asymmetries that work against the buyer and when the cost of negotiating is small relative to the cost of the good. Used car sales are an example of this type of market. 59 auction (CDA), agents make exchanges amongst themselves without any type of market call. For a more complete overview of basic auction theory, see pVlilgrom, 1989], [Engelbrecht-Wiggans, 1983], and pVIester, 1988]. In addition, [Varian, 1995] and [Rosenschein and Zlotkin, 1994] discuss the appli-cation of auction theory to the design of artificial agents. 3.2.3.2 B I D D E R CHARACTERISTICS The allocative and revenue maximization characteristics of the various auction forms depend to a large extent on the characteristics of the auction participants. One such characteristic is the risk profile of the bidders. If a bidding agent is risk averse (i.e., averse to losing the auction), it will bid higher to reduce the possibility of being outbid. The reverse is true for a risk-seeking bidder. If the bidder is risk neutral, it will bid solely on the basis of its valuation of the good or goods being auctioned [Mester, 1988]. The nature of the agent's own valuation (or reservation price) is the second important bidder characteristic. In the private values case, the bidding agent is assumed to have certain knowledge of its own valuation of the good. Moreover, the agent's reservation price is assumed to be indepen-dent of the reservation price of any other agent. In real-world auctions for goods such as antiques or wine, the common values case tends to be more accurate. In the common values case, each bidder is unsure of her reservation price and the bids for the good are not independent of one another. For example, if a bidder intends to buy an antique as an investment for resale, her valuation is bound to be a function of her estimates of the valuations of other agents. Moreover, in some auction forms like English auctions, the observable signalling behav-ior of other bidders may cause an agent to revise her valuation during the course of the auction. The common values case leads to the problem of the winner's curse. The bidder who wins the auction is the 1 1 one whose estimate of the common value of the good is higher than that of any other bidder. To compensate, a rational agent bids less than her reservation price, leading in some cases to allocative inefficiency [Milgrom, 1989]. Naturally, in the private values case, the bidder's certainty about her res-ervation price eliminates the possibility of the winner's curse. In this section, auctions for selling goods are considered. The same principles apply to reverse auctions for purchasing goods (e.g., tender offers). In the purchasing case, common values refer to estimates of cost and the lowest bidder wins the auction. 60 3.2.3.3 E Q U I V A L E N C E S O F F O R M S There are two important characteristics of auctions in which participants satisfy the assumptions of risk neutrality and privately held, independent reservation prices. First, seemingly different auctions reduce to forms that are functionally equivalent. For example, the English and second-price sealed bid (or Vickrey) forms are equivalent. The same is true of the Dutch and first-price sealed bid forms. Second, the expected revenue to the seller in the risk neutral/private values case is the same regard-less of form [Mester, 1988]. Thus, from the seller's point of view, the choice of auction reduces to a matter of computational (rather than economic) efficiency. For example, a C D A will achieve the same outcome as the English or Vickrey forms: the ultimate winner of the auction is the agent that is will-ing to pay the most (allocation rule) and the transaction price is the reservation price of the second highest bidder (price function). The advantage of the C D A is that it does not require a centralized auctioneer or a market call; the disadvantage is that good may be bought and sold many times in a C D A before an equilibrium price is achieved. 3.3 D I S T R I B U T E D A R T I F I C I A L I N T E L L I G E N C E There is a diverse and growing literature on the application of Distributed Artificial Intelligence (DAI) to practical problems (e.g., [Jennings, Sycara and Woolridge, 1998], [Singh, Rao and Woolridge, 1997], [Clearwater, 1996], [Singh, 1994], [Rosenschein and Zlotkin, 1994], [Woolridge, Tambe and Miiller, 1995], [Parunak, 1995], [Woolridge and Jennings, 1994a], [Woolridge and Jennings, 1994b], [Gilbert and Doran, 1994], [Jennings, 1992], [Avouris and Gasser, 1992], [Huberman, 1988]). In an effort to better understand the literature and situate the market-based approach proposed in this the-sis within the broader field of DAI , a simple three-dimensional taxonomy is used. The three dimen-sions considered are the environment in which the D A I system operates, the coordination mechanisms used to resolve disputes or resource contention between agents, and the capabilities of the agents themselves. In the following sections, each of the dimensions is described. In Section 3.3.4, the proposed mar-ket-based approach and a number of exemplars from the DAI literature are evaluated with respect to the taxonomy. In practice, the actual price paid by the winner is the reservation price of the second-highest bidder plus some small amount 8 that corresponds to the minimum bid increment. 61 3 . 3 . 1 E N V I R O N M E N T Although others have used "application domain" or "environment" to classify D A I systems (e.g., [Jennings, Sycara and Woolridge, 1998]), the notion of environment used here is narrower and reduces to a single construct: global utility. A D A I environment is characterized as "global" if a mean-ingful measure of global utility can-be said to exist. For example, in agent-based shop floor scheduling systems, the utilities of individual agents are of little interest. Rather, the agents are created as a means to maximize a complex but wellrdefined global objective function for the factory as a whole. In such an environment, it is possible in theory (but perhaps not in practice) to have a central authority make trade-offs between individual agents so that global utility is maximized. In other environments, however, the concept of global utility may be meaningless or difficult to define. Utility theory exists to describe the preference orderings of individual decision makers. As such, there is no theoretical justification for making direct comparisons of utility between individuals [Holloway, 1979], [Raiffa, 1968], [Arrow, 1951], [Von Neumann and Morgenstern, 1944]. To illustrate the difficulties that arise in inter-agent comparisons of utility, consider a scenario that could plausibly occur in the multiagent aircraft scheduling application described in [Georgeff and Rao, 1998]: Due to an unfortunate coincidence of medical emergencies, two aircraft require clearance for an immediate landing on a single runway. On one aircraft, an 82 year old Nobel laureate has had stroke; on the other aircraft, an; infant is experiencing difficulty breathing. The system must determine which aircraft can land first. Although this scenario is simply an instance of a resource allocation problem, real agents (i.e., humans) facing life-or-death outcomes are involved. A dilemma arises because it is impossible to determine which outcome is preferred from the perspective of the centralized scheduler. In this example, the scheduler is asked to make a moral decision on behalf of society as a whole. Clearly, such a dilemma is an extreme example and is intractable under any formulation. However, even the more mundane case of scheduling can be transformed from an environment with a global utility function to an environ-ment without a global utility function. To illustrate, assume that the scope of a scheduling system is broadened to include supply chain management. The objective of the system is to simultaneously maximize the utilities of all participants in the supply chain including suppliers and customers. In such an environment, the system would have difficulty justifying a decision to delay a shipment to one 62 customer in order to make a shipment to a different customer without making a comparison of utili-ties across organizations. The distinction between environments with global and non-global utility is similar to the division of D A I into two subdisciplines: cooperative problem solving (CPS) and multiagent systems (MAS) [Bond and Gasser, 1988], [Rosenschein and Zlotkin, 1994]. In CPS, agents are created by the system designer to solve well-defined problems. The rationale for introducing the agent metaphor is to sim-plify decomposition and permit the division and specialization of labor. Since a CPS environment is global (or assumed to be global), and since a single designer is responsible for behavior of all the agents that participate in the system, a high level of centralized control over the agents is feasible. In contrast, agents in M A S environments typically represent real agents in the problem domain. Thus, rather than being constructions of convenience for the system designer, the utility functions of the artificial agents map to the private utility functions of the real agents [Jennings, Sycara and Woolridge, 1998]. In such non-global environments, centralized control and inter-agent comparisons of utility are not justified under the basic tenets of decision theory. 3 . 3 . 2 C O O R D I N A T I O N M E C H A N I S M S One of the main advantages of the agent-based approach is that it permits the decomposition of large problems into many smaller problems. However, since the overall behavior of the system of agents is typically the issue of prime concern in D A I , mechanisms for achieving coordination and aggregation of agent-level behaviors are a critical issue. In this section, five different approaches to agent-level coordination are identified: hierarchy, social laws/standard operating procedures, team utility func-tions, negotiation, and markets. These approaches can be viewed along a continuum of autonomy of action for agents: hierarchy provides agents with the least autonomy whereas negotiation and markets provide agents with much greater autonomy. 3.3.2.1 H I E R A R C H Y Hierarchy results when a complex system is decomposed into a tree-like structure of parent-child relationships. Although parent-child relationships typically imply an authority structure, this need not be the case [Alchian and Demsetz, 1972]. Instead, the defining characteristic of hierarchy is that little or no interaction between sub-systems takes place except along the lines of the parent-child relation-63 ships. For example, when two agents require the same resource, they do not negotiate directiy with one another for ownership. Rather, the question of ownership is be passed up to a higher level of the hierarchy. Hierarchies of this sort can be said to be "nearly-decomposable" or "loosely-coupled." Simon [1981] identifies two specific requirements for nearly-decomposable systems: 1. The short-run behavior of each of the component subsystems is approximately independent of the short-run behavior of the other components. 2. In the long run, the behavior of any one of the components depends in only an aggregate way • on the behavior of the other components. Hierarchy is an extremely efficient means of exerting control over large, complex systems since all communication that takes place between the n sub-systems can be carried by (n-V) two-way channels. Moreover, hierarchy provides a mechanism for stable, intermediate systems to combine in systems of increasing complexity [Simon, 1981]. Despite its theoretical advantages, the assumption of loose-coupling (on which hierarchical control is based) breaks down when there is significant conflict between agents or contention for scarce resources. Since inter-agent conflicts cannot be resolved without appealing to higher authority, the computational burden placed on the higher levels of the hierarchy can become overwhelming [Sho-ham and Tennenholtz, 1992]. In addition, if there is considerable uncertainty in the domain, then an increased amount of information must flow up and down the communication channels, leading to less responsive form of decision making characteristic of bureaucracy [Galbraith, 1974]. 3.3.2.2 SOCIAL L A W S / S T A N D A R D O P E R A T I N G P R O C E D U R E S One means of reducing the number of appeals to authority in a hierarchy is to implement binding conventions or policies for resolving disputes. To illustrate the approach, consider the mobile robot example used by [Shoham and Tennenholtz, 1992]: Two robots are headed on a collision course. Since both robots are on the optimal trajectories to achieve their respective goals, neither wants to slow down or alter its course. 64 In a purely hierarchical system, the robots would appeal to a centralized traffic controller who would decide which adjustments the robots should make to avoid collision. Unfortunately, there are two problems with this approach. The first is that there is an enormous reliance on central authority. If the central traffic controller ceases to function, then so do the mobile robots. The second problem is that the computational burden on the traffic controller increases exponentially as robots are added and contention for scarce resources increases. A n alternate approach is to define conventions to resolve the conflict at a local level. Highway mark-ings, traffic lights, and right-of-way rules are common examples of such socially-defined conventions that permit agents to make local, but mutually compatible, decisions. There are two major difficulties with the social laws/standard operation procedures approach. First, as Shoham and Tennenholtz [1992] point out, the problem of specifying a complete set of laws to optimize a global utility function is NP-hard. Second, social laws and standard operating procedures can break down in the face of novel situations or uncertainty. Allison [1970] provides numerous examples of this in his analysis of governmental and military organizations. 3.3.2.3 T E A M U T I L I T Y F U N C T I O N S A n alternate approach to social laws is the theory of teams [Marschak and Radner, 1972]. In the team formulation, each agent's utility function is replaced with the utility function of the team as a whole. The objective for the system designer is to chose the agent-level information structure and decision rule that will yield the highest expected utility to the team. The main advantage of this formulation is that if each agent in the organization seeks to maximize its own utility, and the utility of the individual agents is identical to that of the team as a whole, then it is axiomatic that the organizational utility function will be maximized though the behavior of the agents. Moreover, the specification of a team utility function can be more flexible and complete than a set of discrete rules and conventions. In the team approach, coordination of activity is facilitated by the possibility of each agent predicting the behavior of every other agent. This ability to predict has a number of general preconditions, including the ability of each agent to know the state, capabilities, and action outcomes of every other agent [Marschak and Radner, 1972]. Although there have been some efforts in the D A I literature to implement team-theoretic systems of agents (e.g., [Boutilier, 1996]), the size and complexity of the 65 team utility function can easily exceed the computational capabilities of a single agent. Thus, many of the benefits of distributed computation are not achieved. 3.3.2.4 N E G O T I A T I O N Negotiation, in the context of D A I , refers to the process by which agents search for (and hopefully find) an agreement [Rosenschein and Zlotkin, 1994]. The underlying assumptions on which the nego-tiation approach is based is that agents are autonomous and the type of inter-agent comparisons made in the hierarchical approach are not justified. In economics, the concept of negotiation or bar-gaining arises whenever the preconditions of a perfectly competitive market are not satisfied. In a market with many buyers, many sellers, and undifferentiated goods, a Walrasian equilibrium price will emerge, as discussed in Section 3.2.2.3. However, in cases in which a competitive market cannot be said to exist (e.g., bilateral exchange of a unique good between two agents) competitive forces cannot be relied on to determine an equilibrium price [Rasmusen, 1989 p. 227]. Much of the early work on negotiation in multiagent systems (e.g., [Smith and Davis, 1981]) relied on ad hoc negotiation protocols [Jennings, Sycara and Woolridge, 1998]. More recent work, however, has drawn heavily on theory from non-cooperative game theory [Rosenschein and Zlotkin, 1994], [Sand-holm and Lesser, 1995b] microeconomics [Sandholm and Lesser, 1995a], [Clearwater, 1996], multi-attribute utility theory [Sycara, 1989], and speech-act theory [Chang and Woo, 1994], [Shoham, 1993] to structure the process by which agents negotiate an agreement. Muller [1996] provides a more detailed overview of different approaches to negotiation in D A I . Given the lack of a global utility function by which to evaluate outcomes, a central question in negoti-ation research is what constitutes a good negotiated outcome between agents? In addition to specifying the details of a negotiation protocols, theories of negotiation typically provide a means of characteriz-ing the quality and stability of the negotiation process. For example, Rosenschein and Zlotkin [1994] use game theory to identify five design goals for negotiation mechanisms: efficiency, stability, simplic-ity, distribution, and symmetry. O f these, efficiency and symmetry refer to the social welfare charac-teristics of the mechanism; the remainder of the design goals refer to the mechanism's computational characteristics. In any system in which inter-agent comparisons of utility are not permitted—such as the runway assignment example used Section 3.3.1'—it is very difficult or impossible to define a 66 socially optimal outcome. Clearly, a minimal requirement is that a good negotiated outcome be Pareto optimal—that is, no agent should be able to improve its position without adversely affecting other agents. A stronger requirement is that the outcome be fair in some sense. In the context of a two-agent negotiation, fairness implies that the surplus created by the negotiated settlement is divided . evenly between the agents. The classic criterion for fairness is that the solution that maximizes the product of the agents'utilities [Nash, 1950]. It is important to note that fairness and global optimality are not synonymous. To illustrate, consider the table of outcomes generated by a cooperative delivery problem in [Rosenschein and Zlotkin, 1994]: Deal Utility for Agent 1 Utility for Agent 2 Product of utilities Sum of utilities A 0 10 0 10 B 1 3 3 4 C 2 2 4 4 D 3 1 3 4 E 10 0 0 10 The deal that maximizes the product of the utilities is Deal C. However, if global optimality is a linear function of agent utilities, then the globally optimal deal is one which maximizes the sum of agent utilities (Deal A or Deal E). The lack of equivalence between a fair outcome and a globally optimal outcome creates an important issue of fit between the environment of the problem and the coordina-tion mechanism used by a D A I system. If the delivery example can be characterized as a global envi-ronment, then the choice of a fair (in the Nash sense) negotiation mechanism leads to an outcome that is clearly sub-optimal. This limitation of the Nash criterion applies in any situation in which the payoffs to agents are not zero-sum. 3.3.2.5 M A R K E T S Markets have become an increasingly popular aggregation mechanism for D A I systems, especially in the context of resource allocation problems. Examples include: 67 • Clearwater [1996] contains a number of papers describing the use of market-based resource allocation systems; • . • the market-oriented programming group at the University of Michigan [Walsh et al., 1998], [Wellman, 1996], [Mullen and.Wellman, 1995] has examined many theoretical and practical issues surrounding the use of markets, and specifically auctions, for coordination in M A S environments; • the collection on computational ecologies in [Huberman, 1988] describes the use of agents and markets to build predictive simulations. The popularity of markets has much to do with the existence of a well-established theoretical founda-tion in economics and the ease with which the atomic units of market participation—self-interested agents—can be implemented in a heterogeneous and distributed computing environment [Jennings, Sycara and Woolridge, 1998]. The essential difference between "negotiation-based" and "market-based" approaches as they are characterized here is the method of price discovery. In negotiation, price is determined through one or more rounds of bilateral bidding and counter-bidding. Neither agent is assumed to be a price taker, and therefore complex negotiation strategies can evolve in which one agent extracts a disproportion-ate share of the surplus created by the transaction. In markets, more than one agent is typically involved and thus a form of auction can be used to determine a transaction price. In extreme cases, in which the preconditions for a fully competitive market are satisfied, general equilibrium theory can be used to predict the market outcomes. Under the assumptions of fully competitive markets, agents have no control over prices—both buyers and sellers are price takers. Despite the elegance and simplicity of economic metaphors, many of the same problems that affect real markets affect the simulated markets used in DAI . The existence of externalities between agents or interdependencies between goods can lead to pricing distortions and market failure. To return to the example of runway allocation, an externality could take the form of turbulence created by large jets. If an aircraft with a large jet wash makes certain runways temporarily unusable by smaller aircraft, then the cost to the smaller aircraft must be factored into the price the large aircraft pays for landing. However the process of internalizing the externalities (e.g., through taxes or other pricing schemes) adds significant complexity to the market model. 68 A second problem that has attracted attention in the literature is the issue of stability. [Thomas and Sycara, 1998] show that under certain circumstances, prices can oscillate as agents abandon some resources in favor of others which they believe to be underutilized. This phenomenon is exemplified by the " E l Farrol" problem described by Arthur [1994]. In the E l Farrol problem, the utility to agents of attending a bar on a particular night is a function of the number of other agents in attendance. Spe-cifically, agents prefer the bar when its utilization is below a certain threshold (say 60%). The number of agents that actually attend on a given night tends to oscillate wildly as agents attempt to predict the aggregate behaviors of other agents. O f course, in well-behaved markets in which agents' valuations are transitive and independent and prices rise monotonically, such instability does not occur. 3.3.3 A G E N T C A P A B I L I T I E S The third dimension considered in the D A I taxonomy is the capability of the agents. At the core of D A I is the notion of weak agency. Woolridge and Jennings [1994a] identify four properties that any computer-based process should exhibit to be considered an agent: • autonomy — the agent should have control over its own internal state and behavior; • social ability — the ability to communicate in some way with other agents is the sine qua non of D A I ; • situatedness or reactivity— the agent should receive sensory input from its environment and be able to act to change its environment; and, . • flexibility — agents should exhibit goal-direct behavior in response to their changing envi-ronment. O f course, more complex systems require a stronger notion of agency. For example, in order for agents to be able to plan in the future, they require beliefs about their current state and preferences over future states [Shoham, 1993]. Computer-based processes exhibiting strong agency are often clas-sified as B D I (belief-desire-intention) agents [Georgeff and Rao, 1998], [Fikes and Rao, 1991]. Agents in multiagent systems can have very simple local utility functions (e.g., [Steiglitz, Honig and Cohen, 1996]) or be maximizers of complex expected utility functions over a planning horizon. The agents may be simple consumers of resources or may be able to transform one resource into another. The agents may be trustworthy or untrustworthy, selfish or altruistic, omniscient or ignorant. The 69 critical issue considered here is not what capabilities an agent possesses, but whether the capabilities are consistent with the properties of the agent's environment and coordination mechanisms. A more detailed classification of agent properties can be found in [Goodwin, 1993]. 3.3.4 C L A S S I F I C A T I O N O F A P P R O A C H E S In this section, a number of D A I systems for resource allocation described in the literature are situ-ated with respect to the dimensions of the taxonomy. Figure 3.5 shows a tabular representation of the environment and coordination mechanism dimensions. Certain cells of the table contain references to exemplars from the literature on multiagent systems. Two levels of fit are considered: the fit of the coordination mechanism to the environment and the fit of the agent's capabilities to the coordination mechanism. £ o c o ro o cn ro market • NASA E O S system [Schwartz and Kraus, 1997] • U M D L [Mullen and Wel lman, 1995] negotiation • Contract Net Protocol [Smith and Davis, 1981] • Automated contracting system [Sandholm and Lesser , 1995a] team utility • Multi-agent D T P [Boutilier, 1996] social laws/ SOPs • [Shoham and Tennenholtz, 1992] hierarchy • Information agents [Moore et al. , 1997] • O A S I S [Georgeff and Rao, 1998] global utility non-global utility FIGURE 3.5 Classification of DAI systems with respect to the environment and coordination mechanism dimensions: The shaded area corresponds to "coerced" systems (see Section 3.3.4.1). 70 3.3.4.1 ISSUES O F F I T To illustrate the issue of environment/coordination mechanism fit, consider the shaded cells in the right-hand column of the table in Figure 3.5. The shaded cells correspond to situations in which the environment cannot be said to have a global utility function; however, the coordination mechanisms associated with the cells are centralized and therefore presuppose a measure of global utility. In such situations, it may be possible to coerce the agents into accepting a proxy global utility function. For example, in the OASIS multiagent air traffic control system described in [Georgeff and Rao, 1998], a hierarchical coordination mechanism (in the form of "coordinator" and "sequencer" modules) seeks to minimize the total lateness of all aircraft. The utility function can be considered coerced because it is questionable whether the utilities of the agents representing individual aircraft should be pooled and compared in this manner. A n airline with a large stake in maintaining its reputation for always being on time may value lateness in a different manner than an airline that is most interested in pro-viding low-cost fares to holiday destinations. Moreover, within the aircraft, each passenger has his or her own mapping from landing time to utility. At the other extreme, it is possible to model an environment that possesses a global utility function using autonomous, self-interested agents and negotiation or market protocols. A n example of this approach is the system for allocating large data sets to servers described in [Schwartz and Kraus, 1997]. The environment in which the multiagent system operates is classified as global because the data sources and servers are owned and maintained by NASA as part of the Earth Observing System (EOS). Indeed, the dynamic, market-based allocation system is proposed as a replacement for an existing centralized allocation system. The global utility function to be maximized is that of NASA, rather than that of the data sets and servers. The advantage of using agents rather than centralized hierarchical approaches for problems with well-defined global utility functions is that the agent metaphor provides a convenient means of decomposing large, complex systems. There are, however, two dangers associated with using highly autonomous coordination protocols in global environments. The first, as discussed in Section 3.3.2.4, is that non-global coordination mechanisms do not automatically lead to outcomes that maximize global utility. The second danger is that the agents themselves will be more complex than what is required by the environment. 71 To illustrate the latter danger, consider the EOS system. Despite the existence of a global utility func-tion in the D A I environment, no assumptions are made regarding the guile of the agents created for the allocation system. A second-price sealed-bid (Vickrey) auction is used to induce the agents to reveal their true bidding information and penalties are defined for situations in which agents breach the terms of the bidding protocol. Although the resulting system has the virtue of generalizability, the capabilities of the agents and the complexity of the auction mechanism can be considered excessive for the problem at hand. In cooperative problem solving environments, there is nothing to prevent the system designer from dictating certain agent-level behaviors (for example, truthful price revela-tion). According to the criteria proposed by fRosenschein and Zlotkin, 1994], the primary objectives of the system designer are the quality and stability of the solution and the efficiency of the mechanism used to compute it. The "realism" of the agents is typically not a concern in CPS. 3.3.4.2 C L A S S I F I C A T I O N O F T H E P R O P O S E D A P P R O A C H The market-based approach proposed in this thesis can be classified along the environment, coordi-nation mechanism, and agent capability dimensions in the following manner: • Environment = global: The objective of the resource allocation system is to maximize total revenue by processing and shipping individual products. The agents themselves are merely artifacts introduced to facilitate the process. Since the division of resources and a numeraire good (money) between agents is zero-sum, global utility (defined as the sum of agent utilities) is independent of the precise terms of the deals struck between agents. The only requirement for an optimal equilibrium requirement is that all Pareto optimal exchanges that can occur do occur. The key assumptions and rationale underlying the assumptions are presented in greater detail in Section 4.1. • Coordination mechanism = market: The exchange of resource goods between agents occurs in a continuous reverse auction. A reverse auction differs from bilateral negotiation in that an auctioneer (or in this case, a simple price convention) determines the transfer price of the good being exchanged. A description of the market protocol is provided in Section 4.3. • Agent capabilities = rational with restricted autonomy of action: A fundamental require-ment for any market to achieve a Pareto optimal equilibrium is strict rationality on the part of the participating agents. Indeed, the relationship between maximizing behavior by rational agents and Pareto optimality is tautological. However, it is important to note that rationality is 72 defined in relation to not only a particular set qf beliefs, but also to a particular set of capabil-ities. Thus, although it may be rational (in a colloquial sense) for a particular agent to buy a contract for which it has no requirement in the expectation of making a speculator's profit, agents in the proposed system are not endowed with the capability to reason about specula-i tion. A similar situation arises in the method used for price discovery: The bidding agent is asked to make a rational choice between buying a contract for a-resource at a particular price and not buying it. The agent has no capability to lie, barter, steal, or otherwise change the structure of its simple decision problem. It is in this sense that the autonomy of the agents is restricted. The objective of this chapter has been to review the theoretical foundations of both the motivating problem (manufacturing scheduling) and the proposed solution (a market-based system for coopera-tive problem solving). In Chapter 4, the focus shifts from describing theory to describing the techni-cal challenges posed by the proposed framework and the solutions developed as part of the thesis. 73 CHAPTER 4: A FRAMEWORK FOR MARKET-BASED RESOURCE ALLOCATION As in other forms of cooperative problem solving, the objective of this thesis is to provide a method for solving large, well-defined problems by decomposing the global-level problem into smaller prob-lems that can be computed independently and simultaneously. Given the size and complexity of the manufacturing problem, it is clear that some form of decomposition must be used. However, as dis-cussed in Chapter 2, the fundamental problem with the conventional decomposition used in manu-facturing is that the sub-problems into which the problem is decomposed are not independent. When a large degree of dependence exists between elements of a decomposed problem, the dependence must either be ignored (at the expense of solution quality) or resolved (thereby nullifying many of the advantages of decomposing the problem in the first place). Markets offer a form of decomposition in which a high degree of interdependence can be transformed into complete independence through the use of a price mechanism. The framework for the market-based approach to problem solving proposed in this research is shown in Figure 4.1. Unlike the sequence of stages conventionally used to address the manufacturing problem, market-based decomposition does not occur along functional lines. Instead, the metaphor of a large number of agents interaction within an economy is used to provide a basic blueprint for the system. Following Coleman's [1990] model of social outcomes, the framework is divided into the three distinct phases: 1. Decomposition — The first phase involves transformation of a global problem into a num-ber of agent-level problems. In other words, the task is to model a complex system using agent-based constructs such as beliefs, desires, capabilities, and so on. The result of the decomposition phase is that each part and machine in the physical system is modeled and rep-resented by a computer-based agent. 2. Rational choice — The second phase involves the off-line computation of each agent's will-ingness to pay for scarce production resources. In the case of an agent representing a part to be manufactured (a part agent), the rational choice phase requires the agent to plan an opti-mal trajectory through the production system and determine its reservation prices for various combinations of resources in different situations. To determine the optimal trajectory, the 74 1) Conventional decomposition of the manufacturing problem problem M p s MRP ' CRP scheduling outcome global-level decision process 2) Market-based decomposition problem global-level decision process outcome rational choice FIGURE 4.1 The market-based approach to decomposition (adaptedfrom [Coleman, 1990]): The global-levelproblem (productionplanning) is decomposed into many agent-level planning problems. The agents make decisions that are strictly rational given the information they have available. Finally, a market is used to resolve contention for resources and aggregate the agent-levelplans into a global-level outcome. agent must consider a number of factors such as the costs and rewards facing the object, the probabilistic effects of various actions and events, and the unknown prices of the resources. 3. Aggregation — In the aggregation phase, the agents use the reservation prices computed in the rational choice phase to participate in a market for production resources. The role of the aggregation mechanism (in this case, a continuous reverse auction) is to ensure that each pro-duction resource is ultimately owned by the agent that is willing to pay the most for it. In this way, the self-interested interaction of the agents within the market determine the solution to the manufacturing problem. The objective in the remainder of this chapter is to identify the critical elements of the market-based approach and describe the degree to which the preconditions to the economic theory have been satis-fied in the prototype system. Figure 4.2 provides a simplified roadmap of the major milestones • encountered in moving from problem definition to a market-based system. First, as described in 75 Section 4.1, the global manufacturing problem is decomposed into agent-level problems and modeled using the P-STRIPS knowledge representation language. Next, the P-STRIPS representation is trans-formed into a special data structure called a policy tree. The structured dynamic programming algo-rithm presented in Section 4.2 is used to iteratively refine the policy tree until it contains the'agent's optimal course of action for all its possible states. In addition to containing the agent's optimal plan, the policy tree contains the agent's reservation price for all relevant bundles of resource goods. This price information is extracted into a price list that the agent uses in the final stage of the process— exchanging contracts with other agents. Finally, Section 4.3 describes the continuous reverse auction protocol that aggregates the agent-level behaviors into a coherent global solution. manufacturing problem VJ7 model objects in the manufacturing environment using P-STRIPS - actions - effects - rewards - costs •XX build core policy tree - temporal transitions - state-dependant transitions - terminal reward estimates auction for production resources improve policy using S D P algorithm u x extract agent's reservation price for all relevant resource bundles FIGURE 4.2 Major milestones of the market-based approach. 4.1 PROBLEM DECOMPOSITION A problem that one typically encounters when modeling industrial-scale systems is their size and complexity. The agent-based decomposition advocated in this thesis serves two purposes: 1. Create independent sub problems — Market-based decomposition splits the large, com-plex problem into many smaller problems that can be solved independently and simulta-neously. The independence property of market-based agents is discussed in greater detail in subsequent sections. 76 2. Simplify the modeling of large systems— Agent-based decomposition provides modelers with a conceptual tool to manage the otherwise overwhelming complexity of large manufac-turing systems. In this regard, the agent-orient approach to modeling relies on many of the same "ease-of-modeling" arguments as object-oriented modeling (e.g., fjacobson, Jacobson and Ericsson, 1995], [Shoham, 1993]). Specifically, the assumption is made that it is easier for a modeler to express the states, capabilities, and goals of a particular part or machine in isola-tion than it is to express the joint state, capability, and goal of all parts and machines in a man-ufacturing facility. A n empirical test of the ease-of-modeling assumption is beyond the scope of the thesis, however. •4.1.1 A P P R O A C H E S T O M D P D E C O M P O S I T I O N In stochastic optimization, the objective is to create a contingency plan (orpolicy) for every possible state of the system. In this type of optimization, no fixed sequence of actions is generated and thus there is no requirement to assume that all actions have deterministic outcomes. Stochastic optimiza-tion is especially well suited to manufacturing environments in which broken bits, delays in machine setups, late arrival of raw materials, and so on are facts of life. The ability to reason over such contin-gencies is essential for high-quality and robust plans. Despite its power, stochastic optimization is difficult to exploit in industrial environments because the Markov Decision Process (MDP) representations of state spaces become unmanageably large for even small problem instances. One approach to coping with large MDPs in manufacturing and other resource allocation environments is to decompose the large problem into a number of smaller, loosely-coupled subproblems that induce their own subMDPs (e.g., [Meuleau et al., 1998], [Dean and Lin, 1995]). To illustrate the potential payoff from M D P decomposition, consider a problem domain consisting of n completely independent subproblems. Assume that the number of states in the state space for subproblem I G 1, . . . , n is . If the independence between the subproblems is not rec-ognized and the entire system is formulated as a single MDP, the size of the joint state space is |i>l| X | 5 2 | X ... X . In contrast, i f the independence property is exploited and each subproblem is solved in isolation, the total number of states considered is only + | 5 2 | + . . . + . Thus, meth-ods of decomposition and solution that exploit independence can result in an exponential decrease in problem size. 77 The difficulty in decomposing MDPs in practice arises from the fact that contention for finite resources creates dependencies between otherwise independent subproblems. To illustrate, consider the case of two numerically-controlled milling workstations. Under ideal circumstances, each worksta-tion can processes its own jobs and remain oblivious to the operation of the other machine. If, how-ever, both workstations are fed raw materials by the same industrial robot, then the workstations cease to be independent. Specifically, the optimal policy of one workstation depends on the availabil-ity of the industrial robot and therefore on the optimal policy of the other workstation. In [Meuleau , et a l , 1998], the problem of joint dependency or resources is addressed in the following way: 1. solve each subproblem in isolation, ignoring the issue of resource contention 2. use the value functions provided by the subproblem solutions to guide the heuristic allocation of resources to jobs. The key to this particular approach is that the optimal policies generated for the subproblems can provide hints as to what might be a good global allocation of resources. Although the approach achieves an exponential decrease in problem size and provides solutions very quickly, the quality of the solution is sensitive to the quality of the assignment heuristic and to the amount of resource con-tention in the problem environment. The market-based approach to M D P decomposition uses prices over bundles of resources to elimi-nate the dependencies created by resource contention. The cost of achieving independence in this manner is that the subproblems must incorporate the price information and are therefore much larger than the non-market-based case. In order to create sufficient information for the market-based allocation of resources, the subMDPs treat the ownership of resources as random variables, rather than as decision variables. To illustrate the difference between market-based and non-market-based M D P decomposition for resource allocation problems, consider the form of the optimal policy gen-erated in the non-market-based case: the optimal allocation of resources to subproblem i is vector Xi; the value of the optimal policy is Vr 78 In contrast, the market-based decomposition requires a policy of the form: given an allocation of resources Xj e Qf, where Q. • is the set of all valid allocations, the value of the optimal policy is Vj. As might be expected, creating a contingency plan over all possible allocations leads to a massive increase in the size of the state space of the subMDPs. In practice, the explosion is mitigated some-what by restricting the subMDP to relevant bundles of resources, as discussed in Section 4.2.4. How-ever, the effort required to generate a "reservation price" for each relevant resource bundle is considerable. The advantage of the market-based decomposition is that the optimal policy for each subproblem is dependent only on the local characteristics of the subproblem and the vector of prices for the resources. In this way, full subMDP independence is achieved. 4 .1 .2 M A R K E T - B A S E D D E C O M P O S I T I O N In the decomposition used in this thesis, agents are used to represent identifiable real-world objects in the manufacturing environment, such as parts to be processed, production machines, and so on. A n important feature of this particular form of agent-based decomposition is that the term "represent" is used in two senses: 1. Modeling — Agents are used to model objects in the manufacturing environment. Thus, an agent consists of data structures that contain information about a real-world object such as a. state — properties of the object, such as its physical location and completion status; b. capabilities — the actions that the object can perform and the effects of the actions; and, c. ascribed goals — the sources of utility and disutility that the object encounters in the manufacturing facility. 2. Agency — Agents are called upon to make decisions on behalf of the real-world objects they represent. Thus, an agent representing a part may decide when and how the part should wend its way through the production system. To make these decisions, the agents solve stochastic optimization problems and thus there is a one-to-one correspondence between agents and • the subproblem construct used in Section 4.1.1. 79 It is important to emphasize that, despite the use of the term "agency", the type of agent-theoretic issues that typically arise in the economic analysis of institutions are absent from this particular for-mulation. Specifically, agency theory (e.g., [Alchian and Demsetz, 1972]) posits a contractual relation-ship between two self-interested entities: a principal and an agent. Given the divergent goals and different information sets possessed by the two entities, agency theory addresses the problem of writ-ing contracts such that agency costs (e.g., loafing, monitoring, misaligned incentives) are minimized [Gurbaxani and Whang, 1992]. In the form of agency considered here, however, such issues do not exist. Instead, the computerized agents are designed and implemented so that they faithfully pursue the interests of the objects they represent. 4.1.3 D E F I N I T I O N O F T H E G L O B A L P R O B L E M For the purpose of this thesis, the manufacturing problem is defined in the following way: A manufactur-ing facility consists of a number of production resources { M l , M m } . Although the production resources are typically referred to as "machines", M,- could be arty resource, such as a jig, automated guided vehicle, or even a person. The manufacturing facility exists for the purpose of transforming raw materials into "parts" {JI, ..., Jn} through one or more value-added processing "operations".1 A n example of a part might be a component of fuselage from an aircraft wing that requires five opera-tions: milling, grinding, drilling, polishing, and quality assurance. Once a part leaves the manufactur-ing facility, it is "shipped" to either the final customer or to another stage in the supply chain. In either case, the selling/transfer price is assumed to be known and exogenously determined. More-over, the selling/transfer price is typically a function of the time at which the part ships. As a result, lateness penalties and any other time-dependencies are reflected in the payoff function for each part. The basic elements of the modeling approach are shown in Figure 4.3. The objective of the market-based system is to allocate production resources to parts such that the total profit of the manufacturing facility is maximized. The profit-based formulation of the problem subsumes many of the objective functions used in conventional scheduling (e.g., minimizing late or tardy jobs) and is superior to other proxy objective functions (e.g., maximizing machine utilization). The difficulty that arises in the profit maximization formulation occurs because each production 1 Parts are designated by the letter " J " (for jobs) to remain consistent with the standard notation used in the scheduling literature. 80 manufacturing environment / market-based system for resource allocation part-agent - machine-agent FIGURE 4.3 Agent-based decomposition in a manufacturing environment: Objects in the physical world (such as parts and machines) are represented by computer-based agents. resource has finite capacity. Thus, by deciding to schedule a part on a particular machine at a particu-lar time, the manufacturing facility incurs an additional cost: the opportunity cost of using the resource in an alternate manner. Indeed, the global objective of maximizing profit in the facility is equivalent to minimizing the opportunity cost of all scheduling decisions. In the following sections, the decomposition of a manufacturing environment into agents represent-ing machines ("machine-agents") and agents representing parts to be manufactured ("part-agents") is described in detail. In Section 4.1.6, the properties of the agents are stated in a more formal manner. 4 . 1 . 4 M O D E L I N G M A C H I N E S The types of machines considered in this research include milling machines, drill presses, and the like. Each machine is assumed to be autonomous (i.e., there is no complementarity between the machine and an operator) and setup costs are assumed to be zero. Both of these assumptions are used to sim-plify exposition of the approach; however, as discussed in Section 4.1.7, it is possible to internalize Maximization of machine utilization—although common in practice—ignores Goldratt and Cox's [1984] observation that no revenue is realized by running machines. Instead, revenue is realized exclusively by shipping products. 81 complementarities, and other economic externalities using the market-based constructs described in the thesis. 4.1.4.1 P R O P E R T I E S O F M A C H I N E - A G E N T S Time plays an important role in scheduling in general and thus time is an important element in the formulation of machine-agents. The relationship between time and production resources is character-ized as follows: 1. Discrete time intervals— The availability of a resource is expressed in terms of discrete and atomic intervals of time. For example, the interval T l could be defined as the period between 05 September, 2000, at 09:10 G M T and 05 September, 2000, at 09:15 GMT. In a similar man-ner, the interval T2 could be defined as the five-minute interval that immediately follows Tl , and so on. 2. Exclusivity — A resource may not be shared during a particular unit of time. For example, if a particular part is being processed on a particular machine at a particular time, then no other part can be processed on that same machine at the same time. 3. Uniqueness — In general, no two units of processing time on a machine are directly inter-changable. Each unit of time on each machine represents a unique "good" (in the economic sense) and thus contracts for resources make explicit reference to both the time and machine for which the contract is valid (e.g., Contract(Ml, T5). 4. Perishability — Once a time interval has passed, all contracts for processing during that unit of time are worthless. In other words, contracts for processing time are perishable goods that expire at a known time. In addition to time, there are a number of modeling issues concerning the costs and configuration of the production resources: 1. Fixed configuration in the short-run — In microeconomic analysis, a common assump-tion is that the aggregate supply of a good is fixed in the short-run. Indeed, the short-run is defined as the period in which there is insufficient time to make changes to the supply of inputs [Lipsey, Purvis and Steiner, 1985]. The corresponding assumption made in this thesis is that the supply of production resources within the manufacturing facility is known and stable 82 over the entire planning horizon. If the configuration of the facility changes (e.g., new machines are brought on-line), new schedules must be generated. 2. Sunk fixed costs — A common practice in cost accounting is to make a distinction between a production resource's fixed costs (e.g., procurement and installation) and its variable costs (e;g., energy requirements, wear, consumables, maintenance) [Deakin and Maher, 1987]. The important feature of fixed costs is that they have already been incurred and are therefore sunk. Variable costs, in contrast, vary with production. Given that the task at hand is to allo-cate resources to parts in the short-run, fixed costs can safely be ignored. 3. Independence — The independence property of resource goods is a corollary to the unique-' ness property above: each unique processing interval can have its own independendy-deter-mined cost and market price. The independence between goods is readily observable in other environments in which time plays an important role. For example, the price a spectator is will-ing to pay for the exclusive use of a seat in a stadium depends critically not only on the seat's physical location, but also its temporal location. Thus, the rental price of the seat depends on whether there is an event in the stadium at that particular time and also whether the event is a boat show or the Superbowl. Although it is certainly possible for a consumer valuations of goods to be correlated, this type of interdependence arises from consumer preference, not from any inherent dependence relationship between the goods. 4.1.4.2 T H E D E C I S I O N P R O B L E M F O R M A C H I N E - A G E N T S Given the foregoing, a machine-agent i can be seen to face a real-valued variable cost, c,-1, whenever it performs a processing operation during the interval of time t € {1, ..., T). Typically, the per-unit-time cost is stationary with respect to time (i.e., it does not change as a function of time as a response to, for example, inflation); however, the cost may depend on many other factors such as the^ nature of the macruning operation, the hardness of the part to be machined, the feedrate used, and so on. If the machine-agent chooses not to process a part during the time interval t, it incurs no variable cost but forgoes any revenue it could have realized by selling a contract for the interval. To capture the binary decision facing the machine-agent, the variable x,- t = {0, l}can be used: X,-1 takes the 83 value 1 when the machine-agent has contracted to perform processing on a part for price pj ( and zero otherwise. Under this formulation, a machine-agent's objective function can be written: f \ maximize X (xi,tPi,t-xuci,t) Eq: 4.1 Thus, the basic decision problem faced by a machine-agent is whether to sell contracts for its machin-ing time at prices bid by other agents. Note that this decision problem is not equivalent to the much broader notion profit maximization since the machine-agent has no ability to maximize its revenue by maximizing the value of pit. For example, a machine-agent is not endowed with the ability to initiate a call auction for its processing times. Nor is a machine-agent capable of speculating about market prices and refusing bids in the expectation of higher bids to come. Instead, a machine-agent, as it is defined here, practices a very restricted form of rationality: if the bid price for a block of processing time, pij, exceeds the variable cost of operating during that block, then the agent accepts the bid. There are at least two reasons for implementing machine-agent rationality in this way. First, as dis-cussed in greater detail in Section 4.2.4.4, the division of economic surplus between buyer and seller is irrelevant in a cooperative problem solving (CPS) environment (recall Section 3.3.1 on page 62). The sole requirement under the first fundamental theorem of welfare economics is that each transac-tion be Pareto efficient; whether the seller receives the highest possible selling price is irrelevant from the global point of view. The second reason for relying a simple form of rationality for machine-agents is that is that the alternative—a revenue-maximizing machine-agent—could lead to gaming behavior by bidding agents and possible market failure. 4.1.4.3 P L A N N I N G W I T H O U T M A C H I N E A G E N T S The formulation of machine-agents is flexible enough to permit the modeling of a large number of production management issues, such the optimal scheduling of preventative maintenance. However, 3 The decision process faced by machine-agents resembles that of a firm engaging in an initial public offering (IPO) through an underwriter. Since the initial seller of the stock has no access to the secondary market, its decision to go public boils down to whether it should accept the underwriter's IPO price or withdraw the IPO. To use a specific example, investors were willing to pay up to $140. for shares of P A L M C O M P U T I N G on the day of its IPO. However, P A L M C O M P U T I N G itself only received $38 per share in the primary market. Despite the discrepancy, the IPO can be considered rational as long as P A L M C O M P U T I N G ' S original owners believed that the IPO price was fair at the time. 84 in the problems considered in this thesis, such issues are not a consideration. Moreover, the variable cost of each resource (that is, the machine-agent's reservation price) is assumed to be zero. In such circumstances, there is no requirement to include machine-agents in the market-based system. For this reason, the in-depth discussion of modeling issues in Section 4.2 focuses exclusively on part-agents. 4 . 1 . 5 M O D E L I N G P A R T S Parts refer to physical items that are manufactured in the production facility being scheduled. Typi-cally, a part is created by transforming of one or more raw materials into something of value to an end customer. Part-agents are responsible for coordinating all aspects of a part's transformation from a line item on an order to a finished good. To accomplish their coordination task, part-agents enter into contracts with other agents for the provision of production resources. 4.1.5.1 T H E L I F E - C Y C L E O F A P A R T - A G E N T Unlike machine-agents (which represent production resources over relatively long periods) part-agents are ephemeral: they are created with an order arrives and are destroyed when the order is shipped. The different stages in a part-agents life-cycle are summarized below: 1. Order — Each order that arrives at a manufacturing facility can be viewed as a contract. In exchange for each part in the order, the ordering entity agrees to pay the manufacturing facil-ity a predetermined amount of money.4 In most cases, the amount paid by the ordering entity is a function of time. For example the order may specify the part's due date and a schedule of lateness penalties if the ship date exceeds the due date. The payoff may also be contingent on certain attributes of the part, such as conformance to specifications, finish quality, and so on. As is typically the case in manufacturing firms, the precise way in which the payoff function is determined is exogenous to the scheduling system. 2. Process plan — As discussed in Section 2.2.2, a number of decisions about a part's path through the production facility have been determined prior to the instantiation of a schedul-ing problem. For example, the part's process plan specifies the materials, operations, and Money is used here to refer to any medium of exchange that is (a) divisible and (b) valued by both parties in a transaction. 85 machines required to manufacture the part. Once an order for a particular part arrives, the process plan can be used as a template to provide the skeleton of the part-agent's policy. 3. Hold ing costs — Holding costs are typically used in manufacturing environments to capture the intuition that resources locked up in work-in-process (WIP) inventory could be put to productive use elsewhere in the firm. In addition to the cost of capital, holding costs are used to capture time dependent risks, such as spoilage, breakage, deterioration, and obsolescence [Nahmias, 1989]. A part-agent is assessed a holding cost penalty, ht, for each unit of time that the part it represents remains in the system. Typically, the actual per-unit-time charge is a function of the value added to the part. To illustrate, consider the case in which an order has been placed for a part and an agent has been created to represent the part. If no raw materials have been marshalled on the part's behalf and no processing has occurred, then the holding cost is taken to be zero. That is, a part that has not yet been started incurs no holding cost. In contrast, if a part has undergone many operations and is only a single operation away from being shipped, then a much larger per-unit-time holding cost should be used. Naturally, once an item is shipped, it no longer incurs a holding cost. 4. Contracting — Agents are provided with an infrastructure for buying and selling contracts for production resources and the agents' activity within the market for resource contracts ulti-mately determines the schedule of the parts they represent. As discussed in Section 4.3, the market is implemented as a continuous double auction, which means that a contract for a par-ticular unit of machining time may be bought and sold by many different agents before it is actually used. The price pit that a part-agent pays for a unit of processing time on machine M,-at time t is determined by the interaction of agents in the market. 4.1.5.2 M A T I N G P A R T S A N D A S S E M B L I E S In the general case, the payment received by the ordering entity may depend in a direct manner on the ship date (or some other property) of one or more other parts. For example, a part created as a com-ponent of an aircraft wing might not be shippable until it is mated with other parts into an assembly. Under these circumstances, an externality exists between the two parts: the contracting behavior of one part-agent direcdy affects the payoffs realized by other part-agents. In order to facilitate exposi-tion in this thesis, all parts are assumed to be stand-alone and independent (that is, non-mating). 86 However, as discussed in Section 4.1.7, the market-based framework can also be used to internalize the externalities created by mating parts. 4.1.5.3 T H E D E C I S I O N P R O B L E M F O R P A R T - A G E N T S The objective function for a part-agent can be specified in the following way: the goal of the agent is to maximize the expected value of its policy. In most cases, one would expect the final value realized by the part-agents to be positive (if this were not the case on average, the manufacturing facility would not be covering its aggregate variable costs and would ostensibly cease operations). There are two ways in which a part-agent j can increase the revenue side of its profit function: 1. earn its terminal reward, rj, by successfully shipping the part it represent, and 2. sell a contract for machining on machine M,- at time t for price t (in order to sell a particular contract, the agent must first own the contract). A n implication of the price convention introduced and explained in Section 4.2.4.2 is that part-agents sell resources for their indifference price. In other words, if an agent values a resource at $10, then it is indif-ferent between keeping the resource and selling it to another agent for exactly $10. Given the price convention (combined with the fact that part-agents never start with an endowment of production resources) the revenue opportunities available to the agent through the sale of contracts is exacdy zero. That is, since the revenue generated by a sale is exacdy offset by a drop in the agent's expected value, no part-agent can deliberately or inadvertentiy increase its wealth by selling a contract. The ownership of a production resource i at time t by agent j is represented by the variable xi,j,t ~ {0' 1 }• Under this formulation, the objective function of a part-agent is to simply maximize its terminal reward net of contract costs and holding costs: The interesting feature of Equation 4.2 is that it is independent of the profit functions of all other agents. Thus, although each part-agent is in direct competition with other part-agents for scarce pro-decide whether to purchase a contract, a part-agent does not need to consider the impact of its deci-Eq: 4.2 duction resources, the market replaces direct agent-to-agent conflict with a price mechanism. To 87 sion on other agents in the system since this impact is already embodied in the market price of the resource. As a consequence, the price mechanism permits the agents to make their decisions indepen-dent of one another—herein lies the ultimate value of the market-based approach [Hayek, 1945]. 4 . 1 . 6 G L O B A L U T I L I T Y A N D T H E R U L E S O F M O N E Y As discussed in Section 3.2, the first fundamental theorem of welfare economics states that a compet-itive equilibrium induces an optimal state. In other words, if self-interested, rational agents are permit-ted to buy and sell resources in a properly-functioning market, then the final outcome is a Pareto : efficient allocation of resources. The problem with the first theorem in practice is that Pareto effi-ciency is not a sufficient condition for global optimality unless certain preconditions are satisfied. Thus, in order for the market-based system to converge on a global (rather than merely Pareto) opti-mal allocation of resources, the agents must be designed such that they satisfy these preconditions. In the following sections, the preconditions are described and justified in the context of the manufactur-ing problem. 4.1.6.1 Q U A S I - L I N E A R U T I L I T Y F U N C T I O N S In the objective functions in Equation 4.1 and Equation 4.2, agents are defined as profit maximizers: they select a course of action so that the difference between the revenue they earn less the costs they incur is maximized. A more general formulation of the agent-level problems is to state the agents' objectives in terms of utility maximization. Specifically, the utility Uj for agent j takes the form Uj -- Mj + bjiXj) Hq: 4.3 where Mj is the amount of numeraire good (money) held by the agent and §J(XJ) is the utility the agent receives from its consumption of all other goods in the Z-good economy, Xj = {xhj, x2J, ...,xL_hj} [Walsh et al., 1998]. The advantage of a quasi-linear utility function is that it permits a distinction to be made between the intrinsic utility that accrues from the ownership of goods, §j(Xj), and the utility that accrues from the ownership of special numeraire good that exists solely to facilitate inter-agent transactions. The pre-cise form of §j(Xj) can vary from agent to agent. However, the values of the function are determined by solving the agent's M D P for the allocation of goods X T . The implication of Equation 4.3 is that 88 the utility an agent receives from owning goods (e.g., production resources) is direcdy expressible in terms of a quantity of good M a n d vice-versa. It is important to note that inter-agent transfers of good Mare zero-sum. Thus, from a global point of view, the total amount of good M is the system is constant. That is, like all other goods in the pro-duction economy, money is neither created or destroyed during the operation of the market. It is merely exchanged between agents and thus the net effect of transfers of M between agents on global utility is zero. In contrast, the changes in intrinsic utility facilitated'by transfers of M do have an effect on global utility. To illustrate, consider the following example: A machine, M l is considering the sale of a contract for production at time T l to a part-agent A l . By processing the part, M l will incur a loss of intrinsic utility due to wear and other variable costs. Thus, M l will only agree to sell the contract for processing if the • buyer (Al ) indemnifies it for its loss of utility. From A l ' s perspective, purchasing the contract for machining increases the probability that it will ship. As a consequence, ownership of the contract results in an increase intrinsic utility. If the increase in intrinsic utility is greater than Mi 's ask price for the contract, then it is rational for A l to purchase the contract by transferring the appropriate quantity of M to M l . Under the quasi-linear utility function, the exchange is Pareto efficient since M l is at least as well off after the sale of the contract and A l is stricdy better off. Put another way, if the gain in expected profit to A l exceeds the cost of wear and tear to the system to M l , the part represented by A l will be pro-cessed on M l at T l . 4.1.6.2 R I S K N E U T R A L I T Y Agents are defined to be risk neutral. As such, an agent's expected utility equals the agent's expected monetary value, Vj. UJ=VJ Eq:4.4 The purpose of defining agents as risk-neutral is to establish equivalency between expected utility and expected monetary value (EMV). Whereas Equation 4.3 establishes the existence of an "exchange rate" between money and intrinsic utility, Equation 4.4 states that the exchange rate is independent of 89 the amount of good M held by the agent. Thus, the actual initial endowment of M to the agents can be arbitrarily large or small (as long as it is finite) without affecting the agents' reservation prices for resource goods. In addition, the property of risk neutrality is extended to the system as a whole so that global utility is taken to equal global expected monetary value: UQ = VQ The absence of wealth effects follows as a consequence of quasi-linear utility for agents. A n agent knows its willingness to pay a certain amount of good M in exchange for other goods and is therefore unwilling to buy resources simply because it has money. To illustrate, consider the effect of changing the initial endowment of agent j by some amount a: Mj + a + §J{XJ) = Uj+a Eq: 4.5 Since money and utility are expressed in the same units, any transfer of money to other agents results in decrease in utility for the buying agent. The decrease for the buyer corresponds exactly to the dol-lar amount of the sale. Thus, for an exchange to be rational, the increase in intrinsic utility; tyj(Xj), that accrues to the buyer as a result of the purchase must exceed the price it pays. But since the price the buying agent pays is no less than the selling agent's intrinsic utility for the same good, the net effect of the transfer is that the agent with the highest intrinsic utility for the good gets the good. The selling agent is no worse off because it the good M appears in its utility function and the increase in M resulting from the sale offsets its loss of the good. 4.1.6.3 A D D I T I V E U T I L I T Y Global utility, UQ, is a linear function of the agent utilities, Uj, rJG = Y aju: E c i : 4 - 6 j Equation 4.6 states that the utility of the manufacturing organization as a whole is a linear function of the utility of each agent j that has been created during decomposition of the system. To better under-90~ stand the additive property, it is useful to decompose the utility function into its components and rec-ognize that the sum of Mj for all j is a constant, M. Rewriting Equation 4.6 yields: G = £ UjlMj + fy'Xj)] = M+ X <»jWj(Xj Eq: 4.7 To verify the last term of Equation 4.7, recall that the function (J)-(^) is simply the expected value to agent j of owning the allocation of goods Xj. Moreover, the expected value for each agent is a func-tion of exogenously-determined costs and rewards expressed in monetary units. Specifically, recall that there are three sources of costs: the variable costs of processing incurred by machines and the holding costs incurred by parts. O n the revenue side, there is only one real source revenue: the termi-nal rewards received by parts for successfully shipping. The important feature of all three sources of cost and revenue is that they are directly attributable to individual parts and machines (and hence attributable to individual agents). Moreover, as established in the preceding sections, the sources of costs and revenues for each agent are independent of other agents. 4.1.6.4 C O M M O N N U M E R A I R E The measure of utility for all agents is identical and interchangeable, Bij = 1 for V / Eq: 4.8 The fourth property of agents states that a dollar is a dollar regardless of how the dollar is earned or which agent earns it. In other words, no job has a predefined, non-monetary priority over any other job. 4.1.6.5 F R O M P A R E T O O P T I M A L I T Y T O G L O B A L O P T I M A L I T Y Combining the four properties above with the first fundamental theorem of welfare economics yields the following result: nax(L/G) = max(VG) = max^ V- = ] £ m a x ( * j - Eq: 4.9 91 The final term in Equation 4.9 restates of the first fundamental law of welfare economics and is therefore critical: the utility of the system as a whole can be maximized as a consequence of the max-imizing behavior of each independent agent j. The price mechanism and the requirement that all exchanges be Pareto efficient ensure that the increase in utility for the buying agent is greater that the price paid to the seller in terms of the common numeraire good, M. As a consequence, the maximiz-ing behavior of the agents leads to a strictly monotonic increase in global utility. 4.1.7 * C A V E A T S A N D S C O P E Before discussing the details of a system that implements Equation 4.9, two lurking issues regarding the "rules of money" and the properties of agents should be addressed. First, the additive utility property is not appropriate when inter-agent dependencies exits. For example, when two parts must come together simultaneously to create an assembly, neither part receives a reward until the assembly is completed. Although such dependencies tend to be the rule rather than the exception in manufac-turing environments, the case of mating parts is taken to be beyond the scope of this thesis. However, as discussed in Section 6.2.2.1, there is a relatively straightforward way to address the problem of mat-ing parts that is consistent with economic theory. Briefly, the relationship between assemblies and parts can be modeled by introducing hierarchical agent relationships analogous to contractor and subcontractor roles. The contractor agent receives the entire terminal reward for the finished assem-bly; however, the contractor must use side payments to subcontractors to coordinate the production of the individual parts. In effect the contractor's side payments are used to internalize the externality created by the synchronization issue. A related issue is the externality created by setups. If a production machine is setup for Operation A and an agent incurs the cost of setting up the machine for Operation B, then all agents that require Operation B benefit from the changeover without paying for it in any way. In a sense, setups are anal-ogous to public goods in real economies. The defining characteristic of goods such as parks and clean air are that everyone benefits if the goods are provided, but no one is willing to incur the cost of pro-vision alone. The problem created by setups is beyond the scope of the thesis. However, there are well-established techniques for internalizing public good externalities. For example, one approach is to introduce agents whose sole role is to consolidate demand for the good and ensure all agents who 92 benefit assume their share of the cost (e.g., governments in real economies). This issues surrounding setups and other externalities are discussed in greater detail in Section 6.2.2.2. 4 . 1 . 8 A K N O W L E D G E R E P R E S E N T A T I O N L A N G U A G E F O R A G E N T S In order to model complex physical systems in terms of agent-oriented constructs—such as beliefs, desires, and capabilities—a knowledge representation language is required. This section provides an overview of the language used in this thesis. A knowledge representation language consists of two elements: syntax and semantics [Russell and Norvig, 1995]: • Syntax describes the symbols and the allowable configurations of symbols that may be used to make statements about the world. The syntax may be graphical and consist of boxes and arrows. Alternatively, it may be mathematical or logical in flavor and consist of textual sym-bols (such as the examples from the situation calculus and STRIPS in Chapter 3). For any for-mal language, the syntax is ultimately a series of bits inside a computer's memory and thus the exact nature of the symbols used is not important as long as the different representations are isomorphic—that is, as long as there is a means of making lossless transitions from one set of symbols to the other. For example/it is a simple matter to translate decision trees (which have a graphical syntax) into productions (which have a textual IF-THEN syntax). • Semantics determines the relationship between sentences in the knowledge representation language and the world the language is meant to represent. In other words, semantics deter-mine the meaning oi sentences [Winston, 1992]. For a formal knowledge representation language to be used for planning, it must also include mecha-nisms for inference. In the classical planning languages discussed in Section 3.1.3, inference is achieved through the well known inference rules of first-order logic (such as modusponens, resolution, and so on). However, in order to make decisions in the real world, an agent must be able to represent and reason about its own imperfect knowledge and the uncertain outcomes of its actions. 93 4.1.8.1 S O U R C E S O F U N C E R T A I N T Y The model of rationality used in this research (recall Figure 3.2 on page 53) admits two distinct forms of uncertainty: 1. Uncertainty about state: A n agent might not know precisely what state it is in. Indeed, the purpose of conventional management information systems is to help real agents refine their beliefs about their actual state (e.g., "Are we in a state with good quarterly performance?"; "Are we in a state in which Employee X has achieved her sales objectives?"). A n environment in which agents are assumed to have perfect sensing and can be relied on to know their cur-rent state is called accessible (orfully observable) [Russell and Norvig, 1995], [Puterman, 1994]. A n environment in which agents are permitted to be unsure are inaccessible or partially observ-able. 2. Uncertainty about outcomes: There are two distinct types of uncertainty about outcomes. Event uncertainty captures the agent's imperfect knowledge about how its environment evolves and changes over time. For example, an agent in a manufacturing environment may be adversely effected by events such a work stoppage due to labor unrest or a power outage. However, the agent has only imperfect knowledge about if and when such events will occur. The important feature of event uncertainty is that it has nothing to do with the agent. That is, the uncertainty is a feature of the environment exclusively. Action uncertainty, in contrast, arises from the agent's imperfect knowledge of how its own actions affect its environment. For example, a mobile robot may execute a PickUpBlock action. However, there may be a small chance of the block slipping through the gripper and remaining on. the table after execution of the action. Thus, the primary difference between an event and an action as they are defined here is that agents have no control over events. Note that in either case, all the uncertainty in this formulation arises as a result of the agent's inability to perfectly sense the world or its inability to perfectly predict outcomes. The world itself, however, is assumed to be a certain and unequivocal place. For example, if a block is on a table, it is on the table.5 Moreover, in this research, a special accessibility assumption is made: all relevant features for an agent's states are known with certainty. Returning to the example of the mobile robot in Figure 3.1 on page 5 This assumption contrasts with the fuzzy-theoretic stance (e.g., [Zadeh, 1965]) in which the reality of the world itself is subject to nuances and degrees of truth. 94 43, the robot is assumed to have complete and perfect knowledge of the value of its x and y state vari-ables. However, the agent is not assumed to have any knowledge of state variables that are not rele-vant to the planning task at hand. For example, the mobile robot is not assumed to have any knowledge of whether it is raining outside, the current stock price of I B M , and so on. Given these assumptions, the only remaining source of uncertainty is due to the agent's imperfect knowledge of causality. If the agent had better knowledge of causality, it would be able to refine its probabilistic assessments and make better decisions. For example, assume that a mobile robot believes that the sentence InGripper will be true following the PickUpBlock action in six out of ten instances. However, as noted in the discussion of fineness in Section 3.2.1, the agent could make bet-ter estimates with better information. For example, assume that a wet block is more likely to slip than a dry block when gripped. In such circumstances, the agent's belief about the truth value of InGripper following PickUpBlock could be expressed using conditional probabilities: P(InGripper | PickUpBlock, DryBlock) = 0.9 P(InGripper | PickUpBlock, -'DryBlock) = 0.1 Other factors that contribute to a block slipping could include whether the grip is oily, the hardness of the block, and so on. In principle, it should be possible to enumerate all the causal factors so that the truth value of an outcome following execution of an action is known with virtual certainty. In practice, however, such certainty is unachievable and thus the reasoning done by agents is necessarily approximate. 4.1.8.2 E L E M E N T S O F T H E K N O W L E D G E R E P R E S E N T A T I O N L A N G U A G E In this thesis, agents are created to make decisions on behalf of objects (often inanimate) in the real world. As such, each agent must represent its object in both the agency sense of the word and the mod-eling sense of the word. In this section, the focus is the modeling sense. In addition to representing the object's state, the agent must be able to represent any relevant aspects of the environment in which the object is situated. The agent's own state is defined as the union of information about its object, the object's environment, and any other information that the agent requires to fulfil its deci-sion making objectives. In addition to information about state, the agent requires information about 95 transitions between states and the costs and rewards associated with the transitions so that it can plan in dynamic environments. The language advocated for this purpose is a variant of the STRIPS representation that permits com-pact representations of actions with uncertain outcomes. The language is based on the work of the work by Boutilier etal. (e.g., [Dearden and Boutilier, 1997], [Boutilier and Dearden, 1996], [Boutilier,; Dearden and Goldszmidt, 1995]) which in turn incorporates concepts from A D L [Pednault, 1989] . ••• andBURIDAN [Kushmerick, Weld and Hanks, 1994]. The primary difference between the language developed in this thesis and its ancestors developed by Boutilier et al. is that the language for schedul-ing agents must represent and reason about time explicitly. In contrast, the planning domain generally used within the decision-theoretic planning community (at least for illustrative purposes) is the con-trol of mobile robots. The core constructs of the action representation language are shown graphically in Figure 4.4. A n agent starts in a state s which may have an intrinsic utility to the agent represented by the immediate reward r(s). When the agent executes action a € As in state s, two things happen: first, the agent incurs an action cost c(a, s) ; second, the action has an effect e(a, s) e Ea . The agent's knowl-edge of effects is probabilistic and therefore its knows only which effects are in the set Ea . and the probability associated with each e(a, s) . The application of a set of effects to state s results in out-come—that is, a transition to a new state s'. In the following sections, the essential elements of this action representation framework are discussed in greater detail and illustrated with respect to a part • agent that requires three units of processing on three machines. States In STRIPS, states are represented by conjunctions of ground propositional atoms, such as s = (raining, "'umbrella). In the state description language used here, the notion of an atomic sentence is expanded to include predicates and the equality symbol. This notation permits parameterized sen-tences of the form variable name = value (e.g., OpStatus(Opl) = complete). Although the domain of value is not restricted to binary variables, it is assumed to be finite, discrete, and mutually exclusive. 96 S'n FIGURE 4.4 A graphical representation of the elements of the knowledge representation language for agents. It is important to recognize that the addition of predicates and the equality symbol are simply "syn-tactic sugar" [Russell and Norvig, 1995, p. 200]. That is, although the syntax resembles first-order logic, the semantics remain purely propositional (at least for the purposes of the scheduling algo-rithms considered here). The same sentence could be written in proposition form as OplComplete or -'OplComplete. However, the use of the predicate form simplifies parameterization over related state variables (e.g., OpStatus(Opl), OpStatus(Op2), ...). The equality sign permits compact representations of multi-valued attributes. For example, a propositional representation of time would require an atom' for each time unit (Tl, T2, ...). Moreover, since the time values are mutually exclusive, representation of a particular instant in time would.require a f-tuple in which t is the number of distinct time atoms and t - 1 time atoms are negated. The name = value notation makes it clear that the values of Time are mutually exclusive. The question of which variables to include in the agent's description of state is complicated by the inherent conflict between accuracy and computational feasibility discussed in Section 4.1.8.1. On one hand, the precision of conditional probability estimates increases monotonically with the amount of information in the conditioning term. On the other hand, the curse of dimensionality (recall Section 3.1.4.4) provides clear incentives to minimize the number of state variables in the problem formulation. Thus a judicious trade-off must be made between the decision-making value of addi-tional information and the computational cost of using the information. Approaches to making this trade-off are suggested in the discussion of learning in Section 6.2.1.4. 97 The number of state variables required to implement the simple part agents considered here is rela-tively modest, as shown in Table 4.1. Naturally, as in any scheduling system, the agent will need vari-ables for representing the passage of time (Time), the completion status of operations (OpStatus(Opi)), and whether the agent owns contracts for specific production resources (Contract(Mj, Tj)). In addition, it is typically the case that the probability of completing processing in the next unit of time is condi-tional on the amount of processing that has already been performed on the operation. The state vari-able ElapsedTime(Opj) is used to retain this information from state to state so that it may be used when estimating outcomes. Encoding ElapsedTime(Opj) within the state description has obvious implications for the size of the problem state space. However, by making elapsed time information available to the planning algorithm, any discrete probability distribution for completion times can be used. Two special state variables are used to encode the fixed costs and benefits encountered by the agent as it works through the system. HoldingCost represents the penalty per unit time that the agent incurs whenever a unit of time passes. The standard practice in manufacturing is to make the holding cost a function of the value added to the part or the time it has spent in the system. Thus, as the part nears completion, its holding cost per unit time increases. The ProductValue variable represents the value of the product when it leaves the production system. Since the terminal rewards in this system are assumed to be exogenously determined, the variable can be treated like a constant in most cases. However, it may be desirable to account for specific changes to the terminal reward by assigning new values to ProductValue during processing. For example, certain measurable quality problems may occur during processing which result in the requirement to downgrade the final value of the part. The last two state variables shown in Table 4.1 are used by the agent for bookkeeping purposes. Shipped is a propositional variable which is used to determine whether the part has left the produc-tion system being scheduled. NewTimeUnit is used to simplify the representation of actions with tem-poral effects. Both these variables are discussed in greater detail in subsequent sections. 98 TABLE 4.1: State variables used to model a part agents State variable Values Description Time 1,2 f the current time where t is the number of time units in the current planning horizon OpStatus(Opj) complete, incomplete the status of each processing operation, Op | Gontract(Mi,Tj) yes , no whether the part owns a contract with machine M| at time Tj ElapsedTime(Opj) 0, 1, . . . . m the number of units of processing time already invested in Opj HoldingCost one real value or multiple real values that d e p e n d on the value of s o m e other state variable (e.g., E lapsedTime(Opi) or OpStatus(Opi)) the cost per-unit-time of holding one unit of value added in work- in-process inventory (typically a function of va lue-added or time in the system) ProductValue one or more real va lues the exogenously-determined value of the finished product Sh ipped yes , no whether the product has been shipped NewTimeUnit yes , no a flag to indicate whether a unit of time on the real clock has p a s s e d Actions In STRIPS [Fikes and Nilsson, 1971], action descriptions consist of three lists (where a list is simply a' conjunction of propositional atoms): 1. precondition list: specifies atoms that must be true in the world for the action to be exe-cuted, 2. add list: specifies literals (possibly negated) that are known to be true in the world after the action is executed, 3. delete list: specifies atoms about which nothing is known following the action6. In the fully accessible case considered here, a delete list is not required. It is important to recognize the distinction between atoms that are deleted via the delete list and atoms that are negated via the add list. In the former case, knowledge of state is lost. 99 ( action^) action preconditions I ( aspect^) ( aspect^ I I outcome discriminants effects (outcome^) (outcomen) I effects ( aspect^) r = = n ( probability ) ( probability ) FIGURE 4.5 The action representation hierarchy forP-STRIPS: Action preconditions determine whether the action is executable; outcome discriminants influence the probability distributions over outcomes. In probabilistic STRIPS (P-STRIPS), the basic STRIPS representation is expanded to support the description of actions with probabilistic outcomes. Since probability typically involves notions of dependence and independence, it is useful to introduce new constructs to allow dependence relation-ships to be represented parsimoniously. In what follows, each of the constructs in the P-STRIPS hier-archy shown in Figure 4.5 is described and illustrated with respect to particular action: Process(M2, 0p2, 04). The action describes processing operation Op2 on machine M2 at time unit 04 for exacdy one unit of time. The illustrations of the constructs are set in a sans-serif font to help distinguish the general case from the specific example. Following the recommendation of [Pednault, 1989], the notion of preconditions is divided into two distinct constructs: 1. Action preconditions — A n action precondition is a set of sentences (possibly empty) that specifies what must be true in the world before a particular action can be executed. In other words, the action preconditions define the set of actions As. 100 In order to execute Process(M2, Op2, 04), three preconditions must be satisfied: First, the current time must be equal to time 04; second, the agent must own the contract for machine M2 at time 04. Finally, a physical precedence constraint dictates that Op2 cannot be commenced until O p l is complete. These preconditions are shown in Figure 4.6. Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete FlG URE 4.6 Action preconditions specify conditions that must be true in the world for the action to be feasible. Outcome discriminants —- When an action is executed, its effects may depend on features of the current state. For example, actions of the form Process(Mj, Opj, T k ) are likely to depend on the amount of processing that has already been invested in Opj. A discriminant [Boutilier and Dearden, 1996] is set of sentences (possibly empty) that corresponds to distinct sets of conditional effects. In other words, a discriminant J is used to construct conditional probabil-ities of the form P(cp|y) , where cp is a set of effects. The probability of completing an operation after a single unit of processing time may be conditioned on a number of state variables. In this example, only ElapsedTime(0p2) is 101 considered. The relationship between ElapsedTime(Op2) and the probability of completing the operation by time 6~is shown Figure 4.7 by the cumulative histogram on the right. Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete Discriminants: D l : ElapsedTime(Op2) = 0 D2: ElapsedTime(Op2) = l D3: ElapsedTime(Op2) = 2 1.0 I F TD 0) LU VI O CL 0.5 0 1 2 ElaspsedTime(Op2) F I G U R E 4.7 Discriminants are used to identify the relevantfeatures of the current state: When the action is executed, its outcome depends on the outcome discriminant. The notion of effects must also be expanded in P-STRIPS to account for the possibility that a single action can have multiple possible outcomes: 3. Outcomes — The representation of non-deterministic effects requires the introduction of an additional construct that is used to batch together sets of effects. A n outcome is a set of effects, (p, that can be assigned a conditional probability, P((p|y) • The sum of the probabili-ties for the outcomes within a discriminant must sum to one. There are two basic outcomes of interest for any processing action: either the operation is complete at the end of the unit of time of processing or it remains in progress. As discussed above, the probability of either out-come is conditional on the amount of processing that has already been invested in the opera-tion. When ElapsedTime = 0, the probability of the operation being complete at the end of the unit of processing time is zero (i.e., min(c") > 1). Similarly, the assumption is made that the maximum amount of time that Op2 could require on M2 is three units of time. As such, the probability that the operation is complete after executing the Process(Op2, M2, 04) action in any state in which ElapsedTime = 2. is 1.0. When only one 102 unit of processing has already been applied to the operation, there is a 20% chance that the operation will be complete at the end of the unit of processing time. The P-STRIPS representation of outcomes is shown in Figure 4.8. Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete Discriminants: D l : ElapsedTime(Op2) = 0 Outcomes: O l (not completed): P = 1.0 D2: ElapsedTime(Op2) = 1 Outcomes: 01 (completed): P = 0.2 02 (not completed): P = 0.8 D3: ElapsedTime(Op2) = 2 Outcomes: O l (completed): P = 1.0 FIGURE 4.8 The execution of the action mil lead to one of the outcomes with a known probability. Effects — The effects construct in P - S T R I P S is similar to the add lists in conventional STRIPS. Effects are sentences that are known to be true in the world following execution of actions. The effects' associated with the "completed" and "not completed" outcomes are the same for every discriminant of Process(Op2, M2, 04). When the "completed" outcome occurs, the effect is OpStatus(Op2) = complete. When the outcome is "not completed", the OpStatus(Op2) variable remains unchanged. However, the ElapsedTime(Op2) variable is incremented by one to reflect the work that has been done during the unit of processing -time, as shown in Figure 4.9. 103 Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpSta tus (Opl ) = complete Discriminants: D l : ElapsedTime(Op2) = 0 Outcomes: 01 (not completed): P = 1.0 Effects: ElapsedTime(Op2) = 1 D2: ElapsedTime(Op2) = 1 Outcomes: 01 (completed): P = 0.2 Effects: OpStatus(Op2) - complete 02 (not completed): P = 0.8 Effects: ElapsedTime(Op2) = 2 D3: ElapsedTime(Op2) = 2 Outcomes: O l (completed): P = 1.0 Effects: OpStatus(Op2) = complete FIGURE 4.9 Different effects lists are associated with each outcome. Some actions have multiple sets of effects that are independent of each other. For example, all actions of the form Process(Opj, Mj, T k) are temporal—that is, their execution implies the passage of time. Moreover, the passage of time aspect the actions occurs regardless of whether there is a change in the completion status of the operation. Rather than including the passage of time within the effects of every discriminant and outcome, a factored representation based on action aspects is used [Boutilier and Dearden, 1996]. 5. Aspects — A n aspect is used to separate action effects into independent groups. Because the outcomes associated with each aspect are independent, the probability of joint outcomes can be determined by multiplying the probabilities of each outcome. For example, if the probabil-104 NTU NTU yes value = -1 value = 0 (a) o value = -1 (b) FIGURE 4.11 Reward trees conditioned on the value o/"NewTimeUnit: The tree in (b) is the "closed world" version in which paths not shown explicitly are assumed to be %ero. ity of outcome 0 1 ; from Aspect 1 is 0.4 and the probability of 02j from Aspect 2 is 0.6, then the probability of Ou A 02j is 0.4 X 0.6 = 0.24. As a temporal action, Process(Op2, M2, 04) has an aspect that describes the passage of time. To simplify this, the NewTimeUnit state variable is used. Since the passage of time is unconditional, only a single (empty) discriminant is required. The use of aspects to account for the passage of time is shown in Figure 4.10. Rewards In the conventional representation of rewards in the M D P literature, an agent receives an immediate reward r(s) whenever it enters the state s [Puterman, 1994]. The problem with the extensional formu-lation is that it requires a vector of rewards of size | S \. Moreover, it ignores the fact that rewards may be feature-based rather than state-based. A feature-based reward is triggered whenever certain precon-ditions are satisfied, regardless of what state the agent is in. A compact way to represent such rewards is to use trees in which interior nodes correspond to state variables, branches correspond to values of the state variables, and leaf nodes correspond to reward values. To illustrate, consider the issue of holding cost. A simple means of representing holding cost is to use the NewTimeUnit state variable as a reward precondition, as shown in Figure 4.11(a). The tree states that the agent receives a "reward" of -$1 whenever it enters a state in which NewTimeUnit = yes. Con-7 In an extensional representation, each state in the state space is explicitly and uniquely named. In an intensional representation states are described by sets of features. See [Boutilier, Dean and Hanks, 1999] for a more detailed discussion of the distinction between the two types of representations. 105 Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete Aspects: A l (temporal): Discriminants: D l : (empty) Outcomes: 01 (new time unit): P = 1.0 Effects: NewTimeUnit = yes A2 (processing): Discriminants: D l : ElapsedTime(Op2) = 0 Outcomes: O l (not completed): P = 1.0 Effects: ElapsedTime(Op2) = 1 D2: ElapsedTime(Op2) = 1 Outcomes: 01 (completed): P = 0.2 Effects: OpStatus(Op2) = complete 02 (not completed): P = 0.8 Effects: ElapsedTime(Op2) = 2 D3: ElapsedTime(Op2) = 2 Outcomes: O l (completed): P = 1.0 Effects: OpStatus(Op2) = complete FIGURE 4.10 An aspect is used to represent the passage of time in a temporal action. versely, if the agent enters a state in which NewTimeUnit = no, it receives a reward of zero. The tree in (a) is exhaustive—that is, every s e S satisfies exactly one branch and is therefore associated with exactiy one reward. However, a variant of the closed world assumption^ can be applied to the reward tree The closed world assumption is often used to clarify incomplete representations of state. The assumption states that any proposition not explicitly known to be true in the initial state can be presumed false [Weld, 1998]. 106 NewTimeUnit yes # HC $1 / $ 2 \ $3 value = -1 O value = -3 value = -2 (a) yes i complete complete OpStatus(Op3) complete NewTimeUnit OpStatus(Opl) OpStatus(Op2) incomplete value = -1 incomplete value.= -3 value = -2 (b) FIGURE 4.12 Action cost trees for holding costs: In (a), the value of the HoldingCost variable is used to condition the value of the rewards. In (b), there is a perfect correlation between HoldingCost and OpStatus(Opj) and thus HoldingCost is removed to simplify the tree. to reduce the tree's bushiness. Under this assumption, only branches with non-zero leaf nodes are included, as shown in Figure 4.11(b). Although all the rewards in the part-agent formulation used here are negative, the term "reward" is used instead of alternatives (such as "penalty") in order to remain consistent with the standard M D P nomenclature. The same tree structure can be used to represent more complex reward preconditions. For example, Figure 4.12(a) shows a reward tree for the case in which the state variable HoldingCost takes on differ-ent values. This type of tree would be useful whenever certain milestone events or actions are used to increment the value of HoldingCost. For example, assume HoldingCost is incremented whenever an operation is completed. Interestingly, given the direct correspondence between HoldingCost and the OpStatus(Opj) variables in this example, it is possible to eliminate HoldingCost from the reward repre-sentation, as shown in Figure 4.12(b). This latter representation is preferable given that variables of the form OpStatus(Opj) are bound to appear in the tree anyway. 107 Action costs A n action cost c(a, s) can be associated with the execution of an action in a particular state. For example, in the mobile robot example in Section 3.1.4, an action cost is associated with each move-ment action to reflect the drain on the robot's battery. To achieve parsimonious representation, the same feature-based approach used for rewards is used for action costs. The primary difference between rewards and action costs is that a separate action cost tree is required for each action. For the part-agents considered in this system, the "terminal reward" the agent receives when leaving the system is implemented as an action cost that is triggered whenever the agent executes the Ship action. As shown in the action schema for the Shipped action in Figure 4.13(a), the action can only be executed when 0pStatus(0p3) = complete and Shipped = no. Since the only effect of the action is to set Shipped = yes, and since there is no action with a Shipped = no effect, the action can only be exe-cuted once within a particular planning problem. The action cost is triggered by the execution of the action so there is no need to repeat the preconditions in the action cost tree. For this reason, the action cost tree shown in Figure 4.13(b), the tree is unconditional—that is, it consists of a single leaf node only. However, it is possible to have more complex action cost trees. For example, a terminal reward that is contingent on shipping date could be implemented by adding the Time variable to the action cost tree. As with rewards, the nomenclature used for action costs can be confusing. Although all the action costs in the part-agent formulation used here are negative, the term "action cost" is used instead of alternatives (such as "action reward") in order to remain consistent with the standard M D P nomenclature. 4.1.8.3 E V E N T S A N D T I M E In order to model the physical world accurately, a means of representing non-deterministic events (such as broken bits, power outages, and so on) is required. Recalling the distinction made between actions and events in Section 4.1.8.1, an event is a state transition that is not a direct consequence of an agent's actions. However, events can occur as indirect consequences of actions' and thus a distinction is made between two types of events in P-STRIPS: action-dependent and action-independent. Action-depen-dent events provide agents with mechanisms to reason about events that may occur while they are engaged in some action. For example, while executing an action of the form Process(Opj, Mj, T k), the bit of the machine may break. Although the breakage event is independent of the primary aspect of 108 Action: Ship Preconditions: Shipped = no OpStatus(Op3) •= complete Aspects: A l : Discriminants: D l : (empty) Outcomes: O l (ship part): P = 1.0 Effects: Shipped = yes (a) (b) FIGURE 4.13 The schema for the Ship action: The P-STRIPS representation is shown in (a) and the (unconditional) action cost tree associated with the action is shown in (b). the action (i.e., whether the product is completed in the current processing step), the event only occurs during processing actions. To represent action-dependent events, event aspects are added to the action descriptions. To illustrate, consider the augmented schema for the Process(Op2, M2, 04) action shown in Figure 4.14. For machine M2 and operation Op2, there is a 0.0028 probability of the bit breaking during a unit of machining. If the bit breaks, the machine requires a new setup. The probability of the part or the machine being damaged as a result of a bit breaking is assumed to be negligible. The discriminants within the event aspects can be used to condition the probability of the event occurring on the value of any number of state variables. For example, the probability of a bit breaking may depend on a number of operating parameters, such as the hardness of the stock being machined, the age of the bit, the feedrate of the stock into the machine, and so on. In the example used here, most operating parameters are determined by the interaction of the part, operation, and machine and thus there is no need to introduce a large number of state variables into the discriminants. Instead, a different probability distribution can be associated with each Process(Op i, M:, T k ) action. O value=-$100 109 Action: Process(Op2, M2, 04) Preconditions: Aspects: A l (temporal): A2 (processing): A3 (bit breakage): Discriminants D l : (empty) Outcomes: 01 (bit breaks): p = 0.0028 Effects: Setup(M2, Op2) = False 02 (bit okay): p= 0.9972 Effects: (empty) FIGURE 4.14 An example of the use of P-STRIPS to represent an action-dependent event. Action-independent events are more difficult to represent since actions form the basis of all state transitions in STRIPS-like formalisms. As their name implies, action-independent events should not be included as aspects of particular actions in the same way as action-dependent events. For example, events such as labor disputes and power outages are not caused or influenced by a particular action performed by the agent. However, since such events can have a significant impact on the agent's util-ity, the agent should be able to reason about and plan around their possible occurrence. Another important consideration is that an action-dependent event for one agent is often an action-indepen-dent event for other agents. Assume for example that the part represented by Agent 1 is being pro-cessed on Machine M2. Agent 1 can use an action-dependent representation of the bit breaking event since the event is linked to its processing action. However, for all agents waiting in the queue behind Agent 1, the bit breaking event is an action-independent event. In order to understand how action-independent events are handled in P-STRIPS, a brief explanation of the IncrementClock action is required. IncrementClock is a special fixed action that is automatically 110 assigned to certain states. A fixed action is simply an action that the planning algorithm cannot replace with another action. A portion of the action schema for IncrementClock is shown in Figure 4.15. As the preconditions indicate, the action is executed in any state in which the state vari-able NewTimeUnit is true. Since the "reset N T U " aspect is used to set the value of the NewTimeUnit state variable to false, the passage of time in the system can be seen as a series of transitions between two halves of the state space: the half in which NewTimeUnit is true and the half in which NewTimeUnit is false. Action: IncrementClock Preconditions: NewTimeUnit = True Aspects: A l (reset NTU): Discriminants: D l : (empty) ' Outcomes: 01: P = 1.0 Effects: NewTimeUnit = no A2 (advance clock): Discriminants: D l : Time = 1 Outcomes: 02: P = 1.0 Effects: Time D2: Time = 2 Outcomes: 01: P = 1.0 Effects: Time Drr. Time = n Outcomes: 01: P = 1.0 Effects: Time FIGURE 4.15 A. partial action schema for the IncrementClock action: The discriminant Dn is used to advance the clock past the planning horizon of t time units. = 3 = long 111 The primary rationale for introducing the NewTimeUnit and IncrementClock constructs is to simplify the process by which the Time state variable is incremented. The "advance clock" aspect in Figure 4.15 shows how discriminants and effects are used to increment the value time whenever the IncrementClock action is executed. However, the IncrementClock action is also used to implement action-independent events. For example, the probability a particular machine requiring unscheduled maintenance can be added as an aspect: Completion of the part requires machines M2 and M3 to be operational. The probability of M2 breaking down in any unit of time is estimated to be 0.003. The probability of M3 breaking down in any unit of time is 0.00012. Since the aspects are assumed to be independent, the probability of both machines breaking down within a unit of time is 0.003 x 0.00012 = 0.00000036. Since both breakdown events are taken to be independent of a particular action and independent of each other, they can be added to the IncrementClock action as separate aspects, as shown in Figure 4.16. Like any other discriminant, the discimrinants for action-independent events can be refined by conditioning the outcome probability distributions on state variables. Thus, the Increment-Clock action provides a natural way of describing time-dependent events—such as any type of decay process—in a parsimonious and natural manner. Although it is certainly possible from a computational point of view to embed "exogenous events" into the agent's basic action descriptions without the overhead of a special IncrementClock action (e.g., [Boutilier, Dean and Hanks, 1999]), the clear distinction between action-independent and action-dependent events is adopted to simplify the modeling process. 4.1.8.4 T E R M I N A L S T A T E S There are two situations in which the state transitions for an agent should cease: when the part has been shipped (Shipped = yes) and when the agent's planning horizon has been exceeded (Time = long). The fixed action End is associated with both these "terminal states" to prevent any fur-ther state transitions. In M D P terminology, the End action is used to implement absorbing states. Since the End action has no action cost and the absorbing states have no immediate reward, the agent's util-ity does not change once a terminal state has been entered. 112 Action: IncrementClock Preconditions: NewTimeUnit = True Aspects: A l (reset NTU): A2 (advance clock): A3 (M2 breakdown): Discriminants: D l : (empty) Outcomes: 01: P = 0.003 Effects: Setup(M2) = false A4 (M3 breakdown): Discriminants: D l : (empty) • Outcomes: OI: P = 0.00012 Effects: Setup(M3) ='false FIGURE 4.16 Aspects for action-independent events: The IncrementClock action is used to "host" all action-independent events. 4.2 AGENT-LEVEL RATIONALITY Formulating.resource allocations problems as MDPs and solving them using stochastic dynamic pro-gramming techniques such as policy iteration provides policies that are provably optimal. However, the curse of dimensionality (see Section 3.1.4.4 on page 48) prevents all but the smallest problems from being formulated in this way. Moreover, although the agent-level problems addressed in this research are exponentially smaller than the global-level problem for an entire manufacturing facility, the curse of dimensionality remains problematic. To illustrate, recall the set of state variables for a part-agent shown in Table 4.1 on page 99. This set can be considered "minimal" in that it is sufficient for representing fundamental issues such as the passage of time, the completion status of jobs, and so on. However, the formulation in Table 4.1 does include state variables for representing more complex issues such as machine setups and material availability. In addition, the size of the problem is relatively 113 TABIJE. 4.2: Determination oj the state space for a three-operation three-machine task with a decision horizon of 10 time units. Variable class Number of instances, / Number of values in domain, |d| Number of unique values, \d\' Time 1 10 10 OpStatus(Opi) 3 2 8 ContracUMi.Tj) 30 2 1,073,741,824 ElapsedTime 3 Op1 = 4 Op2 = 3 Op3 = 5 60 HoldingCost 1 1 1 ProductValue 1 1 1 Shipped 1 2 2 NewTimeUnit 1 2 2 Total number of unique states 2.06 x 10 1 3 modest: three operations, three machines, and a planning horizon of 10 time units. Despite its modest 1 "\ size, the problem induces a massive explicit state space of 10 states, as shown in Table 4.2. A large portion of the state space in the problem formulation is the result of parameterized state vari-ables such as Contract(Mj, Tj). The contract.status for a certain machine at a certain time is a binary variable (either the agent owns the contract or it does not). However, the need to consider each of T units of time on each of Mmachines as a separate resource means that 2Mx ^unique combinations of contracts are possible in any state. Clearly, as the number of machines that are relevant to the agent increases or as the planning horizon is extended, the number of states becomes extremely large. A n important goal of this research is to identify ways in which the power of the M D P formulation can be retained without triggering an explosion in the size of the problem's state space. To achieve this end, four techniques have been employed: structured dynamic programming, assertions and con-straints, rolling planning horizons, and distinct valuation and bidding phases. In the following sec-tions, each of the "coping strategies" is described in greater detail. 114 4 . 2 . 1 S T R U C T U R E D D Y N A M I C P R O G R A M M I N G The feature-based representations used in P -STRIPS provide a convenient and parsimonious means of representing actions, rewards, and action costs. By describing the antecedents and consequences of actions in terms of specific state variables (rather than in terms of states themselves) an enormous amount of irrelevant information can be eliminated from the action description. For example, rather than using a matrix of size \S\ to map each state to a reward value, P -STRIPS employs simple tree structures such as those shown in Figure 4.12 on page 107. Despite the compactness of the representation language, difficulties occur when an attempt is made to use the representation language to reason about agent actions. The conventional MDP-based solu-tion methods used in decision-theoretic planning require the agent's state space to be enumerated explicitly. Unfortunately, as the example in the preceding section illustrates, explicit enumeration of states is impractical, even for a small problem. In structured dynamic programming (SDP), the fea-ture-based techniques used for problem representation are carried over into the reasoning and solu-tion phase. For example, [Tatman and Shachter, 1990] show how dynamic programming algorithms can be applied directly to influence diagrams and decision trees. [Boutilier, Dearden and Goldszmidt, 1995] apply a similar approach to the planning domain. Indeed, the work of Boutilier etal. provides the foundation for, the SDP algorithm developed and implemented in this thesis (see [Boutilier, Dean and Hanks, 1999] for a recent review). 4.2.1.1 P O L I C Y T R E E S The foundation of the SDP algorithm used in this thesis is a policy tree. A policy tree has the same basic structure as the reward and action cost tress considered in Section 4.1.8.2—interior nodes cor-respond to state variables and the branches emanating from nodes correspond to values of the state variable. In addition to the basic tree structure, policy trees contain two types of special nodes: 1. Policy nodes — A policy node attaches to a node of the policy tree and indicates which action the agent should execute when it is in a state that satisfies the branches leading to the policy node. 2. Value leaves — A value leaf contains the utility the agent expects to receive from being in a state that satisfies the branches leading to the value leaf. 115 To illustrate, consider the simple policy tree shown in Figure 4.17(a). The tree consists of a root node corresponding to the state variable Shipped and two leaf nodes corresponding to the domain values of the variable, {yes, no}. The policy nodes are represented by the action names adjacent to 'nodes of the tree.- In this example, the policy for the agent is simply "execute the action End in any state in which Shipped = yes and the action Wait in any state.in which Shipped = no." The value leaves can be inter-preted in a similar manner: the utility9 an agent expect to receive from being in a state in which Shipped = yes is true is $0 and $-10 otherwise. To make a clear distinction between different types of nodes, interior nodes of policy trees are denoted by a black dot % and leaf nodes are denoted by a hollow dot O-i It is important to note that policy nodes need not be associated with leaf nodes. For example, in the tree shown in Figure 4.17(b), the Ship action is associated with an internal node corresponding to the branches {Shipped = no, OpStatus(Op3) = complete}. In other words, if the current state satisfies the conditions leading to the policy node, then the prescribed action is Ship, regardless of the value of subsequent branches (such as Time in this example). O f course, the value to the agent given the action that it has chosen may depend on other variables. Consequently, there may be many branches between a policy node and the value leaves associated with the policy node. In Figure 4.17(b), the binary state variable Quality is introduced to illustrate the case in which the value of executing the Ship action depends on the final quality of the product. If Quality = high the agent receives $100 whereas if the quality is determined to be low the agent receives only $50. Interior policy nodes are denoted by a partially-blackened dot O 4.2.1.2 R E L E V A N C E R E L A T I O N S H I P S The basic observation underlying the SDP approach is that not all state variables are necessarily rele-vant when deciding on an action. The relevance relationships that do exist between actions and cer-tain state variables can be efficiendy represented using a policy tree. To illustrate, recall the simple policy tree in Figure 4.17(a). The overall policy depends on the value of the Shipped variable only. In any state in which Shipped = yes, the End action is executed. Alternatively, if Shipped = no, the Wait The assumption of risk neutrality from Section 4.1.6 permits the terms "value" and "utility" to be used interchangeably. The Quality state variable is used for illustrative purposes in this example only. 116 value = 100 value = 50 (a) (b) FIGURE 4.17 Basic policy trees: The value of certain leaves in the policy tree in (a) can be improved by executing the Ship action whenever operation Op3 is complete. action is executed. Thus, with only two branches, the policy tree is complete—it specifies an action to be executed in every possible state in the agent's state space. The leaf nodes of the policy tree are called abstract states [Boutilier and Dearden, 1996]. A n abstract state contains just enough information to group together a set of concrete states. To use the problem formulation from Table 4.2 on page 114, the left-hand leaf of the policy tree in which Shipped = yes corresponds to half of the 10 concrete states. However, from the perspective of the SDP planning algorithm, all the concrete states have the same utility value and are associated with the same action. As a consequence, there is no need to distinguish between the states on the basis of other state vari-ables, such as whether the agent owns certain combinations of contracts, and so on. 4.2.1.3 P O L I C Y I M P R O V E M E N T Although the policy tree in Figure 4.17(a) is complete, it is clearly not optimal. In general, the SDP algorithm can be seen as an "anytime planner" since the policy tree provides a feasible policy at every point in the improvement algorithm. Moreover, the value of the policy to the agent rises monotoni-cally with each iteration so it is conceivable that a fixed number of iterations could be used to set an upper bound on computation time. O f course, using the algorithm in anytime mode comes at the cost of optimality. In this research, the objective is to examine the interaction of strictly rational agents in a market and therefore no hard constraints on computational time are assumed. The SDP 117 algorithm is permitted to run to quiescence for every agent-level planning problem, regardless of the amount of computation required. To improve a policy tree, the SDP algorithm examines the defender actions (i.e., the actions already assigned to policy nodes) and attempts to find challenger actions that increase the value of the leaf or leaves beneath the policy node. Recalling the terms of the value function in Equation 3.1 on page 46, there are two ways in which an action can influence the value of a state: 1. Action cost: A n action can involve an action cost (possibly negative) that is triggered by exe-cution of the action. 2. Outcomes: The value of the current state includes the expected value of the states reachable by executing the action. To illustrate the basic process by which defender actions are replaced by challengers, consider the abstract state on the Shipped = no branch of the policy tree in Figure 4.17(a). One challenger action that the agent is bound to consider is Ship, which involves a large (negative) action cost (i.e., the agent receives $100 for executing the Ship action). Since the Ship action has two preconditions, a direct comparison cannot be made between Ship and the defender action, Wait, without first adding the pre-conditions to the tree. The first precondition, Shipped = no is already part of the tree and is satisfied by the policy node under consideration. The second precondition (OpStatus(Op3) = complete) must be grafted to the tree, as shown in Figure 4.17(b). Since policy trees must be exhaustive, the addition of the OpStatus(Op3) node creates the requirement for a new branch for all the states under the policy node in which OpStatus(Op3) * complete. The new branch is called a precondition complement. Obviously, the challenger action is not feasible in the abstract state in which its preconditions are not satisfied. As a consequence, a new policy node containing the defender action is attached to the precondition complement, as shown by the Wait action on the far right of Figure 4.17(b). Once the challenger and all the necessary precondition complements are added, the tree is evaluated. If a leaf has a higher expected value under the challenger action than the defender action, a policy 1 1 Fixed actions such as End (see Section 4.1.8.3) cannot be changed. As such, branches of the policy tree associated with fixed actions are ignored by the SDP algorithm. • 118 node containing the challenger action is attached to the leaf. Alternatively, if the challenger provides no improvement over the defender action for any leaf under the policy node, then the policy tree is restored to its pre-challenger state. In the example shown in Figure 4.17(b), the Ship action leads to a value of $100 or $50 (depending on the final quality of the product). In either case, the value is greater than that of the defender action: -$10. As such, the changes made to the tree are committed and the algorithm continues to search for other improvements. When no further improvements are possible, the policy improvement phase stops. 4.2.1.4 C O M P U T A T I O N A L P R O P E R T I E S O F T H E S D P A L G O R I T H M The only difference between the SDP algorithm sketched out above and the policy iteration algo-rithm discussed in Section 3.1.4 is that the SDP algorithm works on abstract states rather than con-crete states. Although there is a certain amount of overhead required to maintain the tree structure, the policies generated by SDP and policy iteration are equivalent (keeping in mind that the representa-tion of the policy in SDP is much more compact). However, since policy iteration is guaranteed to converge on the optimal policy in time bounded by a polynomial function of the effective size of the state space, and since the number of abstract states in a policy tree is typically much smaller than the number of explicit states in a conventional M D P formulation, SDP can yield significant performance gains. Moreover, since only irrelevant information is ignored by the SDP algorithm, there is no loss of optimality or approximation associated with the technique. One important shortcoming of the SDP algorithm is that the size of the effective state space (i.e., the number of leaf nodes) is not known a priori. Indeed, the final size of the policy tree is not known until the optimal policy is found. In the worst case, every state variable is relevant to every choice of action and the number of abstract states in the policy tree is equal to the number of concrete states in the conventional M D P formulation of the problem. However, the problem formulations encountered in this research, the SDP algorithm is able to exploit independence relationships between state variables and actions. The question of how much computational leverage can be gained is an empirical ques-tion that depends on the problem's underlying structure. In Chapter 5, this question is addressed in greater detail. At this point, it is sufficient to conclude that, in the context of the part-agents consid-ered here, the SDP algorithm can lead to an exponential decrease in the number of states that would be required for conventional policy iteration. Although the exponential decrease does not fully offset 119 the exponential increase in state space caused by adding state variables, it does greatiy increase the size of MDPs that can be solved given a set of computational resources. 4.2.2 G R O W I N G P O L I C Y T R E E S In the example in the previous section, the policy tree grew to accommodate an action precondition. In general, there are five additional sources of tree growth: fixed actions, rewards, action costs, dis-criminants, and supplemental discriminants. In this section, the different sources of tree growth are illustrated by stepping though a small number of iterations of the SDP algorithm. In the first steps, the foundation of the policy tree is built by adding the fixed actions and reward values. In subsequent steps, a tree evaluation and improvement process is repeated until the optimal policy is found. Since the number of leaf nodes in the optimal policy tree for this example is approximately 1000, the entire SDP process is not illustrated. 4.2.2.1 I N F R A S T R U C T U R E F O R F I X E D A C T I O N S Fixed actions provide the "major limbs" of the policy tree, as shown in Figure 4.18. In the part-agent example considered here, two fixed actions—IncrementClock and End—are used to provide basic infrastructure for the planning algorithm. For example, in any state in which the product has been shipped (Shipped = yes), the End action is executed to ensure than no further state transitions are made by the agent. The IncrementClock action is used to manage time-based transitions for the agent. Since the overall objective of a part-agent is to determine its preferences for resource usage over time, accounting for time is a fundamental issue in the design of this type of agent. Figure 4.18 also introduces some new notation for policy trees: 1. The pie-wedge branch under the Time variable is used to represent a large number of values in the variable's domain. In the case of Time, the size of the domain is determined by the granu-larity of the time units and the length of the planning horizon. In this example, a generic unit of time is used and the planning horizon is assumed to be 10 time units. 2. The special branch value "*" is introduced as a means of simplifying the representation of non-binary state variables—it denotes the complement of a specific branch or set of branches. 120 FIGURE 4.18 The "major limbs" ofa policy tree for a part agent: TheIncrementClock a^ra is used to manage time-based transitions. The End action is used to indicate the end of the planning process. Thus, the branch Time = * adjacent to a branch labeled Time = long in Figure 4.18 denotes all states in which the value of Time is not known to be " long". If a *-valued branch was not used in this case, all non4ong values of time would have to be enumerated on the right-hand-side of the tree. The special time value long is used to represent the "long term"—that is, all time values greater than the planning horizon. As the schema for IncrementClock in Figure 4.15 on page 111 shows, if the agent's planning horizon is 10 (i.e., t = 10) and the IncrementClock action is executed when Time = 10, the value of time is set to "long.". The action associated with the Time = long branch is End, indicating a terminal state. In any state in which the planning horizon has been exceeded, the algorithm enters an absorbing state and all planning activity ends. The concept of rolling planning horizons is discussed in Section 4.2.3. 4.2.2.2 R E A P I N G R E W A R D S The leaves of the policy tree contain values corresponding to the utility the agent expects to receive by being in that particular abstract state. As such, any element of the problem definition that can affect the value of the leaves must be included in the tree. Recalling the discussion regarding the rewards in Figure 4.11 on page 105, the agent incurs an immediate reward equal to the per-unit-time holding cost of the part whenever it enters a NewTimeUnit = yes state. However, since the size of the 121 FIGURE 4.19 The core policy tree with rewards: The reward tree for the agent is appended to all the leaf nodes on the NewTimeUnit = yes side of the policy tree. To keep the diagram readable, only the reward tree rooted atT\me = 10 is shown. holding cost is typically a function of the amount of value added to the part, the value of the reward is conditioned on the OpStatus(Opj) family of variables. The expanded reward tree must be appended to each leaf on the NewTimeUnit = yes side of the policy tree, as shown in Figure 4.19. For example, execution of the IncrementClock action from the Time = 10 state on the left-hand side of the tree in Figure 4.19 leads to an absorbing state. Since there is no action cost associated with the IncrementClock action, the values of the leaves under the Time = 10 node are determined exclusively by the immediate rewards. In this case, the agent incurs a holding cost based on the number of oper-ations that have been completed and then enters a zero-valued terminal state. The tree shown in Figure 4.19 constitutes the agent's core policy tree. The core policy tree includes all fixed action and rewards and therefore constitutes a complete policy. Moreover, since the core policy tree is independent of actions, it does not change as the tree is improved. 122 4.2.2.3 P O L I C Y M A P P I N G A N D E V A L U A T I O N In order for the value of a policy tree to be determined, each abstract state must be mapped to its post-action outcome(s). If an action is deterministic, it has exactly one outcome with a probability of 1.0. However, if the action is non-deterministic, its execution will lead to a transition to one of many possible states. Consequently, the expected value of executing a non-deterministic action is deter-mined by summing the value of each outcome state multiplied by the probability of the outcome occurring. In this example, the IncrementClock action is deterministic. As such, all the leaves under the Time = 10 node on the left-hand-side of the tree map to a single Time = long leaf on the right-hand-side of the tree, as shown by the dotted lines in Figure 4.20. In contrast, consider the Wait action on the right-hand-side of the tree in Figure 4.19. The agent's state description before the Wait action consists of the following atoms: {NewTimeUnit = no, Shipped = no, Time * long}. The Wait action only has one effect—NewTimeUnit = yes—so the agent's new state following execution of the action is simply {NewTimeUnit = yes, Shipped = no, Time * long}. To map to an outcome state, the SDP algorithm starts at the root node of the policy tree and works down the tree until a leaf node is encountered. In this example, it is possible to resolve the first branching node (NewTimeUnit = yes); however, it is not possible to select a branch under the Time node or determine the correct values of the OpStatus(Opj) variables. Since, these variables do not currendy appear in the branches leading to the original policy node, there is insufficient information to complete the mapping. To eliminate the incomplete mapping problem, the description of the abstract state in which the action is executed is augmented with additional information, as shown under the Wait policy node in Figure 4.20. The state variables added to eliminate mapping ambiguity are called supplemental discrimi-nants. Although the abstract states below the original Wait policy node inherit the action, the values of the leaf node are now conditioned on specific values of Time and the OpStatus(Opj) variables. For example, the value of the state marked Sj can be calculated using the non-discounted ((3 = 1) version of the value function from Equation 3.1: F(5j) = r(Sl) + c(Wait,Sl)+ P(j\Wait, Sx)V(j) Eq: 4.10 jeS 123 FIGURE 4.20 The core policy tree with mapping information: The policy tree is augmented with supplemental discriminants to enable mapping of outcomes of the Wait action. Since Wait is deterministic, there is only one outcome state j (labeled in Figure 4.20). Furthermore, there are no relevant action costs or rewards associated with the transition and thus Equation 4.10 reduces to V{SX) = 0 + 0 + [1.0x V(S2)] = - 2 . Eq:4.11 Each leaf node generates a separate value equation of the form shown above. However, the value equations are interdependent and must therefore be solved either as a set of simultaneous linear equa-tions or through successive approximation (see [Puterman, 1994] for a description of both approaches in the context of policy iteration). The result of the evaluation stage under a given policy 71 is a value F ^ s ) for Vs E S. 124 FIGURE 4.21 A policy tree prior to improvement: A state transition mapping is shown for the Wait action. Note that the Ship action has already been selectedfor the state in which 0pStatus(0p3) = complete and its leaf node reflects the large negative action cost associated with the Ship action. 4.2.2.4 P O L I C Y I M P R O V E M E N T In the policy improvement phase, each policy node is evaluated with respect to the feasible actions available at the node. To illustrate the improvement phase, consider the node labeled S] in Figure 4.21. In this example, the policy node under consideration is associated with the Time = 9 node on the right-hand-side of the tree. The node S} is defined by the following atoms: {NewTimeUnit = no, Shipped = no, Time = 9, OpStatus(Opl) = complete, OpStatus(Op2) = complete, OpStatus(Op3) = incomplete}. The defender action at the node is Wait and the challenger action that is currently being evaluated is Process(Op3, M3, 09). The challenger's completion time discriminant is shown graphically in Figure 4.22 below. 125 ElaspsedTime(0p3) FIGURE 4.22 The cumulative completion time distribution for operation Op3: There is a probability of 0.1 the that operation will be complete after executing the Process(Op3, M3, 09) action in a state in which ElapsedTime(Op3) = 1. In contrast, j/"ElapsedTime(Op3) = 3, the probability of completion is 0.7. The set of feasible actions in an abstract state consists of all actions for which the action precondi-tions are not contradicted by branches that define the state. For example, the policy node under con-sideration is at the end of a Time = 9 branch. Consequently, all actions of the form Process(Opj, Mj, T k ) in which T k ^  9 are infeasible. The SDP algorithm considers each action in the feasible set and selects the one with the largest single-leaf increase over the defender action. This action is designated as the best challenger. If it turns out that the best challenger does not have any leaves with higher values than the defender action, then no changes are made to the tree. The defender action at Sj, Wait (not shown), results in a single leaf with a value of -4 (this value corresponds to a holding cost penalty of -2 to move from Time = 9 to Time = 10 and a second holding cost penalty of -2 to move from Time = 10 to Time = long). The tree for the challenger action Process(Op3, M3, 09) is shown in Figure 4.23. Although two of the action's preconditions (Time = 9 and OpStatus(Op2) = complete) are already satisfied by the branches leading to Sj, the third precondition (Contract(M3, 09) = yes) must be added to the tree above the challenger's policy node. The policy tree must be exhaustive—that is, it must provide an action for every concrete 126 FIGURE 4.23 A. partial policy tree showing outcomes: The challenger action Process(Op2, M2, 09) is appended to the policy tree. The value of the ElapsedTime = 2 discriminant reflects a 0.2 probability of moving to the OpStatus(Op3) = complete state and a 0.8 probability of remaining in an OpStatus(Op3) = incomplete state. state—and therefore the addition of a new precondition branch means that a precondition complement (Contract(M3 09) = *) must also be added to the tree. A policy node containing the defender action (Wait) is associated with the precondition complement. Any discriminant value of an action that can effect the value of an outcome must be included in the tree. For example, the "processing" aspect of the Process(Opj, Mj TL.) family of actions is conditioned 127 on values of ElapsedTime(Opj). Since these discriminants influence outcomes (and therefore influence values), discriminant nodes must be added to the tree. Both aspects of the Process(Op3, M3, 09) action can lead to different outcome values and are therefore deemed relevant. The "temporal" aspect results in a single unconditional effect: NewTimeUnit = yes. The "processing" aspect is conditioned on values of ElapsedTime(Op3). The cross-product of the effects of the two aspects for ElapsedTime = 2 is shown in Figure 4.23 as two dotted arrows leading to different outcomes. The outcome {NewTimeUnit = yes, ElapsedTime(Op3) = 3} occurs with probability 1.0 x 0.8 = 0.8. The outcome {NewTimeUnit = yes, OpStatus(Op3) = complete} occurs with a probability of 1.0 x 0.2 = 0.2. The expected value of executing the Process(Op3, M3, 09) when ElapsedTime(Op3) = 2 is therefore 0.2 x $97 (operation completes) + 0.8 x -$4 (operation continues) = $16.2. For this particular node, the challenger's expected value is clearly better than the defender's expected value (-$4). Given that at least one of the challenger's leaf values is greater than the corresponding defender value, the Process(Op3, M3, 09) challenger tree is set aside as the best challenger and a new challenger tree rooted at Sj is created for the next action in the feasible set. If the new challenger tree contains a leaf value that is greater than the maximum leaf value in the best challenger, then the new challenger becomes the best challenger. Once all actions in the feasible set for a particular policy node have been evaluated the best challenger is permanently merged with the policy tree and the tree is evaluated. Tree evaluation is necessary since there may be value leaves in other parts of the policy tree that map to obsolete leaves in the defender tree (an obsolete leaf is one that has been replaced during the policy improvement stage). Iterations of improvement and evaluation continue until the defender policy is better than all challengers for every policy node in the tree. 4.2.2 .5 A S S E R T I O N S A N D C O N S T R A I N T S One shortcoming of the P - S T R I P S language is that its semantics are incomplete. It is possible, for example, to create subtrees (conjunctions of variable — value atoms) that are syntactically correct but nonsensical in the world being modeled. To illustrate, consider the following defender-challenger sce-nario: 128 A defender action at policy node Sj in Figure 4.24 is being challenged by action Process(M3, Op3, 01). In order to add the challenger policy node, the precondition OpStatus(Op2) = complete has been appended below Sj. The branch leading to the new policy node now contains the combination: {OpStatus(Opl) * complete, OpStatus(Op2) = complete}. However, precedence constraints in the manufacturing domain prevent such a situation from occurring in the physical world. \ • OpStatus(Opl) complete / \ incomplete complete/ \ incomplete Process(Op3, M3, 01) Q OpStatUS(Op2) FIGURE 4.24 An example of an invalid combination of state variable values: There is no way for operation Op2 to be complete if operation Op 1 is incomplete. The existence of nonsensical combinations of state variable values do not affect the validity of the policy generated by the SDP algorithm. Since the physical system never enters the nonsensical states, they are effectively ignored. However, the nonsensical states do create an unnecessary computational burden (in the form of unnecessary abstract states) and should be eliminated to the greatest extent possible. In this thesis, two constructs are introduced into the P-STRIPS language for this purpose: • assertions and constraints. A n assertion is a statement of valid combinations of state variable values that is applied to the core policy tree. In short, assertions coerce the core policy tree into representing only meaningful abstract states. To illustrate, recall the core policy tree shown in Figure 4.19 on page 122. Since the tree already contains various combinations of values from the OpStatus(Opj) family of state variables, it is possible to extend the branches of these variables with assertions before the policy improvement stage takes place. For example, the tree fragment in Figure 4.25 shows the addition of singleton OpStatus(Op2) = incomplete and OpStatus(Op3) = incomplete branches to the end of the OpStatus(Opl) = incomplete branch. The singleton branches bind OpStatus(Op2) and OpStatus(Op3) to 129 known values and ensure that no inconsistent precondition can be added to the branch during the improvement phase. Thus, the Process(M3, 0p3, 01) action that caused the problem in Algorithm 4.24 would never be considered by the SDP algorithm because the action is not feasible from asser-tion-extended state Sj in Figure 4.25. FIGURE 4.25 A tree-based representation of assertions: Assertions are added to the OpStatus(Opj) branches to restrict the growth of the policy tree to meaningful combinations of values. Although assertions can be used to eliminate a large number of nonsensical branches, they can only be applied to state variables that exist in the core policy tree. Constraints, in contrast, are more general rules of the form: IF variable = value [AND | OR] THEN variable [= | *] value [AND | OR] When a branch is being added to the tree during the policy improvement stage, it is first checked against the constraint base. If the antecedent of the constraint is satisfied, then the consequents are enforced by the SDP algorithm. Although no constraints are required for the simple agents addressed in this research, they could play an important role in more complex agent formulations. For example, in an injection molding environment, P-STRIPS constraints could be used to express the operational 130 constraint that a light color cannot follow a dark color on a machine unless the machine is first cleaned. A n example of a policy tree generated by the prototype planning system is shown in Figure 4.26. Since the trees used in this research typically contain many thousands of nodes, only a small part of the tree is shown. 4.2.3 R O L L I N G P L A N N I N G H O R I Z O N S The number of values of the Time atom affects the number of contract variables and therefore has an exponential impact on the size of both the concrete and abstract state spaces. Since parts may spend a long time in a manufacturing system, a means of accounting for long planning horizons that does not trigger exponential growth in the size of the problem is required. The approach proposed here is based on the concept of a rolling planning horizon: 1. The agent plans over a finite horizon from Time = 1... t. 2. A special value, Time = long, is used to represent the interval from the end of Time = t to infinity. 3. As the agent nears its planning horizon, it replans over the next t units of time. By using rolling horizons, a very large infinite horizon planning problem is replaced by a sequence of smaller finite horizon planning problems. A similar approach is used by [Barto, Bradtke and Singh, 1995] in their real-time dynamic programming algorithm. For the finite horizon plan to be accurate, the value realized by the agent when moving from Time = / to Time = long must be a good approximation to the infinite horizon case. At its simplest, the horizon estimate could be a single value. For example, if the long-run expected value to the agent of moving to a Time = long state was known to be V / , the value could be associated with the terminal state at the end of the Time = long branch. The value of this state would be propagated backwards in time by the SDP algorithm in the same manner as the terminal reward is propagated backwards in Figure 4.23 on page 127. A more realistic approach is to assume that the long-run expected value of the agent depends on var-ious features of the agent's state at Time = t. Accordingly, the horizon estimates used by the SDP 131 JiRTree View F i le T r e e 0|:ciicc'obs;-j | l : \ T H E S I S \ S y s t e m \ D a t a \ B e n c h m a r k \ P a r t 0 3 \ 0 1 - 0 6 \ T r e e v i e w . n r l ' 1(177-1): [Root] IB- 2(178-1): [0pStatus(0p1 )=Complete] B 3(478-1): [OpSSa(us(Op1)-1 a 1 0(179-1): [OpStatus(Op2)=lncomplete] • 11 (255-1): [OpStatus(Op3)- lncomplete] &• 12(256-1): [ElapsedTime(Op2)=0] EH3(43-1): [E lapsedTime(Op3) -0 ] B 16(16-2): [Shipped=No] li)..S4(11-4): [NewTimeUni t -Yes] P l n c r e m e n t C l o c k tk- 72(1 50-2): r N e w T i m e U n i t - l assertions l±l 73(41-1): [T ime-Long] P * E n d B-86(1 50-2): [ T i m e - T a 571 (525-2): [Time-6] B 565(1 50-2): [Time=1 i-1 391(522-2): [Time=5] B 1425(150-2): [f ime=-] S3 2708(519-2): [Time=4] B 2806(150-2): [ T i m e - T 1±1 5182(516-2): [Time-3] $ 5183(44-2): [Contract(M1. 3)-Yes] a 5181(519-2): [HoldingCost-0.000023] Etl 5185(522-2): [Contract(M1. 4 ) -Yes] B 5286(522-2): [Contract(M1.4)=T ri 5287(525-2): [Contract(M1. 5 ) -Yes] .B 5317(525-2): [Contract(M1. 5)-*] EB 5318(254-2): [Contract(M1. 6 ) -Yes] B 5330(254-6): [Contract(M1. 6)--] P *Process i l|] 5331 (520-7): [ElapsedTims(Op1)=3] outcome discriminants action preconditions |M1. Op1, 03) policy node B 141151(523-7): [Contract(M2.1)=Yes] B 111156(526-7): [Contract(M2. 5)=Yes] $-141158(17-7): [Contract(M2. 6 ) -Yes] a 111160(527-7): [ProductValue-100] £ - 1 411 61 (524-7): [Contract(M3. 6) -Yes] 141163(0-93): [Contiact(M3. 5)=Yes] (90.00) [1 3831701.00 141164(0-99): [Contract(M3. 5)=*] (84.00) [ 1 3 8 3 1 8 ® 1 . 0 0 ] n-141162(524-7): [Cont rac tus . 6)=*] value leaf 141165(0-99): [Contract(M3. 5)-Yes] (84.00) [138319® 141166(0-99): [Contract(M3. 5)-"\ (78.00) |t38320@1.0, B 141159(527-7): [Contract(M2. 6)-"1 i 141167(47-7): [Contract(M3. 6)=Yes] [i| 111168(17-7): [Contract(M3. S)-~] 111157(526-7): [Contract(M2. 5)=*] 4 111155(523-7): [Contract(M2.1)-*] ri) 5333(251-6): [ElapsedTime(Op1)="1 El 5868(44-7): [Contract(M1. 3)-T P"Wait B-5550(1 50-2): [Time-"] H 9994(513-2): [Time=2] B-9995(44-2): [Contract(M1. 2)=Yes] B 9996(516-2): [HoldingCost»0.000023] 00] destination node and probability FIGURE 4.26 A screen-shot showing a portion of a policy tree: The policy nodes are indicated by the prefix "P*" whereas the values of the leaf nodes are shown in parentheses. algorithm are implemented as reward trees rooted at end the Time = long branch. The branches of the reward tree can contain any state variables, although OpStatus(Opj) and ElapsedTime(Opj) are natural candidates. Generally, the larger the number of branches in the terminal reward tree rooted at the Time = long node, the better the approximation is to the infinite horizon case. However, as the size of 132 the reward tree approaches the size of the Time = t + 1 subtree of the infinite horizon planning prob-lem, the advantage of the rolling horizon approach is eliminated. 4.2.3.1 B U I L D I N G H O R I Z O N E S T I M A T E S There are two ways of building the terminal reward trees for the planning horizon: learning and back-ward induction. In the learning approach, historical information is used to identify the state variables that best discriminate between different horizon values. Although this is straightforward in practice, it assumes that large amounts of historical data are available and that outcomes in the past are represen-tative of outcomes in the future. The backward induction approach exploits the fact that states in the past are unreachable. In other words, there is no way for an agent to make a state transition from the / + l t h term to the £ t h term so more distant planning problems can always be solved independently of their antecedents. It is there-for possible to solve a planning problem for some interval in the future and use the resulting tree as horizon estimates for the current planning interval. To illustrate; consider the sequence shown in Figure 4.27. Once the Term t + 1 problem is solved, a fully evaluated subtree exists for all abstract states in which T ime t + j = 1. However, the start of Term t + 1 is also the end of Term t. Thus, the Time t + i = 1 values can be used as the Time t = t | o n g values. This process can be repeated for multiple terms. Term t-1 : : Sequence of solution FIGURE 4.27 The backward induction approach to generating horizon estimates: Since the start of the t+1th term can be used as the end of the t?h term, all times greater than tj in the t+1th problem can be ignored once the problem is solved. Although the basic backward induction approach described above provides the same outcomes as solving all three terms as a single large problem, it also retains much of the dimensionality of the sin-133 \ Contract(M3, 09) Contract(M3, 10) resource state physical state value = 89 value = 80 FIGURE 4.28 A partial subtree for Time = 1. the bottom. The resource the resource variables are sorted to gle large problem. For example, if a leaf value in a Time t + i = 1 state is dependent on the value of a resource state variable, (e.g., Contract(M3, 19)) then the variable needs to be carried back into the Term t state space. Given the large number of combinations of resource state variables, this approach is generally not feasible. 4.2 .3 .2 C O L L A P S I N G R E S O U R C E S U B T R E E S To avoid the problem of carrying a large number of state variables backwards from term to term, cer-tain parts of the T i m e t + 1 = 1 subtree can be replaced with their expected values before transforming it into the T ime t = t| 0 n g reward tree. To illustrate, consider the partial policy tree shown in Figure 4.28. In this tree, only the subtree for Time = 1 is considered. In addition, all the nodes for the resource variables are sorted to the bottom of the tree. The value realized by the agent depends on whether it owns contracts for processing on machine M3. For example, if the agent does not own Contract(M3, 09) but owns Contract(M3, 10), its expected value is $89. 134 The probability of owning a contract for a resource is a function of two things: the price of the resource and the agent's willingness to pay. Although it is impossible for an agent to know the final selling prices of resources a priori, the agent's willingness to pay is easily derived from the values in the leaf nodes of its policy tree. If the agent does not own Contract(M3, 10), it is willing to pay up to $89 - $80 = $9 to move from the state in which Contract(M3, 10) = no to the state in which Contract(M3, 10) = yes. If the agent purchases the contract, its expected utility is the value on the left side of the subtree ($89) less the price it pays for the contract. Clearly, if the price of the contract is greater than $9, it is not rational for the agent to participate in the transaction. In such a case, the agent remains in its original "resource state" and receives the value on the left side of the subtree ($80). Given this information, the expected value of being in the state marked Sj can be expressed as a function of the probability that the market price of the contract is less than the agent's valuation of the contract: E[V(5 7)] = (89 - E[Price(Contract(M3, 10)) | Price(Contract(M3,10)) < 9]) x /'/-^(PriceCContractCMS, 10) < 9) + (80 x (1 - /'ro/b(Price(Contract(M3, 10) < 9))) More generally, let s p A R E N T represent a node for resource R that is to be collapsed. Furthermore, let sR and represent the branches emanating from the resource node corresponding to owning the resource and not owning the resource respectively. Finally, let A be the difference in the expected val-ues of the state in which the agent own the resource and does not own the resource. In other words, A = — V(S-JI) is the agent's reservation price for the resource. The expected value of the par-ent node is simply the value the agent expects to receive if it buys the contract multiplied by the prob-ability is buys the contract plus the value the agent expects to receive if it does not buy the contract multiplied by the probability that it does not buy the contract: ( E l v ( s PARENT^ = (V{sR)-E[Price(R)\Price{R)<A\)xProb{Price((R))<A))) + (V(s_iR))x(l-Prob(Price(R)<A)) Eq: 4.12 A n important feature of the expression above is that the expected price of the contract is conditional on the maximum the agent is willing to pay: the expression E[Price(R)\Price(R) < A ] is the expected value of the price of the contract given that the price of the contract is less than A . Given 135 historical data on the selling prices of the resources, it is possible to estimate the probabilities in Equation 4.12. Starting from the bottom of the policy tree, each resource tree can be collapsed and ; replaced by a single value estimate until no resource nodes remain. The resulting'reward tree is approximate; however, it is greatly reduced in size relative to the combined physical tree and resource tree. 4.2.4 B I D D I N G S T R A T E G I E S F O R A G E N T S In the discussion of the action representation language in Section 4.1.8, no mention is made of Buy or Sell actions for agents even though the agents are expected to interact within a market. Although such capabilities are important for the agents considered in this research, it is possible to make a distinc-tion between the valuation and bidding phases of the agents' behavior [Mullen and Wellman, 1995], [Meuleau et al., 1998]. The planning and scheduling algorithms discussed to this point have been con-cerned with determining the agent's best policy given its cost and reward structure. The ownership of resources has been treated as a set of random variables over which the agent has no control. In this section, the policy trees generated by the SDP algorithm are used to determine the agents' willingness to pay for resources. In Section 4.3, the mechanisms by which the agents use their private valuations of resources to achieve an equilibrium outcome is described. 4.2.4.1 T H E R E S O U R C E S T A T E S P A C E When a policy tree is sorted so its resource nodes are at the bottom (given that the root node is defined as the top), a distinction can be made between the agent's physical state and its resource state. Physical state is determined by variables such as Time, OpStatus(Opj), ElapsedTime(Opj), and so on. In contrast, the agent's resource state is determined by the values of the variables corresponding to the ownership of resources—in this case, variables of the form Contract(Mj, Tj). The actions identified in the P - S T R I P S language and reasoned about using the SDP algorithm in the agent's valuation phase cause transitions in physical state. The Buy and Sell actions available to the agent in the bidding stage cause state transitions in the agent's resource state. To illustrate the relationship between the valuation and bidding phases, recall the partial policy tree shown in Figure 4.28 on page 134. After the SDP planning algorithm is applied to the policy tree, each leaf node in the tree contains the value that the agent expects to achieve by being in that particu-136 lar abstract state and executing its optimal policy. In this formulation, resources, such as the Contract(Mj, Tj) variables, are not affected by any actions in the agent's repertoire and are therefore treated as uncontrollable random variables. However, by sorting the resource variables to the bottom of the tree, it is a relatively straightforward to determine the agent's reservation price for certain resources or combinations of resources. For example, Figure 4.29 shows a policy tree from the benchmark problem that has been sorted so that the resource state space is distinct from the physical state space. In the example from the previous section, the agent is willing to pay any amount less than $9 to move from a state defined by Contract(M3, 10) = no to one defined by Contract(M3, 10) = yes. Conversely, the agent's asking price for the resource can be determined by moving in the other direction. To will-ingly move from a leaf with a value of $89 to one with a value of $80, the agent must be given com-pensation of at least $9. Given valuation information of this form, it is possible to determine a rational bidding policy for the agent. 4.2.4.2 D E T E R M I N I N G R E S E R V A T I O N P R I C E S To illustrate the issues surrounding a rational bidding policy for an agent, the Edgeworth box nota-tion introduced in Section 3.2.2 is revisited. For the two-agent, two-resource equilibrium analysis con-sidered here, however, a number of modifications must be made to the conventional Edgeworth formulation. First, good X i is designated as a numeraire good, M, such that X j = M e 5 R + . A numer-aire good is a unit of exchange (such as money) that has value for all agents. Indeed, given the condi-tions on risk neutrality and global utility set out in Section 4.1.6, all agents are assumed to derive the exact same utility from each unit of M. The second good in the Edgeworth box economy considered in this section is a discrete, indivisible resource (such as a contract for a unit of processing time on a particular machine). The domain of x2 is therefore restricted to x 2 6 {0, 1} . In other words, the agent either owns or does not own the resource. To provide specific numbers for constructing an Edgeworth box, the policy tree from Figure 4.28 on page 134 is used. For the other agents considered in this section, optimal policy trees of the same basic form are assumed to exist. Given that one agent owns the resource and the others do not, the 137 root of resource subtree the value of owning Contract(M3, 8) given the agent owns the bundle of resources shown in the antecedent branches is $68 - $62 = $6 3 631030(254-2): [ElapsedTime(Opl)-*] £1-646267(47-2): [ElapsedTime(Op1)=0] El 648442(510-2): [ProdudValue=100] i a 648444(522-2): [Contract(M1. 4)-Yes] i) 648445(525-2): [Contrart(M1.5)=Yes] i i) 648446(528-2): [Contract(M1. 6)=Yes] I j B 648447(531-2): [Contrad(M1. 7)-Yes] '< \ I a 648448(532-2): [Contrad(M1. 8)«Yes] 9 648449(529-2): [Contratf(M2, 8)=Yes] E 648450(513-7): [Contrad(M2. 7)=Yes] \ (±1-648451(526-7): [Contract(M1. 2)-Yes] i 648452(516-7): [Contrad(M2. 6)-Yes] B 648453(523-7): [Contract(M1. 3)=Yes] $-648454(533-7): [Contrad(M2.5)=Yes] •|--648455(0-99): [Contract(M3. 8)=Yes] (68.00) I L 648456(0-99): [Contrad(M3. 8)=T (62.00) L 648457(0-99): [Contrart(M2. 5)=*1 (62.00) 648458(0-99): [Contrad(M1. 3)-T (62.00) B 648459(516-7): [Contract(M2. 6)="| 4 648460(523-7): [Contract(M1. 3)=Yes] [ 648461(0-99): [Contract(M2. 5)=Yes] (62.00) 648462(0-95). [Contract(M2.5)-*\ (56.00) 648463(0-99): [Contract(M1. 3)=*] (56.00) B 648464(516-7): [Contract(M1. Z)-"\ i) 648465(526-7): [Conlract(M1. 3)=Yes] - [ -648466(0-99): [Contract(M2. 6)-Yes] (62.00) L 648467(0-99): [Contract(M2, 6)=T (56.00) L 648468(0-99): [Conlract(M1. 3)="| (56.00) 6-648469(513-2): [Contrad(M2. 7)="] il 648470(526-2): [Contrart(M1, 2)-Yes] B 648471 (516-7): [Contract(M2. 6)=Yes] 5 648472(523-7): [Contract(M1. 3)-Yes] 1-648473(0-99): [Contrad(M2. 5>Yes] (62.00) 1- 648474(0-99): [Contrad(M2. 5)-"] (56.00) 648475(0-99): [Contract(M1. 3)="] (56.00) .8-648476(516-2): [Contrad(M2. 6)-"] #648477(523-2): [Contred(M1.3)-Yes] (-648478(0-99): [Contrad(M2. 5)= Yes] (56.00) L648479(0-99): [Contract(M2. 5)»"] (50.00) 648480(0-99): [Contract(M1. 3)-T (50.00) 6-648481(516-2): [Contrad(M1. 2)-"] 4 648482(526-2): [Contract(M1. 3)=Yes] 648483(0-99): [Contract(M2. 6)=Yes] (56.00) 648484(0-99): [Contrad(M2. 6)=*] (50.00) 648485(0-99): [Contrad(M1, 3)=*] (50.00) 6-648486(529-2): [Contrad(M2. 8)-"] il-648487(513-7): [Contrad(M2. 7)=Yes] FIGURE 4.29 A policy tree sorted to enable extraction of price information: By sorting the resource nodes to the bottom of the tree, transitions in the resource state space resultingfrom buy and sell transactions can be isolated. jents are identified by their potential roles within a single transaction as pure seller (S) and pure uyer (B): Agent S currently owns the contract for machine M3 at Time = 10. Based on the information in its policy tree, Agent S is better off if it can sell Contract(M3,10) for any amount greater than $9. At a selling price of exactly $9, Agent S is indifferent between keeping the resource and selling it. 138 The axes for the selling agent are drawn so that they bound the indifference curve through agent's ini-tial endowment, (x)s. For reasons discussed in Section 4.2.4.3 below, it is both possible and conve-nient to set the length of the M axis of the to the marginal cost of good Xj, as shown for the selling agent in Figure 4.30(a). O f course, since x2 is an indivisible good, values of x2 within the interior of the Edgeworth box are infeasible. The true indifference "curves" for x2 reduce to pairs of points of the form ( M j , 0) and (0 , 1) . In order to facilitate exposition, however, the conventional represen-tation of continuous indifference curves is used. Mk $9 + (a) x 2 potential seller of x2 Mi $11-c»- B (b) X 2 potential buyer for x2 FIGURE 4.30 Indifference curves and initial endowments for the seller (a) and the buyer (b): The length of the M axes is assumed to be the indifference price (reservation price) of the resource good for each agent. Agent B does not currently own Contract(M3, 10). However, according to its optimal policy tree, Agent B is willing to pay up to $11 to acquire the resource. As such, Agent B is indifferent between the status quo co 5 = ( 1 1 , 0 ) and the allocation that results if Agent B buys Contract(M3, 10) for $11: xB- = (0 , 1) . The indifference curve for Agent B is shown in Figure 4.30(b). The indifference curves for the two agents can be arranged to form an Edgeworth box, as shown in Figure 4.31(a). The box provide a convenient means of evaluating the rationality of transferring the resource good, x2, between the two agents in exchange for the numeraire good, M. A n exchange is considered rational if the outcome is at least as good as the initial endowment for both agents and 139 strictly better for one of the agents. Since this is identical to the definition of Pareto efficiency, any Pareto efficient exchange is considered rational for all participants, and vice-versa. To construct an Edgeworth box for a discrete good, the indifference curves for the two agents must intersect at the initial endowment CO. Since the resource good is indivisible, all feasible allocations must lie on either vertical edge of the box. In Figure 4.31, the range of values along the x 2 — 0 axis between and >g is known as the contract curve because any voluntary exchange that takes place between the two agents must do so along the line. xl A contract curve I budget line at seller's indifference price (a) FIGURE 4.31 An Edgeworth box representation of different exchanges: The figure in (a) shows the contract curve and budget line for the two agents. In (b) an allocation x' that is weakly preferred by the seller but strongly preferred by the buyer is shown. In (c), the indifference prices of the agents are reversed. The allocation x' is weakly preferred by Agent S but is inferior to CO from Agent B's perspective. A unique equilibrium allocation, x , should occur at the intersection of the contract curve and the Walrasian budget line. Recall from Section 3.2.2.3 that the slope of the budget line is determined by the equilibrium price of the two goods. The problem in the economy considered here is that the resource goods do not satisfy the assumptions on which the theory of competitive equilibrium is based. Specifically, the theory assumes that the goods are infinitely divisible and that the agents are price takers. However, a contract for a particular unit of time on a particular machine is both unique 12 In this economy, good M is a numeraire good. The price of a numeraire good is 1 by definition and thus the slope of the budget line is determined by the price of %2 in terms ofM. 140 and indivisible. As such, the assumption of a large number of sellers in not satisfied and competitive • forces cannot be relied on to produce an equilibrium price. To get around the problem of a non-competitive market and ensure the existence of a unique.equilib-rium allocation, the following price convention is introduced: sellers always sell at their indifference price. Note that for reasons discussed in Section A.l A A below, the price convention could specify any price along the contract curve. For example, the net surplus created by the exchange could be split equally between the two agents. However, the seller's indifference price has the virtue of being easy to calcu-late whereas a "fair" price that splits the surplus evenly would require additional information such as the reservation price of the buyer. Given the price convention, a unique budget line can be drawn through the points at which the seller's initial indifference curve intersects the two axes, as shown in Figure 4.31(a). The implications of the price convention on the equilibrium allocations of resources can be illustrated with the follow-ing scenarios: In Figure 4.31(b), an allocation x' is shown that lies on the indifference curve for the selling agent. Although the budget line is omitted for clarity, the point x' corresponds to the indifference price for the seller ($9). The buyer's indifference curve through x' is to the left of its initial indifference curve and thus'the allocation x' is a Pareto improvement over the initial endowment co. It is therefore jointly rational for the agents to exchange Contract(M3, 10) for the seller's asking price of $9. In this exchange, the utility of Agent B is increased by the distance between its indifference curves along the xi axis: $11 - $9 = $2. Thus, whenever the buyer is willing to pay more than the seller's indifference price, a Pareto efficient transaction will occur in which the buyer captures all the surplus created by the transaction. The price convention used here ensures this outcome. In Figure 4.31(c) the indifference prices for the agents are reversed. According to the price convention, Agent S is willing to sell trie-resource for $11 and Agent B is willing to buy the resource for any amount less than $9. As before, an allocation x' is placed on the seller's indifference curve where it intersects the x2 = 0 axis. However, as the diagram shows, the indifference curve for Agent B through allocation x' is inferior to its initial endowment. As such, it is weakly rational for Agent S to sell but it is not rational for Agent 141 B to. buy at the ask price. In such a case, no transaction occurs and the allocation remains at co. The Edgeworth box analysis in this section allows a number of observations to be made about the bidding and selling behavior of agents: • ' •1. The indifference curves through the initial endowments of resources are provided for each agent by their respective optimal policy trees. Each agent knows precisely how much it is will-ing to pay (in terms of the numeraire good) for resources that it does not own and how much it is willing to accept in exchange for resources that it does own. In other words, the policy tree provides a preference ordering over resources (and bundles of resources) that is both transitive and complete. 2. Since the indifference prices are known by the agents with certainty, they correspond to the private values case described in Section 3.2.3.2. This has implications for the choice of auction form, as discussed in Section 4.3. Specifically, the English, second-price sealed-bid, and con-tinuous double auction are functionally equivalent given the characteristics of the bidding agents. 3. The price convention ensures that a stable equilibrium allocation exists for each resource good. As Figure 4.31(b) and (c) illustrate, once a good has been exchanged between two agents, the same good cannot be exchanged again without a change in indifference prices. In other words, all other things being unchanged, the prices in the market are guaranteed to rise monotonically. O f course, changes in an agent's physical state typically result in different valu-ations and a new equilibrium. The point of the Edgeworth box analysis is to describe how the equilibrium will attain in a given physical state. The importance of the distinction between the valuation and bidding phases is that the information generated by the agent during its valuation phase can be combined with a price convention to create stable, Pareto efficient bidding and selling policies. Thus, the operation of the market depends criti-cally on the agents' ability to determine their private values for resources. 1 3 If there were no changes in the underlying physical state of the world, the New York Stock Exchange would open, achieve equilibrium, and close its doors for good. 142 4.2.4.3 T H E R O L E OF T H E N U M E R A I R E G O O D The Edgeworth box representation helps to illuminate the role of the numeraire good in an agent's bidding and selling policy. Note that the agent's absolute endowment of M never appears in the analy-sis. That is, the rationality of a particular transaction is assumed to be independent of the total wealth of either agent. Instead, rationality is determined at the margin: the cost of the transaction must be no higher than the benefit in order for the transaction to be considered rational. This approach is similar to the "marginal cost base contracting" approach described in [Sandholm, 1993]. Under these circumstances, there is nothing to be gained by giving each agent a constraining "bud-get" of good M since the optimization problem the agent solves include all the costs and rewards the agent faces.14 To illustrate, consider the case of an agent representing a part with a large terminal reward and a tight deadline. Since the agent seeks to maximize its expected value net of all costs and rewards, it is clear that the agent should be willing to bid high (relative to jobs with lower terminal rewards) in order to complete processing before its deadline."There is no need to allocate a budget of M to the agent to reflect its priority over other agents since the priority structure is defined already by the agent's problem formulation. If an artificially-imposed budget constraint prevents an agent from bidding for a different allocation of goods X'2 when (|) .(X'-) >- tyj(Xj), then the opportunity for a Pareto improvement is lost. By eliminating of the issue of absolute wealth from the bidding phase, a number of important sources of complexity and possible market failure are also eliminated. First, an agent cannot use the revenue realized by the sale of one good to subsidize a Pareto inefficient exchange at some point in the future. The important message in Figure 4.31 (c) is that global wealth is destroyed by an inefficient transac-tion, regardless of how the transaction is funded. Similarly, agents are unable to speculate because their private values are determined during a planning process that is free of information about other agents. To illustrate, consider the following scenario: T h e u s e o f a finite b u t s u f f i c i e n t l y l a r g e a l l o c a t i o n o f M t o e a c h a g e n t i s e q u i v a l e n t t o t h e p r o v i s i o n o f f r e e a n d u n l i m i t e d c r e d i t . F o r e x a m p l e , i f t h e a g e n t h a s a n o p p o r t u n i t y t o i n c r e a s e i t s u t i l i t y b y s o m e a m o u n t a b y p u r c h a s i n g a c o n t r a c t f o r a r e s o u r c e , i t c a n " b o r r o w " a n y a m o u n t l e s s t h a n (X w i t h o u t i n c u r r i n g i n t e r e s t o r t r a n s a c t i o n c o s t s . S i n c e t h e n e t e f f e c t o f s u c h a b o r r o w - a n d - b u y t r a n s a c t i o n i s a n i n c r e a s e i n t h e a g e n t ' s n e t u t i l i t y , t h e t r a n s a c t i o n i s r a t i o n a l . 143 Agent I's process plan indicates that Machine M4 is unsuitable for any of the operations on the part that the agent represents. As such, the value of a unit of processing of time on M4 is worth exactly zero to Agent 1. However, if Agent 1 notices in passing that contracts for processing on M4 historicallytrade for $20 and that the current price is $5, then the agent might be tempted to buy at $5 in the hope of reselling the contract later for a speculator's profit. However, the agents in this system are not capable of this type of broader "rational" behavior. Agent I's reservation price for a contract on M4 is $0 and, according to the operationalization of rationality used here, the agent is unwilling to pay more than this amount. 4.2.4.4 P R I C E D I S C O V E R Y V E R S U S T H E P R I C E C O N V E N T I O N The overall objective of this particular market-based system is to solve a global-level resource alloca-tion problem. According to the formulation in Section 4.1.6 for cooperative problem solving envi-ronments, the objective function to be maximized is simply the sum of agent utilities and the actual final utilities achieved by the agents do not matter. Thus, as long as Pareto efficient exchanges take place and Pareto inefficient exchanges do not take place, the exact manner in which the total eco-nomic surplus is divided between the buyer and seller is irrelevant. To illustrate, recall the exchange in Figure 4.31(b). The transaction price is arbitrarily set to the seller's reservation price ($9) and therefore the surplus realized by Agent S as a result of the exchange is exactly zero. The surplus realized by Agent B is its reservation price less the transaction price: $11 -$9 = $2. More importantly, the increase in global utility resulting from the exchange is $0 + $2 = $2. Note, however, that the exchange of good M is zero-sum and thus the increase in global utility is independent of the transaction price. The same increase in global utility could have been realized with any transaction price on the contract curve (i.e., between $9 and $11). It is on this basis that the use of the price convention is justified. Although it is possible to devise a more elaborate price discovery mechanism that determines a "fair" transaction price, there is little benefit to be gained from the additional computational cost and complexity. 4.3 AGGREGATION In this section, the auction mechanism used to implement the market is described in detail. As dis-cussed previously, the bidding policies of the agents in this system are greatly simplified (relative to 144 real markets) by the existence of private values,.risk neutrality, and the ultimate irrelevance of the divi-sion of surplus between the buying and selling agents. However, the nature of the goods them-selves—contracts for processing time on production machinery—introduces additional complexity in the form of dependence between resources. In the following sections, these sources of complexity are described in detail and an auction mechanism that addresses the problem is proposed. 4.3.1 S E T T I N G T H E S T A G E : A N E X A M P L E P R O B L E M To illustrate the essential characteristics of the auction mechanism used in the thesis, a simplified ver-sion of the benchmark problem introduced in Section 2.5 is used. Instead of each of the three jobs requiring three operations on three different machines, each of the three jobs in the simplified version requires a single operation on a single machine. Thus, the problem is an instance of the 1 | | "EiWjCj class of scheduling problems. The duration of the lone operation, Cj, is known with certainty for each job j}5 Once the job has been processed by the machine M l , it leaves the system and receives an exogenously-determined ter-minal reward, rj. For each unit of time that the part is in the system, it incurs a per-unit-time holding cost of Wj. The specific values of these parameters for the example problem are summarized in Table 4.3. TABLE 4.3: Details of the example scheduling problem Job Processing time (Cj) Terminal reward (rj) Holding C O S t (Wj) J o b 1 3 $100 $1 J o b 2 5 $100 $1 J o b 3 2 $100 $1 Problems of the 1 | | Z^wyCj class are solvable in polynomial time using the weighted shortest processing time (WSPT) rule. O f course, when n = 3 jobs, it is a simple matter to find the optimal schedule by inspection, as shown in Figure 4.32. In this case, the equilibrium allocation of resources in the market should yield the same global outcome as the WSPT rule: $95 + $90 + $98 = $283. 1 5 The terms "job", "part" and "agent" are used interchangeably in this section since a job corresponds to the processing of a part and an agent is created to represent the job in the planning system. 145 J3 J3 J1 J1 J1 J2 J2 J2 J2 J2 • 1 2 3 4 5 6 7 8 9 10 11 12 Job Time in Revenue(net of system holding costs) Job 1 5 $95 Job 2 10 $90 Job 3 2 $98 Global expected value $283 FIGURE 4.32 The optimal solution to the 3-job, 1-machine example problem. 4 . 3 . 2 C H O I C E O F A U C T I O N F O R M The type of market in which agents in this system participate is a continuous reverse auction. In a reverse auction, the buyer submits a request to the auctioneer for a good. The auctioneer then col-lects ask prices from sellers and matches the buyer with'the low-cost seller. Thus, apart from the fact that auction participants are competing on the basis of lowest cost instead of highest price, the essen-tials of the auction setting are identical to those discussed in Section 3.2.3. In a continuous auction, there is no batching of bids or market call. The NASDAQ market is an example of a real-world market structured as a continuous double auction [Schwartz, 1988]. Unlike NASDAQ however, agents partici-pating in the market described here have known private values. As such, each agent has no uncer-tainty about the maximum it is willing to pay for a particular resource. In addition, an agent in this system will experience no regret if it buys the resource for any amount up to and including its reserva-tion price. There are two important implications of the private values case. First, since there is no risk of win-ner's curse, the dominant strategy for all bidders is to reveal their true reservation prices to the auc-tioneer [Milgrom, 1989]. The revelation of true reservation prices greatly simplifies the task of matching bidders and sellers.16 The second implication is that private values (and risk neutrality) lead 146 to equivalence between various forms of auctions. The advantage of a continuous auction in the con-text of manufacturing is that it permits jobs to join the auction at any time. If a call auction were used, the arrival of a new job following the closure of the auction would necessitate the start of a new auc-tion. The disadvantage of the continuous auction is that each good might be bought and sold multiple times before the most efficient allocation is attained. However, as discussed in the following sections, auctions for resource allocation can place a massive computational burden on the auctioneer. Conse-quently, any means of distributing the burden across multiple auctioneers (e.g., market makers, indi-viduals) is advantageous. 4.3.3 C O M P L E M E N T A R I E S A N D SUBSTITUTES One condition for attainment of a Pareto optimal equilibrium that has not been addressed to this point is the condition of market completeness. A market is complete if every good in the market is traded at publicly quoted prices [Mas-Colell, Whinston and Green, 1995]. A market that is incomplete . will fail to achieve equilibrium and render the predictions of the first fundamental theorem of welfare economics invalid. In the context of scheduling, incompleteness is a pervasive source of market failure [Walsh et al., 1998]. The failure occurs because the dependencies that may exist between goods are not explicitly reflected in market prices. To illustrate the problem, consider the two forms of dependence below: substitutes and complements. • Substitutes — Agent 1 requires a single unit of processing time (for notational convenience, assume that resources of the form Contract(Mj, Tj) can be denoted Tj). The amounts the agent Since the auction participants in this situation are artificial and can therefore be programmed to reveal their true reservation prices regardless, the existence of a dominant strategy is interesting but unnecessary for implementation. 147 is willing to bid for the units of processing time—both individually and collectively—are shown below: Resource Maximum bid price T1 $10 T2 $10 [Tl, T2] $10 Since Agent 1 only requires one unit of time, and is indifferent whether it consumes T l or T2, the two resources are substitutes. Therefore, the price the agent is willing to pay for one good (say T2) depends on whether the agent already owns a substitute good (Tl). In this example, the agent would bid zero for T2 if it already owned T l (and vice-versa). Complements — Agent 1 requires two units of processing time and has a binding deadline dj = 2. If the part ships before the deadline, the agent receives a terminal reward rj — 10: fbO viCj<dj 3 [ 0 otherwise The amounts the agent is willing to bid for units of processing time— both individually and collectively-—are shown below: Resource Maximum bid price T1 $0 T2 , $0 [Tl, T2] $10 Again, the value of one good is dependent on whether the agent owns the other. Resources T l and T2 are said to be complements in this context since the agent is willing to pay more for both that it is for either one individually. The dependence relationships between goods can also be expressed using a compact tree notation identical to that used throughout Section 4.2. The amount the agent is willing to pay (or must receive 148 Value = 10 Value =10 Value = 0 (a) substitutes Value = 0 Value = 10 Value = 0 (b) complements Legend Resource owns \ does not own contract / contract FIGURE 4.33 Free-based representations of dependence between goods: In (a), the items are perfect substitutes. In (b), the items are perfect complements. as compensation) to move from one allocation to another is the difference between the target leaf node and the starting leaf node. For example, in the substitute case shown in Figure 4.33(a), the agent is willing to pay $10 - $0 = $10 to move from the allocation R = {} to either R' = {Tl} or R' = {T2}. In a market in which the dependence relationships between goods is not priced, the market is incom-plete and may fail to attain an equilibrium. To illustrate, consider the simple market scenario used by McAfee and McMillan [1996] to illustrate market failure: Agent 1 requires two units of processing time and has a binding deadline of two time units in order to receive its terminal reward of $3. If the job is completed after the deadline, Agent 1 receives no terminal reward. In such a situation, Agent 1 regards T l and T2 as complements—one is worthless without the other. Agent 2, in contrast, faces the same deadline but only requires a single unit of processing time. In the absence of a holding cost, Agent 2 regards T l and T2 as perfect substitutes. The details of the scenario are summarized in Table 4.4. TABLE 4.4: A problem for which no equilibrium price exists. Agent Processing time (Cj) Deadline (dj) Terminal reward (rj) Agent 1 2 2 $3 Agent 2 1 2 $2 149 Since Agent 2's terminal reward is only $2, the optimal global outcome is to assign both units of pro cessing time to Agent 1. In a market, however, processing time is not "assigned" and thus the agents themselves must converge on the optimal outcome via their bidding behavior. If markets are made for T l and T2 as distinct goods, the following problem occurs: Assume Agent 1 and Agent 2 bid competitively for T l . Agent 2 is willing to pay up to $2 and thus Agent 1 will have to bid $2 to secure the resource. When the agents bid for T2, . the situation is the same except that Agent 1 has already paid $2 for T l . Given that its . terminal reward for being processed is $3, it is only willing to bid up to $1 to secure the second unit of processing time. However, Agent 2 is willing to bid up to $2 and therefore wins the auction for T2. To avoid this problem, Agent 1 would have to be able to bid on T l and T2 joindy in an all-or-nothing deal. The issues surrounding this combinational approach are described in the following section. 4.3.4 T H E C H A L L E N G E S O F C O M B I N A T I O N A L A U C T I O N S A combinational auction is an auction in which bidders may choose to bid on aggregations or bun-dles of goods. As Rothkopf, Pekec and Harstad [1998] point out, there has been relatively litde schol-arly work in this area of combinational auctions due primarily to the belief that the computational disadvantages of such auctions outweigh their benefits. However, the recent auctions for radio spec-trum in the United States and other countries have increased interest in the use of combinational auc-tions in situations in which there is significant complementarity or substitutability between goods being offered for sale. Interestingly, the F C C declined to permit combinational bids in its spectrum auctions because of concerns about the computational feasibility of considering every possible aggre-gation of licenses [McAfee and McMillan, 1996], [Rothkopf, Pekec and Harstad, 1998]. Given that there is a maximum of 2"-l unique aggregations that must be quoted on the market in a combinational auction of n goods, it is clear that feasibility of combinational designs is a legitimate concern. Indeed, the problem of winner determination for combinational auctions is known to be NP-complete and it has been shown that no polynomial time algorithm can be constructed for 1 7 Some authors use the term combinational auction (e.g., [Rothkopf, Pekec and Harstad, 1998], [McAfee and McMillan, 1996]) whereas others use the equivalent term combinatorial auction (e.g., [Sandholm and Suri, 2000], [Rassenti, Smith and Bulfin, 1982]). 150 achieving an allocation that is guaranteed to be at least as good as a specified lower bound [Sandholm, 1999]. In the case of the FCC spectrum auctions, a system of parallel, multi-round auctions were used in which bidders could observe whether they were likely to win a particular good and adjust their bids for other goods accordingly [McAfee and McMillan, 1996]. However, such auctions require bidders to anticipate the behavior of other agents and may lead to allocations that are economically ineffi-cient. In addition to the spectre of coping with an unmanageably large number of aggregations, continuous combinational auctions create several complexity issues that do not occur in single-round single-item auctions. In the following sections, each of these complexity issues is described in detail and, where appropriate, general strategies for addressing the issues are suggested. 4.3.4.1 C O N S O L I D A T I O N OF S U P P L Y Consolidation of supply must occur whenever an agent makes a bid for an aggregation that is not currendy quoted on the market. For example, if Agent 1 bids for [Tl, T2] but the current allocation has T l owned by Agent 2 and T2 owned by Agent 3 then the market must consolidate ask prices from the two potential sellers in order to provide the bidding agent with a single ask price. Although con-solidation of supply creates additional overhead for the auctioneer, the task itself consists of straight-forward search and summation operations. 4.3.4.2 C O N S O L I D A T I O N OF D E M A N D Consider the partial resource tree for Agent 3 shown in Figure 4.34 and assume that the agent owns T l and T2. Since Agent 3 faces a hard deadline, it treats [Tl, T2] as pure complements (i.e., its ask price for T2 is identical to that of [Tl, T2]). Although the tree-based representation of resource own-ership generated by the SDP algorithm provides a parsimonious means of expressing interdepen-dence relationships (complements and substitutes) between resources goods, the trees also create a subtie completeness anomaly in the agents' preference orderings for resources. Specifically, the tree does not reveal the agent's ask price for T l in isolation. To illustrate the difficulties created by the anomaly, assume two agents, Agent 1 and Agent 2, can bid for the resources owned by Agent 3 at the prices shown in Figure 4.35. Although the optimal alloca-151 tion of resources is clearly R - ^ l T l } , R2={T2}, and R3={}, neither bidder can unilaterally afford the $98 to initiate the transaction. This situation is known by different names in the combinational auc-. tion literature: Branco [1997] refers to it as the "free rider" problem; Rothkopf, Pekec and Harstad, [1998] refer to it as the "threshold problem". Regardless of nomenclature, the issue is essentially the inverse of supply consolidation—hence the introduction of another name: the demand consolidation problem. The demand consolidation problem arises whenever a Pareto efficient exchange requires an existing aggregation of goods with a known price to be split into individual goods or sub-aggrega-tions for which prices are riot known. Value = 98 Value = 0 FIGURE 4.34 The resource tree for Agent 3 depicts an all-or-nothing situation: The two resources T l and Tl are complementary and a deadline eliminates all substitutes. Ask prices Resource Owner Price T2 Agent 3 $98 [Tl, T2] Agent 3 $98 Bid prices Resource Bidder Price T1 Agent 1 $70 T2 Agent 2 $70 FIGURE 4.35 A pricing scenario in which a Pareto efficient exchange requires consolidation of demand. One means of addressing the demand consolidation problem is to introduce the concept of residual supply. Consider the bids put forward by the potential buying agents and the sequence in which they may be considered by Agent 3. If Agent 2 makes a bid for T2 in isolation, no transaction can occur because Agent 2's maximum bid of $70 is well below Agent 3's ask price of $98. However, if Agent 1 152 makes a bid for T l , the situation is different. Since Agent 3 does not have an ask price for T l in isola-tion, it must provide the ask price for the smallest aggregation of resources that contains T l (in this case, [Tl, T2]). Since Agent 1 has not bid for T2, the "residual" resource can be sold to another agent (Agent 2 for example) and the revenue resulting from the sale can be factored into the overall transac-tion. Consolidating supply in this way enables surplus-creating «-way transactions between agents. The question of exactly how much each of the buying agents contributes to the $98 selling price of the bundle is an open question. However, in a cooperative problem solving environment, a simple scheme in which Agent 1 pays its reservation price ($70) for T l and Agent 2 pays any amount greater than or equal to $28 for T2 is sufficient for a Pareto efficient outcome. 4.3.4.3 C O M P O U N D B I D S A n agent's objective in bidding is to move to a leaf in the resource portion of its policy tree with a higher value. However, in a continuous or multi-round auction, each auction participant normally has an existing allocation of resources when it submits bid prices to the auctioneer. The agent's ability to pay for one set of resources is therefore contingent on its ability to recoup the investment it has already made in substitute resources. To illustrate, consider the sub-optimal schedule faced by Agent 3 in Figure 4.36. Based on the SPT sequencing rule, it is clear that global utility is maximized when the shorter job (J3) is processed first. Figure 4.37 shows that the existing allocation of resources to Agent 3 is R 3 = {T4, T5}and the expected utility associated with the allocation is $95. By moving to the optimal allocation, R'3 = {Tl, 72}, Agent 3 would avoid three units of holding cost and thereby realize a value of $98. As a consequence, Agent 3 is willing to bid any amount up to $3 to move from R 3 to R'3. However, since [Tl, T2] and [T4, T5] are substitutes, the agent could dispose of [T4, T5] once in its new state with no corresponding change in utility. As a consequence, $3 understates Agent 3's willingness to pay for [T1,T2]. As shown by the arrows in Figure 4.37, Agent 3's transition from the node marked R 3 to the node R'3 can be viewed as a compound bid consisting of a bid and a linked ask. A linked ask permits the agent to divest itself of resources made redundant by the bid half of the transaction. The ask is "linked" to the bid in the sense that both the buying and the selling parts of the transaction are required to 153 JI J 1 JI *J3 1 "J3: 1 2 3 4 5 6 7 8 9 1 0 Job Time in system Holding COSt (Wj) Revenue (net of holding costs) Job 1 3 $1 $97 Job 3 5 $1 $95 FIGURE 4.36 A sub-optimal2-job, 1-machine schedule. FIGURE 4.37 A tree-base representation of a compound transaction: A linked ask and a bid are required to move from state R3 to state R'3 via state R'3. The net surplus generated by the transaction is AR^ _^  R ^ = $3. achieve the agent's intended outcome and therefore the feasibility both parts are evaluated jointiy. Returning to the example, the question facing Agent 3 is how much is it willing to pay to move to node R'3? Its reservation price for the bundle [Tl, T2], denoted PriceRes([Tl, Tl]), is the gain in sur-plus gained by the agent in moving to the new allocation—in this case, 154 A R j _^  R> = — ^(-SRJ) — p l u s a n y revenue realized by selling the less desirable substitute bundle [T4, T5] on the open market: PriceRes([T\, T2]) = A R j R j , + Pricemt([T4, T5]) Eq: 4.13 In the limiting case in which the agent receives its asking price for its linked ask resources (i.e., PriceMktQJ^, T5]) = Pn'ce^jyr.([T4, T5] = $95), Agent 3's willingness to pay for the bundle [Tl, T2] attains its maximum value: $3 + $95 = $98. Thus, by recognizing the issue of existing allocations and substitutability, Agent 3's willingness to pay increases dramatically. This increase is important since a Pareto efficient transaction can occur at a bid of $98 (Agent I's ask price for [Tl, T2] is $97) whereas the same transaction cannot occur i f Agent 3 is only willing to bid $3. In short, ignoring the possibil-ity of agents divesting their redundant resources can create a source of market failure. The compound bid construct is similar to Sandholm's [1998] S-contract (swap contract). In an S-con-tract, an agent is permitted to exchange a single good with another agent. Compound bids are more general since they permit the simultaneous exchange of multiple goods (including the numeraire good 1 Pi M) and may involve more than two agents. However, Sandholm shows that some form of swap mechanism—even in their simplest form as S-contracts—are necessary for attaining the optimal allo-cation. 4.3.4.4 N E X T - B E S T STATES The "next-best" state issue is essentially the seller's version of the compound bid issue described above. Specifically, the price at which an agent is willing to sell a resource is dependent on two things: the agent's ability to buy substitute resources and the difference in utility between the original state and the state in which the agent owns the substitute resources. Assume, for example, that Agent 3 owns T l and T2 and is therefore in the position to sell the resources. Based on the information in Figure 4.37 on page 154, the quoted ask price for the bundle [Tl, T2] is $98. In other words, Agent 3 is willing to relinquish ownership of [Tl, T2] for any amount greater than $98 and is indifferent when the bid price is exactly $98. However, the quoted ask price 1 8 Since linked ask resources can be purchased by one or more agents (through consolidation of demand), the compound bid construct also subsumes Sandholm's [1998] M-contract (multiagent contract) type. 155 ignores the possibility of Agent 3 selling one set of resources and subsequendy being able to buy a-close substitute. If a close substitute is available, the actual ask price for the original resources is -merely the seller's net change in utility caused by the compound transaction plus the cost of the resources to get to the next-best state. To illustrate, assume that Agent 3 sells T l and T2 and moves to R'3 = {}. In addition, assume that it can move to R"3 = {T4, T5} for $10. The change in utility associated with the compound transaction is $95 - $98 = -$3. If the cost of the substitute resources is added to this loss, then it is clear that it is rational for Agent 3 to accept any amount greater than $13 in exchange for [Tl, T2] given that is has the opportunity to buy [T4, T5] for $10. If Agent 3's next-best state is not taken into account, then exchanges based on bids between $98 and $13 would not occur even though such exchanges would be Pareto efficient. Despite the fact that a seller choosing a next-best state is similar to a buyer choosing a compound bid, the seller's case is considerably more complex than the buyer's case considered in Section 4.3.4.3. In the buyer's case, the ultimate target state of the compound bid is known. As such, there is a unique pair of buy and sell transactions that moves the agent from its current state to the target state. In the seller's case, however, the target state is unspecified and many candidate next-best states may have to be considered. The heuristics used to address this problem, and the rationale behind the heuristics, are presented in Section 4.3.5.2 4.3.4.5 D E A D L O C K Deadlock occurs when the market for a particular good is thin and external demand cannot be relied on to complete compound transactions. In such cases, agents within the transaction must play dual roles as buyers and sellers simultaneously. Returning to the allocations shown in Figure 4.36 on page 154, recall that the net surplus available to Agent 3 in moving from R3 = {T4, T5} to R'3 = {Tl, T2} is only $3. In addition, recall that if Agent 3 can obtain its asking price for its existing allocation, {T4, T5}, then a Pareto efficient exchange can occur. In the example in Section 4.3.4.3, an unspecified agent in the market was willing to pay Agent 3's reservation price for its linked ask. However, in the two-agent case considered here, no external demand exists for T4 or T5. Although it is true that Agent 1 will be in a position to bid on [T4, T5] once it has sold [Tl, T2], this demand does not arise 156 until after the transaction is complete. In its current resource state, Rj = {Tl, T2}, Agent 1 is willing to pay exactly zero for [T4, T5]. As a consequence, the only way the transaction can occur is if one of the participant bids 'as, if the transaction has already occurred. The inability of agents to look forward to compete a Pareto efficient exchange is defined here as deadlock. Interestingly, the same type of difficulty arises in real markets. For example, it is rare that an existing homeowner can buy a new home without knowing for certain the selling price of her existing home. Because of this, a tradition of provisional contracts (e.g., "conditional offers") has evolved. The con-ditional offer concept can be implemented in artificial markets in the form of a two-phase protocol: In the first phase, each participant in the exchange assumes that the provisional transaction has occurred and responds to requests for price information accordingly. In the second phase, the initia-tor of the transaction evaluates the transaction's overall desirability and decides whether to commit or roll back all parts of the transaction. In the situation in Figure 4.36, the bid for [Tl, T2] by Agent 3 causes Agent 1 to make a provisional transition from Rj = {Tl, T2, T3} to R'j = {T3}. The provisional transition permits Agent 1 to rec-ognize its demand for Agent 3's linked ask resource. Assuming that Agent 1 pays Agent 3 some amount P in exchange for [T4, T5], Agent 1 makes a second provisional transition to the state IV\ — {T3, T4, T5}. Given the cost of moving to this next-best state, Agent I's asking price for [Tl, T2] becomes A R | _^  R » = $97 - $95 = $2 plus the cost of the substitute resources P. On the other side of the transaction, the amount Agent 3 is willing to pay for [Tl, T2] is A R ^ _^ R / = =$98 - $95 = $3 plus the revenue it receives from the sales of its linked ask, P. The overall transaction is desirable if the following inequality is satisfied: V - R 3 ' + i > > A R , - » - R , - + P E c l : 4 - 1 4 Since P appears on both sides of the inequality, its actual value is irrelevant and the feasibility of the two-phase transaction reduces to the marginal change in utility for each agent caused by the transac-tion. In this example, Agent 1 is worst off by $2 after the transaction but Agent 3 is better off by $3 so the transaction should occur regardless of the value of P. In this system, the transaction price for 157 the linked ask is assumed to be the selling agent's ask price; however, this is merely a convention adopted for consistency with the price convention introduced in Section 4.2.4. 4.3.4.6 STABILITY OF E Q U I L I B R I A As its name implies, equilibrium does not change without some form of exogenous disturbance such as the entrance of a new market participant, the arrival of new information, or a change in the under-lying physical state of the system. In manufacturing environments, such disturbances are common and may cause agents to revalue their current resource allocations. To illustrate, recall the agentTlevel policy tree shown in Figure 4.28 on page 134. Each change in the agent's physical state results inj a transition to a new resource subtree. If the values on the leaf nodes on the "new" resource tree are different from those on the "old" tree, the agent will need to buy or sell resources to restore equilib-rium. In a dynamic manufacturing environment, agents will be constantly buying and selling resources, much like in a real continuous double auction such as the NASDAQ market for equities. In the artificial case of the simplified benchmark problem (in which the outcomes of actions in the physical world are deterministic), the values on the policy trees do not change from physical state to physical state. Consequently, the equilibrium reached in the first round of bidding does not change unless new agents enter the market as buyers or sellers. In the non-deterministic case in which agents are unsure of the outcomes of their actions, agents need to continuously revalue their resource alloca-tions based on actual (rather than expected) outcomes. For example, an agent that expects to require no more than two units of processing time for an operation is willing to pay very little for a third unit of time. However, if the agent unexpectedly encounters a machine breakdown midway through an operation, it would reevaluate its willingness to pay for additional units of processing time. As deci-sion-theoretic planners, the agents in this system have some look-ahead capability. Consequently, if the expected cost of a machine breakdown (i.e., the cost of the breakdown X the probability of the breakdown occurring) exceeds the cost of an otherwise redundant resource, then the agent—being rational—ensures that the backup resource is in place. Indeed, the better the agent's information about probabilistic events, the more stable the initial equilibrium. 158 4.3.5 C H O I C E O F M A R K E T P R O T O C O L There are three special features of the production resource market considered in this research that differ from the general case of combinational auctions: 1. Sparseness of bids — The tree-based policies generated by the agents during their planning stage include only the aggregations of resources that they deem to be relevant. For example, i f an agent knows with certainty that it requires a maximum of three units of processing time, it will have no reason to bid on bundle containing more than three units of time. 2. Price convention— The reservation price for each aggregation is known to the agent and is truthfully revealed to the auctioneer under the terms of the price convention (recall Section 4.2.4). Thus, rather than seeking to maximize its surplus on a particular sale, an agent will sell a resource contract for an amount exacdy equal to its indifference price for the resource. In this way, a Pareto efficient exchange can occur without an exhaustive search over all agents for the best bid price. 3. Substitutability of resource goods— Because of the time-dependent formulation of resource.goods, many substitutes for a particular good are typically available. For example, an agent that has a non-zero reservation price for Contract(Ml, T4) will almost certainly have a non-zero reservation price for a later contract on the same machine, say Contract(Ml, T5). O f course, each resource may be valued differendy by the agent—that is, the contracts are unlikely to be perfect substitutes. However, unless there is a massive discontinuity in the expected payoff resulting from a delay of one unit of time, the substitution cost in accepting Contract(Ml, T5) in exchange for Contract(Ml, T4) will be relatively small. For example, it may simply be equal to the incremental increase in holding cost. A number of complete algorithms exist for allocating bundles of goods to agents. For example, [Parkes, 2000] has implemented a number of complete and complete algorithms for multi-round combinational auctions. Sandholm ([Sandholm, 1999], [Sandholm and Suri, 2000]) has implemented a number of provably optimal algorithms for combinational auctions that exploit sparseness of bids and advanced search techniques to solve "reasonably" large problems (e.g., 200 goods) in an "accept-able" amount of time. Although the underlying allocation problem is intractable and thus the scalabil-ity of any optimal algorithm is limited, the value of the solutions generated by Sandholm's algorithms 159 increase monotonically with the amount of computation. As a consequence, the algorithms can be used in anytime mode. Unlike the general case of combination auctions addressed by Sandholm's algorithms however, the auction protocol developed in this thesis exploits the two features unique to this particular market: the price convention and the high degree of substitutability between resource goods. Although the resulting allocation algorithm is not complete (that is, it is not guaranteed to find the optimal alloca-tion), it does possess the following desirable characteristics: 1. Distributed — The data and computation required to match buyers and sellers is delegated to the individual agents. Equilibrium is reached through an iterative process in which individ-ual agents initiate bidding and continue to bid until no utility-increasing exchanges can be made at current market prices. The auctioneer (or auctioneers since more than one may be used) fulfills a simple brokerage role: it keeps a list of resource ownership and uses this infor-mation to redirect requests for price information to the appropriate agent or agents. There is no market call so equilibrium prices emerge only after a number of bidding rounds in which agents are free to buy resources or resell resources bought in previous rounds. Prices are 1Q computed "just in time" and are disseminated as required. 2. Monotonic — Since all transactions that occur in the market are Pareto efficient, the value of the global objective function is guaranteed to rise monotonically. 3. Polynomial — The maximum number of bids required to attain equilibrium (which may be a local maximum) is a polynomial function of the number of agents and the number of bids in the agents' bid lists (see Section 5.2.2 for the complexity analysis). Note, however, that the agents' bid lists are extracted from their final policy trees and are therefore exponential in the dimensionality of the agent-level problems. Rapid computation of allocations is important in the manufacturing environment because a new equilibrium must be computed after each change in the physical state of the system. This approach relies on the fact that the agents are networked entities and that the cost of inter-agent communication is negligible. 160 1. select candidate bid resources 5. determine next-best state FIGURE 4.38 The critical elements of the proposed market protocol. 4.3.5.1 E L E M E N T S OF T H E P R O T O C O L The basic operation of the market protocol is shown graphically in Figure 4.38. In order to better understand the flows between the agents and the auctioneer, several critical elements are defined below: 1. Candidate bid: Each agent seeks to maximize its utility given a set of prices for resources. In most cases, an agent j with an allocation of resources Ry will be able to identify an alternative allocation of resources, Ry, with a higher utility (the exception is when the agent already owns its most preferred allocation of resources). The candidate bid list'is an ordered list of all the tar-get states with higher absolute utility values than the agent's current state. If the agent already owns resources, candidate bids may involve both bid and linked ask components, as discussed in Section 4.3.4.3. 2. Request for quotation: The term "bid" implies that the agent submits a list of desired resources and a bid price to the auctioneer. The auctioneer is then expected to use the bid price to determine whether the bid can be filled at current market prices. In this protocol, the submitting agent is responsible for making the ultimate decision whether a bid is feasible. A l l that is submitted to the auctioneer is a list of desired resources analogous to a requestfor quota-161 tion (RFQ) in real markets. The auctioneer is expect to respond to each R F Q with a market price. The choice of whether to commit to the transaction is made by the bidding agent based on its own reservation price for the bid resources and the market price of the resources. 3. Seller-specific RFQs (Steps 3 and 4 in Figure 4.38): When an.agent submits a RFQ, it has no knowledge of which agents own the resources in the request. However, the auctioneer does have this information and partitions the original R F Q into multiple seller-specific RFQs. The process of creating seller-specific RFQs is equivalent to the consolidation of supply role described in Section 4.3.4.1. 4. Next-best state and substitute resources: Before responding to a R F Q with an ask price, a selling agent must determine its next-best state. The resources that permit the agent to attain its next-best state are called substitute resources. In this protocol, selling agents do not perform an exhaustive search for the most cost-effective next-best state. Instead, a constraint and two heuristics are used to determine a suitable set of substitute resources: a. Out-of-bounds resources (constraint): The substitute resources selected by the selling agent cannot be members of the out-of-bounds list (see below). b. Priority resources (heuristic): If possible, the substitute resources are selected from the priority list of resources (see below). c. Minimal change (optional heuristic): If the resources in the priority list are insufficient to provide a next-best state, the agent buys additional resources on the open market such that its next-best state has a similar expected value to the current (pre-RFQ) state. The purpose of this heuristic is to minimize the impact of the substitute transaction on the cost of the overall transaction. 5. Out-of-bounds (OOB) Ust: When an agent becomes involved in a provisional transaction by submitting a candidate bid or by responding to an RFQ, it makes certain price information available to other agents. Since the other agents use this information to make their decisions, it is important that all participants refrain from becoming involved in other buying and selling activity until the existing transaction is completed. The O O B list contains all the resources that are currently "in play" explicitly as well as all the other resources owned by agents partic-ipating in the transaction. As other agents are brought into the transaction, the O O B list grows accordingly. 162 6. Priority list: The priority list is essentially the opposite of the O O B list. When a selling agent is determining its next-best states, it starts by considering resources in the priority lists. A resource is added to the priority list if it matches one of three criteria: a. Unowned: Unowned resources have an ask price of zero. As such, the addition of "free" resources to the priority list biases the selection of next-best states towards slack resources. b. • Linked ask: As discussed in Section 4.3.4.5, the feasibility of a candidate bid often depends on the existence of sufficient demand for the initiating agent's linked ask. Add-ing linked ask resources to the priority list helps generate this demand and eliminates the problem of deadlock. c. Residual supply: In the discussion of consolidation of demand in Section 4.3.4.2, it was shown how demand for residual supply could enable Pareto efficient transactions that wouldmot otherwise occur. Adding residual supply resources to the priority list increases the probability that they will be used as substitute resources elsewhere in the transaction. 7. Firm ask price: There are three situations in which determination of the next-best state is not required in order to respond to a RFQ: a. Unowned: The auctioneer does not create seller-specific RFQs for unowned resources. The ask price for such resources is taken to be zero. b. Linked ask: Since the owner of the resources in a linked ask is the initiator of the trans-action, the price for the resources can be considered firm. In this protocol, the transac-tion price for the linked ask is taken to be the ask price. c. Residual supply: The existence of residual supply implies that the owner has already executed a provisional transaction in response to a previous RFQ. As such, residual sup-ply resources are similar to unowned resources. Since the price of the resources exchanged in this way appear on both sides of the feasibility inequality (see below), the actual transaction price is irrelevant. The convention in this protocol is that price of residual supply is zero. 163 8. Feasibility inequality: When determining whether it should commit or roll back a particular ' candidate bid, an agent uses the feasibility inequality: AR^W>PriceMkl(XBlD)-Revenue(XLYNKEDASiO Eq: 4.15 A candidate bid is considered feasible when the increase in surplus created by the transaction ( A R _^  R / ) is greater than the market price of the bid resources (^BID) ^ e s s a n Y revenue accru-ing from the sale of the linked ask ( ^ L I N K E D ASK)-9. Transaction list: The transaction list is simply a list of the agents and provisional transac-tions required to determine the market price of a particular candidate bid. When the feasibil-ity of the candidate bid is determined, the transaction list is used to commit or rollback the provisional changes made by the participating agents. \ An, important computational consideration which is apparent given the description above is that determination of the market price of a candidate bid is a recursive process. Each R F Q may require the receiving agent to select a next-best state and spawn a follow-on RFQ. The recursion stops when the R F Q contains resources with firm ask prices. In the worst case, the submission of an R F Q would result in a new R F Q being spawned for each agent participating in the market. In practice however, the bias towards resources on the priority list causes the system to converge to equilibrium with a minimal amount of recursion. 4.3.5.2 I L L U S T R A T I O N In this section, the simplified benchmark problem introduced in Section 4.3.1 is used to illustrate the functioning of the market protocol. As shown graphically in Figure 4.39, the initial allocation of resources across all three jobs is assumed to be Rj = {T6, T7, T8}, R2 = {Tl, T2, T3, T4, T5} and R3 = {}. This allocation is clearly suboptimal and Agent 3 is assumed to join the market after the allocation process has already commenced. In the following sections, each numbered step in Figure 4.38 is described in general terms and illustrated through application to the simplified bench-mark problem. 164 J2 J2~ •J2,. J2:* J2 J1 J1 J1 • 1 2 3 4 5 6 7 8 9 10 11 12 Job Time in system Revenue (net of holding costs) Job 1 8 $92 Job 2 5 $95 Job 3 12 -$12 FIGURE 4.39 The initial (sub-optimal) resource allocation for the example problem: Although Job 3 has entered the system, it has not purchased resources and is incurring holding cost over the decision horizon of 12 time units. Step 1: Select candidate bid A candidate bid represents a transaction in resource space that will move the agent to a higher valued OA state. According to the feasibility criterion set out in Equation 4.15, the agent must determine whether the market price of the bid resources is lower than the sum of the surplus the agent obtains from the transition to the target state plus the revenue that accrues from the sale of the linked ask (if one exists). Since a feasible bid implies that the buyer values the resources more than the seller (or sellers), buyer-initiated transactions are sufficient to move the market towards a Pareto optimal alloca-tion of resources. A n agent in a given state typically has a number of alternative candidate bids. To determine the sequence in which candidate bids are submitted to the auctioneer, the bids are sorted in decreasing order of value. The agent starts with the first candidate bid and stops when either a feasible bid is found or when the end of the list is encountered. In this sense, the selection rule for candidate bids is greedy—an agent always starts by attempting to move to its highest-valued state. When all agents exhaust their list without finding a feasible candidate bid, the market has achieved equilibrium. 20 Note that since all resources are assumed to be "goods", it is impossible to move to a higher-valued state by disposing of resources. Although it is possible for certain resources (such as solvents) to become "bads" after a certain amount of use, such cases are not considered. 165 Agent 3 (which represents Job 3 in Figure 4.39) is assumed to bid first. The table below shows the agent's candidate bids sorted in decreasing order, in general, only target states with values larger than the current state are considered. In this case, Agent 3 would start by creating a candidate bid for [T l , T2]. Bid resources Utility of target state Linked ask resources Linked ask price Surplus (A) [T l , T2] $98 {} N/A $98 [T l , T3] $97 {} N/A $97 [T2, T3] $97 {} N/A $97 [T l , T4] $96 {} N/A $96 [T2, T4] $96 {} N/A $96 L _ [ T V T 4 ] $96 {} ^ $96 — - — y [T10, T 1 2 ] ^ — $88" ^ - " - " N T A [ T i l , T12] $88 {} N/A $88 Step 2: Submit request for quotation (RFQ) The R F Q submitted to the auctioneer consists of four lists: bid resources, out-of-bounds resources, priority resources, and provisional transactions. The initiating agent is responsible for creating the R F Q and adding resources to each list as required. To create an RFQ, Agent 3 starts by asking the auctioneer for a list of resources not owned by any agent. If such resources exist, they are added to the priority list. In this example, T9 through T12 are added to the priority list. Agent 3 then adds T l and T2 to the RFQ's bid list. At the same time, the agent adds the bid resources to the OOB list (to indicate that the resources are in play) and, if necessary, removes them from the priority list. If the candidate bid involves a linked ask, the linked ask resources are added to the priority list. Finally, any other resources currently owned by Agent 1 but not in the linked ask are added to the OOB list. 166 Steps 3 and 4: Determine owners and relay seller-speciGc RFQs On receipt of an RFQ, the auctioneer consults its lookup table of resources and owners. The bid resources are grouped by owners and the requisite number of seller-specific RFQs are created. In the case of unowned resources, no R F Q is created. The auctioneer sends the seller-specific RFQs to the appropriate agents and, on receipt of an RFQ, each agent "locks" itself. The lock prevents the agent from participating in other transactions until the overall feasibility of the current transaction has been determined. The bid resources are owned by a single agent—Agent 2. The auctioneer creates a single seller-specific bid for [T l , T2] and forwards it to Agent 2. Step 5: Determine next-best state When a seller-specific R F Q is received by an agent, it selects a next-best state using the criteria described in Section 4.3.4.4. Once the agent has identified its next-best state, it appends the details for the new state to the RFQ's transaction list. In order to respond to the RFQ, Agent 2 first determines the implications of selling [T l , T2]. In this case, the structure of Agent 2's policy tree is such that it can quote a price for [T l , T2] without creating residual supply. The value of the pre-RFQ state R2 = {T l , T2, T3, T4, T5} is $95; the value of the state after selling the bid resources R'2 = {T3, T4, T5} is-$12. Rather than submitting an ask price of $95 - (-$12) = $107, Agent 2 searches for a next-best state. Since the priority list contains resources, these resources are checked first suitability as substitutes. In this case, Agent 2 selects the bundle [T9, T10] as a substitute. The next-best state is therefore denoted R"2 = {T3, T4, T5, T9, T10} Step 6: Submit RFQ for substitute resources In order to respond to the R F Q , the selling agent must determine the cost of the substitute resources required to achieve the next-best state. 167 To determine the cost of achieving its next-best state, Agent 2 creates a new RFQ and adds T9 and T10 to the bid resource list. In addition, Agent 2 transfers the resources from the priority list to the OOB list and adds the provisional transaction from R 2 to R"2 to the transaction list. The RFQ is then submitted to the auctioneer. Step 3(a): Determine owners and relay seller-specific RFQs The submission of the RFQ by Agent 2 starts a recursive.procedure. However, in this case, the bundle [T9, T10] is unowned. As a consequence, no seller-specific RFQ is created by the auctioneer. Step 7: Return price for substitute resources Since the bid resources are unowned, the fixed ask price (zero) is returned to Agent 2. Step 8: Return ask price The change in utility for Agent 2 caused by the transition from allocation R 2 to allocation R"2 is -$5; the cost to Agent 2 to acquire [T9, T10] is $0. As such, Agent 2 is indifferent between staying at R 2 and moving to R"2 if it is paid $5. According to the price convention, the ask price for the resource is set to the seller's indifference price. Step 9: Return market price When more than one seller-specific R F Q is created, the auctioneer must sum the ask prices from each before returning the total market price to the initiating agent. Since only one seller-specific RFQ was required in this case, the total market price for the original bid resources, [T l , T2], is $5. Step 10: Decide whether candidate bid is feasible The feasibility inequality (recall Equation 4.15 on page 164) for Agent 3's candidate bid evaluates to ($98 -(-$12)) > $5 - $0. In other words, the gain in surplus realized by Agent 3 ($110) is more than enough to compensate Agent 2 for the net cost of moving to a next-best state ($5). The candidate bid is therefore deemed feasible and the provisional 168 transactions contained in the transaction list are committed. The resulting allocation of resources is shown in Figure 4.40. J3 J3 J2 J2 J2 J1 J1 J1 J2 J2 1 2 3 4 5 6 7 8 — i 1 1 1 — ^ 9 10 11 12 Job Time in system Revenue (net of holding costs) Job 1 8 $92 Job 2 10 $90 Job 3 2 $98 FIGURE 4.40 The allocation of resources following the purchase of [Tl, T2] by Agent 3. A round of bidding continues until every agent has had an opportunity to work down its candidate bid list in an attempt to execute a utility-increasing exchange. In the next step of the example, it is Agent I's turn to attempts to improve its situation. The first candidate bid it considers is [T l , T2, T3], which is jointly owned by Agent 2 and Agent 3. The provisional allocation of resources resulting from the candidate bid depends on the order in which the seller-specific RFQs are sent out. If Agent 2's seller-specific R F Q is sent out first, the allocation shown in Figure 4.41(a) emerges. Conversely, i f Agent 3's seller-specific R F Q is sent out first, then the allocation in Figure 4.41(b) emerges. In case (a), the result of the provisional allocation is a net decrease of $1 in the overall utility realized by the agents and the candidate bid is deemed infeasible. In case (b) however, there is net increase of $2 and the candidate bid is committed. The final allocation is the same regardless of the path taken. For example, in case (a), Agent 1 contin-ues to submit candidate bids until [T3, T4, T5] is deemed feasible and is committed. From this point forward, no further candidate bids from any of the agents is feasible and equilibrium is achieved. In case (b), Agent 3 makes a second successful bid for [ T l , T2] to put the market into equilibrium. 169 (a) S e l l e r - s p e c i f i c R F Q for A g e n t 2 is s e n t f irst J l J1 J1 J2 J2 J2 J2 J2 J3 J3 • 1 2 3 4 5 6 7 8 9 10 11 12 J o b Utility be fore p r o v i s i o n a l t r a n s a c t i o n Utility fo l low ing p r o v i s i o n a l t r a n s a c t i o n C h a n g e (Aj) J o b 1 $92 $97 $5 Job 2 $90 $92 $2 Job 3 $98 $90 -$8 (b) S e l l e r - s p e c i f i c R F Q for A g e n t 3 is s e n t f irst J1 J1 J1 J3 I J3 J2 J2 J2 J2 J2 • 1 2 3 4 5 6 7 8 9 10 11 12 J o b Utility be fore p r o v i s i o n a l t r a n s a c t i o n Utility fo l lowing p r o v i s i o n a l t r a n s a c t i o n C h a n g e (Aj) J o b 1 $92 $97 $5 Job 2 $90 $90 $0 Job 3 $98 $95 -$3 F I G U R E 4.41 Two possible provisional transactions: The transaction depends on the order in which the auctioneer creates seller-specific asks in response to the candidate bid by Agent 1 for [Tl, T2, T3]. 4.3.5.3 S U M M A R Y O F E X A M P L E P R O B L E M For the simple single-machine problem considered here, the equilibrium outcome of the market is identical to the known optimal solution. Moreover, the threats to equilibrium created by the require-ment to consolidation supply and demand, the issue of compound bids and next states, and the prob-lem of deadlock in thin markets are eliminated by the two-phase feature of the auction. Since ask 170 A L G O R I T H M 4.1: CONTINUOUS REVERSE A U C T I O N ^PRIORITY*~ r - r ^ R . ^ o r (initialize priority list) 1. Repeat 2. continue <— FALSE 3. For each agent / 4. For each candidate bid R / such that F ( R / ) > V(RA: 5- R W f l < - { r : r 6 R / , r a , } 6. Get price (RBID): 1- ^FIXED *~ ^PRIORITY ° R£/£> 8. For each r e R /r /^££> 9. Price(RBID) <- Price(RBID) + Price(r) 10. Find selling agent J such that R . n R J J D ^ 0 11. Find next-best state such that R^' n RBID = 0 12. Get price ( R / ) 13. If F ( R / ) > F ( R f ) + Price(RBID) then 14. commit transaction 15. continue <— TRUE 16. Else 17. rollback transaction 18. next candidate bid 19. Next agent i 20. End if continue = FALSE prices are quoted just-in-time, there is no requirement for a centralized auctioneer or price list. Given the number of unique markets that need to be made in the worst case of a combinational auction, this eliminates the problem of an infeasibly large price list. Algorithm 4.1 shows the basic steps involved in the market protocol. In Chapter 5, the computational characteristics of the auction mechanism are analyzed and the distributional advantages of the approach are described in greater detail. 171 CHAPTER 5: EMPIRICAL RESULTS The proof-of-concept pyramid in Figure 1.1 on page 7 identifies a number of theoretical and empiri-cal questions that must be resolved in order to evaluate the overall feasibility of the market-based approach. In this research, the question of feasibility is necessarily empirical due to the nature of the algorithms employed. Specifically, the effective size of an agent's policy tree depends critically on the manufacturing environment in which the agent is situated and the way in which the environment is represented using the P-STRIPS language (see Section 5.1.3). Hence, the only way to know whether the SDP algorithm will construct an optimal policy tree in a reasonable amount of time is to formu-late some problems and solve them. In this chapter, issues of feasibility are explored by solving problems using a prototype market-based system. In keeping with the framework presented in Chapter 4, the prototype consists of two major components: an agent-level planning system (rational choice) and a combinational auction in which agents can buy and sell contracts for resources (aggregation). The question of whether it is possible to achieve agent-level rationality using the SDP algorithm is addressed iri Section 5.1. The question of whether the proposed auction protocol leads to an optimal market equilibrium is discussed in Section 5.2. In Section 5.3, the performance of the market-based prototype with respect to more complex problems is assessed and in Section 5.4, the benchmark problem from Chapter 2 is revisited. 5.1 ACHIEVING AGENT-LEVEL RATIONALITY Without agent-level rationality, the first fundamental theorem of welfare economics says nothing about the overall efficiency of market outcomes. Thus, the research question addressed in this phase of the thesis is: "Is it is possible in practice to implement rational agents for manufacturing scheduling problems?" O f course, the general answer to the question is "no"—planning and scheduling are both known to be NP-hard and the requirement to account for non-deterministic outcomes exacerbates the computational issues considerably. However, a fundamental advantage of the market-based approach is that it permits large problems to be decomposed into many smaller problems. Although the small problems remain NP-complete in principle, they may be small enough in practice to formulate and solve in a reasonable amount of time. Hence, the emphasis in this section is placed on the "in practice" part of the research question. 172 Ultimately, the goal of this program of research is to solve agent-level planning problems that are large enough Xo permit the market-based system to be implemented in industrial settings. Stating the objec-tive in this way requires that the term "large enough" be defined. Based on the descriptions in [Fox, 1987] and the researcher's own experience, the assumption is made that an agent in a real-world man-ufacturing environment should be capable of sequencing 5 to 15 operations. O f course, "operation" ' is not an objective measure since atomic units of work can be bundled together in many different ways. For example, an operation can be defined to include setup and teardown of a machine. Alterna-tively, setup, processing and teardown can be considered as distinct operations in their own right. The point of binding "large enough" to a specific range of values is simply to emphasize that the planning requirements for- a single agent in a manufacturing environment are surprisingly modest. The com-plexity that does exist in industrial scheduling generally arises as a consequence of the sheer number of jobs and the different costs and rewards associated with each, not because of the complexity of the jobs themselves. 5 .1 .1 C O N V E N T I O N A L S T O C H A S T I C O P T I M I Z A T I O N One possible means of achieving agent-level rationality in uncertain environments is to formulate the agent-level problem as a Markov Decision Problem (MDP) and solve it using an established tech-nique such as policy iteration. As discussed previously, however, the curse of dimensionality greatly limits the use of the M D P approach, even for the small agent-level problems considered here. To illustrate, consider the number of states need to represent the state space of an agent in the bench-mark problem: Agent 2 requires three operations (Op l , 0p2, and 0p3) on three different machines ( M l , M2, and M3). The amount of processing time for each operation, ph is assumed to be deterministic, as shown in the table below: Operation (/) Processing time (p^ Op1 4 Op2 3 Op3 5 173 ro u tn cn _o in a> (U .£2 E Number of States vs. Planning Horizon (MDP formulation) tE+18 tE+17 1.E+16 tE+15 1E+14 1E+13 1E+12 1.E+11 tE+t) OJ IE+09 </> tE+08 1E+07 v. 1.E+06 1.E+05 1.E+04 tE+03 tE+02 1.E+01 1.E+00 FIGURE 5.1 4 6 8 10 12 planning horizon length, t (time units) The curse of dimensionality for a part-agent: As the length ofAgent 2"s planning horizon approaches a "useful" si%e, the concrete state space becomes unmanageably large. Recall from the analysis in Table 4.1 on page 99 that approximately 10 1 3 states are required to repre-sent the agent's planning problem over a decision horizonof 10 units of time. More generally, the size of the concrete (or explicit) state space is shown (on a logarithmic scale) over a range of planning hori-zon in Figure 5.1. 5 . 1 . 2 C O P I N G S T R A T E G I E S As discussed in Section 4.2, three "coping strategies" have been used to attenuate, or at least delay, the explosive growth of the state space shown in Figure 5.1: 1. Structured dynamic programming (SDP) — Policy iteration is performed over a policy tree rather than an explicit state space. The SDP algorithm is designed to exploit sources of independence within the problem formulation. 174 2. Assertions and constraints — Assertions are used to eliminate branches of the policy tree that correspond tp invalid states in the physical world. Constraints were not used for the sim-ple problem formulations considered here. 3. Roll ing planning horizons — In the rolling planning horizon approach, a planning horizon of t units of time is used to generate a policy tree. However, since the agent's planning prob-lem may extend beyond the horizon, point estimates of expected utility are used for the inter-val [t+1, °°]. As the agent approaches the end of its planning horizon, it replans for the next t units of time. 5 . 1 . 3 C O M P U T A T I O N A L L E V E R A G E The SDP algorithm is highly sensitive to the existence of independence relationships within the prob-lem formulation. For example, if the planning algorithm is deciding whether to execute the action Process(Op2, M2, T5) in a particular state, a number of factors combine to define the sets of relevant and irrelevant information for the decision process: • The one-way arrow of time ensures that all state variables of the form Contract(Mj, Tj) for Tj < 5 are irrelevant. Although the SDP formulation does not use a conventional M D P repre-sentation of states, the Markovian property of history independence [Puterman, 1994] remains an important feature of the approach. • If there is a precedence constraint that states that Opl must be completed before 0p2 can be started, then the value of the state variable OpStatUS(Opl) is certainly relevant. However, since the amount of time actually used to complete Opl has no impact on 0p2, all state variables of the form ElapsedTime(Opl) are irrelevant. • If there is a precedence constraint that states that 0p2 must be completed before 0p3 can be started, then an assertion binds the state variable ElapsedTime(0p3) to a known value of zero. Known values create singleton branches and have no effect on the number of abstract states in the solution. Since relevance relationships are specific to a particular problem formulation, the efficacy of the cop-ing strategies can only be evaluated empirically within a class of problems. For the agent-level plan-175 ning problems considered here, the method used for evaluating the performance of the coping strategies is as follows: 1. Formulate agent-level planning problems that differ along a number of dimensions: a. length of the planning horizon b. deterministic vs. stochastic processing times c. single operation vs. multiple operations 2. Solve the problems using the SDP algorithm. 3. Compare the number of abstract states in the optimal policy tree to the number of concrete states in the conventional M D P formulation. The number of abstract states in the agent's policy tree is used as a measure of computational leverage for two reasons. First, the modified policy iteration algorithm used in SDP is polynomial in the size of the abstract state space. As such, the problem's effective size (as measured by the number of leaf nodes in the final policy, tree) is the primary indicator of computational performance. Second, the as the number of states becomes large, the issue of representation becomes important. For most of the explicit M D P formulations considered here, the limiting factor is not computational time but rather the number of bits required to store the state space description. In the following sections, the effects of different problem formulations on the effective size of the agent-level planning problems are explored. The purpose of this exploration is not to provide a com-prehensive analysis of the computational performance of the SDP algorithm. Instead, the objective is provide a broad indication of what is gained by using the coping strategies and highlight the remain-ing sources of complexity. 5.1.3.1 P L A N N I N G H O R I Z O N The importance of time in scheduling problems means that the length of the planning horizon, t, has a large effect on the size of both the concrete and abstract state spaces. Although the rolling planning horizon approach permits problems with arbitrarily long horizons to be addressed (at the cost of periodic replanning), the estimates of terminal rewards used in the rolling horizon approach intro-duce elements of approximation and "nearsightedness" into the technique. It is preferable, therefore, that the planning horizon for an agent span as much of its decision problem as possible. For example, 176 Number of States vs. Planning Horizon (MDP and SDP formulations) "LE+15 1E+14 1.E+13 IE+12 ro 1.E+11 o V) 1E+10 O) O tE+09 — if) tE+08 ;tat 1E+07 in H — tE+06 O __ 1.E+05 o _ Q tE+04 E 3 tE+03 C 1E+02 tE+01 tE+00 concrete states in MDP formulation jm __l —- w _____— aDsiraci states in SDP formulation 1 1 planning horizon length, f (time units) FIGURE 5.2 The effect of the length of the planning horizon on the number of concrete and abstract states. if an agent's expected processing requirement is 10 units of dme (suitably denned), then a planning horizon longer than 10 units should be used to reduce the agent's reliance on the accuracy of the ter-minal reward estimates. A comparison between the explicit M D P formulation and the SDP formulation for different plan-ning horizons is shown in Figure 5.2. The upward slope of the SDP line indicates that the number of abstract states increases exponentially with t. Moreover, the slight upward curve in the line suggests that the rate of growth is actually worse than exponential. However, despite the rapid growth charac-teristics of the abstract state space, the important result shown in Figure 5.2 is the dramatic decrease in both the absolute number of states and the rate of state space growth (indicated by slope of the line) provided by the SDP algorithm. For example, whereas almost 10 1 4 concrete states would be required for the M D P formulation of the t = 11 problem, the SDP algorithm requires just under 16,000 abstract states. 177 5.1.3.2 STOCHASTIC P R O C E S S I N G T I M E S Outcomes in manufacturing environments are characterized by uncertainty. However, most of the problem's considered to this point in the thesis are deterministic. In order to better understand the impact of stochastic operation times on the effective size of the problem's state space, experiments were conducted to compare the state space growth of deterministic and stochastic versions of the same problem. In the stochastic version of the problem, each unit of processing time for each opera-tion results in equal likelihood of finishing and continuing. For example, the cumulative finishing time distribution for actions of the form Process(Op3, M3, •) is shown in Figure 5.3.1 Note that the maxi-mum processing time for each operation is the same for both the deterministic and stochastic ver-sions. As a result, the number of state variables of the form ElapsedTime(Opj), and hence the number of concrete states, is constant. cumulative probability of completion ElapsedTime(Op3) FIGURE 5.3 The probability that 0p3 is complete after executing Process(Op3, M3, •): The probabilities are conditioned.on different values of state variable ElapsedTime(0p3). The maximum amount of processing requiredfor the operation is assumed to be five units of time. Figure 5.4 shows the size of the state space plotted against the length of the planning horizon for the deterministic and stochastic versions of the problem. The size of the concrete state space is also Since the probabil i ty o f comple t ion for any value o f ElapsedTime(0p3) is constant, the complet ion't imes i n this example approximate an exponential distribution. H o w e v e r , as discussed i n Sect ion 4.1.8, P-STRIPS permits the use o f any discrete probabil i ty distribution. 178 Number of Abstract States vs. Number of Concrete States 10000000000 T 1 1 1 1 1 1 1 •o ! J 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 planning horizon length, f (time units) FIGURE 5.4 A. comparison of deterministic and stochastic problems with the same number of concrete states: For the shorter planning horizons, non-determinism has virtually no effect on the si^e of the finalpolicy tree. shown for reference. For shorter planning horizons, there is virtually no difference between the deterministic and stochastic case. However, as the planning horizon is increased, the two lines diverge as the number of states in the stochastic case jumps dramatically. The main reason for the jump is an artifact of the problem formulation. Specifically, the mean makespan for the stochastic case is shorter than the known makespan (4 + 3 + 5 = 12) for the deterministic case. Because of this difference, the planner in the stochastic case has more choice in determining a schedule. For example, in the stochastic case there is a 0.5 = 0.125 probability of all three operations being completed after three units of processing. Since there are many ways to schedule three units of time within a planning horizon of five or six units, the bushiness of the policy tree naturally increases with the length of the planning horizon. 179 The conclusion based on Figure 5.4 is that non-determinism perse does not significandy affect the effective size of an agent-level planning problem. However, the ratio of mean makespan to the length of the decision horizon does impact the bushiness of the policy tree. 5.1.3.3 N U M B E R OF O P E R A T I O N S To examine the effect of the number of operations on problem size, a family of problems with a finite planning horizon (six units of time) but a different number of operations summing to a makespan of six was solved. As the results in Figure 5.5 suggest, there is no clear, monotonic relationship between the number of jobs and the effective size of the problem for a given planning horizon. O f course, this result must be interpreted cautiously given the small range of values considered here. Unfortunately, limitations of the current prototype system prevent a more comprehensive exploration and thus the issue remains an area for future investigation. Number of Abstract States vs. Concrete States problem concrete states abstract states 1 operation 9,216 200 2 operations 884,736 784 3 operations 50,331,648 251 c o JS 3 E a Q_ o CO in CO ra in o ra in £i ra 900 800 700 600 500 ' 400 300 200 100 0 2 operations 1 operation 3 operations 10 100 xm 10000 100000 1000000 1E+07 -E+08 concrete states (MDP formulation) FIGURE 5.5 The relationship between the number of operations in a job and the si%e of the abstract state space: Note that the number of jobs is not an accurate predictor of state space for the range ofproblems considered here. 5.1.4 I N T E R P R E T A T I O N O F R E S U L T S To recapitulate, the advantage of the decision-theoretic planning (DTP) formulation of the agent problems is that it permits the precondition of strict agent-level rationality to be satisfied. Moreover, 180 the D T P formulation supports reasoning about non-deterministic outcomes. Given that manufactur-ing environments are characterized by uncertainty about completion times, breakdowns, resource availability, and so on, there is a good fit between the agent-level problem formulation and the envi-ronment in which the agents are situated. The primary challenge of the D T P formulation is the "curse of dimensionality". In standard D T P (e.g., [Dean et al., 1993], [Russell and Norvig, 1995]), the agent-level problems are transformed into MDPs and solved using well-know techniques such as policy iteration. However, as the agent-level problems become more complex and realistic, the size of the agent's state space grows to an unman-ageably large size. As Figure 5.1 on page 174 illustrates, there is absolutely no possibility of using con-ventional M D P solution techniques to address the agent-level planning problems required for market-based resource allocation. The results using coping strategies such as SDP are more encourag-ing, however. The key result from the prototype implementation is that the coping strategies employed in this thesis permit the solution of agent-level planning problems that are infeasible to represent (never mind solve) using conventional M D P solution techniques. Although the example problems solved using the prototype remain too small to be considered "useful" in industrial environments, this is largely a result of limitations posed by the languages and platform used for prototype implementation (see • Section 6.2.1.1). If more efficient internal algorithms (e.g., sorting, search, tree joining), better lan-guages, and faster hardware were used to implement a prototype, one would expect a performance increase of several orders of magnitude over the existing prototype. By extrapolating from the results in this section, it is estimated that a better implementation could be used to solve problems with several million abstract states (versus the tens of thousands considered here) in an acceptable amount of time. Agents with this type of planning capability could be of prac-tical use in industrial environments and thus exploration of additional sources of computational lever-age and prototype enhancements are identified in Chapter 6 as areas for further research. 181 5.2 DISTRIBUTED COMBINATIONAL AUCTIONS In Section 4.3, a protocol for a continuous combinational auction was introduced and illustrated with a simple example. A n important feature of the proposed protocol is that it supports combinational bids without the need to explicitly quote prices for every possible bundle of goods. That is, agents can submit candidate bids for any resource or aggregation of resources and the auctioneer will return a market price. Although the auctioneer is responsible for controlling the flow of price information between agents, the bulk of the work to determine the market prices is done by the agents them selves. One means of visualizing the operation of the market is to situate it with respect to other "hill climbing" techniques. In Section 5.2.3, the market framework is characterized as a distributed; hill climbing algorithm and its computational properties, are briefly discussed in the hill climbing context. 5 . 2 . 1 S E A R C H I N G F O R N E X T - B E S T S T A T E S A basic issue that arises in practice is whether the selling agent must conduct a "shallow" or "medium" search for a next-best state. To illustrate the difference, consider the case of a "deep" search (which has not been implemented in the prototype system). In a deep search for its next-best state, the potential seller of a resource is given the opportunity to evaluate all the transitions in its can-didate bid list (with the exception of those containing resources in the initial request for quotation (RFQ) and those on the out-of-bounds list) in order to find its best post-sale alternative. Thus, in a deep search, each R F Q results in a recursive chain of optimal bidding attempts. At the other end of the continuum is shallow search. In shallow search, the selling agent is permitted to search for substitute resources in the priority list only. If no utility-increasing transition can be con-structed from priority list resources, the seller simply quotes its reservation price for the resources in the RFQ. Although shallow search eliminates recursion, a potential source of market failure could arise i f the seller—for some reason unique to that particular agent—places no value on priority list resources. For example, an agent with a binding deadline of Time = 10 could be asked to submit a price for a subset of its resources and consider those on the priority list as possible substitutes. If all the resources on the priority list correspond to processing contracts that occur after the Time = 10 deadline, then the seller's shallow search for a next-best state fails. However, i f some other agent with a more forgiving deadline can supply the selling agent with appropriate substitute resources and then 182 partially offset its own losses from the priority list, two beneficial outcomes accrue. First, the price for the original R F Q resources is lower due to the seller's ability to find a substitute resources. Second, demand is created for the priority list resources. Facilitating simple «-way trades for substitute resources (such as in the example above) is what is meant by "medium" depth search for a next-best state. The selling agent performs a neighborhood (rather than exhaustive) search for substitute resources. Although this results in recursion, the recur-sion ends as soon as an agent can fulfil its substitute resource requirements from the priority list. For the problems addressed in this chapter and in Chapter 4, a shallow search for the seller's next-best state proved to be sufficient for finding the correct equilibrium (that is, the equilibrium corre-sponding to the optimal allocation of resources). It is clear, however, that in in the general case, deep search is required to avoid all sources of market failure and ensure an optimal allocation. Thus, although the priority list and the shallow search heuristic gready reduce the complexity of the algo-rithm, the resulting allocations may be arbitrarily far from the optimal outcome. Precisely how much approximation results from shallow search in this type of market is an empirical question that is beyond the scope of the thesis. 5 . 2 . 2 C O M P L E X I T Y A N A L Y S I S The core difficulty with medium-depth search is that the process used to determine market prices is potentially recursive. Rather than simply consult a central repository of current market prices, poten-tial buyers in this system must submit a R F Q to the auctioneer. The auctioneer relays relevant por-tions of the original R F Q to the agents that own the resources under consideration and these agents typically generate their own RFQs for substitute resources. Thus, a single R F Q could conceivably set off a chain of follow-on RFQs from each agent in the system. To get some idea of the worst-case performance of the process, consider Algorithm 4.1 on page 171. Assume that each agent i — 1, ..., n has the same number of bids, b, in its candidate bid list and each agent is initially in the lowest-valued state of its resource tree. Since each agent starts searching at the top of its candidate bid list for a feasible bid, it could be that all b-\ states worth more than the cur-rent state are evaluated before a feasible candidate bid is found. Moreover, assume that the situation is the same for all agents so that each agent has to consider b-\ bids in a single round of bidding. Finally, 183 assume that each candidate bid generates n follow-on RFQs so that every agent is involved in every provisional transaction. Note that the protocol prevents infinite recursion since no agent can submit more than one R F Q within the context of a single conditional transaction. Under these assumptions, 2 the total number of RFQs for each round of bidding is proportional to n (b — 1). To estimate the total number of auction rounds required to attain equilibrium, assume that only one Pareto efficient transaction is executed in each round. Although a single transaction is sufficient to require another round of bidding, the fact that prices increase monotonically ensures that there are no cycles and that the total number of rounds is no greater than b X n. Consequently, the maximum number of RFQs that could be generated when finding equilibrium is 0(b n ). In other words, the auction protocol is polynomial in the number of bids on the agents' candidate bid lists and the num-ber of agents in the system. Although the number of candidate bids, b, is a function of the size of the agent's policy tree and is therefore exponential in the complexity of the agent-level problem, the SDP algorithm ensures that only relevant bundles of resources are considered. Thus, the set of candidate bids considered by each agent is complete but minimal. In addition, the actual number of RFQs generated during an auction tends to be much smaller than the worst case due to the bias introduced by the priority list, the greedy bidding of the agents, and the monotonicity of prices. As illustrated in Section 5.3.1 below, agents move up their bid lists very quickly. Since an agent's utility cannot decrease as the result of a transac-tion, early gains are "locked in". Because of this, the number of candidate bids that actually result in a committed transaction tends to be very small. 5.2.3 B I D D I N G A S H I L L C L I M B I N G The process of buying and selling contracts for resources in a market can be seen as a special form of hill climbing. In hill climbing, the objective function to be maximized (in this case, global utility) is visualized as a surface in which the height dimension is the objective function's value. A hill climbing algorithm attempts to reach the solution by considering its current location and moving along the steepest possible path to a higher location. The process is iterative and ends when the algorithm can-not identify any further value-increasing moves. 184 In a market that is complete and efficient, all transactions that are committed lead to a monotonic increase in global utility. Moreover, in such a market, all Pareto efficient exchanges that can occur do occur. Thus, the surface created by an efficient market formulation has no local maxima (although multiple global maxima may exist). Sandholm [1998] identifies four atomic types of contracts including ordinary bilateral contracts, cluster contracts (more than one good is exchanged), swap contracts (in which each agent gains and loses a good in the transaction), and multiagent contracts. A n OCSM-con-tract—which combines aspects of all four atomic contract types—is shown to be sufficient for imple-menting an efficient combinational auction. In other words, a hill climbing algorithm can attain the globally optimal allocation in a finite number of exchanges without backtracking (see [Sandholm, 1998] for the proof). The auction protocol developed in the thesis supports a rich set of inter-agent exchanges that mirrors the functionality of OCSM-contracts. In addition, the use of buyer-initiated RFQs delegates much of the hill climbing work to individual agents. When agents submit candidate bids to the auctioneer, they do so in an effort to maximize their own utility net of any payments they must make to other agents. Thus, by submitting their best bids first, the agents are implicitly identifying paths of steep ascent. Similarly, when the agents decide whether to commit or rollback the candidate bid based on the mar-ket's response to their RFQ, they are implicidy determining the direction (uphill or downhill) of the transaction. Since prices can only increase, gains made by a particular exchange are locked in for all subsequent exchanges. The difference between the auction protocol proposed here and Sandholm's optimal algorithm is that the auction protocol relies on the use of a priority list and incomplete search (i.e., shallow or medium-depfh search). Since the search for the next-best state is incomplete, the "task allocation graph" is not fully connected. Thus, although the types of contracts available to the agents are equiv-alent to Sandholm's OCSM-contracts, only a small subset of the possible transactions are actually considered during the auction. As a consequence, the auction protocol described in this thesis may attain equilibrium at an economically inefficient local optimum. The loss of optimality resulting from incomplete search depends to a large degree on idiosyncratic properties of the manufacturing environment, such as the extent to which resource contracts are sub-stitutable. As a consequence, the performance of the protocol needs to be evaluated empirically in 185 different manufacturing environments. For example, in [Parkes, 2000], the performance of serveral complete and incomplete auction protocols are evaluated against a standardized problem set. Such an analysis, however, is beyond the scope of the thesis and is left for future work. 5.3 EVALUATING THE PERFORMANCE OF THE MARKET In this section, the operation of the market-based system as a whole is examined in greater detail. The objective is to gain a better understanding of the framework's computational strengths and weak-nesses and to identify the type of optimization problems that can be addressed using markets. Section 5.3.1 provides empirical results from a simple resource allocation task. In Section 5.3.2, the generalizability of the framework is assessed by addressing multiple machine problems with more complex objective functions. Finally, in Section 5.3.3, the flexibility of the market in the face of new opportunities and uncertainty is examined. 5 . 3 . 1 E M P I R I C A L R E S U L T S To illustrate the performance of the proposed market protocol, consider the typical sequence of transactions used to achieve equilibrium shown in Figure 5.6. The problem setup used to generate the sequence is identical to that used in Section 4.3.5.2: Three jobs each consisting of a single operation require different amounts of processing time on a single machine. The holding cost faced by the agents for each unit of time in the system is $1. The initial allocation of resources to agents is arbitrary and suboptimal, a shown in Figure 5.6. After the second round of bidding, the optimal allocation of resources is found by the market. By analyzing the sequence of exchanges leading to the equilibrium outcome shown in Figure 5.6, a num-ber of insights concerning the operation of the protocol can be gained. • In Round 1, Agent 1 displaces Agent 2 from T2. Note, however, that Agent 2's next-best allo-cation uses T7, which is the linked ask for Agent 1. By consuming resources that will ulti-mately be redundant for Agent 1, the transaction can occur without involving Agent 3 (i.e., the amount of recursion is limited by the priority list). 186 Round Initiating agent Allocation at time t Values 1 2 3 4 5 6 7 8 9 10 11 12 A1 A2 A3 Total 0 2 1 2 3 2 1 3 2 1 2 93 89 92 274 1 1 1 1 1 2 3 2 2 3 2 2 97 89 92 278 2 1 1 1 2 2 2 2 2 3 3 97 92 90 279 3 3 3 1 2 2 2 2 2 1 1 90 92 98 280 2 1 3 3 1 1 1 2 2 2 2 2 95 90 98 283 2 3 3 1 1 1 2 2 2 2 2 (no change) 283 3 3 3 1 1 1 2 2 2 2 2 (no change) 283 3 (all) 3 3 1 1 1 2 2 2 2 2 (no change) 283 FIGURE 5.6 The sequence of transactions used by the market prototype to attain an equilibrium outcome. • Unowned resources are also on the priority list and thus Agent 2 could have selected T12 instead of T7 for its next-best state. However, note that each agent incurs a $1 holding cost for each unit of time that it is in the system. The "minimal change" heuristic (recall Section 4.3.5.1) biases Agent 2's selection of its next-best state to T7 because it leaves the agent's net value unchanged. If Agent 2 had used T12, its ship time would be delayed by one unit and its net value would have decreased accordingly. • In Round 1, Agent 2 purchases [T5, T8] from Agent 3. The transaction makes Agent 2 better off by $3 and Agent 3 worse off by $2. Since Agent 2 pays Agent 3 its ask price of $2, the exchange is Pareto efficient. • Under the current implementation of the bidding protocol, the agents bid in succession. More generally, bidding could occur simultaneously on different threads as long as each chain of bids is compartmentalized in some way. As an artifact of the sequence used in this exam-ple, the agent that can afford to pay the most for processing—Agent 3—bids last. • At the end of Round 1, Agent 3 bids for and purchases [Tl, T2]. However, rather than shift all jobs down two time units, the seller (Agent 1) selects its next-best state from the priority list and purchases Agent 3's linked ask [T9, T10]. Agent 2 remains unaffected by the transaction. 187 • The algorithm iterates until a round occurs in which there are no changes. Since several trans-actions occurred in Round 1, a second round is required. • In Round 2, Agent 1 is given the opportunity to improve the allocation it accepted at the end of.Round 1. It starts by bidding for [Tl, T2] but cannot pay Agent 3's ask price since Agent .1 sold Agent 3 the identical bundle in the previous round (prices rise). The best allocation that Agent 1 can afford involves the purchase of [T4, T5] from Agent 2. Since buying [T4, T5] makes Agent 1 better off by $5, it can afford to compensate Agent 2 for the $2 loss it incurs by selling the resources. ;• • After Agent 1 has completed its transaction in Round 2, Agent 2 and Agent 3 take their turns. In Agent 2's case, it submits candidate bids for all the bundles that would permit it to finish before T10. However, in each case, the agent cannot afford the ask price returned by the auc-tioneer and is therefore unable to initiate a transaction. In Agent 3's case, its allocation of [Tl, T2] is already maximally preferred and it has no incentive to initiate a transaction. • A third round is required because a transaction occurred in Round 2. However, since Agent 1 cannot afford to do better than [T3, T4, T5], no transactions occur in the third round. Equilib-rium is therefore attained. As the sequence Figure 5.6 shows, each committed transaction results in a strict increase in global utility. This property of the market coupled with the priority list heuristic permits the market to con-verge very quickly to the optimal allocation of resources. 5 . 3 . 2 E X T E N S I O N S T O M U L T I P L E - M A C H I N E P R O B L E M S In a simple single-machine problem, such as the one used for illustration in Section 5.3.1, the profit maximization behavior of the market reduces to the shortest processing time (SPT) sequencing rule. Although it is interesting that the market can "discover" the SPT sequencing rule viaa process of iter-ative improvement, the outcome is of little practical value given the simplicity of implementing SPT. The situation is very different in the case of multiple machines, however, since no optimal sequencing rule for the general problem is known to exist. Simple sequencing rules like SPT do not work well in multiple machine environments due to the requirement to consider the cost implications over all operations simultaneously and to account for the possibility of concurrent activity on multiple 188 machines. To illustrate the basic issues, consider the coarse-grained multiple-machine problem shown in Figure 5.7. This particular problem is referred to as "coarse-grained" because the units of time used in the problem formulation are longer than the finer-grained time units used in the benchmark problem. It may be that the two problems are expected to span the same amount of calendar time; however, the coarse-grained problem contains less information and is therefore easier to solve in a single planning horizon. Jobs Operations Op1 Op2 Op3 J1 1 1 2 J2 2 1 2 J3 2 1 1 Time 1 2 3 4 5 6 7 M1 JI J2 J2 M2 JI M3 JI JI J2 J2 Legend JI J2 J3 F I G U R E 5.7 Processing times and the optimal schedule for a coarse-grained three-machine problem. The sequence of transactions used by the market to attain the equilibrium allocation of jobs to machines is shown in Figure 5.8 (the optimality of the final schedule with respect to total profit can be confirmed by inspection). As expected, the greedy bidding behavior of the agents and the mono-tonicity of prices ensures that the solution is found in a relatively small number of iterations (two plus one to confirm that no further exchanges are possible). The interesting feature of the solution is that it demonstrates that the market does not simply apply the SPT rule to each machine. For example, the 189 total amount of processing required by J2 is greater than that required by J3; however, J2 appears first in the final permutation schedule. Another interesting feature of the solution is its stability in the face of multiple optimal solutions. Although swapping the order of J2 and J3 increases the total makespan of the schedule (from 7 units of time to 8 units of time), it leaves its total cost unchanged at $17. Since the market minimizes the total cost of the solution regardless of makespan, the schedule in which J3 precedes J2 is also an equi-librium outcome. However, because of the monotonicity of prices and the requirement that at least one agent be strictly better off as the result of an exchange, the market does not oscillate between the two solutions. 5.3.3 F L E X I B I L I T Y OF T H E M A R K E T A P P R O A C H In Section 2.2.4, a number of limitations of the conventional (MRP-based) decomposition of the manufacturing problerfi were identified. In general, the most important shortcomings of MRP-based approaches are their inability to reason about non-deterministic outcomes and their inflexibility in the face of new information, opportunities, and constraints. In this section, the single-machine problem from Section 5.3.1 is revisited and a new job is inserted into the system after an equilibrium allocation has been attained. The purpose of this exercise is to illustrate how the market reacts to uncertainty, time-dependent payoff functions, and undercapacity. The essential elements of the newly arrived job are described below. At Time = 2, a new job, J4, enters the system and is assigned an agent (Agent 4). J4 is more complex than the existing jobs in the system for two reasons. First, its completion time is not known with certainty. Second, the transfer price received by the manufacturing system for processing the part is contingent on its completion time. The completion time distribution and the payoff function for the job are shown in Figure 5.9. A n important aspect of the problem formulation to keep in mind is that the payoff function shown in Figure 5.9 is determined exogenously. For example, the customer for the part may have specified dif-ferent prices for different delivery times in response to its own requirements and preferences. Alter-natively, the payoff function could have been determined subjectively by taking into account loss of 190 Round Initiating agent Machine Allocation at time f Values 1 2 3 4 5 6 7 A1 A2 A3 Total 0 M1 -7 -7 -21 M2 -7 M3 1 1 M1 1 -7 -7 82 M2 1 96 M3 1 1 2 Ml 2 2 1 93 95 -7 181 M2 2 1 M3 2 2 1 1 3 M1 3 3 1 93 -7 96 182 M2 3 1 M3 3 1 1 2 1 M1 1 3 3 96 -7 95 184 M2 1 3 M3 1 1 3 2 M1 1 2 2 3 3 96 94 93 283 M2 1 2 3 M3 1 1 2 2 3 3 M1 1 2 2 3 3 (no change) 283 M2 1 2 3 M3 1 1 2 2 3 3 M1 1 2 2 3 3 (no change) 283 M2 1 2 3 M3 1 1 2 2 3 FIGURE 5.8 Equilibrium allocations of resources for the coarse-grained three-machine problem: If the problem is solved in a single planning horizon, the market finds the minimum cost schedule. 191 customer goodwill, and so on. The interesting issue raised by this example revolves around determi-nation of Agent 4's willingness to pay for production resources. On one hand, the new job has a much higher payoff than the incumbents if it can be processed before Time = 6. On the other hand, in the worst case, the job may require five units of processing and thereby prevent one or more of the incumbent jobs from finishing by its own deadline (Time = 11). The sequence of transactions leading to the new equilibrium is shown in Figure 5.10. When Agent 4 joins the market, it has the opportunity to bid on any resource currendy owned by the incumbent agents (of course, since the first unit of time has already passed, no further bidding occurs for T l ) . Given the magnitude of Agent 4's payoff for early delivery and the infeasibility of processing all the jobs in the time available, it is not surprising that the new job displaces one of the other jobs. At first, JI is displaced but JI eventually displaces 32, which is longer and therefore less profitable. By the end of the first round, the set of jobs to be processed is set. In the second and third rounds, the issue becomes Agent 4's willingness to pay for a fifth unit of pro-cessing time. According to the problem formulation, the probability of Agent 4's operation being complete after four units of processing time is 0.95. Thus, the agent's purchase of a fifth unit of pro-1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Completion Time Distribution I i J H a probability • cumulative $300 * $2oq $100 $50 $0 payoff for J4 payoff for J1, J2, and J3 ElapsedTime(Opl) 1 2 3 4 5 6 7 8 9 10 11 completion time FIGURE 5.9 The completion time distribution and payoff function for the newjob. 192 Round Initiating Allocation at time f Values agent 2 3 4 5 6 7 8 9 10 11 12 A1 A2 A3 A4 Total 0 3 1 1 1 2 2 2 2 2 95 90 98 0 283 1 3 1 1 1 2 2 2 2 2 (no change) 283 1 2 3 1 1 1 2 2 2 2 2 (no change) 283 3 3 1 1 1 2 2 2 2 2 (no change) 283 4 4 4 . .... 4 MffSfl 4 2 2 2 2 2 3 -11 91 90 195.98 365.98 1 4 4 1 1 3 93 -11 90 195.98 367.98 2 2 4 4 4 1 3 (no change) 367.98 3 4 4 3 1 1 92 -11 95 195.98 371.98 4 4 4 4 1 1 1 3 92 -11 91 201.70 373.70 1 4 4 1 1 1 3 (no change) 373.70 3 2 4 *L z 1 1 1. (no change) 373.70 3 4 3 1 1 < J L 92 -11 95 198.79 374.79 4 1 1 91 -11 95 201.65 376.65 5 4 4 4 3 I 1 1 1 (no change) 376.65 FIGURE 5.10 The sequence required to attain equilibrium after the arrival of the newjob: Agent 4's purchase of a fifth unit of processing time can be seen as insurance. The price the agent is willing to pay for the insurance is a complexfunction of is completion probabilities and payofffunction. cessing time can be seen as "insurance" to cover the risk of being incomplete after four units of pro-cessing. However, the value of the insurance depends on its timing with respect to the agent's payoff function. For example, the difference between having no insurance and having Til as insurance is worth about five cents to Agent 4. In contrast, the expected value of T6 as insurance is $5.72 and only slightly less ($5.67) for T7. Although it might seem odd that Agent 4 ultimately decides to purchase its fifth unit of processing at Time = 7 rather than Time = 6, the discontinuity is due to the interaction of Agent 4's payoff function and the opportunity cost of T6. Specifically, Agent 3 is willing to pay $1 for T6 whereas Agent 4's expected value is reduced by only five cents if it purchases T7 instead. 193 The relatively complex schedule that emerges from the interaction of the agents in Figure 5.10 is especially interesting when one considers that what emerges is a price for Agent 4's option to use the machine at Time = 7. If, during execution of the plan, Agent 4 finds that it has completed processing after two units of time (probability = 0.325), the OpStatus(Opl) = complete outcome will lead to a new physical state. In the resource subtree corresponding to the new physical state, Agent 4 will value its remaining contracts for processing on Machine M l at exactly $0. This revaluation will trigger a new auction and the excess capacity will be absorbed by the remaining jobs (including J2). The ability of the agents to price real options in the manner may have important applications in other domains, as discussed in Section 6.2.3.1. 5.3.4 I N T E R P R E T A T I O N OF R E S U L T S The distributed hill climbing algorithm implemented by the market delegates the task of identifying value-increasing exchanges to the individual agents. Thus, the role of the centralized market is to sim-ply iterate over the exchanges suggested by the agents until no agent is willing to suggest an exchange at current market prices. Although the resulting equilibrium is not guaranteed to be optimal with respect the global utility function, the results presented here indicate that the algorithm is capable of avoiding local optima and finding the optimal solutions in certain cases. As stated above, the general-izability of this observation needs be explored more systematically. In the problems addressed in the thesis, the market was shown to be capable of discovering of simple sequencing rules such as SPT. In addition, the market can provide good solutions in cases in which no optimal sequencing rules are known to exist, such as generalized multi-machine problems (Section 5.3.2) and problems with complex payoff functions and non-deterministic processing times (Section 5.3.3). O f course, in both cases, the problems used for illustration consist of a finite number of jobs and are small enough to be solved within a single planning horizon. Given the ultimate objec-tive of industrial-scale applicability, a more interesting class of problems are those with an indetermi-nate number of jobs and an infinite planning horizon. In the following section, the benchmark problem from Chapter 2 is addressed using a rolling planning horizon approach in order to gauge the broader, real-world relevance of the market-based framework developed in this thesis. 194 5.4 S O L U T I O N T O T H E B E N C H M A R K P R O B L E M Recall from Section 2.5.3 that although Johnson's rule provides the optimal solution to certain instances of the F3 | | CMAX class of problems, the resulting schedule is optimal with respect to makespan, not total profit. Given that maximization of total profit (or in the case of the simple prob-lems considered here, minimization of cost) is a more realistic goal in many manufacturing environ-ments, a method for addressing the F3 | | XWJCJ class of problems is preferred. Unfortunately, there is no known polynomial time solution for multiple-machine, minimum cost scheduling problems. Indeed, the optimal minimum cost solution to the benchmark problem in Section 2.5.3 (reproduced as Figure 5.11 below) was found by inspection. The objective of this section is to assess the market-based prototype's ability to find a solution to a F3 | | HWJCJ problem using a rolling horizon approach. According to the Johnson's rule solution (recall Figure 2.2 on page 24), the minimum makespan for the benchmark problem is 17 time units. Given that the prototype system is not capable of solving problems of this size and the general desir-ability of examining the impact of rolling planning horizons on auction outcomes, the agent-level problems were decomposed into four rolling horizon problems: Time = [1, 6], [7, 11], [12, 15], and [16, 24]. M2 M3 Time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 M1 11 j i | J3 j J3 H i J2 J2 J2 JI JI JI J2 J2 J2 JI JI JI JI J2 J2 J2 | J2 | J2 Legend JI J2 J3 e  FIGURE 5.11 The optimal minimum cost solution to the benchmark problem (reproduced from Figure 2.3). When the rolling planning horizon technique is used for solving the agent-level planning problems, a sequence of auctions must also be used. In this particular case, the first auction is held for resources in the interval Time = [1, 6]. At Time = 7, a second auction is held for resources in the interval 195 Time = [7, 11], and so on. Note that in the general case of non-deterministic actions, the second auc-tion cannot be held prior to Time = 7 because the agents do not know what physical state they are actually in until they can observe the outcomes of their actions at that time. Since each physical state has a different resource subtree with a different set of value leaves, knowledge of physical state is required to determine the agents' reservation prices. In the deterministic case considered here, how-ever, each agent's physical state at Time = 7 can be deduced as soon as the first auction attains equilib-rium. For example, if Agent 3 purchases contracts for four units of processing on Machine M l during the first interval, then it is known with certainty that the state variable OpStatus(Opl) = complete at the end of the interval since the operation requires precisely four units of processing. 5.4.1 R E S U L T S OF T H E S E Q U E N T I A L A U C T I O N Figure 5.12 shows the resource allocations at equilibrium for the four stages of the sequential auction. In the first stage of the auction, each agent has the ability to alter its current expected value by buying contracts for production resources in the time interval under consideration. However, production resources beyond Time = 6 are not priced in the first stage and therefore the agents must rely on their terminal reward estimates (recall Section 4.2.3) to guide their decisions. Consider, for example, the case in which Agent 1 owns no resources in the interval Time = [1, 6]. The expected value of being in Time = 7 without any processing is (according to the agent's policy tree) $46. The expected value reflects the agent's beliefs about its ability to purchase resources in the future, complete its processing before any hard deadlines, and receive its reward for shipping (i.e., the product's transfer price). Natu-rally, the expected value is net of both holding cost and the cost of buying contracts for production resources. Being in a different physical state at the end of the auction interval induces a different expected value. If, at the end of the first interval, Agent 1 has undergone two units of processing on M l and has thereby completed its first operation, its expected value jumps to $58. As a result, Agent 1 is willing to pay any amount up to $12 to procure contracts for two units of processing M l in the first stage of the auction. Similarly, if the agent can make its best-case purchase in each time unit so that it has started its third and final operation by the end of the interval, its expected value is $82. O f course, if the esti-196 Auction time horizon Machine Allocation at time t 1 2 3 4 5 6 initial allocation M1 M2 M3 1 to 6 time units M1 1 1 2 2 2 2 M2 1 1 1 M3 1 7 to 11 time units 7 8 9 10 11 M1 Q 3 3 3 M2 2 2 2 3 M3 1 1 1 2 2 12 to 16 time units 12 13 14 15 16 M1 M2 M3 2 2 2 3 3 17 to 24 time units 17 18 19 20 21 M1 M2 M3 3 FIGURE 5.12 Equilibrium allocations of resources at the end of the auctions for the rolling horizon case: Since the agents are "nearsighted", they do not converge on the minimum cost schedule. mates of future prices are not very good or if the prices are not stable over time, the accuracy of the rolling horizon/sequential auction approach will suffer. The other agents in the auction face their own cost and reward structures; however, they have access to the same historical price information and, in this example, face the same per-unit holding cost ($1) as Agent 1. Under these circumstances, the market converges on a straightforward SPT allocation 197 over the first auction interval: Since it costs Agent 1 $4 to wait for Agent 2 or Agent 3 on Ml, but it only costs the other agents $2 to wait for Agent 1 on the same machine, Agent 1 is willing to pay the most to go first. 5 . 4 . 2 T H E P R O B L E M O F N E A R S I G H T E D N E S S Although the agent's estimates of prices for resources in the future (i.e., beyond the current planning horizon) impact its willingness to pay for resources in the current auction, a problem occurs because future prices are considered only in an aggregate way. In this sense, bidders in a sequential auction are nearsighted: The agents will schedule operations optimally in the current auction stage without being able to reason at a specific level about the downstream impacts of their early decisions. To illustrate the problem, consider the decision whether to schedule J2 or J3 following JI in the first stage of the auction. Since both J2 and J3 require all the remaining time on Ml in the interval Time = [1, 6], both agents are willing to pay the same for the contracts. To make the example interest-ing, assume that Agent 2 bids first and purchases the resources, as shown in Figure 5.12. However, what the market does not know in the first stage of the auction is that Agent 3's second operation only requires a single unit of processing time whereas Agent 2's second operation requires three units of processing time. Thus, had the first auction stage been slightly longer (e.g., over the interval Time = [1, 8]), Agent 3 would have recognised its willingness to pay more to be scheduled first on Ml. As it turns out, by scheduling J2 first, the total cost of the schedule ($40) is higher than the known minimum cost ($39). 5 . 4 . 3 I N T E R P R E T A T I O N O F B E N C H M A R K R E S U L T S The conclusion to be drawn from Figure 5.12 is that in the infinite horizon case, the use of rolling planning horizons introduce another source of approximation and can therefore impact the optimal-ity of the final solution. However, even when the planning horizons are short relative to the makespan of the schedule (such as the case considered here), the rolling horizon/sequential auction approach appears to provide reasonably good solutions. One reason for this is that the market is flex-198 ible and is not necessarily limited to permutation schedules. To illustrate, consider the simple two-machine, two-job problem shown in Table 5.1 below: TABLE 5.1: A problem in which nearsightedness is potentially costly. Job Processing time for Op1 Processing time for Op2 J1 5 100 J2 6 2 Assuming that the first auction stage is six time units long, the agent representing JI is willing to pay more and is therefore scheduled first on M l . If a strict permutation schedule is used, the decision to schedule JI first would be extremely costly since J2 would have to wait until Jl's very long second operation is complete before being processed on M2. In this worst-case outcome, the total cost of the schedule (assuming a holding cost per-unit-time of $1) is $105 + $107 = $212. In contrast, the cost of the permutation schedule in which J2 is sequenced first is only $111 + $8 = $119. A n important feature of the market is that Agent 2 eventually recognizes the scheduling error made in the first stage of the auction. As the sequence in Figure 5.13 shows, once J2 has been processed on M l , it can preempt JI on M2. Thus, although agents in sequential auctions are prone to make near-sighted errors, they are also capable of "recovering" from the errors in subsequent auction stages. In this example, the total cost resulting from the allocation shown in Figure 5.13 is $14 + $107 = $121. Although this result is suboptimal, it is still a "good" schedule relative to the worst-case allocation selected in the first stage of the auction. In the problems considered in this thesis, setup and changeover costs are assumed to be zero. In cases in which there are significant changeover costs, preemption and non-permutation schedules require consideration of these costs. 199 Auction time horizon Machine Allocation at time t 1 2 3 4 5 6 initial allocation M1 M2 1 to 6 time units M1 1 1 1 1 1 2 M2 1 7 to 12 time units 7 s 9 10 11 12 M1 2 2 2 2 2 M2 1 1 1 1 1 1 13 to 18 time units 13 14 15 16 17 M1 M2 2 2 1 1 1 1 FIGURE 5.13 An example of a problem in which the market "recovers"from nearsighted mistake in an early stage of the auction: In the third stage of the auction, the market recognises its sequencing error and J2 preemptsJI on M2. 200 CHAPTER 6: CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE RESEARCH The overall vision of this program of research is to provide decision makers with a means of exploit-ing powerful but inexpensive computing devices to solve large, complex problems. For example, in a manufacturing environment in which the processing of 300 jobs needs to be coordinated at any given time, a decision support system consisting of 300 agents running independently and simultaneously on a large number of computers could be responsible for determining the allocation of scarce pro-duction resources to competing uses. The agents, by constantly reacting to new information and seeking to further their own self interest would unknowingly drive the system as a whole to an effi-cient outcome. In environments such as manufacturing, allocative inefficiency creates an important source of dead-weight social loss. Indeed, the magnitude of the gains realizable through better management of man-ufacturing resources is reflected in the corporate mantra of 12 Technologies Inc., a provider of advanced planning systems (APS) software for manufacturing: "l2 Technologies will add $50 billion of value, in growth and savings, for our customers by the year 2 0 0 5 . I n short, better decision mak-ing within manufacturing and other sectors of the economy can generate significant economic bene-fits. The market-based approach described in this thesis is directed at achieving this end. 6.1 CONCLUSIONS The research question addressed in the thesis is: Is it possible to generate optimal solutions to large resource allocation problems using electronic markets and artificial rational agents? The framework for market-based resource allocation that emerged as a response to the research question attempts to satisfy two seem-ingly contradictory design objectives: fidelity to existing economic and operations research theory and applicability to real-world, industrial-scale problems. With these objectives in mind, the methodology used to address the research question can be summarized as follows: 1. Identify the theoretical foundations of the approach — In this research, a literal interpre-tation of well-known economic theory is used to motivate and guide the entire framework. The key prediction provided by Walrasian microeconomics is that a socially optimal outcome 1 S a n j i v S i d h u , C E O a n d f o u n d e r o f 12 T e c h n o l o g i e s I n c . , a s q u o t e d o n t h e l2.COM w e b s i t e . 201 can be attained by permitting agents to selfishly maximize their own utility. The practical ben-efit of formulating the agent system in this way is that the price mechanism permits the agent-level problems to be decoupled and solved independently. Thus, economic metaphor and theory is used as a means of achieving distributed computation. 2. Develop an overall framework for market-based multi-agent planning — The three phases of the proposed framework are decomposition of the large problem into agents, rational choice at the agent level, and aggregation of the agent-level solutions using a market. The theoret-ical foundations of the approach—specifically, the first fundamental theorem of welfare eco-nomics—imposes strict preconditions on the design choices made in each phase. 3. Select appropriate techniques for each phase of the framework — Given the precondi-tions of the first theorem, techniques from operations research, artificial intelligence, and eco-nomics are identified, refined, and synthesized to provide a concrete implementation of the the proposed framework. 4. Evaluate the performance of the implemented system with respect to the design objectives — With respect to the first design objective (fidelity to existing economic theory), the prototype system is used to demonstrate the general feasibility of satisfying the precondi-tions imposed by the first theorem: a. Rationality — A t the agent-level of decision making, variations of well-established techniques from operations research and computer science are used to implement agents that are capable of generating rational policies in the face of uncertainty. The result of the planning stage is a preference ordering over resource bundles that is transi-tive and complete on one hand and compact on the other. b. Efficiency — The auction protocol used in the aggregation phase achieves market completeness by permitting agents to bid on bundles of resources. The protocol is shown to be capable of finding the correct equilibrium given various initial conditions in very small number of auction rounds. Although the simple market protocol devel-oped in the thesis does result in complete search over the entire space of possible con-tracts, the results indicate that incomplete search can lead to good when there is a high degree of substitutability among resources. With regards to the second design objective (applicability to real-world, industrial-scale prob-lems) there is clearly much work to be done before strong claims of practicality can be made. 202 As discussed in the section on future research later in this chapter, more sophisticated proof-of-concept prototypes are required to demonstrate the benefits of market-based dis-tributed computation for real manufacturing problems. 6 . 1 . 1 C O N T R I B U T I O N S OF T H E T H E S I S The primary contribution made by the thesis is the development of a novel framework for using ' agents and markets to solve large stochastic optimization problems. The framework is concrete in the sense that in each phase, the design challenges are identified, solution techniques are proposed, and an implemented system is used to evaluate the proposed solutions. To the best of the author's knowl edge, no other system for multi-agent resource allocation has been constructed entirely on a founda-tion of economic theory. In addition, to the best of the researcher's knowledge, no comparable implementations of viable rational agents (strictly defined) and a viable combinational auction have been reported in the literature. In the context of manufacturing planning, the proposed framework offers a number of important benefits over existing approaches. These benefits are enumerated and described in Table 6.1. In addition to the primary contribution, a number of secondary contributions have been made by the thesis, as summarized in Table 6.2. The secondary contributions are novel techniques or refinements to existing techniques that were introduced to achieve the primary objectives of the market-based sys-tem. For example, the requirement to implement market-based agents necessitated a number of incre-mental changes to the SDP algorithm proposed by Boutilier et al. ([Boutilier, Dean and Hanks, 1999], [Dearden and Boutilier, 1997], [Boutilier, Dearden and Goldszmidt, 1995]). As a consequence, the SDP algorithm was used to solve very large MDPs (i.e., problems consisting of more than 10 1 4 states). A secondary contribution was also made in the field of auction theory in the form of a novel heuristic protocol for combinational auctions. 6 . 1 . 2 L I M I T A T I O N S OF T H E T H E S I S As discussed in Section 6.1, the principal limitation of the thesis is that it falls short of making a con-vincing argument that the proposed market-based framework is appropriate for industrial-scale prob-lems. The limiting factor in this regard is clearly the size of the agent-level problems that can be solved using the current prototype. For example, since the original benchmark problem requires a 203 TABLE 6.1: Benefits of the market-based approach to manufacturing planning. Benefit Description scalability Markets are inherently flexible since the complexity of the decision process of each agent representing a part is independent of the number of other part-agents in the system. flexibility S ince the market runs continuously, agents can join or leave the system at any time. A high-priority product can enter the system at any point and schedules will be adjusted accordingly. Moreover, the agents are "contingency planners" and can react very quickly to changes in their environments. modularity T h e agent metaphor provides a modular and flexible means of decompos ing and modeling large, complex manufacturing systems. sensitivity to increases in computational power S ince the agent-level problems are solved in parallel, any performance increase that permits larger problems to be solved or longer planning horizons to be considered has a significant impact on the overall quality of the solutions generated by the market-based system. optimality T h e market-based approach provides an approach that is more realistic than M R P - b a s e d approaches and more reliable (in terms of optimality) than heuristic-based approaches . A s the quality of the information provided to the planners increases, so to d o e s the global utility of the outcome. price information T h e equilibrium prices attained in the market represent the value of the resource to the manufacturing system at a particular point in time. Unlike the standard cost estimates typically used in cost accounting, the market prices take into account all costs and rewards (including opportunity costs) faced by all of the agents in the system. T h e information provided by the resource prices can be used for long-term planning activities, such as capacity planning (e.g., purchasing new machines to increase the supply of production resources that prices show to be in high demand) . non-deterministic actions and events In the physical world of manufacturing, outcomes are subject to uncertainty. In this system, it is possible to represent outcomes using arbitrary probability distributions. This feature is especial ly important in environments in which low probability/high impact outcomes must be factored into decis ions. multi-attribute utility functions T h e agent utility functions are more flexible than the single-attribute criteria typically used by conventional O R schedul ing techniques. integrated framework for decision support In the benchmark problem used here, only two types of agents are used: part-agents and an auctioneer. However, the market framework permits many different types of agents to interact. For example , it is possible to introduce machine-agents to represent the interests of production machines. In this way, other decision problems (such as preventative maintenance planning) can be incorporated into the system. 204 TABLE 6.2: Secondary contributions made by the thesis. Contribution Feature Description structured dynamic programming (Section 4.2) non-propositional and "-valued branches State variables are not restricted to binary values. W h e n certain values of a state variable can be grouped together, a "-valued branch is introduced to reduce the bushiness of the policy tree. assertion and constraint constructs Assert ions and constraints are introduced as a m e a n s of specifying additional semant ic content about the domain being modeled . By using assert ions and constraints, the bushiness of the policy tree is reduced rolling planning horizons Through a process of backward induction, planning problems in the future are solved and then col lapsed into terminal reward estimates for planning problems in the present. In this way, discontinuous "end effects" are avoided. representation of action-independent events T h e IncrementClock action permits action-independent events to be represented and reasoned about during the course of planning. protocol for combinational auctions (Section 4.3) extraction private values from policy tree By sorting the resource nodes of a policy tree to the bottom, the private values for resources and combinations of resources can be determined. reverse auction design T h e use of a R F Q - b a s e d protocol greatly. simplifies the informational requirements of the combinational auction since it permits the calculation of just-in-time ask prices. marginal utility formulation and price convention T h e total monetary endowment of e a c h agent is ignored and only the marginal benefit from e a c h candidate transaction is considered. This simplifies the auction and eliminates sources of market failure such as speculat ion. two-phase bidding protocol T h e two-phase protocol permits agents to engage in n-way (rather than merely 2-way) transactions. priority list T h e priority list for assigning next-best states helps to limit the amount of recursion that occurs during n-way transactions. 205 decision horizon of at least 17 units of time, it had to be decomposed and solved using a combination of rolling planning horizons and sequential auctions. Although sequential auctions are shown to lead to "good" schedules, they do not provide the same guarantee of optimality that is provided in the sin-gle-horizon case. • ' A large part of the problem is the current prototype. In development of the prototype system, a num-ber of binding (but poorly documented) limitations of the V I S U A L B A S I C 6.0 development platform were encountered. These limitations imposed upper bounds on problems size that could be solved that had little to do with computation time or memory. O f course, prototype performance is not the only issue. As the results in Chapter 5 show unequivocally, decision-theoretic planning (with or with-out the SDP algorithm) is plagued by the curse of dimensionality. The failure to achieve enough computational leverage with the coping strategies presented in the thesis does not in itself invalidate the market-based approach. The market and the agents participating in the market are separate computational issues and thus any method of solving the agent-level problems that satisfies the preconditions of rationality can be used. Decision-theoretic planning was used in the thesis—despite its well-documented scalability issues—because of its elegance and conceptual fit with the model of rationality used in the economics literature. The critical research challenge that lies ahead is to reexamine the decision-theoretic planning formulation to determine whether it should be abandoned in favor of one of many alternative approximate approaches or whether there are.other sources of computational leverage within the decision-theoretic approach that can be exploited. This issue is addressed in greater detail in Section 6,2.1.2. It is important to note that, in this early stage of research, approximation techniques and heuristics for the agent-level problem have been avoided to the greatest extent possible. The obvious reason for avoiding approximation is that it nullifies the strong claim of global optimality provided by the com-bination of the first fundamental theorem. A second reason for de-emphasizing practical results and focusing on the theoretical foundations of the approach is that doing so helps highlight issues that— for whatever reason—have not been widely addressed in the distributed AI literature. For example, until recently, there has been surprisingly little research activity around the issue of combination auc-tions in resource allocation environments . The same can be said about the issue of estimating hori-zon effects (or "end effects") for agent-level planning problems. If approximation techniques were 206 used more extensively in this thesis, it is unlikely that either of these important issues would have come to the fore. A second limitation of the market-based approach is that heuristics and approximation, are required to address very large, industrial-scale problems. The sources of approximation introduced by the framework are summarized in Table 6.3. TABLE 6.3: Sources of approximation in the market-basedframework Source Description Mitigation Strategies Rolling planning horizons Rolling planning horizons are required because increasing the length of the planning horizon has an exponential effect on the problem size (primarily through variables of the form Contract^M|, Tj)). However, by using historical price information to estimate the value of being in a certain physical state at the end of a shorter planning horizon, the problem of nearsightedness (see Sect ion 5.4.2) is introduced. T h e effects of the approximation can be minimized by exploiting increases in algorithmic efficiency and computational power at the agent level to plan over longer horizons. However, the effectiveness of this approach is bounded by the exponential growth for the problem. In addition, finer-grained reasoning by agents about future prices of resources could lead to more realistic estimates of terminal rewards. Incomplete search in the combinational auction protocol Shal low or medium search depth is used in the combinational auction to eliminate or reduce the amount of recursion that occurs in response to each request for quotation. B y failing to consider every possible transaction, there may be instances in which the equilibrium attained in the auction is economical ly inefficient. For certain problem d o m a i n s — s u c h as manufacturing schedul ing—the high degree of resource substitutability may mean that the impact of incomplete search is minimal. In other domains , standard approaches to avoiding local maxima (e.g., simulated anneal ing, tabu search , and so on) may lead to better results. 6.2 RECOMMENDATIONS FOR FUTURE RESEARCH In this section an number of areas for further research are identified based on issues that emerged during the course of the thesis. Three broad categories of future research topics are identified: possi-ble improvement of the computational characteristics of the framework; enhancements to the model-Notable exceptions include the work of Sandholm (e.g., [Sandholm, 1998]) and United States patent 5,640,569: "Diverse Goods Arbitration System and Method for Allocating Resources In a Distributed Computer System," which was developed by employees of AGORICS, Inc. and is assigned to SUN MICROSYSTEMS. [Miller et al., 1997]. 207 ing language to increase its expressiveness and ability to accurately represent industrial scheduling environments; and, application to other research domains in which the techniques developed in the thesis can be used to address unresolved problems. 6 . 2 . 1 C O M P U T A T I O N A L ISSUES Although the SDP algorithm is shown in Chapter 5 to yield a significant decrease in the effective size of decision-theoretic planning problems, the growth of SDP problems remains exponential in the dimensionality of the agent-level problem. To determine conclusively whether the SDP delays expo-nential growth long enough so that agent-level problems of a "useful size" can be solved, a new pro-totype based on different technologies and techniques needs to be constructed. The objective of developing a second prototype is to be able to test the approach on a small real-world problem. 6.2.1.1 P L A T F O R M A N D P R O T O T Y P E E N H A N C E M E N T S V I S U A L B A S I C and M I C R O S O F T ' S database object models (DAO and A D O ) are the primary technolo-gies used to implement the existing prototype system. Although V I S U A L B A S I C is not known for its elegance as a programming language, version 6.0 provides support for object-based programming and a convenient development environment. Given that the SDP algorithm was far from fully speci-fied at the start of the prototype phase, V I S U A L B A S I C was an excellent language for rapid and itera-tive development. More importantly, V I S U A L B A S I C provides good support though its database object models for programmatic access to both the J E T and SQL S E R V E R databases. Access to a high-per-formance relational database was determined to be a critical issue early in the development effort due to the large amounts of information (e.g., hundreds of thousands of tree nodes) that would have to be manipulated and persisted to disk. Unfortunately, V I S U A L B A S I C retains many vestigial structures and was ultimately determined to be ill-suited for developing programs that create and destroy many objects during execution. Specifically, it appears that V I S U A L B A S I C has an upper bound on the number of objects that can be created within a single scope, even though the objects may be destroyed and garbage collected in the pre-scribed manner. The prototype implementation ran up against this upper bound on a routine basis and required many time consuming and inelegant work-arounds to achieve the results presented in Chapter 5. 208 Given that the core of the SDP algorithm is now known and stable, the next prototype can be engi-neered from the bottom up for performance and stability. Moreover, better data structures and inter-nal algorithms can be used to speed up critical operations. Such an implementation should achieve a performance increase over the current prototype of at least two orders of magnitude while avoiding the pitfalls associated with the V I S U A L B A S I C runtime environment. 6.2.1.2 A D D I T I O N A L C O M P U T A T I O N A L L E V E R A G E W I T H I N T H E D T P A P P R O A C H The SDP planning algorithm described in Section 4.2 performs a complete search over the set of fea-sible actions in each state to determine the agent's optimal policy. Although the search algorithm (in this case, policy iteration) is extremely efficient, it is efficient only because the problem is formulated as a Markov Decision Process. The cost of the Markovian formulation is that all the relevant informa-tion required to make a decision is encoded within each state's description. As a consequence, the state space of the agent level problems can grow to become unmanageably large. One means of reducing the need for complete search over the agent's state space is to make better use of constraints, as discussed briefly in Section 4.2.2.5. Although assertions were used to yield dramatic decreases in the bushiness of the policy trees, constraints were not exploited to the same extent. A n area of research that is potentially worth exploring is the incorporation of constraints into the SDP algorithm at a fundamental level. Such constraints would limit growth of the policy tree to states that are likely to be feasible. This type of reachabilty analysis has been used by other researchers (e.g., [Boutil-ier, Brafman and Geib, 1998], [Dean et al., 1993])to greatly increase the computational feasibility of solving large stochastic optimization problems. One obvious application of constraints to policy trees is to incorporate information about the agent's actual state at the start of the planning process. For example, i f no operations have been started at Time = 1, then the values of certain state variables are bound to known values, for example:-• OpStatus(Opj) = incomplete.for all operations, Op;, . • ElapsedTime(OPi) = 0 for all operations, Opj, • Shipped = No. The potential gains from application of this type of constraint are represented graphically in Figure 6.1. The policy trees generated by the agents in this research tend to be bushiest at the start of 209 the time interval and exponentially less bushy as the decision horizon approaches. Intuitively, this is because the agent has fewer possibilities to consider when planning over the last couple of time units compared to planning over the entire decision horizon. However, by knowing the initial state of the agent and propagating the constraints forward to prune any branches that contain combinations of state variables that are impossible to reach from the current state, the bushiest part of the tree can be eliminated. known initial state high-utility states FIGURE 6.1 The impact oj initial state constraints on the si%e of a policy tree: The shaded area shows how the bushiness of the policy increases as it grows backwardsfrom its objectives. The lines radiatingfrom the known initial state demarcate "possible" future statesfrom "impossible"future states. Based on this reasoning, it is hypothesized that a hybrid algorithm that draws on concepts from both decision-theoretic planning and constraint satisfaction could facilitate the solution of much large agent-level problems. 6.2.1.3 V A R I A B L E - G R A N U L A R I T Y O F T I M E One of the results from Chapter 5 is that it is possible to generate "optimal" solutions to multi-machine problems if the granularity of time is coarse enough to permit the problem to be addressed within a single planning horizon. However, a notion of time that is too coarse-grained can lead to its own sources of imprecision. For example, if the action descriptions for an agent's process-ing operations provide estimates of completion time to the nearest 10-minute interval, then an atomic unit of time that maps to an eight-hour shift renders the completion time information useless. 210 To strike a balance between the loss of optimality due to short planning horizons and the loss of opti-mality due to an excessively coarse-grained notion of time, it may be possible to use a variable granu-larity approach to time. Under the variable granularity approach, the next n units of time are fine grained (the short term); the subsequent m units of.time map to much longer intervals of calendar time (the medium term); finally, horizon estimates are used (as always) for the long term. One possi-ble advantage of such an approach is that it could permit agents to avoid the type of nearsighted sequencing errors encountered in Section 5.4 without requiring a massive increase in the size of the agent-level problems. 6.2.1.4 S U P P O R T F O R L E A R N I N G In the P-STRIPS language, uncertainty is modeled by associating probabilities with action outcomes. Thus, execution of an action of the form Process(Opj, Mj, T|<) could lead to a number of outcomes including completion of the operation (effects list = {OpStatus(Opj) = complete, NewTimeUnit = yes}) or continuation of the operation (effects list = {ElapsedTime(Opj) = ElapsedTime(Opj) + 1, NewTimeUnit = yes}). As described in Section 4.1.8.2, the probabilities associated with the outcomes may be conditioned on one or more state variables. In the case of Process(Opj, Mj, T|<) actions, the out-comes are typically conditioned on the ElapsedTime(Opj) to reflect the intuition that the cumulative probability of completing a manufacturing operation increases as processing time is invested into the operation. In other circumstances, a much larger set of conditioning state variables might be used, including the identity of the machine operator, the day of the week, the phase of the moon, and so on. As the number of relevant conditioning variables grows, the probabilities associated with the out-comes should approach 1.0. That is, there should exist a set of variables that is able to explain virtu-ally all the observed variance in processing outcomes. The downside of adding this set of explanatory variables to the action descriptions is that more variables lead to bushier policy trees. Recall that con-ditioning variables appear in P-STRIPS as outcome discriminants. Since outcome discriminants are exhaustive and mutually exclusive, the addition of a single state variable with a domain of three values triples the size of the representation. Given the inherent trade-off between the compactness of the The example assumes that each value in the domain contributes some information to the outcome discriminant. If this is not the case, the bushiness of the tree can be minimized by grouping together similar domain values using the *-branch construct introduced in Section 4.2.2. 211 representation and the accuracy of its predictions, there is an opportunity to draw on theory and tech-niques from machine learning to strike