Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A market-based approach to resource allocation in manufacturing Brydon, Michael 2000

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2000-565149.pdf [ 18.34MB ]
JSON: 831-1.0089755.json
JSON-LD: 831-1.0089755-ld.json
RDF/XML (Pretty): 831-1.0089755-rdf.xml
RDF/JSON: 831-1.0089755-rdf.json
Turtle: 831-1.0089755-turtle.txt
N-Triples: 831-1.0089755-rdf-ntriples.txt
Original Record: 831-1.0089755-source.json
Full Text

Full Text

A M A R K E T - B A S E D APPROACH T O RESOURCE A L L O C A T I O N IN MANUFACTURING by Michael Brydon M.Eng., Engineering Management, Royal Military College of Canada, 1993 B.Eng., Engineering Management, Royal Military College of Canada, 1990 A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in T h e Faculty of Graduate Studies Commerce and Business Administration We accept this li^siis^s<oliforming to the required standard  T h e University of British C o l u m b i a October, 2000 © Michael Brydon, 2000  In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  Faculty of Commerce and Business Administration The University of British Columbia Vancouver, Canada p  m  11  0<-A- gcj  ABSTRACT  In this thesis, a framework for market-based resource allocation in manufacturing is developed and described. The most salient feature of the proposed framework is that it builds on a foundation of well-established economic theory and uses the theory to guide both the agent and market design. There are two motivations for introducing the added complexity of the market metaphor into a decision-making environment that is traditionally addressed using monolithic, centralized techniques. First, markets are composed of autonomous, self-interested agents with well defined boundaries, capabilities, and knowledge. By decomposing a large, complex decision problem along these lines, the task of formulating the problem and identifying its many conflicting objectives is simplified. Second, markets provide a means of encapsulating the many interdependencies between agents into a single mechanism—price. By ignoring the desires and objectives of all other agents and selfishly maximizing their own expected utility over a set of prices, the agents achieve a high degree of independence from one another. Thus, the market provides a means of achieving distributed computation. To test the basic feasibility of the market-based approach, a prototype, system is used to generate solutions to small instances of a very general class of manufacturing scheduling problems. The agents in the system bid in competition with other agents to secure contracts for scarce production resources. In order to accurately model the complexity and uncertainty of the manufacturing environment, agents are implemented as decision-theoretic planners. By using dynamic programming, the agents can determine their optimal course of action given their resource requirements. Although each agent-level planning problem (like the global level planning problem) induces an unsolvably large Markov Decision Problem, the structured dynamic programming algorithm exploits sources of independence within the problem and is shown to greatly increase the size of problems that can be solved in practice. In the final stage of the framework, an auction is used to determine the ultimate allocation of resource bundles to parts. Although the resulting combinational auctions are generally intractable, highly optimized algorithms do exist for finding efficient equilibria. In this thesis, a heuristic auction protocol is introduced and is shown to be capable of eliminating common modes of market failure in combinational auctions.  11  TABLE OF CONTENTS Abstract  1 1  List of Tables  viii  List of Figures  i  Dedication  x  x 1 1  Chapter 1: Introduction to the Problem  1  1.1 1.2 1.3 1.4  Research Question Existing Approaches to the Manufacturing Problem Overview of the Market-Based Approach Proof-of-Concept Criteria 1.4.1 Decomposition and Global Optimality 1.4.2 The Computational Feasibility of Agent-Level Rationality 1.4.3 Aggregation Using Competitive Markets 1.4.4 Solution of a Benchmark Problem 1.5 Summary of Contributions 1.5.1 Primary Contribution 1.5.2 Secondary Contributions 1.6 Overview of the Thesis Chapter 2: Planning arid Scheduling In Manufacturing 2.1 The Manufacturing Problem 2.2 Conventional Decomposition 2.2.1 Rough-Cut Planning 2.2.2 Detailed Scheduling 2.2.3 Achieving Computational Tractability 2.2.4 Limitations of Conventional Decomposition Dependence Between Stages Constraints versus Decision Variables Imprecision of M R P 2.3 Heuristic Approaches 2.4 The Emergence of E R P Systems 2.5 A Benchmark Problem 2.5.1 A Special Cases of the Three-Machine Job 2.5.2 Limitations of the Exact Approach 2.5.3 The Meaning of Optimality 2.6 Criteria for Comparison Chapter 3: Theoretical Foundations and Prior Research 3.1 Scheduling  iii  1 2 3 .6 6 7 8 9 10 10 10 11  •  13 13 13 14 15 16 17 17 18 18 19 20 21 23 25 25 27 29 29  3.1.1 Exact Versus Heuristic Approaches 3.1.2 Scheduling In O R Exact Approaches Heuristic Approaches 3.1.3 Scheduling and Classical Planning in A I Knowledge Representation The Situation Calculus The Strips Representation Classical Planning and Refinements 3.1.4 Decision-Theoretic Planning Probabilistic Representations of Actions Finding Optimal Policies • D T P Observations The Curse of Dimensionality 3.1.5 Constraint-Based Planning 3.1.6 Recent Advances in Planning 3.2 Markets and Equilibrium 3.2.1 Rationality 3.2.2 Equilibrium. . Individual Consumers and Budgets Pareto Optimal Allocations in an Edgeworth Box Economy Equilibrium Prices '. 3.2.3 Auction Theory : Auction Forms Bidder Characteristics Equivalences of Forms 3.3 Distributed Artificial Intelligence 3.3.1 Environment 3.3.2 Coordination Mechanisms Hierarchy Social Laws/Standard Operating Procedures Team Utility Functions Negotiation Markets 3.3.3 Agent Capabilities 3.3.4 Classification of Approaches Issues of Fit Classification of the Proposed Approach Chapter 4: A Framework for Market-Based Resource Allocation 4.1 Problem Decomposition 4.1.1 Approaches to M D P Decomposition 4.1.2 Market-Based Decomposition 4.1.3 Definition of the Global Problem 4.1.4 Modeling Machines Properties of Machine-Agents  iv  30 31 32 34 36 37 38 40 41 42 44 46 47 48 49 50 51 52 54 54 55 57 58 59 60 61 61 62 63 63 64 65 66 67 69 70 71 72 74 76 77 79 80 81 82 The Decision Problem for Machine-Agents Planning Without Machine Agents 4.1.5 Modeling Parts • The Life-Cycle of a Part-Agent Mating Parts and Assemblies The Decision Problem for Part-Agents 4.1.6 Global Utility and the Rules of Money Quasi-Linear Utility Functions Risk Neutrality Additive Utility Common Numeraire From Pareto Optimality to Global Optimality 4.1.7 Caveats and Scope 4.1.8 A Knowledge Representation Language for Agents Sources of Uncertainty Elements of the Knowledge Representation Language Events and Time Terminal States 4.2 Agent-Level Rationality 4.2.1 Structured Dynamic Programming Policy Trees Relevance Relationships Policy Improvement Computational Properties of the SDP Algorithm 4.2.2 Growing Policy Trees Infrastructure for Fixed Actions Reaping Rewards Policy Mapping and Evaluation Policy Improvement Assertions and Constraints 4.2.3 Rolling Planning Horizons Building Horizon Estimates Collapsing Resource Subtrees 4.2.4 Bidding Strategies for Agents The Resource State Space Determining Reservation Prices The Role of the Numeraire Good Price Discovery Versus the Price Convention 4.3 Aggregation 4.3.1 Setting the Stage: A n Example Problem 4.3.2 Choice of Auction Form 4.3.3 Complementaries and Substitutes 4.3.4 The Challenges of Combinational Auctions Consolidation of Supply Consolidation of Demand Compound Bids  v  83 84 85 85 86 87 88 88 89 90 91 91 92 93 94 95 108 112 113 115 115 116 117 119 120 120 121 123 125 128 131 133 134 136 136 137 143 144 144 145 146 147 150 151 151 153 Next-Best States Deadlock Stability of Equilibria 4.3.5 Choice of Market Protocol Elements of the protocol Illustration Summary of Example Problem  155 156 158 159 161 164 170  Chapter 5: E m p i r i c a l Results 5.1 Achieving Agent-Level Rationality 5.1.1 Conventional Stochastic Optimization 5.1.2 Coping Strategies 5.1.3 Computational Leverage Planning Horizon Stochastic Processing Times. . . Number of Operations 5.1.4 Interpretation of Results 5.2 Distributed Combinational Auctions 5.2.1 Searching for Next-Best States 5.2.2 Complexity Analysis 5.2.3 Bidding as Hill Climbing 5.3 Evaluating the Performance of the Market 5.3.1 Empirical Results 5.3.2 Extensions to Multiple-Machine Problems 5.3.3 Flexibility of the Market Approach 5.3.4 Interpretation of Results 5.4 Solution to the Benchmark Problem 5.4.1 Results of the Sequential Auction 5.4.2 The Problem of Nearsightedness 5.4.3 Interpretation of Benchmark Results  172 172 173 174 175 176 178 180 180 182 182 183 184 186 186 188 190 194 195 196 198 198  Chapter 6: Conclusions and Recommendations for Future Research 6.1 Conclusions . 6.1.1 Contributions of the Thesis 6.1.2 Limitations of the Thesis 6.2 Recommendations for Future Research 6.2.1 Computational Issues Platform and Prototype Enhancements Additional Computational Leverage within the D T P Approach Variable-Granularity of Time Support for Learning 6.2.2 Agent Modeling and Problem Formulation Support for Interdependent Parts Support for Setups and Other Externalities 6.2.3 Other Areas of Application Structured Dynamic Programming  201 201 203 203 207 208 208 209 210 211 212 213 214 214 215  vi Combinational Auctions References  216 217  LIST OF TABLES Table 2.1:  Processing times for the benchmark problem  22  Table 2.2:  Processing times for the transformed benchmark problem  24  Table 2.3:  Holding costs for the optimal benchmark and alternative schedule  26  Table 3.1:  A 2 X 2 matrix for classifying approaches to scheduling based on discipline and degree of exactness  31  Table 3.2:  Common machine configurations and their standard notations  33  Table 3.3:  Classification of three classes of single-machine scheduling problems.)  34  Table 3.4: Table 4.1:  The probability of each outcome for the M o v e W e s t action State variables used to model a part agents  44 99  Table 4.2:  Determination of the state space for a three-operation three-machine task with a • decision horizon of 10 time units  114  Table 4.3:  Details of the example scheduling problem  145  Table 4.4:  A problem for which no equilibrium price exists  149  Table 5.1:  A problem in which nearsightedness is potentially cosdy  199  Table 6.1:  Benefits of the market-based approach to manufacturing planning  204  Table 6.2:  Secondary contributions made by the thesis  205  Table 6.3:  Sources of approximation in the market-based framework  207  Vlll  LIST OF FIGURES Figure 1.1:  The proof-of-concept pyramid for the thesis  7  Figure 2.1:  The conventional decomposition of the "manufacturing problem"  14  Figure 2.2:  A n optimal schedule for the benchmark problem  24  Figure 2.3:  A n alternative solution to the benchmark problem  26  Figure 3.1:  A n example of an optimal contingency plan for a probabilistic planner  43  Figure 3.2:  A model of rational decision making under uncertainty  53  Figure 3.3:  The Walrasian budget set  55  Figure 3.4:  Equilibrium in an Edgeworth box economy  56  Figure 3.5:  Classification of D A I systems with respect to the environment and coordination mechanism dimensions  70  Figure 4.1:  The market-based approach to decomposition  75  Figure 4.2:  Major milestones of the market-based approach  76  Figure 4.3:  Agent-based decomposition in a manufacturing environment  81  Figure 4.4:  A graphical representation of the elements of the knowledge representation 97  language for agents Figure 4.5: Figure 4.6:  The action representation hierarchy for P-STRIPS  100  Action preconditions specify conditions that must be true in the world for the action to be feasible  101  Figure 4.7:  Discriminants are used to identify the relevant features of the current state  102  Figure 4.8:  The execution of the action will lead to one of the outcomes with a known probability  103  Different effects lists are associated with each outcome  104  Figure 4.11: Reward trees conditioned on the value of NewTimeUnit  105  Figure 4.10: A n aspect is used to represent the passage of time in a temporal action  106  Figure 4.12: Action cost trees for holding costs'.  107  Figure 4.13: The schema for the Ship action  109  Figure 4.14: A n example of the use of P - S T R I P S to represent an action-dependent event  110  Figure 4.15: A partial action schema for the IncrementClock action  Ill  Figure 4.16: Aspects for action-independent events  113  Figure 4.17: Basic policy trees  117  Figure 4.18: The "major limbs" of a policy tree for a part agent  121  Figure 4.19: The core policy tree with rewards  122  Figure 4.9:  ix  Figure 4.20: The core policy tree with mapping information  124  Figure 4.21: A policy tree prior to improvement  125  Figure 4.22: The cumulative completion time distribution for operation Op3  126  Figure 4.23: A partial policy tree showing outcomes  127  Figure 4.24: A n example of an invalid combination of state variable values  129  Figure 4.25: A tree-based representation of assertions  130  Figure 4.26: A screen-shot showing a portion of a policy tree  132  Figure 4.27: The backward induction approach to generating horizon estimates  133  Figure 4.28: A partial subtree for Time =1  134  Figure 4.29: A policy tree sorted to enable extraction of price information  138  Figure 4.30: Indifference curves and initial endowments for the seller (a) and the buyer (b) . . . . 139 Figure 4.31: A n Edgeworth box representation of different exchanges  140  Figure 4.32: The optimal solution to the 3-job, 1-machine example problem  146  Figure 4.33: Tree-based representations of dependence between goods  149  Figure 4.34: The resource tree for Agent 3 depicts an all-or-nothing situation  152  Figure 4.35: A pricing scenario in which a Pareto efficient exchange requires consolidation of demand  ;,  152  Figure 4.36: A sub-optimal 2-job, 1-machine schedule  154  Figure 4.37: A tree-base representation of a compound transaction  154  Figure 4.38: The critical elements of the proposed market protocol  161  Figure 4.39: The initial (sub-optimal) resource allocation for the example problem  165  Figure 4.40: The allocation of resources following the purchase of [Tl, T2] by Agent 3  169  Figure 4.41: Two possible provisional transactions  170  Figure 5.1:  The curse of dimensionality for a part-agent  174  Figure 5.2:  The effect of the length of the planning horizon on the number of concrete and abstract states  177  Figure 5.3:  The probability that Op3 is complete after executing Process(Op3, M3, •)  178  Figure 5.4:  A comparison of deterministic and stochastic problems with the same number of concrete states The relationship between the number of operations in a job and the size of the abstract state space '  Figure 5.5: Figure 5.6:  The sequence of transactions used by the market prototype to attain an equilibrium outcome  x  179 180 187  Figure 5.7: Figure 5.8:  Figure 5.9:  Processing times and the optimal schedule for a coarse-grained three-machine problem Equilibrium allocations of resources for the coarse-grained three-machine problem  191  The completion time distribution and payoff function for the new job  192  Figure 5.10: The sequence required to attain equilibrium after the arrival of the new job Figure 5.11:  189  193  The optimal minimum cost solution to the benchmark problem (reproduced from Figure 2.3) 195  Figure 5.12: Equilibrium allocations of resources at the end of the auctions for the rolling horizon case  197  Figure 5.13: A n example of a problem in which the market "recovers" from nearsighted mistake in an early stage of the auction  200  Figure 6.1:  The impact of initial state constraints on the size of a policy tree  210  Figure 6.2:  Integration of machine learning (data mining) and production planning using the market-based framework  212  xi  DEDICATION  To my mother, for all her efforts. And to my wife Stephanie for giving me the things that matter most in my life: Gregory and Chloe.  0  xu  CHAPTER 1: INTRODUCTION TO THE PROBLEM Despite dramatic increases in computing power over the last half century, certain classes of routine decision problems remain too difficult to be solved on even the most powerful computers. Unfortunately, one such class of hard problems—the allocation of scarce resources to competing uses—con-' stitutes the foundation of many human decision-making activities. For example, governments collect taxes and redistribute resources in an effort to benefit society as a whole; firms allocate people, money, and other productive resources in an effort to maximize profit; individuals allocate their time and effort among a large number of competing demands in an effort to be happy. Generally, such allocation problems generate too many possible solutions to make the application of brute force methods feasible. As a consequence, the role of computers in solving hard resource allocation problems has traditionally been limited to one of decision support—computers do some of the work but the final decisions rely on human intervention and judgement [Keen and Scott Morton, 1978]. Given the many well-known computational limitations and biases in human decision making (e.g., [Kahneman and Tversky, 1979], [Damasio, 1994]), it is worthwhile to ask whether there is a way to restructure hard problems so that computers can generate solutions without undue reliance on human intervention or guidance.  1.1 RESEARCH QUESTION One possible means of restructuring a hard problem is to decompose it into many smaller pieces that can be solved independently. Accordingly, the research question addressed in this thesis is the following: If itpossible to generate optimal solutions to large resource allocation problems using electronic markets and cial rational agents'? The specific problem used to motivate and illustrate the thesis is manufacturing scheduling. However, the proposed techniques are generic to the broader class of resource allocation problems. The fundamental issue being addressed in this research is not how jobs should be assigned to machines, but rather how distributed computation can be used to side-step computational complexity. Hard problems arise whenever the complexity of the problem grows faster than the computational power of the machines used to solve the problem. A problem like manufacturing scheduling is computationally troublesome precisely because it is insensitive to Moore's Law . That is, although the 1  1  ^processing power of a computer may double every 18 months or so, the number of unique schedules in a job shop with n jobs increases by a factor of n\ when a new piece of production machinery is introduced. Thus, an algorithm that requires only a few seconds to yield an exact solution for a toy scheduling problem may require many centuries of deliberation to solve an industrial-scale version of the same problem. In such circumstances, dramatic increases in computational power do little to alter the basic infeasibility of the solution technique. Broadly stated, the goal of this thesis is to rethink the way in which large resource allocation problems are formulated and solved. The result is a problem-solving approach based on a market metaphor: large problems are decomposed into many smaller agent-level problems, the agents generate solutions to their own problems, and a market is used to combine the agent-level solutions into optimal  3  or near-optimal global-level outcomes. The fundamental advantage of the market-based approach is that conflict and resource dependency between agents is replaced by a price mechanism. The resulting inter-agent independence permits the global problem to be distributed over a large number of computers.  .  1.2 EXISTING APPROACHES TO THE MANUFACTURING PROBLEM Decision makers in manufacturing environments are faced with "the manufacturing problem". The manufacturing problem involves the allocation of production resources and the coordination of activities so that goods are produced in an efficient manner. Efficiency is defined in terms of a broad cost function that includes a number of important factors such as holding costs, lateness penalties, and the opportunity cost of using the resources in an alternative manner. In any real-world manufacturing environment, the manufacturing problem bears litde resemblance to the simple n-job, w-machine problems that are solved using exact algorithms in production management textbooks. Instead, real-world manufacturing is characterized not only by scale, but also by rapid change and uncertainty.  Moore's-Law was originally coined by Intel founder Gordon Moore in the mid-1960s to describe the growth in the number o f transistors (and thus its processing power) on a microchip. In its current incarnation, the law states that the power o f microchips doubles every 18 months [Raymond, 1994]. A n exact solution technique is guaranteed to provide an optimal solution to the problem. In this chapter, the terms "exact" and "algorithmic" are used interchangeably. In this thesis, "optimality" is defined with respect to a special global utility function. This issue is discussed in much greater detail in Chapter 4.  2  In the face of such complexity, algorithmic approaches to solving the manufacturing problem have little practical utility. To get around the computational difficulties created by the manufacturing problem, two approaches are used in practice. In the first approach, algorithmic techniques are applied to a greatly simplified version of the problem and an effort is made to coerce the real system to satisfy the assumptions of the simplified model. The most common example of this approach is materials requirement planning (MRP). In the underlying MRP logic, sources of complexity (such as non-deterministic lead times, high-priority orders, capacity constraints, and machine breakdowns) are assumed not to exist. These 1  simplifying assumptions permit the MRP system to generate a rough-cut schedule that is then used to generatefiner-grainedschedules for particular jobs and machines. Since the problem solved by the computerized system may be quite different from the real-world problem, the quality of the plans resulting from MRP systems is questionable [Turbide, 1993], [Nahmias, 1989]. In the second approach, a satisficing strategy based on heuristics is used. For example, an increasingly popular approach is to formulate factory-wide scheduling problems as constraint satisfaction problems (CSPs). High-level constraints such as precedence between operations, the capabilities of machines, and due dates are used to guide the search through the space of all possible schedules. A good schedule (i.e., a schedule that satisfies the constraints) can be found very quickly using CSP techniques. There are two general disadvantages of heuristic approaches. First, heuristics provide no guarantee that the solutions they generate are optimal. The second is that the quality of the solution often depends on the quality of inputs from users. For example in CSPs, the constraints placed on the problem interact in complex ways to define the feasible set of solutions. In many cases, however, the initial set of constraints does not admit any feasible schedules and trade-offs in the form of constraint relaxations must be made. However, knowing which constraints to relax and by how much is a difficult decision problem in its own right; it requires judgement and a deep understanding of the overall objective function.  1.3  O V E R V I E W  O F T H E M A R K E T - B A S E D  A P P R O A C H  Given the shortcomings of both the conventional and heuristic approaches to addressing the manufacturing problem, there are clear opportunities for alternative techniques that provide good solutions  3  within the bounds of computational practicality. The power of markets to coordinate the activities of a large number of boundedly-rational agents contending for scarce resources is well understood. Equally well understood is the difficulty of using centralized hierarchical methods to address large resource allocation problems. Consider for example the following vignette from the Economist [Anonymous, 1995]: When Soviet planners visited a vegetable market in London during the early days of perestroika, they were impressed to find no queues, shortages, or mountains of spoiled and unwanted vegetables. They took their hosts aside and said: *We understand, you have to say it is all done by supply and demand. But can't you tell us what is really going on? Where are your planners, and what are their methods?"  Given the current popularity o f using market mechanisms to address large-scale, societal issues (such as school vouchers, pollution control, and the allocation of radio spectrum) it is surprising that greater use of markets has not been made in the design and implementation of computer-based decision support systems. Markets have three fundamental advantages over centralized control. First, the notion of an agent provides a convenient means to decompose complex systems into self-contained chunks for goals, capabilities, beliefs, and so on [Simon, 1981]. By decomposing the problem into sub-problems that are roughly independent, distributed computation can be used to reduce the calendar time (as distinct from processor time) required to generate a solution. Second, markets are inherently scalable. When making decisions, agents need only consider their own preferences and the relative prices of goods in the economy. The addition of a new agent to the economy may result in a change in prices, but does not change the complexity of the decision problem faced by any other agent. Finally, the system of. prices that emerges from the interactions of the agents in the market guides the system as a whole to a welfare-maximizing state. The marvel of the price system has long been recognized. Consider, for example, the following observation by Nobel laureate Friedrich von Hayek [1945]: I have deliberately used the "word" marvel to shock the reader out of the complacency with which we of take the working of this mechanism for granted. I am convinced that if [the price system] were the result of deliberate human design and if the people guided by the price changes understood that their decisions have significance far beyond their immediate aim, this mechanism would have been acclaimed as one of the greatest triumphs of the human mind.  4  The approach to distributed computation proposed in this research is based on a literal interpretation of classical microeconomic theory. The global-level resource allocation problem is restated in terms of a large number of agent-level problems. Rather than relying on a centralized controller the aggregation of the agent-level solutions into a coherent global solution is achieved by the unseen band..By selfishly maximizing their own well-being, the agents unknowingly maximize the well-being of the system as a whole. The process of transforming a resource allocation problem to a market-based problem (consisting of agents, goods, bidding protocols, and prices) can divided into three distinct phases: 1. Decomposition — Agents are created to represent the interests of objects in the real world, such as parts and production machinery. Each agent is simply a process on a computer that makes decisions on behalf of the physical-world object it is assigned to represent. 2. Agent-level rational choice — - Each agent is implemented as a decision-theoretic planner. The decision-theoretic planning algorithm generates a contingency plan that maximizes the agent's expected utility over some planning horizon. Since the objects that the agents represent exist in an imperfect physical world, rationality implies the ability .to reason about uncertain outcomes. 3. Aggregation — To resolve inter-agent conflicts, a market is provided in which agents can buy and sell contracts for resources. The ultimate allocation of resources to agents is deter- • mined by the equilibrium of prices that emerge through the market interactions. The body of economic theory that addresses the global welfare effects of competition by agents is Walrasian microeconomics. The core of the Walrasian approach—the first fundamental theorem of : 4  welfare economics—provides a formal statement of the workings of the unseen hand. The theorem states that a competitive equilibrium produces an allocation of resources that is Pareto optimal  5  [Mas-Colell, Whinston and Green, 1995]. Naturally, the first theorem is based on a number of assumptions about agents and the markets in which the agents interact. Most notably, the agents are assumed to be strictly rational and the markets are assumed to be frictionless and efficient. In human  Named for Leon Walras (1834-1910), French economist and pioneer of general equilibrium theory. A n allocation is Pareto optimal (or Pareto efficient) i f there is no way to make one agent better off without making another agent worse off. The importance o f Pareto optimality is discussed in greater detail in Section 3.2.  5  economies, these assumptions are never fully satisfied and thus Walrasian microeconomics is seldom seen as anything but a first-cut descriptive model of human behavior. However, the goal of this research is to build artificial agents, not model real ones. As such, relatively simplistic economic theory can be used in the normative sense to guide the design and implementation of the com-, puter-based agents and markets. The issue, therefore, is not whether the theory accurately models the agents, but whether the agents can be engineered to accurately model the theory.  P R O O F - O F - C O N C E P T  1.4  C R I T E R I A  To demonstrate the viability of the market-based approach presented in this thesis, the combination of analytical and empirical reasoning shown in Figure 1.1 is used. A t the top of the "proof-of-concept pyramid" is the fundamental premise that it is possible to find a globally optimal solution to a large resource allocation problem using a market-based approach. The claim of optimality rests into two additional premises: •  it is possible to express global utility (i.e., the utility of the system as a whole) as a special function of agent utilities; and,  •  the first fundamental theorem of welfare economics holds in the market in which the agents interact.  The first premise, which establishes a relationship between global optimality and Pareto optimality, can be established as a design feature of the system of agents. The second premise is more complex since the first theorem imposes a number of preconditions on the structure of the agents and the market. The net result, as Figure 1.1 illustrates, is that the proof of concept pyramid rests on three distinct lines of reasoning that correspond to the decomposition, rational choice, and aggregation phases described in the preceding section. These lines of reasoning are introduced in more detail in the sections that follow.  1.4.1  DECOMPOSITION AND GLOBAL OPTIMALITY  The relationship between Pareto and global optimality is important in the context of the manufacturing problem because the agents themselves are created solely as a means of achieving a globally optimal outcome. In other words, the ultimate welfare of the agents is much less important than the  6  problem  solution  global optimality?  first fundamental theorem of welfare economics  Pareto optimality -» global optimality  problem formulation (Section 4.1)  agent-level rationality  A  FIGURE 1.1  market equilibrium  structured dynamic programming algorithm (Section 4.2)  protocol for . combinational auctions (Section 4.3)  empirical results (Section 5.1)  complexity analysis (Section 5.1)  The proof-of-conceptpyramidfor the thesis: The feasibility of the market-based approach is supported by lines of analytical and empirical reasoning.  ultimate allocation of resources to agents. Although the first theorem guarantees Pareto optimal outcomes for agents, Pareto optimality is generally insufficient for global optimality. Consequently, in order for the first theorem to be of any practical value, the global-level problem must be formulated in such a way as to create and maintain a sufficiency relationship between Pareto and global optimality. Fortunately, in environments such as manufacturing in which there is a well-defined global utility function, it is possible to decompose the problem such that global utility is defined as the sum of agent utilities. The justification for, and implications of, this approach are discussed in Section 4.1  1.4.2  T H E C O M P U T A T I O N A LFEASIBILITY OF A G E N T - L E V E L RATIONALITY  Attaining outcomes that are economically efficient requires that all agents participating in the market be stricdy rational. To achieve the requisite level of rationality, agents in this system are implemented as solvers of stochastic optimization problems. Consequently, the agents are subject to the complexity  7  constraints inherent in any optimization technique. The difficulty of achieving agent-level rationality (for multiagent or single-agent systems) is summarized by [Mullen and Wellman, 1995]: Despite the centrality of decision-theoretic rationality in our view of computational economies, at present we have little to say [...] about how to make economic agents rational in the decision-theoretic sense. The reason is that there is no difference in the problem of achieving computational rationality in this context compared to any other context. That is, designing rational agents is the general [artificial intelligence] problem, and we are working on pieces of it just like every other research group.  The core problems is that all planning problems are in principle intractable [Chapman, 1987], [Garey and Johnson, 1979]. The critical issue in this research is whether the market-based decomposition of large problems leads to the formulation of agent-level problems that are small enough to be solvable in practice. Put another way, a well-known shortcoming of conventional stochastic optimization techniques is that despite their power and elegance, they only permit the solution of "toy" problems. This research investigates whether market-based decomposition can be used to generate a large number of toy problems that can be solved and recombined to provide the solution to much larger problems. One of the conclusions of the thesis is that although the agent-level planning problems are much smaller than the global (factory-level) planning problem, the agent-level problems are still much too large to represent (never mind solve) using conventional stochastic optimization techniques. However, as shown in Section 4.2, a structured dynamic programming algorithm can be used to exploit source of independence within the agent-level problem formulations. Precisely how much can be gained by using the structured dynamic programming algorithm is an empirical question—it depends on both the problem domain and the problem representation. Section 5.1 presents empirical evidence that suggests that it is possible to achieve agent-level rationality for the class of resource allocation problems addressed in the thesis.  1.4.3  .  •  AGGREGATION USING COMPETITIVE MARKETS  The second precondition of the first fundamental theorem of welfare economics is that the market be capable of attaining equilibrium. To attain equilibrium, a large number of potential sources of market failure (such as externalities, information asymmetries, transaction costs, and so on) need to be eliminated. In Section 4.3, the sources of market failure relevant to multiagent resource allocation systems  8  are identified and an efficient incomplete protocol for distributed combinational auctions is pre6  sented. The strategies for avoiding market failure embodied in the protocol are described with the aid of a simple example. The worst-case computational complexity of the protocol is analyzed in Section 5.1.  1.4.4  SOLUTION OF A B E N C H M A R K  PROBLEM  In addition to the analytical/empirical chain of reasoning shown in Figure 1.1, a benchmark problem is used to evaluate the ability of the market-based approach to converge on an optimal allocation of resources. The problem used for this purpose in this thesis is a deterministic three-machine flow shop scheduling problem that can be solved using a variation of Johnson's rule, a well-known operations research technique. Although the market-based approach could be applied to much larger or more realistic problems (indeed, this is the rationale for its introduction in the first place), solutions that are known to be optimal do not exist for such problems—hence the reliance on a very simple problem as a benchmark. The details of the problem and the conventional solution technique are presented in Section 2.5. Although the benchmark problem used in the thesis is very small, it belongs to a large, well-defined class of resource allocation problems that share the same core structure. As a consequence, conclusions made regarding the inner workings of the agent and market algorithms on small problem instances should generalize well to larger «-job, /w-machine scheduling problems with the same structure. For example, in the case of the deterministic 1-machine example introduced in Section 4.3.5 to • illustrate the operation of the auction protocol, approach can be shown to reduce to the shortest processing time (SPT) dispatch rule (which is known to be optimal for the entire class of problems). Since the equivalence between the market and STP is independent of scale, the fact that the market-based system can find the optimal solution to the small instance implies that it can find the optimal solution to larger instances of the same class (assuming that there is sufficient computation time and memory).  A n incomplete algorithm is one that is not guaranteed to find the optimal solution. The combinational auctions in which the agents participate are known to be intractable [Sandholm and Suri, 2000] and therefore complete algorithms are infeasible for systems with large numbers of agents and resource goods.  9  1.5  S U M M A R Y  O F  C O N T R I B U T I O N S  This thesis brings together established theories and techniques from a number of foundation disciplines (specifically, economics, operations research, and artificial intelligence) and applies them to an important problem in production management. As such, the primary contribution of the thesis is the novel integration of existing research to address a practical problem. However, to achieve the integration and create a working prototype of a market-based system, a number of secondary incremental contributions were made in the foundation disciplines. Both the primary and secondary contributions are described in detail in Section 6.1.1. In this section, the contributions are briefly summarized.  1.5.1  PRIMARY  CONTRIBUTION  The primary contribution made by this thesis is an approach to multiagent resource allocation that is built on a foundation of microeconomic theory. The theoretical foundation is important because it provides assurance that the equilibrium outcome of the system is optimal and stable with respect to a given information set. In addition, the inherent scalability of the market-based approach permits the decomposition and solution of stochastic optimization problems that are impractical to solve using conventional monolithic techniques.  1.5.2  SECONDARY  CONTRIBUTIONS  The secondary contributions made by the thesis involve new or refined techniques that have been introduced in order to satisfy the preconditions of the first fundamental theorem of welfare economics. These contributions involve technical refinement to the work of others (e.g., adopting and implementing the structured dynamic programming algorithm of Boutilier etal([Boutilier, 1997], [Boutilier and Dearden, 1996], [Boutilier, Dearden and Goldszmidt, 1995]) in Section 4.2.1) and the introduction of novel approaches to dealing with the technical challenges inherent in the approach (e.g., the two-phase protocol for combinational auctions in Section 4.3.5 and a rolling planning horizon technique for long term planning in Section 4.2.3). Without rational agents and efficient markets, it is impossible to draw on classical economic theory to make predictions about the quality of equilibrium outcomes. Thus, the ability to solve large agent-level planning problems and attain equilibrium in a combinational auction can be seen as the "enabling technologies" of the market-based approach.  10  O V E R V I E W  1.6  O F T H E  T H E S I S  In Chapter 2, the'manufacturing problem is described in greater detail and the shortcomings of both conventional approaches to manufacturing planning and emerging enterprise resource planning systems are identified. The objective of the chapter is to illustrate the need for better methods of addressing the manufacturing problem. The chapter closes with a description of the benchmark scheduling problems used throughout the thesis for illustration. Chapter 3 summarizes theory and prior research in the disciplines relevant to this research: scheduling, economics, and artificial intelligence. •  In Section 3.1, approaches to planning and scheduling from two different disciplines (operations research and artificial intelligence) are reviewed. The purpose of the review is to identify the strengths and weaknesses of various exact and heuristic approaches.  •  Section 3.2 contains a summary of a number of important economic concepts. First, the concept of individual rationality is defined. This is followed by a brief review of the concept of general equilibrium in which the Edgeworth box model is used to illustrate a number of basic properties of pure exchange economies and to introduce notation used in the discussion of the implemented system in Section 4.2.4. The section on economic theory ends with a review of auction theory. Auction theory is used to establish the equivalence of the different types of markets considered in this research.  •  Section 3.3 briefly reviews theory and practice from distributed artificial intelligence. In this section, a three dimensional taxonomy is introduced and used to situate different approaches, and systems reported in the literature. In addition, the market-based approach proposed in this thesis is situated within the taxonomy. The objective of this section is to evaluate the suitability of various approaches to the task of manufacturing scheduling.  Chapter 4 describes the proposed market-based approach in detail. The sections of the chapter correspond to the three phases identified in Section 1.3: decomposition, agent-level rational choice, and aggregation. •  •  In Section 4.1, the decomposition of a scheduling problem into a number of self-interested agents is described. The action description language for agents is summarized and illustrated with examples.  11  •  In Section 4.2, the challenges of achieving agent level rationality are discussed and a tree-based structured dynamic programming algorithm for solving very large stochastic optimization problems is described.  •  In Section 4.3, the aggregation mechanism for the systems—a form of continuous reverse auction—is introduced and illustrated with an example.  Chapter 5 contains a discussion of the empirical results used to support the proof-of-concept pyramid in Figure 1.1. First, in Section 5.1, the efficacy of the structured dynamic programming algorithm is analyzed. A n effort is made to extrapolate the results to estimate the feasibility of addressing real-world problems using the agent-level planning techniques described in Chapter 4. Section 5.2 contains an analysis auction protocol and derives the worst-case performance of the algorithm. In Section 5.3 and Section 5.4, the market-based solutions to the benchmark and other scheduling problems are analyzed. Chapter 6 concludes the thesis with a summary of conclusions and limitations of the thesis. The primary and secondary contributions of the research are enumerated and described in detail. In addition, a number of areas for further research are identified.  12  C H A P T E R  2:  P L A N N I N G  A  N  DS C H E D U L I N G  I N  M A N U F A C T U R I N G  2.1  T H E M A N U F A C T U R I N G  P R O B L E M  Effective management of a manufacturing enterprise requires the coordination of many types of production resources such as time, materials, machines, and personnel. The explicit goal of the coordination exercise is to simultaneously maximize a number of important performance measures (such as profitability, product quality, responsiveness, and worker satisfaction) and minimize others (such as environmental impact and waste). In short, decision makers in manufacturing environments are faced with a very large, ongoing resource allocation problem that requires judicious trade-offs between numerous conflicting objectives. Not surprisingly, the complexity of the manufacturing problem far exceeds the capacity of any human decision maker. Similarly, existing analytical methods—that is, methods that can be expressed as algorithms and executed by computers—fall far short of being able to address such problems in their entirety. As discussed in Section 1.2, there are two broad approaches to coping with the computational intractability of the manufacturing problem. The first is to generate exact solutions to a series of simplified sub-problems (the conventional decomposition approach). The second-is to address the problem in its full complexity but to satisfice—that is, accept an approximate or "good enough" solution as long as it is found within an acceptable amount of time (the heuristic approach). In this chapter, both approaches are examined in greater detail and an effort is made to highlight some of the important shortcomings associated with each.  2.2  C O N V E N T I O N A L  D E C O M P O S I T I O N  Figure 2.1 shows the conventional decomposition of the manufacturing problem. The term "conventional" is used to distinguish this form of decomposition from other forms (such as the agent-based decomposition described in Chapter 4) and to emphasize the extent to which practitioners and academics have converged on a scheme for dividing the manufacturing problem into a standardized set  13  production planning, master scheduling  I  capacity status  orders, demand forecasts  quantities, due dates  material requirements planning, capacity planning schedule constraints  ±  material requirements  shop orders, release dates  scheduling and rescheduling  I  schedule performance  schedule  detailed scheduling  dispatching  1  shop status  shopfloor management data collection  a  job loading  shopfloor  FIGURE  2.1  The conventional decomposition of the "manufacturing problem": The global problem is divided into a number of distinct sub-problems (from [Pinedo, 1995, p. 4]).  of sub-problems. Indeed, most production management textbooks contain a diagram very similar to the one shown in Figure 2.1.  2.2.1  ROUGH-CUT  PLANNING  The starting point of the conventional decomposition is forecasts of future demand across all product lines. In many cases, forecasts are combined into measures of aggregate demand to facilitate the • estimation of workforce requirements over the planning horizon. The forecasts are also used to create a master production schedule (MPS), which describes the target quantities and delivery dates of finished goods. Once the MPS is complete, it is passed to the materials requirements planning (MRP) system. The role of the M R P system is to "explode" the schedule for finished goods into order release schedules for the raw materials and purchased components that constitute the finished goods.  14  In addition to the MPS, the inputs to a M R P system are the bill of materials (BOM) information for each finished product and estimates of the lead time required to produce or purchase each item in the B O M . The system uses a simple procedure to work backwards from the due date of the finished good to determine the timing and quantities of B O M items. The output of the M R P system is a general list of material requirements and an order release schedule for each B O M item that must be produced or purchased. The order release schedule consists simply of the date on which the order should be released to the production system and a date by which the order should be completed. Since the order release schedule does not allocate specific production resources (such as personnel or machines) to orders, M R P is sometimes called rough-cut scheduling.  2.2.2  DETAILED  SCHEDULING  The detailed scheduling stage in Figure 2.1 refers to the process by which jobs are assigned to machines. In prototypical manufacturing environments, the terms "jobs" and "machines" are used literally. However, in the general case, a "job" can refer to any object or collection of objects that requires processing and a "machine" can refer to any production resource. One consequence of the sequential nature of the decomposition shown in Figure 2.1 is that the shop-floor scheduler is constrained by a number of decisions that have already been made upstream: 1. Process plan: The process plan specifies the sequence of operations that must be performed to produce the part. In addition, the process plan may specify precedence constraints between the operations, machine requirements or preferences, setup requirements (e.g., jigs or fixtures), and personnel skill requirements. 2.  Release and due dates: The rough-cut planning stages specify an order release date and a due date for each job. The task of the scheduler is to ensure the job is complete within the designated window of time. Typically, penalties are specified for both missed deadlines and early completion (e.g., the holding cost for finished goods).  3.  Priority: To facilitate the process of resolving contention for resources during scheduling, jobs are often assigned priorities. These priorities may be subjective (e.g., based on the strate-  In queueing theory, such resources are referred to as "servers". Although this term is in many ways preferable, "machine" is used throughout this thesis to remain consistent with the established scheduling nomenclature.  15  gic importance of the customers placing the orders) or objective (e.g., proportional to the value already added to the order). Despite constraints imposed on the scheduling problem by the preceding sub-problems, the search for optimal solutions typically involves evaluation of a very large number of alternatives. Generally, the number of possible schedules increases exponentially with the number of factors considered, such as operations, machines, tools, personnel, and so on. For this reason, commercial scheduling systems typically restrict their focus to a single resource—machines [Fox, 1987]. However, the combinatorics remain challenging, even for the simplified problem. For example, the task of scheduling n jobs on m machines generates (n\)  m  unique schedules. Naturally, the goal in scheduling is to find a good sched-  ule without having to search through a massive number of alternative schedules.  2.2.3  ACHIEVING COMPUTATIONAL  TRACTABILITY  The conventional decomposition achieves computational tractability in two ways. First, each sub-problem in the sequence is assumed to depend only on the output of the preceding sub-problem. For example, the master schedule generated during MPS is used to generate an order release schedule for individual parts by the M R P system. However, material planning is assumed to be independent of capacity planning, shop-floor scheduling or any other downstream activity. Clearly, the assumption of infinite (capacity gready reduces the computational complexity of the materials planning process. The second way in which the conventional decomposition achieves computational tractability is by using simplified, single-attribute objective functions for each sub-problem in lieu of the more complex multi-attribute utility function faced by the firm. Subjective attributes such as product quality, responsiveness, and worker satisfaction are ignored, quantified to the extent possible, or replaced with proxies. For example, in M R P the implicit objective function is to have the lateness (due date minus completion date) of all jobs equal to zero. Since the due date is assumed fixed and binding in M R P logic, the profitability and customer satisfaction implications of relaxing the due dates are never considered. In shop-floor scheduling, decision makers have a choice of alternative objective functions to minimize. Single-attribute measures that are commonly used include tardiness, lateness, makespan, and the sum of completion times. In some cases, the objective functions are weighted according to a pre-  16  determined system of priorities. Although these objective functions are related (they all are cost functions with respect to time) the choice of which objective function to minimize can greatly impact the computational complexity of the solution process and the solution itself. In addition to the core objective functions, other measures are used at a global level to guide the choice of objective functions and solution methods at the sub-problem level. For example, it is often desirable to quantify the "robustness" of a production plan [Pinedo, 1995], [Nahmias, 1989, p. 86]. A robust plan is relatively insensitive to unexpected occurrences in the production environment and • therefore limits the workload uncertainty faced by human workers. In contrast, a brittle plan may involve lower up-front costs but increase the possibility of having to cancel shifts or request overtime shifts in response to some disturbance. By maximizing the robustness measure of a schedule, the decision maker is attempting to minimize many hard to quantify and intangible costs. In a similar way, there have been numerous efforts in the last three decades to express the issues surrounding product quality in economic terms (e.g., [Crosby, 1979]).  2.2.4  LIMITATIONS OF CONVENTIONAL DECOMPOSITION  In this section, a number of limitations of the conventional decomposition are described. The objective is to demonstrate that the simplifications and assumptions used to support the sequential decision process shown in Figure 2.1 are seldom justified in practice.  D E P E N D E N C E B E T W E E N STAGES  Perhaps the most obvious shortcoming of the conventional decomposition is that the stages shown in Figure 2.1 are not independent. In fact, the stages are highly dependent whenever constraints from downstream stages adversely affect the feasibility of the solutions generated at upstream stages. For example, a common criticism of M R P systems is that the order release schedules that they generate assume (among other things) infinite downstream capacity. If a particular order release schedule is found to violate capacity constraints in the capacity planning stage, the order release schedule must be fixed. The situation is similar for the scheduler: although an order release schedule may be feasible with respect to aggregate capacity, it may be impossible for the scheduler to satisfy the specific constraints it is given. If the MRP, CRP, and scheduling systems are not well integrated, the resulting  17  "backtracking" can be very expensive in terms of effort and disruption [Sillince and Sykes, 1993], [Turbide, 1993].  C O N S T R A I N T S VERSUS D E C I S I O N V A R I A B L E S  Given the sequential dependencies between the sub-problems in Figure 2.1, it is clear that quality of the final production plan is sensitive to the accuracy of the initial demand forecasts. However, one problem with forecasts in general is that they tend to ignore the impact that a firm can have on generating demand. In other words, demand is a decision variable in its own right; it can be controlled via changes in marketing and promotion [Raman and Singh, 1998], [Nahmias, 1989]. The situation is similar for other "constraints" in the problem formulation. For example, production capacity is taken to be a hard constraint; however, it may be possible to alter short term capacity by adding shifts or subcontracting certain items. Similarly, a customer may not have a firm due date but instead have a utility function over a wide range of delivery dates. As such, it might be advantageous to allow customers to pay a higher price for early delivery or give them a discount when their orders can be moved to off-peak production times. By treating decision variables as fixed constraints, decision makers forego the opportunity to make trade-offs between various aspects of the problem.  IMPRECISION O F M R P  The logic that underlies standard M R P systems is based on a number of exceedingly simplistic assumptions concerning lead times, capacity, and the priority of different jobs. The basic shortcomings of M R P can be summarized as  follows:  •  1. Lead times for parts making up a product are assumed to be constant and known with certainty. When actual lead times differ from the estimates used by the M R P system, the system's outputs may be misleading [Neely and Byrne, 1992]. 2. Lead time is ignored as a variable in its own right. Treating lead time as an exogenously-determined factor ignores the fact that decision makers can often influence lead times, especially for products built in-house. 3. The decision horizon is discrete and there is limited look ahead. When the order release schedule is generated, it is generated over a fixed horizon of n time periods. However, the  18  subsequent scheduling of a very large, high-priority job at time n.+ 1 could render the existing order release schedule suboptimal. 4. M R P does not provide a framework for making trade-offs. Since all jobs in the master production schedule are treated equally, there is no systematic way of assessing the value of plan refinements such as expediting high-priority orders, changing due dates, adding overtime shifts, and so on.  2.3  H E U R I S T I C  A P P R O A C H E S  In heuristic approaches,-the manufacturing problem is treated as a semi-structured decision problem. The distinction between structured, semi-structured, and unstructured problems in the context of decision support systems (DSS) is due to [Keen and Scott Morton, 1978]. The classic example of an unstructured operational task is the selection of the monthly cover by a magazine editor. Selection of the cover requires human capabilities such as taste and judgement. Moreover, there is litde structure on which to build an algorithmic approach. As a consequence, cover selection, like any unstructured decision problem is left to human decision makers with little or no computer-based decision support.. At the other extreme of the continuum are structured problems. The classic example of a structured problem is inventory control. Before the widespread use of computers in business, the task of monitoring and reordering inventory required a well-paid middle manager with experience and specialized skills [Keen and Scott Morton, 1978, p. 89]. However, .the economic trade-off between holding costs and the risk of stocking out is quantitative and well understood. Since structured operational tasks are relatively easy to solve algorithmically, fully automated inventory control systems are now common. Decision support systems that rely on a mixture of algorithmic decision making, heuristic decision making, and human participation are typically targeted at the middle ground—semi-structured decision problems. The example of a semi-structured decision problem used by Keen and Scott Morton is . bond trading. The objective of a bond trader is to maximize long-run profitability by buying and selling bonds with different coupon rates and maturities. Clearly, the task itself is highly-structured—it is a stochastic optimization problem with a well-defined objective function. The difficulty occurs because there is an overwhelming number of sources of relevant information for bond traders and the relationship between the market and the information is so complex and uncertain that a fully algorithmic solution is generally impractical. As a consequence, bond trading normally involves a  19  mixture of algorithmic analysis, rules of thumb, and human judgement. The manufacturing problem has much in common with bond trading and is also used as an exemplar of the semi-structured class of problems (e.g., (Turban, 1995]). However, unlike bond trading, the manufacturing environment is highly structured and relatively predictable. The semi-structured designation arises more from the difficulty of applying conventional algorithmic techniques to the problem rather than from any inherent lack of structure in the problem itself. Given the computational infeasibility of conventional structured techniques and the shortcomings of MRP-based approaches, there is a growing number computerized systems for scheduling that adopt a semi-structured approach—that is, they combine human judgement, decision-making heuristics, and computational brute force. A n example of this trend is systems for production planning based on constraint-directed search. CSP-based scheduling can be viewed as a semi-structured approach for two reasons. First, search over the space of possible schedules is heuristic. Without exhaustive search, there is no guarantee that a schedule that satisfies the constraints is also an optimal schedule. Second, human experience and judgement is required to identify and formulate constraints. Although some constraints—such as physical precedence constraints—are binding (e.g., a hole must be bored before a bolt can be inserted), others—such due dates and minimum levels of quality—are more subjective and could involve complex trade-offs. Selection and relaxation of softer constraints involves a very important element of human intervention in the decision making process.  2.4  T H E E M E R G E N C E  O F ERP  S Y S T E M S  Vendors of Enterprise Resource Planning (ERP) systems have recognized the dependence between the sub-problems shown in Figure 2.1, but only from an information management point of view. E R P systems provide a centralized repository for all types of manufacturing information, such as demand forecasts, raw materials inventories, order status, due dates, and so on. Although the integration of information is certainly an improvement over stand-alone information systems, the fundamental problem of dependence from a planning and coordination point of view remains unsolved. In many cases, the production planning engines used by E R P systems are based on the same simplistic assumptions and approximations as the M R P systems they replace [AMR Research, 1997].  20  To rectify the problem, most major E R P vendors are in the process of integrating so-called advanced planning systems (APS) algorithms within their production planning modules. For example, RED PEPPER software, a vendor of constraint-based planning systems, was recendy acquired by PEOPLESOFT; ORACLE, on the other hand, has decided to partner with 12 to sell its RYTHYM APS products. According to reports in the trade press (e.g., [Bartholomew, 1997], [Gould, 1997]), the transition from MRP-based systems to more sophisticated CSP-based algorithms has resulted in significant payoffs in a number of manufacturing organizations. In broad terms, APS products represent a shift away from the compartmentalized sub-problems in Figure 2.1 towards heuristic techniques that support more complex problem formulations.  2.5  A  B E N C H M A R K  P R O B L E M  Benchmark problems are used in this thesis to facilitate concrete illustrations of techniques and to demonstrate the viability of the market-based approach on small but interesting problems. In the following sections, the primary benchmark problem is introduced and solved using a conventional operations research technique. In Chapter 5, the optimal solutions found here are compared to the solution generated by the market based approach.  Three jobs need to be scheduled. Each job consists of three operations and each 2  operation must be performed On a different machine. Thus the scheduling environment consists of three jobs and three machines. Each job is subject to the same precedence constraints. Specifically, the completion of Operation 2 (Op2) must precede the commencement of 0p3. Similarly, the completion of O p l must precede the commencement of Op2. The processing time of each operation on each machine is shown in Table 2.1.  .  Given the formulation above, the benchmark problem can be classified as a classic flow shop problem. Flow shop problems have the following characteristics: •  each job consists of multiple operations and each operation can only be processed on a particular machine;  •  there are precedence constraints of the form Opj -> O p . l + 1  Whenever specific operations on the benchmark problem are described, a sans serif font is used.  21  TABJJE. 2.1: Processing times for the benchmark problem. Operations Op1  Op2  Op3  2  3  4  J2  4  3  5  J3  4  1  3  For reasons discussed in Section 2.5.3, algorithms for flow shop problems typically focus on minimizing makespan—the total amount of time required to complete all the jobs in the problem. Thus, in operations research, the benchmark problem would be classified as an instance of the F3 | J  C  max  class of scheduling problems . The F3 in the notation indicates that the scheduling environment is a flow shop consisting of three machines whereas C  max  indicates that the optimal solution minimizes  the maximum completion time for all operations in the problem. Although the benchmark problem might appear trivial, shop problems (such as open shop, flow shop, and job shop) are generally very difficult to solve using exact techniques. The algorithms that do exist are typically directed at special cases, such problems with two machines, or problems for which each operation for each job is limited to a single unit of processing time (see [Brucker, 1998] for a summary of shop scheduling algorithms). A n interesting feature of the three-job, three-machine flow shop problem is that it is classified as being.strongly NP-hard [Pinedo, 1995, p. 101]. In other words, there is no algorithm that is known to provide the optimal solution to all instances of the problem in an amount of time that is a polynomial function of the size of the problem. Since a problem with n jobs generates n\ unique permutation schedules , solution techniques based on enumera4  tion of all possible schedules becomes computationally impractical as the number of jobs increases.  See Chapter 3 for more information on scheduling notation and the classification o f scheduling problems. In a permutation schedule, the sequence o f jobs on all machines is the same. Thus, i f the sequence o f jobs on Ml is JI —» J2 —> J3, then jobs follow the same sequence on all other machines. Permutation schedules are known to be optimal with respect to the C criterion for two and three-machine flow shop problems. However, this is not the case when more than three machines are involved [Brucker, 1998, p. 165]. max  22  For example, if a computer can examine one billion schedules a second, solution to even a relatively modest 20-job problem requires just over 77 years of computation.  2.5.1  A SPECIAL CASES OF T H ET H R E E - M A C H I N E JOB  One of the earliest results in the field of operations research was an algorithm for solving the F2 | | C  max  class of problems [Johnson, 1954]. Known as the SPT-LPT rule (or Johnson's rule), the  algorithm can generate a schedule that minimizes makespan and do so in an amount of time that is a polynomial function of the number of jobs, n. Although Johnson's rule does not scale to the three-machine problem in the general case, there are certain three-machine problems that can be transformed into two machine problems. If the processing time of job j on machine / is denoted Cj- •, then Johnson's rule can be used to schedule three machines whenever either min C j • > max C j or min C 2  3  • > max C • is true for at least one j [Nahmias, 1989]. 2  To schedule the three jobs with three operations using Johnson's rule, composite operations (e.g., O p l ' and 0p2') must be defined for each job. The duration of each composition operation, C / y , is determined as follows: '\,j  C  =  C  \,j  + C  2,j  Johnson's rule is then applied to the two composite operations in the normal manner (see Algorithm 2.1) to generate an optimal permutation schedule.  Since ( m i n C  3  • = 3 ) > ( m a x C j = 3 ) in the problem shown in Table 2.1, it is 2  possible to used the three-machine version of Johnson's rule to generate a schedule. The composite (transformed) operation times are shown in Table 2.2. Since J3's 0p2' entry is the smallest, J3 is added to  and removed from the list. The next smallest remaining  entry is J l ' s O p l ' , which is added to J . Finally, J2 is appended to the tail ofJ . The final H  H  schedule is the concatenation of J and J : J I , J2, J3. The Gantt chart for the final H  T  schedule is shown in Figure 2.2.  23  A L G O R I T H M 2.1: J O H N S O N S  RULE  1. Define "head" and "tail" schedules J <- 0 and J <- 0 H  T  1. List the processing time of the operations in two columns: O p l and Op2 (or in the transformed case, O p l ' and Op2') 2. Find the remaining operation in the two columns with the smallest processing time. a.  If the operation is from O p l ' , append its job the tail of J  b.  Otherwise, append the job to the head of J  H  T  3. Remove the job from the table and continue until no further jobs remain. 4. Return J<—J„ + J  7  TABLE 2.2: Processing times for the transformed benchmark problem.  Operations Opt"  Op2'  J1  5  7  J2  7  8  J3  5  4  Time 1  2  FIGURE 2.2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  An optimal schedule for the benchmark problem: Johnson's rule has been used to minimise the total makespan of the schedule.  24  2.5.2  LIMITATIONS OF T H E EXACT  APPROACH  Given that Johnson's rule reduces the search through n\ permutation schedules to a relatively small number of sorting and search operations, it should be possible to schedule any number of jobs that one might encounter in practice. However, there are at least three obvious limitations of the approach: 1. N u m b e r of machines: The algorithm can only be applied to two-machine and a restricted class of three-machine problems. Since manufacturing environments typically contain a much larger number of machines, the practical value of the approach is limited. 2. Deterministic processing times: Although a slightly modified version of Johnson's rule can be applied to stochastic flow shop problems, optimality is only assured if processing times are exponentially distributed [Pinedo, 1995]. However, it is difficult to imagine a case in manufacturing in which the memoryless property of the exponential distribution is satisfied . 5  3. Inflexibility: The formulation of the problem assumes that all jobs have the same priority, holding costs, and release dates. Due dates are not even considered. In the more realistic case in which high-priority jobs with tight deadlines arrive while other jobs are in process, the C  max  objective function and assumption of permutation schedules becomes overly restrictive.  Although F3 | | C  max  is only one type of scheduling problem, and although Johnson's rule is only  one of many sequencing rules and techniques in the scheduling literature, the limitations noted above apply to virtually all conventional scheduling techniques. This problem is discussed in greater detail in Chapter 3.  2.5.3  T H E M E A N I N G OF OPTIMALITY  It is important to keep in mind that the schedule Figure 2.2 is "optimal" with respect to a particular objective function, in this case C . max  turing, C  max  However, when considering the realities of modern manufac-  might seem to be an odd measure to minimize. Most manufacturing organizations are  going concerns that run their expensive production machinery at maximum capacity always. As such, 5  The memoryless property states that the probability o f an event occurring in the next interval o f time is independent o f the amount of time that has passed without it occurring. In the context o f manufacturing, this implies that the probability of an operation being completed in the next interval o f time is independent o f the amount of processing it has already undergone. While this may be a legitimate assumption for an activity such as troubleshooting, it is less likely for conventional machining tasks [Pinedo, 1995, p. 256].  25  it is rare that the true objective function of the organization is to schedule a finite number of jobs so that makespan is minimized. A more realistic objective function is to minimize total costs on an on-going basis. To illustrate the difference between minimizing makespan and minimizing total cost, an alternative schedule in which J2 and J3 have been switched is shown in Figure 2.3. Although the makespan of the total schedule has increased from 17 time units to 18 time units, the sum of the holding cost incurred by each job (assuming a holding cost per unit time in the system of $1) is actually less in the alternative schedule, as shown in Table 2.3.  Time 1  M1 M2  2  3  4  5  6  JI JI  8  9  10  11  12  13  15  16  17  18  Legend  JI J2 J2 J2  JI JI JI JI  FIGURE 2.3  14  J2 J2 J2 J2  I  JI JI JI  M3  I 7  n  J2 J2 J2 J2 J2  J3  An alternative solution to the benchmark problem: In this schedule, the sum of holding costsfor all jobs is minimised.  FABLE 2.3: Holding costsfor the optimal benchmark and alternative schedule. Job  optimal schedule (Johnson's rule)  alternative schedule  Ji  $9  $9  J2  $14  $18  J3  $17  $12  Total holding cost  $40  $39  26  The reason for the difference is straightforward: the objective of Johnson's rule is to minimize the total makespan of the schedule by minimizing the amount of time that machines are sitting idle at the beginning and end of the schedule. For example, precedence constraints ensure that no job can start on M2 until the job is first processed on M l . By selecting the job with the shortest processing time on M l first, Johnson's rule minimizes the amount of time that M2 must wait before its first job arrives. Similarly, the "other half" of Johnson's rule minimizes the amount of time that the last machine (in . this case M3) is running by itself by sequencing jobs with short processing times on M3 at the end of the schedule. In this way, concurrent processing on more than one machine is maximized. The alternate schedule shown in Figure 2.3 is the solution to the F3 | |  1M>JCJ  problem (where  Wj —  $1 for all jobs j). Because the benchmark problem is so small, this particular minimum cost solution can be found by inspection, or explicit enumeration of all solutions. However, in the general case, even the two-machine version of the minimum cost problem is NP-hard [Garey and Johnson, 1979]. As such, it appears that the popularity of the  C  M  A  X  objective function is due more to its feasibility  than its applicability to real-world problems. Minimizing makespan creates an interesting mathematical structure and permits certain problems to be solved exacdy using polynomial time algorithms, such as Johnson's rule. O f course, there are certain situations in which minimizing makespan does make sense. For example, when erecting a building, the "holding cost" incurred by the developer is a function of the building's occupancy date. As such, it is not rational to expedite certain operations if the overall effect is to delay the occupancy date of the building as a whole. More generally, makespan minimization is suitable for many types of project schedulingVn which the final completion time of the project determines its overall costs. However, in the type of manufacturing problems considered in this thesis, the payoffs that jobs receive for being processed are assumed to be independent of one another. In this context, the "optimal" schedule shown in Figure 2.3 (minimum cost) is preferred to the "optimal" schedule shown in Figure 2.2 (minimum makespan).  2.6  C R I T E R I A  F O R C O M P A R I S O N  In order to be considered "better" than Johnson's rule, the proposed market-based solution should be able to find the minimum cost solution to the benchmark problem shown in Figure 2.3 without  27  resorting to exhaustive enumeration of all possible schedules. To be considered "much better" than Johnson's rule, the market-based solution should address the issues of scalability, non-determinism, and flexibility identified in Section 2.5.2.  28  C H A P T E R  3:  T H E O R E T I C A L P R I O R  F O U N D A T I O N S  A N D  R E S E A R C H  The planning system described in this thesis is built on theory and practice from a number of different disciplines such as operations research, artificial intelligence, economics, and distributed artificial intelligence. In this chapter, relevant aspects of each of these disciplines is reviewed in an attempt to make the theoretical foundations of the system explicit and to situate the proposed system with respect to alternative approaches. In addition, the discussions of planning languages and economic theory set the stage for the more detailed descriptions in Chapter 4 of the market-based architecture developed in this thesis.  3.1  S C H E D U L I N G  In this section, existing theory and practice in scheduling is examined from the perspective of two disciplines: operations research (OR) and artificial intelligence (AI). In both fields, a distinction is made between scheduling techniques that are guaranteed to provide optimal solutions (exact approaches) and techniques that do not provide guarantees of optimality but are computationally practical when applied to real-world problems (heuristic approaches). The distinction between exact and heuristic approaches is discussed in greater detail in Section 3.1.1. The O R perspective on conventional and emerging scheduling techniques is reviewed in Section 3.1.2. Since much of the research in O R involves the specification of algorithms for standardized classes of scheduling problems, and since thousands of distinct problem classes have been identified and analyzed, the summary of the literature is necessarily superficial. The objective of Section is merely to illustrate the basic hardness of scheduling problems and highlight some of the difficulties involved in creating algorithms that are both exact and computationally tractable. In Section, heuristic approaches based on dispatch rules and local search are reviewed. In Section 3.1.3, the scheduling problem is revisited from the perspective of A I research on planning systems. Planning and scheduling are normally taken to be distinct activities in manufacturing (recall Figure 2.1 on page 14). However, in the more general case, scheduling can be seen as a special subset of planning in which the focus shifts from sequencing actions to sequencing actions in such a way as  29  to optimize the use of time and resources [Georgeff, 1987], [Fox, 1987]. Since many of the general-purpose planning techniques developed in A I can be applied to scheduling problems, the broader A I planning literature is relevant. The review of A I planning is largely historical: it begins with the earliest formalisms and algorithms and proceeds to describe a series of increasingly complex planning systems. In all cases, the focus is on  classicalplanning  systems—systems based on deductive logic and theorem proving techniques to rea-  son about means and ends. In Section 3.1.4, a non-classical form of planning—decision-theoretic planning (DTP)—is examined. In contrast to the logic foundations of classical planning, decision-theoretic planning is based on the fundamental constructs of decision theory: utility and probability. The overview in this chapter lays the groundwork for the more detailed evaluation of the relative merits of classical and decision-theoretic planning in Chapter 4. The section on A I planning methods concludes with a brief review of the constraint satisfaction problems (CSP) approach to scheduling. Constraint satisfaction is important in the context of scheduling for two reasons. First, as discussed in Section 2.3, commercial constraint-based planning systems are becoming increasingly common in manufacturing environments. Second, there has been dramatic success recently combining theorem-proving concepts from classical planning with constraint propagation techniques from the CSP literature [Weld, 1998]. Related heuristic search techniques such as simulated annealing, tabu search, and genetic algorithms are also discussed.  3.1.1  EXACT VERSUS HEURISTIC APPROACHES  Regardless of whether a particular approach to scheduling evolved from research in OR, A I , or any other discipline, the fundamental issue is the conflict between the exactness of the solution and the computational feasibility of generating the solution in the first place. A useful distinction is often made in A I between the time required to generate a solution (the search cost) and the time required to execute the solution (the path cost) [Russell and Norvig, 1995]. The objective of a scheduler is to minimize the path cost—that is, the costs associated with execution of the schedule. Performance goals (such as minimizing makespan or maximum lateness) are functions of time. Consequently, the specific times at which actions start and end are critically important. However, when minimization of path cost is the stated goal, there is a tendency to de-emphasize search costs. This leads to what has  30  been called the assumption (or perhaps fallacy) of calculative rationality. Algorithms based on the assumption of calculative rationality perform an exhaustive search for a schedule that minimizes path cost. When the search space is large, however, the solution will come too late to be of any practical use [Jennings, Sycara and Woolridge, 1998]. Because of the conflict between path cost and search'cost, scheduling research in both O R and A I has experienced a bifurcation between exact approaches (approaches that emphasize path cost and optimality) and heuristic approaches (approaches that seek a balance between path cost and search cost) (Pox, 1987]. As shown in Table 3.1, the scheduling research reviewed in this chapter can be situated into quadrants according to discipline and stance with respect to calculative rationality.  TABLE 3.1:A2x2  matrixfor classifying scheduling based on discipline and degree of exactness. Exact  OR  Heuristic  Optimization t e c h n i q u e s  Heuristic a p p r o a c h e s  •  deterministic rules (e.g.,  •  W S P T , J o h n s o n ' s rule)  •  bottleneck s c h e d u l i n g  integer p r o g r a m m i n g  •  s e a r c h (local, b r a n c h a n d  •  dispatch rules  bound)  Al  3.1.2  C l a s s i c a l planning  N o n - c l a s s i c a l planning  •  theorem-proving s y s t e m s  •  reactive planning  •  partial-order  •  constraint-directed s e a r c h  least-commitment p l a n n e r s  •  expert s y s t e m s  SCHEDULING I N O R  O R as we know it today emerged from the Second World War as a distinct set of mathematical modeling and optimization techniques. Shortly following the war, these techniques were applied to the difficult problem of determining the optimal sequence of activities in manufacturing environments [Hillier and Lieberman, 1986], [Pinedo, 1995].  31  EXACT APPROACHES  The exact approach to scheduling in O R focuses on identifying generic classes of scheduling problems and developing tractable algorithms that are guaranteed to find optimal solutions for all instances of the class. Informally, a tractable algorithm is one for which the amount of time required to attain a solution is a polynomial function of the size of the problem. A n important contribution of O R scheduling research has been to partition the space of scheduling problems into those that are known to be easy (polynomial-time solution techniques exist) and those which are suspected of being hard (no polynomial-time solution techniques are known to exist). The latter set can be further subdivided into problems that are N P (nondeterministic polynomial) hard in the ordinary sense and those which are strongly NP-hard [Pinedo, 1995], [Garey and Johnson, 1979]. A standardized notation of the form OC |. J3 | yhas been developed to describe generic classes of scheduling problems. The first term, a, describes the number and configuration of machines in the scheduling environment. The most common machine configurations and their notation forms are listed in Table 3.2. The p term is used to specify zero or more processing characteristics of the scheduling environment. For example, the Ji \ recrc notation describes a three-machine job shop in which a job may visit a machine more than once {recrc denotes "recirculation"). Finally, the y term describes the objective function to be minimized. Different objective functions are used in an attempt to capture the different underlying cost structure of scheduling environments. For example, a common measure to be minimized is the sum of weighted completion times for each job j—^WjCj. A n alternative measure to minimize is the lateness of each job, Lj, where lateness is defined as the difference between the job's completion time, Cj, and its due date, dj. In the first objective function, the primary source of cost is assumed to be associated with holding during production; in the latter objective function, the primary source of cost is assumed to be endogenously-determined penalties and rewards based on pre-specified due dates. The important thing to keep in mind when considering scheduling notation is that the goal is not to describe real-world problems; rather, the goal is to provide a compact and uniform way to describe the tractability of certain classes of problems. For example, Table 3.3 shows the computability of three different single-machines scheduling problems. The most obvious conclusion that one draws from examining Table 3.3 is that even seemingly minor changes to the processing characteristics or objective function of a problem can result in intractabil-  32  TABLE  3.2: Common machine configurations and their standard notations (from [Pinedo, 1995]).  Configuration  Description  S i n g l e m a c h i n e (1)  Example  the simplest c a s e : all j o b s must visit  the q u e u e for a family  the single m a c h i n e for a single  physician  operation Identical m a c h i n e s in  m functionally-identical m a c h i n e s are  parallel (Pm)  p l a c e d in parallel; a job j requires  a b a n k with m tellers.  p r o c e s s i n g o n o n e of the m a c h i n e s Flow s h o p (Fm)  m different m a c h i n e s are p l a c e d in  a n automobile a s s e m b l y  s e r i e s ; e a c h job must be p r o c e s s e d  line with m stations  o n e a c h m a c h i n e a n d the s e q u e n c e of m a c h i n e s is the s a m e for all j o b s J o b s h o p (Jm)  the s h o p c o n s i s t s of m m a c h i n e s ; the  a c l a s s i c manufacturing job  route taken by e a c h job is fixed  s h o p with m m a c h i n e s  h o w e v e r different j o b s m a y h a v e different routes O p e n s h o p (Om)  the s h o p c o n s i s t s of m different  visiting m b o o t h s at a trade  m a c h i n e s ; the route taken by e a c h  show  job is o p e n (up to the s c h e d u l e r )  ity—even in the single-machine case. Indeed, the class of scheduling problems that is considered tractable is surprisingly small. In the case of "open shop" and "job shop" configurations—which are common in practice—only the 02 | | C  max  and J2 | | C  max  problems are known to be solvable in  polynomial time [Pinedo, 1995]. Even the benchmark problem introduced in Section 2.5.belongs to a class of problems (F3 \ \ C ) that is known to be strongly NP-hard. In the case of the benchmark, max  it is the special structure of the problem instance that permits it to be solved in polynomial time using Johnson's rule. From the point of view of industrial-scale systems, exact approaches are of virtually no practical use due to the computational effort required to solve problems with more than a handful of jobs and machines. Although general optimization techniques such as integer programming (IP) can be used in practice to solve much larger problems, there is no guarantee that polynomial time solution methods exists for all IP problems [Hillier and Lieberman, 1986].  33  TABLE 3.3: Classification of three classes of single-machine scheduling problems (from [Pinedo, 1995]). Problem class  1 | prec |  L  max  Computational complexity  Description single m a c h i n e s h o p with p r e c e d e n c e  polynomial time  constraints b e t w e e n j o b s (prec); the objective is  solvable  to minimize the l a t e n e s s of the latest job  1 | rj, prmp \ ~LWJUJ  single m a c h i n e s h o p in which e a c h job is  N P - h a r d in the  r e l e a s e d to the s y s t e m at a different time (r ) 7  ordinary s e n s e  a n d j o b s c a n p r e e m p t j o b s a l r e a d y in p r o g r e s s  (prmp); the objective is to minimize the s u m of unit (i.e., fixed) penalties for late j o b s  1 Ij I L x r  ma  single m a c h i n e s h o p in which e a c h job is  strongly N P - h a r d  r e l e a s e d to the s y s t e m at a different time (rj)\ the objective is to. minimize the l a t e n e s s of the latest job  HEURISTIC APPROACHES  As indicated in the preceding section, the range of scheduling problems for which provably-optimal polynomial time algorithms exist is quite narrow. As a consequence, much of the applied work in O R has been directed at devising techniques that trade-off guarantees of optimality for tractability. In this section, a number of heuristic approaches from O R are briefly reviewed. 1. Dispatch rules — One heuristic approach that is commonly used in scheduling practice is dispatch rules (orpriority rules). A dispatch rule is simply a means of determining which job to schedule on a machine when the machine becomes free. Many dispatch rules were originally developed as exact techniques for small problems. However, experience has shown that certain exact algorithms provide reasonably good results when used as approximations in larger, more complex problems. A n example of a heuristic dispatch rule that is commonly used in practice is shortest queue at next operation (SQNO) (see [Pinedo, 1995 p. 143]). Under this rule, the jobs waiting for the free machine are examined and the one with the shortest queue for its next operation is selected. The rationale for the S Q N O heuristic is that it helps to balance the queues at downstream machines and minimize starvation . Note that since the rule does not 1  34  take into account other issues such as job priority or due dates, it is clearly incapable of scheduling jobs such that overall costs are minimized. 2. Composite dispatch rules — One problem raised in Chapter 2 is that single-attribute objective functions may not accurately represent the complex multi-attribute objective function faced by the firm. To address the problem of dispatch rules that are too narrowly focused on one measure of quality, composite dispatch rules can be used. In a composite dispatch rule, several basic dispatch rules are combined into a single index number. Weights are applied to each basic rule to determine its contribution to the overall index and simulation or experience can be used to fine tune the weights for a particular scheduling environment (see [Pinedo, 1995 p. 145]). 3. Bottleneck scheduling — Another class of heuristics, which has been used to minimize the makespan of job shops (i.e., problems of the form Jm | | C ) is the shifting bottleneck max  heuristic [Pinedo, 1995]. The premise underlying the botdeneck heuristic and other bottleneck-based approaches such as Optimized Production Technology (OPT) is that the overall rate of flow through the production system is governed by a small number of critical resources (e.g., bottleneck machines) that determine overall system throughput. Once the bottlenecks are identified, all scheduling effort is directed at ensuring efficient use of the botdeneck resources only (e.g., [Goldratt and Cox, 1986]). 4. Local search — In addition to the domain-specific heuristics described above, a number of generic local search techniques have been applied to scheduling problems. In local search, a neighborhood of alternate solutions is associated with each candidate solution. For example, for a particular schedule, a neighborhood of alternatives could be defined as all the schedules that result from a single pairwise switch in the ordering of jobs at a machine. Local search' proceeds by searching the neighborhood for an alternative that is better than the current solution. If one is found, it replaces the current solution, a new neighborhood is defined, and the search continues. If a search technique is guaranteed to find an optimal solution, then it is characterized as complete; in the more common case, incomplete search is used to find a solution that satisfies some minimum desirability criteria. 5. Meta-heuristics — A n important issue in local search is the existence of local minima. A number of different approaches (i.e., meta-heuristics [Shaw, Brind and Prosser, 1996]) have been 1  A machine is said to "starve" when there are no jobs available for it to process.  35  . used to avoid premature commitment to a local minimum solution. For example, in simulated annealing, there is a non-zero probability of replacing the current solution with one that is worse. In tabu search, the current solution can be replaced by any neighbor that it not on an ever-changing list of forbidden transitions. In genetic algorithms, random mutations are inserted in the solution strings. The non-determinism injected into the search by these techniques has been shown to reduce the problem of premature commitment to local minima and minimize the risk of cycling over the same set of solutions (see [Russell and Norvig, 1995], > [Pinedo, 1995]).  3.1.3  S C H E D U L I N G A N D CLASSICAL P L A N N I N G I N A I  The A I community's interest in scheduling per se is relatively recent (e.g., [Fox, 1987]). However, scheduling can be seen as a special case of planning and planning research dates back to the first "problem solving" systems in the mid 1950s (e.g., [Newell and Simon, 1956]) and to early robot planning systems (e.g., [Fikes and Nilsson, 1971]). The primary difference between a typical planning problem (such as stacking blocks so that they are in a desired order) and scheduling is the relative importance of search and path costs (recall Section 3.1.1 on page 30). In planning, the goal is to determine an ordered (or partially ordered) sequence of actions that achieves a goal. As in any problem-solving domain, the emphasis is placed on either minimizing search cost or finding a good solution within an upper bound on search cost. The exact timing of the executed actions is generally unimportant to the planner as long as precedence constraints embodied in the plan are satisfied. In scheduling, many of'the critical decision (such as which actions to take, which resources to use, and so on) are predetermined. In this sense, scheduling is much easier than generalized planning. However, the importance of path cost in scheduling means that any solution is not good enough—the best solution relative to a well-define cost function is the desired outcome. The requirement (or at least desire) for optimality and the requirement to explicitly include path costs in the problem formulation makes scheduling much harder than generalized planning. Despite these differences, the techniques used for planning can normally be adapted for use in scheduling. Within planning systems, a distinction can be made between on-line planners and off-line planners. In an on-line planning system, the agent reasons (at some level) about its decision problem before each  36  action. Thus, each decision "epoch" (or period) includes both search costs and path costs. For an on-line planner to operate effectively in its environment, the time required for search must be small in relation to the time required to execute actions, perceive changes in the environment. For example, an on-line planner to get groceries from a grocery store would not be particularly effective if it required several hours of deliberation after selecting an item in order to decide which item to select next on the grocery list. In off-line planning, reasoning about actions is done beforehand and the agent has a complete plan before executing its first action. Although the amount of computation is ultimately the same regardless of whether a particular search technique is implemented on-line or off-line , the off-line approach permits a clear separation between search and path costs. For this reason, off-line planning has been more widely used approach A I scheduling research. The question of whether a planning system's output is guaranteed to achieve the agent's goal is a function of the problem formulation. If the planning environment is accessible (the agent knows all it need to know to make decisions) and deterministic (the agent's actions and other events in the environment have known outcomes), then theorem proving techniques can be used to generate a sequence of actions that is guaranteed to move the agent from its initial state to its goal state. Such systems are known-as "classical" planning systems and are reviewed in the following sections.  4  KNOWLEDGE REPRESENTATION  Classical planning is logic-based and is therefore inextricably linked to formalisms for representing and reasoning about knowledge. The two formalisms that are most relevant for a discussion of planning systems are propositional logic andfirst-orderlogic (alternatively, first-order predicate calculus). This section contains a brief review of both these logics as they apply to planning. For a more detailed introduction to the use of logic in A I , see [Russell and Norvig, 1995]. Propositional logic consists of constants and symbols that are taken to be either "true" or "false" in the world. For example, if the symbol P represents the proposition "it is raining", then P is true  In A I , systems that satisfy this assumption are described as being momentary because execution is assumed to occur within a single moment. This assumes the deterministic case in which there is no advantage to be gained from observing the actual outcomes of actions during planning. The classical approach is also described as "planning from first principles" because very general inference rules from deductive logic are used to select and sequence actions [Jennings, Sycara and Woolridge, 1998].  37  whenever it is raining and false otherwise. The symbols of propositional logic can be used to form atomic sentences (e.g.,P) or combined using standard connectives to make complex sentences..For example, the sentence —\P v (Q A R) => S can be interpreted as follows: If P is not true or Q. and R are true then S is true. Propositional logic also includes a number of inference rules for deductive reasoning about the truth value of sentences. For example, given that the two sentences P => Q and P are true, the modusponens rule can be used to conclude that Q is also true. First-order logic (FOL) is more expressive than propositional logic in that F O L provides a compact means of describing relations between objects. For example, the predicate On(A, B) is true whenever the object represented by the symbols is physically on top of the object represented by symbol B. In addition to relations, F O L permits the use of universal (V) and existential (3) quantifiers over variables in sentences. For example, the universal quantifier can be. used to make general statements such as: "If the mother of x is Anne and the mother of y is also Anne but x and y are not the same person then x and y are siblings." In F O L , this can be written:  Vx, y Mother(x, Anne) A Mother(y, Anne) A —>(X = y) => Siblings(x, y) In this way, a small number of facts and predicates can be combined via the inference rules of F O L to deduce a much larger set of facts and predicates.  T H ESITUATION CALCULUS  First-order logic is sufficient for describing the state of the world. However, in planning systems, the objective is to describe changes in the world that result from events or the agent's own actions. The situation calculus [McCarthy and Hayes, 1969] provides a means of describing states, events, and outcomes using first-order logic. The primary advantage of the situation calculus is that the deductive inference rules of F O L remain valid and may therefore be used to reason about transitions from situation to situation. To represent changes in the planning environment, each relational predicate is nested within a special predicate holds f, s) where / i s a propositionalfluent (that is, a property that is either true or false in the world) and sis a particular situation or state. For example, in a "blocks world" environment the fluent On(A, B) may be true in one situation, but false in another situation in which the stacking order of the blocks has been reversed. To represent the different situations in the situation calculus, two sentences  38  can be written: Holds(On(A, B), Sj) and Holds(On(B, A), S^)- Since the situation in which each fluent is true is made explicit, the inconsistency between the sentences is eliminated (i.e., both sentences can be true at the same time as long as S^^  S2  To represent actions, the special predicate Result(a, s) is used to describe the state that results when action a is performed in state s. Since the Results predicate returns a state, it is possible to represent any state in terms of a predecessor state and the actions applied to it. For example, the following sentence describes the result of executing the ReverseBlocks action on any two blocks x and y in any state s: Vs, x,'y • Holds(On(x,y),  s)  Holds(On(y,  x), Result(ReverseBlocks(x,  y), s))  In words: the property On(B, A) "holds" whenever the ReverseBlocks action is executed in any state in which On(A, B) holds. Thus, the new implied state (s') is written in terms of the initial state and the results of an action (Result(ReverseBlocks(x,  y ) , j)).Very complicated action descriptions  (including preconditions) can be expressed using the Holds and Results constructs in combination with F O L . Given the formal representations of the situation calculus, it is possible to implement a planner using standard theorem proving techniques. To illustrate, assume that the desired goal state for an agent is Holds(On(B, A), s) and that On(B, A) is not true in the initial state. The planner can search through the list of possible actions and find one that has On(x, y) as an effect and make the appropriate substitution of B for x and A for y. For example, the action StackBlocks may be within the set of capabilities for the planner and defined using the situation calculus as: V(s, x , y ) • Holds(Clear(x), Holds(InGrip{x),  s) A Holds'Clear(y),  s) => Holds(On(x,y),  s) A —i(x = y) A  Result(StackBlocks(x,y),  s))  The planner knows that if it can satisfy the preconditions of the action (i.e., that the two blocks are clear and that the block to be placed on top is in its grip) then it will achieve its goal. If the action's preconditions are not true in the current state, then the planner can push the preconditions on to its goal stack and set about finding an action that satisfies the preconditions. The planner repeats the process until the first precondition in the chain of action preconditions is satisfied by the initial state. This recursive process is called regression planning since the planner starts with the goal and works back-  39  wards towards the initial state. It is also possible to search forward from the initial state [progression planning). The relative merits of either search strategy depend on the structure of the planning problem being addressed [Russell and Norvig, 1995], [Georgeff, 1987]. Adoption of the situation calculus for planning systems has been hampered somewhat by the existence of the so-called frame problem. The frame problem occurs because action descriptions describe what changes in the world as a result of an action. However, the action descriptions do not describe what has not changed as a result of the action. In order to be able to reason effectively, frame axioms need to be appended to the action descriptions to make them complete [Reiter, 1991]. The computational load created by the need to reason about the action descriptions and frame axioms has restricted the use of the situation calculus to this point to relatively small systems [Pednault, 1989], [Georgeff, 1987].  The  T H E STRIPS R E P R E S E N T A T I O N  STRIPS  representation pikes and Nilsson, 1971] was developed in the late 1960s in response to  the perceived shortcomings of the situation calculus and the frame problem in particular. Although the  STRIPS  planning system has long ago been superseded, elements of the approach—specifically,  the  STRIPS  representation and the  community. In the  STRIPS  STRIPS  assumption—remain in widespread use in the A I planning  representation, the current state is represented by a conjunction of atoms.  Atoms are simply ground literals that can be either true or false. The literals are "ground" in the sense that STRIPS does not support reasoning about relationships and quantification like the situation calculus does. Although one is free to write a STRIPS state atom as more or less meaning than the simpler alternative  On(Book, Table), the  sentence has no  OBT . 5  The action representation in STRIPS consists of three elements: 1. Action name: each action in STRIPS has a unique name. Although names that look like complex predicates are permitted, they are not interpreted as complex predicates. As such, the action name such as  StackBlocks(A, B)  could be represented  by  any other combination of symbols,  StackAonB.  To make a clear distinction between sentences in F O L and  40  STRIPS,  the latter are shown in a sans-serif font.  2.  Precondition list: The preconditions for an action are represented as a conjunctive list of atoms. For example, the preconditions for the StackAonB action could be written: {ClearA, ClearB, InGripA}.  3.  Effects list: The effects of an action are also represented as a conjunctive list of atoms. For example, the effects of StackAonB could be written: {OnAB, ^ClearB}.  The  STRIPS assumption  states that atoms that are not explicitly referred to in the effects list are  assumed to be unaffected by the execution of the action. By assuming invariance as the default, there is no need to add frame axioms that state that (for example) the color and the size of the blocks remain unaffected by the StackAonB action. Although the  STRIPS  representation is not as expressive as  the situation calculus [Pednault, 1989], it has the virtues of being simple to understand and implement. Further more, as discussed in Chapter 4, the language can be extended in a straightforward manner to support rich representation of non-deterministic actions.  CLASSICAL P L A N N I N G A N D R E F I N E M E N T S  The basic process of planning using theorem proving has been refined in a number of different ways to avoid unnecessary computational effort, decompose the problem into smaller, manageable pieces, and account for unexpected outcomes in plan execution. For a more detailed overview of advances in A I planning see [Weld, 1998], [Russell and Norvig, 1995], [Weld, 1994], [Georgeff, 1987]. 1. Partial order planning: In partial order (or least commitment) planning, the planner refrains from specifying a fully-ordered sequence for actions until forced to by a binding precedence constraint. In all other cases, it simply maintains a list partial ordering and causal relationships between actions. If a particular combination of actions in a particular sequence creates conflicts or inconsistencies, the sequence of the offending actions is fixed to make them consistent. To illustrate, consider the process of getting dressed in the morning. Certain actions, such as PutOnSocks and PutOnShoes are subject to a precedence constraint. As such, the partial order planner must enforce the sequence: PutOnSocks—>PutOnShoes. However it is immaterial whether one puts on one's shirt before or after putting on one's socks and shoes. Indeed, there is nothing inconsistent with the sequence PutOnSocks—»PutOnShirt—> PutOnShoes. As such, a partial order planner would leave the groups of actions PutOnShirt and PutOnSocks—>PutOnShoes unordered but specify the ordering of the latter two actions.  41  2. Hierarchical planning: Execution of an action such as StackAonB by a robot could involve a large number of detailed steps. For example, the robot would have to move its arm to Block A, open its grip, close its grip on the block, lift the block, and so on. A n action such as CloseGrip could also be further decomposed into fine-grained actions including sensing the status of the gripper, sending signals to the motor, and so on. From a computational point, of view, increasing the number of actions can lead to an exponential increase in the complexity of the search [Russell and Norvig, 1995]. As such, it is advantageous to create a hierarchy of abstract actions that encapsulate more primitive actions. If such a hierarchy can be constructed, then the planner can plan using a relatively small number of abstract actions and then work out the specifics of the finer-grained actions independently of other fine-grained actions located under different branches of the hierarchy. 3. Conditional planning: Conditional planning addresses the fact that plan execution is not flawless. The agent may execute an action and find itself in a different state than expected. For example, when executing the PutOnShoes action, the agent may break a lace and be unable to complete the action. Conditional planners attempt to enumerate all possible contingencies and generate a sequence of actions for each. O f course, conditional planning presupposes the agent's ability to sense the current state of the world. If, for example, the agent broke a lace and correctly recognized the outcome, it could execute an alternate branch or actions to deal with the contingency. Although classical planning systems such as STRIPS work well for small "blocks world" problems, such systems do not scale well to larger problems: One important consideration is the amount of search that might be required to find a complete plan. If the number of actions available to the agent is large or some lead to dead ends or loops (e.g., the ReverseBlocks action above), the amount of time required for the planner to satisfy the goal (assuming the goal is satisfiable) could be considerable.' Indeed, Chapman [1987] has shown that planning—regardless of method—is intractable.  3.1.4  DECISION-THEORETIC PLANNING  A second problem with classical planning approaches is that the agent is assumed to have perfect information about its state and the effects of its actions. These assumptions are relaxed somewhat in conditional planning since the agent is given a contingency plan—a mapping from situations to  42  actions for every situation in which the agent can find itself. Unfortunately, a conditional plan is not sufficient for an agent in an uncertain environment. To illustrate, consider the simple mobile robot scenario described in [Dean et al., 1993] and reproduced in Figure 3.1. A mobile robot is navigating a two-dimensional space in an effort to reach its goal (a charging station) at coordinates (1, 4). The charging station is adjacent to the open end of a stairwell (which the robot is not equipped to navigate). The remaining three sides of the stairwell are closed off by a railing which the robot cannot breach. The task of the planner is to determine the robot's best action in any state in which it might find itself.  Legend charging station (high positive utility) stairwell (high negative utility)  t t FIGURE 3.1  railing around the stairwell robot action for state  An example of an optimal contingency plan for a probabilistic planner: The shaded area in the center of the two-dimensional space is a stairwell.  Clearly, the robot's best action to perform in any state is contingent on the robot's location with respect to the charging station and the stairwell. A conditional planner would provide the robot with a contingency plan in the form of a mapping from states (in this case, x, y coordinates) to actions (such as MoveNorth, MoveSouth, and so on). However, what a conditional planner does not do is provide the agent with a means of reasoning about how its own actions affect the states in which it finds itself. In any real-world environment, an agent's actions are non-deterministic. For example, the agent may execute the MoveWest action but, due to some mechanical problem, end up instead moving one unit to the south with probability p. If the agent were to execute the non-deterministic MoveWest action in the location (2, 4), it would achieve its goal with probability (1 - p) and fall down the stairs and be smashed with probability p. In such circumstances, it is important that the agent be able to  43  take into account not only the possibility that its actions are imperfect, but also all the potential consequences of all possible outcomes.  PROBABILISTIC REPRESENTATIONS OF ACTIONS  A natural way of reasoning about the desirability of uncertain outcomes is decision theory. A decision-theoretic (or probabilistic) planner assigns utilities to states of the world and uses its knowledge of probabilistic outcomes to select actions such that its expected utility is maximized. For example, assume that the mobile robot in Figure 3.1 associates a utility of 10 with reaching the charging station and a utility of -100 with falling down the stairwell. Assume as well that although the agent does not know the exact outcome of any of its actions, it knows the probability that a particular action executed in a particular situation will result in a particular outcome with a known probability. For instance, each action could have a probability distribution over outcomes similar to that shown for MoveWest  in Table 3.4 below.  TABLE 3.4: The probability of each outcome for the MoveWest action. Effect  Probability  West  0.80  No change  0.10  North  0.05  South  0.05  The utility of being in a particular state is therefore the reward (or penalty) associated with the state, plus the utility the agent can expect to receive from executing an action in the state. Returning to the example of location (2j 4) in Figure 3.1 and the probabilities in Table 3.4, the immediate reward realized by the robot by being in state (2, 4) is zero (in this problem formulation, only the stairwell and charging states have utilities associated with them). When contemplating whether to execute the MoveWest  action in an effort to reach the very attractive state to the West (the charging station), the  robot must also consider the 10% chance that it will remain in the same zero-reward state and the 5% chance that it will tumble down the stairwell.  44  To express the elements of a decision-theoretic planning problem formally, the following notation is used: •  S is the finite set of mutually exclusive states that describes the agent's environment. State s is an element of the state space (s 6 S). In Figure 3.1, the state space is defined as a two-tuple (x, y) describing the robot's location on a 4 X 5 grid.  •  U(s) is the utility to the agent of being in state s. In the case in which the agent is risk-neutral, utility is a linear function of the monetary value associated with the state, V(s) and the terms "utility" and "value" are interchangeable.  •  r(s) is the immediate reward the agent receives for being in state s. For example, if the mobile robot moves into the location containing the charging station, it receives an immediate reward of 10 units of utility. O f course, rewards can also be negative: the states corresponding to the stairwell have large negative rewards.  ,•  a e A denotes that an action a belongs to the set. of feasible actions A associated with state s  s. Not all actions are feasible in every state. For example, whenever the robot is in a location such as (2,1) bounded on the North by the stairwell railing, it cannot execute the action MoveNorth. In other words, MoveNorth g ^4^ ^ . 2  •  c(a, s) is the cost of executing action a in state s. The inclusion of action costs in the formulation permits the costs and benefits of decisions to be compared direcdy. For example, a cost of one unit of utility may be charged to the robot whenever it moves to a new location. In the mobile robot context, the action cost represents the drain on the robot's battery as it moves around. As with negative rewards, it is possible to have negative costs (i.e., execution .of the action creates an increase in the agent's utility) . 6  •  (3 is a constant between 0 and 1 that is used to discount costs and rewards in the future. A value of zero means that the agent fully discounts the future and considers immediate rewards only. Conversely, a (3 value of 1 means that the planner treats future costs and rewards and immediate cost and rewards identically. In many infinite horizon problems, the discounting factor is used to capture the intuition that a unit of utility today is worth more than a unit of utility in the future.  Although the semantics o f negative costs and rewards can be confusing, the terms are used throughout this thesis to remain consistent with the standard DTP nomenclature.  45  •  P(j\ > ) is the probability of moving to state j after performing action a in state s. Essena  s  tially, this is a transition probability matrix that contains the probability of moving between any two states in the state space as a consequence of executing action a. A complete action representation for a particular, planning problem requires a  matrix for every action.  |'J"|X  Given this notation, it is possible to define the utility of being in a particular state as the immediate reward for being in the state plus the cost of executing the chosen action in the state plus the expected value of executing the action. Assuming the risk neutral case: U(s) = V(s) = r(s) + c(a,s) + ^^P(J-W,s)V(J)  .  Eq: 3.1  jeS  Since the value of each state can depend on the value of all other states, a separate equation must be written for each state in the state space. The | S | equations in | S | unknowns can be solved as a set of simultaneous equations or using iterative techniques such as value iteration [Puterman, 1994].  F I N D I N G O P T I M A L POLICIES  A policy, TC, is defined as a mapping 71: s —> a e. A for all s e. S. The question for a decision-theos  retic planner is how to select actions such that the expected utility of the policy is maximized. Fortunately, this formulation of the decision-theoretic planning problem maps to a well-known class of discrete, finite state, history independent, fully-observable Markov Decision Problems (MPDs) [Boutilier, Dean and Hanks, 1999], [Boutilier and Dearden, 1996], [Boutilier, Dearden and Goldszmidt, 1995], [Puterman, 1994], [Dean et al., 1993]. One means of solving this type of M D P is policy iteration [Howard, I960]. Policy iteration consists of two stages per iteration. In the evaluation stage, the value of each state in the state space is determined using the series of equations induced by Equation 3.1. In the refinement stage, the policy is improved on a state-by-state basis by replacing the current action with a better action (if one exists). The evaluation and refinement procedures are repeated until the refinement results in no changes being made to the policy. A n important feature of the M D P formation is that policy iteration is guaranteed to converge on the optimal policy, TC*, in polynomial time [Puterman, 1994]. The details of the policy iteration algorithm are summarized in Algorithm 3.1.  46  3.1: P O L I C Y  ALGORITHM  1.  ITERATION  TC <— any policy on S 1  2. While  TC*TC  a.  TC' <— TC  b.  For all s e S, find V (s) using Equation 3.1  c.  For all s € S, evaluate a* where  (  %  ( a* e arg max ae A d.  V(s) = r(s) - c(a, s) + p £  P  ^  )  s  J)  V<  Eq: 3.2  jeS  If n(s) * a* then lt(>) <— a*  3. Return TC  DTP  OBSERVATIONS  There are two observations that can be made about D T P at this point. First, taking into account many different costs and rewards can lead to complex behaviors that may be unintuitive. For example, consider the case of the mobile robot in (2, 4). The value to the robot of executing MoveWest action is less than one might think due the 5% chance of falling down the stairwell and the large negative reward associated with this outcome. Indeed, the  rational course  of action might be to avoid the  open end of the stairwell to the greatest extent possible. Thus, if the robot started in a state on the East side of the state space, it might expend extra effort to travel around the South side of the stairwell (which has a protective railing) even though an action cost is incurred for each movement. This policy is shown by the arrows in Figure 3.1.  47  More generally, an important advantage of the decision-theoretic framework is that it provides a means of making trade-offs between different aspects of the agent's planning environment in the face of uncertainty. 1.  M u l t i p l e g o a l states  —  Unlike classical planning, a decision-theoretic planner can reason  about multiple goal states. In some cases (e.g., the charging station and the stairwell) the goals can be conflicting. 2.  M u l t i - a t t r i b u t e goals —  Since goals are represented by utility values, multi-attribute utility  functions can be used to express complex relationships between states and the agent's assessment of the state's utility. 3.  Standard unit of measure  ; —  A l l cost and benefits faced by the agent'are expressed in the  same units of utility. The second observation about decision-theoretic planning is that solution via the M D P techniques such as policy iteration requires explicit enumeration of the state space. Unfortunately, the size of the state space is exponential in the number of attributes used to represent the planning problem. To illustrate, consider the following extension to the mobile robot planning problem in Figure 3.1:  After some experimentation, it is determined that the robot is more reliable on a scuffed floor than a recently polished floor. Thus, on scuffed floor, the probability of staying in same position given an action is 0.025 instead of 0.10.  The Markovian assumption on which policy iteration is based requires that all the information that is required to make a decision be encoded in the state definition. Thus, the state definition would have to be expanded to a three-tuple (x,y,f), where/describes the condition of the floor. This simple change doubles the size of the state space from 20 to 40 states.  T H E CURSE OF DIMENSIONALITY  In general, a problem consisting of d propositional state variables induces as state space of size | S | =2^ states. The exponential growth in problem size is known as the curse of dimensionality because it greatly limits the usefulness of the conventional M D P formulation in practice. Thus, although policy iteration can be used to compute the optimal policy in an amount of time that is polynomial in the  48  number of states, the number of states are exponential in the complexity of the domain description. The net result is therefore intractability for all but the smallest problem instances. A number of approaches have been proposed in the literature to cope with the curse of dimensionab ity. For example, [Dean et al., 1993] restrict the planner's attention to areas of the state space ("envelopes") that are likely to be encountered as the agent moves towards its goal state. In a similar approach, [Barto, Bradtke and Singh, 1995] interleave planning and execution in order to use information about the current state to restrict the planner's attention to likely future states. [Boutilier, Dearden and Goldszmidt, 1995] employ the concept of "abstract states" to reduce the effective number of states that are visited by the policy iteration algorithm. The notion of abstract states is based on the observation that not all state variables are relevant in all states. For example, if the mobile robot in Figure 3.1 has to execute the action GetCharged while at the charging station to get the reward, then only those state variables that affect the probability distribution over action outcomes or the utility of the state itself are relevant. In this example, the condition of the floor is irrelevant once the robot is in the abstract state (1, 4, •). The foundations of D T P and strategies for avoiding the curse of dimensionality are discussed in greater detail in Section 4.2.  3.1.5  CONSTRAINT-BASED  PLANNING  Given the complexity of real-world manufacturing environments, it is sometimes better to abandon the strategy of focusing on narrow objectives (as a reducing makespan or eliminating tardiness) and reframe the task of allocating resources to jobs as a constraint satisfaction problem (CSP). In a CSP, the objective is not to find an optimal solution. Instead, the objective is to consider a very large space of possible solutions and find one that satisfies a number of domain-specific constraints. To achieve this end, the constraints are used to guide the search for .solutions and eliminate large areas of the search space. For example, if it is known that a drilling operation must be followed by a deburring operation, then all regions in the state space in which the precedence constraint is violated (i.e., deburring precedes drilling) can be eliminated from further consideration. One of the first and best-known constraint-based systems for scheduling was the ICIS system (and its successors) developed by Mark Fox [Fox, 1994], [Fox, 1987]. ICIS was used for production planning in a Westinghouse Electric turbine factory. A t the time of the study, there were as many as 200 shop  49  orders in progress and each order consisted of ten or more operations. One of Fox's key observations about production environment is that schedulers spend only a small fraction of their time (10% to 20%) dealing with issues such as due dates, process routings and machine availability. The rest of the schedulers' time is spent resolving other issues such as the availability of tools and fixtures, machine breakdowns, setup time reduction, and a multitude of soft preferences and constraints. In ICIS, the term "constraint" is defined broadly to include many different types of domain knowledge such as organizational goals, physical limitations and capabilities, causal relationships, resource 'availability, and preferences. A frame-based knowledge representation language called SRL is used to express the constraints and their interrelationships. The constraint-representation language also includes constructs such as priority (for determining which, constraints are more important), utility of the current attribute value, possible relaxations of the constraint, and interactions between constraints.  •  '  •  In addition to using constraints to limit the search space, ICIS uses hierarchy to partition the problem into a series of levels. In Level 1 search, orders are scheduled according to priorities and due dates. In Level 2, capacity analysis is performed to determined the earliest start and latest finish times for each order. In Level 3, specific resources other than machines are considered and time bounds for each resource are created. In Level 4, the "partial order" specifications from Level 3 are fully ordered with the objective of minimizing work-in-progress time. In this sense, the decomposition of the manufacturing problem into levels mirrors the conventional decomposition from Figure 2.1 on page 14. [Zweben and Fox, 1994] contains a collection of papers describing several applications of CSP-based methods for scheduling. For example, the "decision-theoretic scheduler" (DTS) described in [Hansson and Mayer, 1994] uses decision-theoretic concepts (utility and probability) to guide the selection of heuristic functions for searching the space of possible solutions. The expected utility of each heuristic is used to make trade-offs between evaluation functions. [Kumar, 1992] contains review of other techniques for local search and constraint propagation.  3.1.6  RECENT ADVANCES IN PLANNING  There have been a number of recent advances in A I planning since this thesis was started (see [Weld, 1998] for a review). For example, the BLACKBOX planner is capable of solving 105-action planning  50  problem over 14 time steps with 10  possible states in six minutes [Kautz and Selman, 1998]. Like  many of the newer planning systems, BLACKBOX is. based on the synthesis of classical planning techniques and constraint-directed search techniques.  3.2  MARKETS A N D EQUILIBRIUM  The market-based approach to problem solving is very different from the monolithic scheduling and planning techniques described in the preceding sections. In market-based approaches, solutions emerge from the interaction of many individual agents. From a designers' point of view, the critical issues are quality and stability of the market outcome. Fortunately, there is a large body of economic theory that addresses precisely these issues. The basic problem investigated in economics is the "allocation of scarce means to satisfy competing ends." [Becker, 1976]. The fundamental building blocks used in economics to represent and reason about allocations problems are agents and markets. According to Becker, the economic approach reduces to three assumptions about agents and markets: 1. agents have stable, well-defined preferences, 2. agents exhibit maximizing behavior, and . 3. markets exist to facilitate the allocation of resources among agents. The first two assumptions concern the rationality of the agents that participate in the economic system. The third assumption concerns the provision of markets and the emergence of equilibria in the markets. In the following sections the theoretical foundations of microeconomics are briefly reviewed. In Section 3.2.1, the term "rationality" is defined and illustrated with a simple model. In Section 3.2.2, the focus shifts from a normative model of individual behavior to a descriptive model of collective behavior. The concept of competitive equilibrium is reviewed through the use of a two-agent, two-good Edgeworth box economy. The objective of the section on equilibrium is to illustrate the concept of Pareto optimality and the stability of Pareto optimal outcomes. A shortcoming of standard economic models of equilibrium is that they rely on the assumption of a competitive market. In the markets used in this thesis, there is generally insufficient market depth for an equilibrium price to be discovered through tdtonnement. As a consequence, an alternate price discovery mechanism must be used and Section 3.2.3 contains a short summary of auction theory. Different auction forms  51  are reviewed and equivalence relationships between forms under certain conditions are emphasized. The section on auction theory is included for two reasons: first, it shows how certain auctions can be relied on to yield a Pareto efficient outcome; second, it provides a theoretical justification for the use of a continuous reverse auction in Section 4.3.2.  3.2.1  RATIONALITY  According to the standard economic and decision-theoretic definitions of rationality, a rational agent is one that maximizes its expected utility subject to a consistent set of beliefs and desires [Von Neumann and Morgenstern, 1944]. Under this formulation, beliefs are represented using probability distributions and therefore a consistent set of beliefs is one that satisfies the basic axioms of probability theory. For example, a rational agent cannot believe that it will rain with a probability of 0.6 and also believe that it will not rain with a probability of 0.6. A n agent's desires are represented using real-valued utility functions and therefore a consistent set of desires must satisfy the standard axioms of utility theory (e.g., transitivity, continuity, and monotonicity) [Clemon, 1996], [Holloway, 1979]. A model of rational decision making is shown in Figure 3.2. A n agent starts with a belief (expressed as a probability) that it is in a particular state. As in Section 3.1.3, the assumption is often made in practice that the agent is in an accessible (or fully observable) environment and that its beliefs about its state are certain. Given that the agent is situated in a world of infinite detail and complexity, the assumption of accessibility is difficult to justify. However, the assumption is often critical because partially-observable problems are much more complex mathematically [Littman, Kaelbling and Cassandra, 1995]. The compromise stance adopted here is that the state of the agent is not fully known, but that the most relevant features (such as the w-tuple descriptions of the from the robot example in Section 3.1.4) are known with adequate certainty. In order to select an action to execute in a state, the agent draws on a second type of belief—belief about causality. Since actions are non-deterministic, an agent does not know a priori what the precise outcome of an action will be. However, the agent does have beliefs in the form of a probability distribution over outcomes for each action in each state. Moreover, each possible outcome is associated  7  Tatonnement—the iterative process o f signals and price adjustments—was originally described by Leon Walras in the 1800s [Cheng and Wellman, 1998].  52  states  actions  outcomes  payoffs  •  •  A A  A  beliefs about state  knowlege of alternatives and capabilities beliefs about causality  knowledge of preferences and objectives  rational d e c i s i o n making  FIGURE 3.2  A model of rational decision making under uncertainty: Agents select actions in states in order to maximize their expected utility over some decision horizon. In order to do this, agent require different kinds of knowledge about their state, the outcome of actions, and their own preferences regarding outcomes.  with a known payoff for the agent and each payoff in turn induces a utility. In mapping payoffs to utility, the agent may be risk adverse, risk neutral, or risk seeking. In the case in which the agent is risk neutral and all payoffs are expressed in monetary terms, the maximization of expected utility is equivalent to the maximization of expected monetary value (EMV). It is important to note that the fundamental constructs in the economic/decision-theoretic definition of rationality (maximization of expected utility over some decision horizon given an information set) are identical to those of the M D P formulation of agent planning in Section 3.1.4. That is, the M D P formulation provides a computational method of implementing the model of rationality shown in Figure 3.2. It is equally important to note, however, that a distinction is sometimes made between decision-theoretic (or thin) rationality and a broader notion of rationality that requires that beliefs not only be consistent with respect to axioms, but also that the beliefs be grounded in available evidence [Elster, 1983]. The precision and accuracy of an agent's beliefs are important when one considers the question of optimal decision making. A course of action which leads to maximization of expected utility by an agent can be considered optimal, but only with respect to the agent's information set. Thus, although a thinly rational course of action is first-best, it is generally possible to do better by improving the  53  degree to which the agent's beliefs reflect the reality of its environment. This corresponds to the con cept offineness: a rational agent can expect to do at least as well and possibly better with finer informa tion [Marschak and Radner, 1972]. In the extreme case, the agent's information about cause and effect is so fine that each conditional probability is equal to 1.0. O n the other hand, there are costs associated with gaining finer and better information. In the formulation of rationality shown in Figure 3.2, the costs and benefits of better information are not considered.  3.2.2  EQUILIBRIUM  In systems consisting of more than one rational agent, there is often conflict between the agents' individual goals. In a pure market environment, inter-agent conflict manifests itself in a single form: contention for scarce goods. The issue addressed in equilibrium analysis is how multiple rational agents arrive at a solution—that is, how they divide multiple goods amongst themselves. Since markets contain no element of centralized control, an equilibrium outcome (if one exists) evolves solely through the self-interested behavior of market participants. Formally, a competitive (or Walrasian) equilibrium occurs when the demand for goods exacdy equals the supply of goods in an economy. To illustrate the basic concepts of equilibrium when multiple goods are being exchanged simultaneously (i.e., general equilibrium), the simplest case of an EdgeQ  worth box economy is used. The concepts and theories introduced in the Edgeworth box analysis in the following sections are revisited in Chapter 5 to support conclusions about the efficiency of certain market mechanisms.  INDIVIDUAL C O N S U M E R S A N D B U D G E T S  The critical feature of a market economy is that the goods that an agent may wish to acquire are available for trade with other goods at known rates of exchange [Mas-Colell, Whinston and Green, 1995]. To understand how agents make trade-offs between goods, consider the case of a single agent i in a two-good economy. The agent has an initial endowment of each good, CO- = (COj,-, C0 ) and the 2l  objective of the agent is to find a bundle x* = (x*,-, x* ) that maximizes its utility. In seeking to max2i  imize its utility, the agent is subject to the constraint that its wealth is fixed. The agent's wealth, w-, is  8  The material on general equilibrium and Edgeworth boxes can be found in most microeconomic texts. The terminology and notation used here are from [Mas-Colell, Whinston and Green, 1995].  54  FIGURE 3.3  The Walrasian budget set: In (a) the budget line delimits the set of feasible consumption bundlesfor agent i subject to the agent's initial endowment CO,- and the prices of the goods. The agent's mostpreferred bundle within its budget set, x is shown by theright-mostindifference curve in (b). i  simply defined as its initial endowment of goods multiplied by the market prices of those goods: w = p • x - = P\X t  ;  + P2 2i-> where/? is a vector of prices for the L goods in the economy: p e 9 ^ . x  Xi  The Walrasian budget set denotes the set of feasible consumption bundles for the agent subject to its budget constraint, as shown in Figure 3.3(a). A n indifference curve for agent i, >j, denotes all the consumption bundles that are equally preferred by the agent. In other words, every bundle along an indifference curve provides the agent with the same utility. Under normal circumstances, increasing utility is represented by curves further from the ori9  gin. Thus, although the initial allocation CO,- and the alternative allocation x, shown in Figure 3.3(b) are equally affordable (they both lie on the budget line) the indifference curves indicate that the bundle * * X{ is stricdy preferred by the agent. Indeed, the bundle represented by x,- is optimal since it is the point at which the budget line is tangent to an indifference curve.  P A R E T O O P T I M A L ALLOCATIONS IN A N E D G E W O R T H B O X E C O N O M Y  A n Edgeworth box is a graphical tool for representing general equilibrium in pure exchange markets consisting of two agents and two commodities. In a pure exchange market, there is no production or  9  Indifference curves are traditionally drawn with a concave shape to indicate decreasing marginal utility as the consumption bundle approaches either o f the axes.  55  x  (a)  n  Equilibrium in an Edgeworth box economy: In (a), an Edgeworth box is used to show a Vareto optimal outcome at point x* given an initial endowment CO and a vector ofprices p. The graphic labeled (b) shows the effects of a non-equilibrium price and the excess supplyfor Good 2 created by the price. The grey area show the path taken by the budget line as the prices converge to their equilibrium values.  3.4  F I G U R E  (b)  transformation of goods. Instead, the two agents seek to maximize their utility by exchanging goods 4  with each other. A n allocation x G x  ^l)'(  =  x  12'  22^  x  is a non-negative consumption bundle of good for each agent:  •  A n Edgeworth box is created by taking a diagrams of the form shown in Figure 3.3(b) for each agent and placing them kitty-corner, as shown in Figure 3.4(a). The origin for Agent 2 is at the top right of the box and thus points within the box describe how the total endowment of each good is split between the two agents. Since there is no production of goods within the economy, the length and width of the box are simply the sum of the initial endowments of each agent: 05 j = COj j + (0\  a  n  2  G5  2  =  C0 i 2  +  G) 2 2  • Given a vector of prices p = {p\,P2}, a budget line with slope ~P\/p2  c  a  n  d  be  drawn through the box. Since the initial endowment is known to be affordable for each agent (indeed, it defines the wealth of the agents), the budget line must intersect the point CO.  To illustrate the interpretation of an Edgeworth box, consider the initial allocation and the indifference curves for each agent shown in Figure 3.4(a). The allocation at CO is not Pareto optimal since it is possible to find a different allocation that makes neither agent worse off and at least one agent better  56  off. Specifically, an allocation is better for Agent 1 if it lies above and to the right of Agent I's initial indifference curve, >j. Conversely, an allocation is better for Agent 2 if it lies below and to the left of Agent 2's original indifference curve, >j. Since a feasible allocation must simultaneously satisfy the budget constraints of both agents (i.e., it must lie on the budget line), the set of allocations that are preferred by both agents to the initial allocation CO are the points on the budget line bounded by >i and >2-  •  '  At the allocation represented by the point x in Figure 3.4(a), the budget line is tangent to the indifference curves for both agents. Given the slope of the budget line, neither agent can do better than x and thus a state of. competitive (or Walrasian) equilibrium is achieved. In addition to being stable, the equilibrium is Pareto efficient. Indeed the relationship between equilibrium and welfare maximization holds across a wide range of economies, not just the Edgeworth box economy considered here. Thefirstfundamentaltheorem of welfare economics'states that under perfectly competitive conditions, any equilibrium allocation is a Pareto optimum (see [Mas-Colell, Whinston and Green, 1995], [Mukherji, 1990], [Katzner, 1988]).  E Q U I L I B R I U M PRICES  To this point, little has been said about the equilibrium vector of prices,/?, that determines the slope of the budget line and thus the location of the equilibrium allocation. Under the fully competitive conditions required by the first theorem, agents are price takers—that is, they have no influence on the prices of the goods in the marketplace. To illustrate the manner in which the equilibrium price * p is discovered by the market, consider the non-equilibrium case in which the prices of the two goods are set to some arbitrary values p' ^p* . A t these prices, the optimal allocations for each agent lie on different points of the budget line, as shown in Figure 3.4(b). To move from CO to its preferred allocation at the point marked X j , Agent 1 would have to sell a certain quantity of Good 1 and buy a certain quantity of Good 2. Conversely, to move from CO to its preferred allocation at the point marked x , Agent 2 would have to sell a certain quantity of Good 2 and buy a certain quantity of Good 1. 2  However, as the figure shows, the amount of Good 2 demanded by Agent 1 is much less than the amount that Agent 2 needs to sell (and vice-versa for Good 1).  57  To see how an equilibrium price emerges, consider the case of Good 2 along the horizontal axis of Figure 3.4(b). Although Agent 2 requires a significant amount of Good 1 to move to its most preferred allocation, Agent 1 is willing to sell much less than this amount given the current ratio of prices. In this case, Agent 2 should be willing to pay a little more for the good. In a competitive market, no single agent can control the prices of goods. Thus, the relative prices of the goods change so that supply exactly equals demand and the market clears. Graphically, the budget line in Figure 3.4(b) would pivot through CO, as shown by the shaded area until no excess demand exists for either good.  3.2.3  AUCTION  THEORY  In competitive markets, an ill-defined process of tatonnment is constantiy at work adjusting prices in response to excess supply or demand. The metaphor of an auction is often used to describe the process: prices evolve as if each market is controlled by an implicit Walrasian auctioneer who collects information about supply and demand and selects a price so that the market clears. In non-competitive markets (i.e., in a market in which there are not many buyers and sellers of undifferentiated goods), the auction process must be explicit. In practice, we see auctions used to sell a diverse range of goods such as financial instruments, agricultural commodities, and antiques. The key functions of an auction are allocation of goods and price discovery. As such, an auction is especially useful when a selling agent does not know the value of a good it intends to sell but wishes to maximize its revenue from the sale. Auctions are very different from fixed-price markets. In a fixed-price market, the seller sets a price and the first agent that is willing to pay the price gets the good. If the market is not fully competitive, the seller may set the price too low. Although the good will sell if underpriced, there may be a different agent who would have paid more for the good. In contrast, the seller may set its price too high in which case no transaction occurs even though a Pareto efficient exchange may be possible at a lower price. In either case, a non-competitive fixed-price market does not guarantee allocative efficiency or revenue maximization for the seller. When fixed-price markets lead to market failure, a more flexible price discovery process is required. For example, bilateral negotiation (i.e., haggling between buyer and seller) may yield higher revenues for the seller on average than fixed-price markets.  58  1o  However, negotiation does not guarantee alloca-  tive efficiency and may be impractical for a large number of goods. Auctions, as discussed in the following sections, provide a balance between allocative efficiency and practicality.  AUCTION FORMS  To achieve the simultaneous objectives of allocative efficiency, revenue maximization, and practicality, different auction forms have evolved in response to different market environments. The auction form specifies the procedure used by buyers and sellers to converge on a mutually satisfactory price. Different auction forms are defined by three characteristics: bidformat, allocation rule, and price function [Engelbrecht-Wiggans, 1983]. The bid format describes how bid information is submitted to the auctioneer. For example, in ascending price (or English) auctions, bidders submit progressively higher bids for the good until a single bidder remains. In a descending price (or Dutch) auctions, the.ask price starts high and descends until a bidder accepts the price. Since Dutch auctions tends to achieve quiescence faster than English auctions, they are often used for low-valued perishable goods, such as fish and fresh flowers [Kambil and van Heck, 1998]. There are a number of alternatives to open-outcry auctions. For example, in sealed-bid auctions, bids are submitted in secrecy to the auctioneer. A t a predefined point in time, the auctioneer ceases to accept bids and determines the allocation and transaction price of the good. Although the allocation rule and price function aspects of English and Dutch auctions are straightforward, sealed-bid auctions vary greatly along these two dimensions. For example, since identical bids can be submitted in sealed-bid auctions, an allocation rule is required to break ties. Different price functions can also be used to create different auction forms. For example, in a second-price (or Vickrey) auction, the good is allocated to the highest bidder, but the transaction price is set to the second-highest bid. When multiple identical goods are sold, other price functions are used (such as discriminating and uniform price). A third general class of auctions is double auctions. In a double auction, potential buyers submit bids and potential sellers submit asks to the auctioneer. The auctioneer's task is to match bids and asks according to some allocation rule and price function so that the market clears. In a continuous double  1 0  Negotiation is attractive to sellers when there are information asymmetries that work against the buyer and when the cost of negotiating is small relative to the cost of the good. Used car sales are an example o f this type of market.  59  auction (CDA), agents make exchanges amongst themselves without any type of market call. For a more complete overview of basic auction theory, see pVlilgrom, 1989], [Engelbrecht-Wiggans, 1983], and pVIester, 1988]. In addition, [Varian, 1995] and [Rosenschein and Zlotkin, 1994] discuss the application of auction theory to the design of artificial agents.  B I D D E R CHARACTERISTICS  The allocative and revenue maximization characteristics of the various auction forms depend to a large extent on the characteristics of the auction participants. One such characteristic is theriskprofile of the bidders. If a bidding agent is risk averse (i.e., averse to losing the auction), it will bid higher to reduce the possibility of being outbid. The reverse is true for a risk-seeking bidder. If the bidder is risk neutral, it will bid solely on the basis of its valuation of the good or goods being auctioned [Mester, 1988]. The nature of the agent's own valuation (or reservation price) is the second important bidder characteristic. In the private values case, the bidding agent is assumed to have certain knowledge of its own valuation of the good. Moreover, the agent's reservation price is assumed to be independent of the reservation price of any other agent. In real-world auctions for goods such as antiques or wine, the common values case tends to be more accurate. In the common values case, each bidder is unsure of her reservation price and the bids for the good are not independent of one another. For example, if a bidder intends to buy an antique as an investment for resale, her valuation is bound to be a function of her estimates of the valuations of other agents. Moreover, in some auction forms like English auctions, the observable signalling behavior of other bidders may cause an agent to revise her valuation during the course of the auction. The common values case leads to the problem of the winner's curse. The bidder who wins the auction is the 11  one whose estimate of the common value of the good is higher than that of any other bidder.  To  compensate, a rational agent bids less than her reservation price, leading in some cases to allocative inefficiency [Milgrom, 1989]. Naturally, in the private values case, the bidder's certainty about her reservation price eliminates the possibility of the winner's curse.  In this section, auctions for selling goods are considered. The same principles apply to reverse auctions for purchasing goods (e.g., tender offers). In the purchasing case, common values refer to estimates of cost and the lowest bidder wins the auction.  60  EQUIVALENCES OF FORMS  There are two important characteristics of auctions in which participants satisfy the assumptions of risk neutrality and privately held, independent reservation prices. First, seemingly different auctions reduce to forms that are functionally equivalent. For example, the English and second-price sealed bid (or Vickrey) forms are equivalent. The same is true of the Dutch and first-price sealed bid forms. Second, the expected revenue to the seller in the risk neutral/private values case is the same regardless of form [Mester, 1988]. Thus, from the seller's point of view, the choice of auction reduces to a matter of computational (rather than economic) efficiency. For example, a C D A will achieve the same outcome as the English or Vickrey forms: the ultimate winner of the auction is the agent that is willing to pay the most (allocation rule) and the transaction price is the reservation price of the second highest bidder (price function).  The advantage of the C D A is that it does not require a centralized  auctioneer or a market call; the disadvantage is that good may be bought and sold many times in a C D A before an equilibrium price is achieved.  3.3  D I S T R I B U T E D  A R T I F I C I A L  I N T E L L I G E N C E  There is a diverse and growing literature on the application of Distributed Artificial Intelligence (DAI) to practical problems (e.g., [Jennings, Sycara and Woolridge, 1998], [Singh, Rao and Woolridge, 1997], [Clearwater, 1996], [Singh, 1994], [Rosenschein and Zlotkin, 1994], [Woolridge, Tambe and Miiller, 1995], [Parunak, 1995], [Woolridge and Jennings, 1994a], [Woolridge and Jennings, 1994b], [Gilbert and Doran, 1994], [Jennings, 1992], [Avouris and Gasser, 1992], [Huberman, 1988]). In an effort to better understand the literature and situate the market-based approach proposed in this thesis within the broader field of D A I , a simple three-dimensional taxonomy is used. The three dimensions considered are the environment in which the D A I system operates, the coordination mechanisms used to resolve disputes or resource contention between agents, and the capabilities of the agents themselves. In the following sections, each of the dimensions is described. In Section 3.3.4, the proposed market-based approach and a number of exemplars from the D A I literature are evaluated with respect to the taxonomy.  In practice, the actual price paid by the winner is the reservation price of the second-highest bidder plus some small amount 8 that corresponds to the minimum bid increment.  61  3.3.1  ENVIRONMENT  Although others have used "application domain" or "environment" to classify D A I systems (e.g., [Jennings, Sycara and Woolridge, 1998]), the notion of environment used here is narrower and reduces to a single construct: global utility. A D A I environment is characterized as "global" if a meaningful measure of global utility can-be said to exist. For example, in agent-based shop floor scheduling systems, the utilities of individual agents are of little interest. Rather, the agents are created as a means to maximize a complex but wellrdefined global objective function for the factory as a whole. In such an environment, it is possible in theory (but perhaps not in practice) to have a central authority make trade-offs between individual agents so that global utility is maximized. In other environments, however, the concept of global utility may be meaningless or difficult to define. Utility theory exists to describe the preference orderings of individual decision makers. As such, there is no theoretical justification for making direct comparisons of utility between individuals [Holloway, 1979], [Raiffa, 1968], [Arrow, 1951], [Von Neumann and Morgenstern, 1944]. To illustrate the difficulties that arise in inter-agent comparisons of utility, consider a scenario that could plausibly occur in the multiagent aircraft scheduling application described in [Georgeff and Rao, 1998]:  Due to an unfortunate coincidence of medical emergencies, two aircraft require clearance for an immediate landing on a single runway. On one aircraft, an 82 year old Nobel laureate has had stroke; on the other aircraft, an; infant is experiencing difficulty breathing. The system must determine which aircraft can land first.  Although this scenario is simply an instance of a resource allocation problem, real agents (i.e., humans) facing life-or-death outcomes are involved. A dilemma arises because it is impossible to determine which outcome is preferred from the perspective of the centralized scheduler. In this example, the scheduler is asked to make a moral decision on behalf of society as a whole. Clearly, such a dilemma is an extreme example and is intractable under any formulation. However, even the more mundane case of scheduling can be transformed from an environment with a global utility function to an environment without a global utility function. To illustrate, assume that the scope of a scheduling system is broadened to include supply chain management. The objective of the system is to simultaneously maximize the utilities of all participants in the supply chain including suppliers and customers. In such an environment, the system would have difficulty justifying a decision to delay a shipment to one  62  customer in order to make a shipment to a different customer without making a comparison of utilities across organizations. The distinction between environments with global and non-global utility is similar to the division of D A I into two subdisciplines: cooperative problem solving (CPS) and multiagent systems (MAS) [Bond and Gasser, 1988], [Rosenschein and Zlotkin, 1994]. In CPS, agents are created by the system designer to solve well-defined problems. The rationale for introducing the agent metaphor is to simplify decomposition and permit the division and specialization of labor. Since a CPS environment is global (or assumed to be global), and since a single designer is responsible for behavior of all the agents that participate in the system, a high level of centralized control over the agents is feasible. In contrast, agents in M A S environments typically represent real agents in the problem domain. Thus, rather than being constructions of convenience for the system designer, the utility functions of the artificial agents map to the private utility functions of the real agents [Jennings, Sycara and Woolridge, 1998]. In such non-global environments, centralized control and inter-agent comparisons of utility are not justified under the basic tenets of decision theory.  3.3.2  COORDINATION  MECHANISMS  One of the main advantages of the agent-based approach is that it permits the decomposition of large problems into many smaller problems. However, since the overall behavior of the system of agents is typically the issue of prime concern in D A I , mechanisms for achieving coordination and aggregation of agent-level behaviors are a critical issue. In this section, five different approaches to agent-level coordination are identified: hierarchy, social laws/standard operating procedures, team utility functions, negotiation, and markets. These approaches can be viewed along a continuum of autonomy of action for agents: hierarchy provides agents with the least autonomy whereas negotiation and markets provide agents with much greater autonomy.  HIERARCHY  Hierarchy results when a complex system is decomposed into a tree-like structure of parent-child relationships. Although parent-child relationships typically imply an authority structure, this need not be the case [Alchian and Demsetz, 1972]. Instead, the defining characteristic of hierarchy is that little or no interaction between sub-systems takes place except along the lines of the parent-child relation-  63  ships. For example, when two agents require the same resource, they do not negotiate directiy with one another for ownership. Rather, the question of ownership is be passed up to a higher level of the hierarchy. Hierarchies of this sort can be said to be "nearly-decomposable" or "loosely-coupled." Simon [1981] identifies two specific requirements for nearly-decomposable systems: 1. The short-run behavior of each of the component subsystems is approximately independent of the short-run behavior of the other components. 2. In the long run, the behavior of any one of the components depends in only an aggregate way • on the behavior of the other components. Hierarchy is an extremely efficient means of exerting control over large, complex systems since all communication that takes place between the n sub-systems can be carried by (n-V) two-way channels. Moreover, hierarchy provides a mechanism for stable, intermediate systems to combine in systems of increasing complexity [Simon, 1981]. Despite its theoretical advantages, the assumption of loose-coupling (on which hierarchical control is based) breaks down when there is significant conflict between agents or contention for scarce resources. Since inter-agent conflicts cannot be resolved without appealing to higher authority, the computational burden placed on the higher levels of the hierarchy can become overwhelming [Shoham and Tennenholtz, 1992]. In addition, if there is considerable uncertainty in the domain, then an increased amount of information must flow up and down the communication channels, leading to less responsive form of decision making characteristic of bureaucracy [Galbraith, 1974].  SOCIAL L A W S / S T A N D A R D O P E R A T I N G P R O C E D U R E S  One means of reducing the number of appeals to authority in a hierarchy is to implement binding conventions or policies for resolving disputes. To illustrate the approach, consider the mobile robot example used by [Shoham and Tennenholtz, 1992]:  Two robots are headed on a collision course. Since both robots are on the optimal trajectories to achieve their respective goals, neither wants to slow down or alter its course.  64  In a purely hierarchical system, the robots would appeal to a centralized traffic controller who would decide which adjustments the robots should make to avoid collision. Unfortunately, there are two problems with this approach. The first is that there is an enormous reliance on central authority. If the central traffic controller ceases to function, then so do the mobile robots. The second problem is that the computational burden on the traffic controller increases exponentially as robots are added and contention for scarce resources increases. A n alternate approach is to define conventions to resolve the conflict at a local level. Highway markings, traffic lights, and right-of-way rules are common examples of such socially-defined conventions that permit agents to make local, but mutually compatible, decisions. There are two major difficulties with the social laws/standard operation procedures approach. First, as Shoham and Tennenholtz [1992] point out, the problem of specifying a complete set of laws to optimize a global utility function is NP-hard. Second, social laws and standard operating procedures can break down in the face of novel situations or uncertainty. Allison [1970] provides numerous examples of this in his analysis of governmental and military organizations.  TEAM UTILITY FUNCTIONS  A n alternate approach to social laws is the theory of teams [Marschak and Radner, 1972]. In the team formulation, each agent's utility function is replaced with the utility function of the team as a whole. The objective for the system designer is to chose the agent-level information structure and decision rule that will yield the highest expected utility to the team. The main advantage of this formulation is that if each agent in the organization seeks to maximize its own utility, and the utility of the individual agents is identical to that of the team as a whole, then it is axiomatic that the organizational utility function will be maximized though the behavior of the agents. Moreover, the specification of a team utility function can be more flexible and complete than a set of discrete rules and conventions. In the team approach, coordination of activity is facilitated by the possibility of each agent predicting the behavior of every other agent. This ability to predict has a number of general preconditions, including the ability of each agent to know the state, capabilities, and action outcomes of every other agent [Marschak and Radner, 1972]. Although there have been some efforts i n the D A I literature to implement team-theoretic systems of agents (e.g., [Boutilier, 1996]), the size and complexity of the  65  team utility function can easily exceed the computational capabilities of a single agent. Thus, many of the benefits of distributed computation are not achieved.  NEGOTIATION  Negotiation, in the context of D A I , refers to the process by which agents search for (and hopefully find) an agreement [Rosenschein and Zlotkin, 1994]. The underlying assumptions on which the negotiation approach is based is that agents are autonomous and the type of inter-agent comparisons made in the hierarchical approach are not justified. In economics, the concept of negotiation or bargaining arises whenever the preconditions of a perfectly competitive market are not satisfied. In a market with many buyers, many sellers, and undifferentiated goods, a Walrasian equilibrium price will emerge, as discussed in Section However, in cases in which a competitive market cannot be said to exist (e.g., bilateral exchange of a unique good between two agents) competitive forces cannot be relied on to determine an equilibrium price [Rasmusen, 1989 p. 227]. Much of the early work on negotiation in multiagent systems (e.g., [Smith and Davis, 1981]) relied on ad hoc negotiation protocols [Jennings, Sycara and Woolridge, 1998]. More recent work, however, has drawn heavily on theory from non-cooperative game theory [Rosenschein and Zlotkin, 1994], [Sandholm and Lesser, 1995b] microeconomics [Sandholm and Lesser, 1995a], [Clearwater, 1996], multi-attribute utility theory [Sycara, 1989], and speech-act theory [Chang and Woo, 1994], [Shoham, 1993] to structure the process by which agents negotiate an agreement. Muller [1996] provides a more detailed overview of different approaches to negotiation in D A I . Given the lack of a global utility function by which to evaluate outcomes, a central question in negotiation research is what constitutes a good negotiated outcome between agents? In addition to specifying the details of a negotiation protocols, theories of negotiation typically provide a means of characterizing the quality and stability of the negotiation process. For example, Rosenschein and Zlotkin [1994] use game theory to identify five design goals for negotiation mechanisms: efficiency, stability, simplicity, distribution, and symmetry. O f these, efficiency and symmetry refer to the social welfare characteristics of the mechanism; the remainder of the design goals refer to the mechanism's computational characteristics. In any system in which inter-agent comparisons of utility are not permitted—such as the runway assignment example used Section 3.3.1'—it is very difficult or impossible to define a  66  socially optimal outcome. Clearly, a minimal requirement is that a good negotiated outcome be Pareto optimal—that is, no agent should be able to improve its position without adversely affecting other agents. A stronger requirement is that the outcome be fair in some sense. In the context of a two-agent negotiation, fairness implies that the surplus created by the negotiated settlement is divided . evenly between the agents. The classic criterion for fairness is that the solution that maximizes the product of the agents'utilities [Nash, 1950]. It is important to note that fairness and global optimality are not synonymous. To illustrate, consider the table of outcomes generated by a cooperative delivery problem in [Rosenschein and Zlotkin, 1994]:  Deal  Utility for Agent 1  Utility for Agent 2  Product of utilities  Sum of utilities  A  0  10  0  10  B  1  3  3  4  C  2  2  4  4  D  3  1  3  4  E  10  0  0  10  The deal that maximizes the product of the utilities is Deal C. However, if global optimality is a linear function of agent utilities, then the globally optimal deal is one which maximizes the sum of agent utilities (Deal A or Deal E). The lack of equivalence between a fair outcome and a globally optimal outcome creates an important issue of fit between the environment of the problem and the coordination mechanism used by a D A I system. If the delivery example can be characterized as a global environment, then the choice of a fair (in the Nash sense) negotiation mechanism leads to an outcome that is clearly sub-optimal. This limitation of the Nash criterion applies in any situation in which the payoffs to agents are not zero-sum.  MARKETS  Markets have become an increasingly popular aggregation mechanism for D A I systems, especially in the context of resource allocation problems. Examples include:  67  •  Clearwater [1996] contains a number of papers describing the use of market-based resource allocation systems;  •  •  .  the market-oriented programming group at the University of Michigan [Walsh et al., 1998], [Wellman, 1996], [Mullen and.Wellman, 1995] has examined many theoretical and practical issues surrounding the use of markets, and specifically auctions, for coordination in M A S environments;  •  the collection on computational ecologies in [Huberman, 1988] describes the use of agents and markets to build predictive simulations.  The popularity of markets has much to do with the existence of a well-established theoretical foundation in economics and the ease with which the atomic units of market participation—self-interested agents—can be implemented in a heterogeneous and distributed computing environment [Jennings, Sycara and Woolridge, 1998]. The essential difference between "negotiation-based" and "market-based" approaches as they are characterized here is the method of price discovery. In negotiation, price is determined through one or more rounds of bilateral bidding and counter-bidding. Neither agent is assumed to be a price taker, and therefore complex negotiation strategies can evolve in which one agent extracts a disproportionate share of the surplus created by the transaction. In markets, more than one agent is typically involved and thus a form of auction can be used to determine a transaction price. In extreme cases, in which the preconditions for a fully competitive market are satisfied, general equilibrium theory can be used to predict the market outcomes. Under the assumptions of fully competitive markets, agents have no control over prices—both buyers and sellers are price takers. Despite the elegance and simplicity of economic metaphors, many of the same problems that affect real markets affect the simulated markets used in D A I . The existence of externalities between agents or interdependencies between goods can lead to pricing distortions and market failure. To return to the example of runway allocation, an externality could take the form of turbulence created by large jets. If an aircraft with a large jet wash makes certain runways temporarily unusable by smaller aircraft, then the cost to the smaller aircraft must be factored into the price the large aircraft pays for landing. However the process of internalizing the externalities (e.g., through taxes or other pricing schemes) adds significant complexity to the market model.  68  A second problem that has attracted attention in the literature is the issue of stability. [Thomas and Sycara, 1998] show that under certain circumstances, prices can oscillate as agents abandon some resources in favor of others which they believe to be underutilized. This phenomenon is exemplified by the " E l Farrol" problem described by Arthur [1994]. In the E l Farrol problem, the utility to agents of attending a bar on a particular night is a function of the number of other agents in attendance. Specifically, agents prefer the bar when its utilization is below a certain threshold (say 60%). The number of agents that actually attend on a given night tends to oscillate wildly as agents attempt to predict the aggregate behaviors of other agents. O f course, in well-behaved markets in which agents' valuations are transitive and independent and prices rise monotonically, such instability does not occur.  3.3.3  AGENT  CAPABILITIES  The third dimension considered in the D A I taxonomy is the capability of the agents. A t the core of D A I is the notion of weak agency. Woolridge and Jennings [1994a] identify four properties that any computer-based process should exhibit to be considered an agent: — the agent should have control over its own internal state and behavior;  •  autonomy  •  s o c i a l ability —  the ability to communicate in some way with other agents is the sine qua non  of D A I ; •  situatedness  or r e a c t i v i t y — the agent should receive sensory input from its environment  and be able to act to change its environment; and, . •  flexibility  — agents should exhibit goal-direct behavior in response to their changing envi-  ronment. O f course, more complex systems require a stronger notion of agency. For example, in order for agents to be able to plan in the future, they require beliefs about their current state and preferences over future states [Shoham, 1993]. Computer-based processes exhibiting strong agency are often classified as B D I (belief-desire-intention) agents [Georgeff and Rao, 1998], [Fikes and Rao, 1991]. Agents in multiagent systems can have very simple local utility functions (e.g., [Steiglitz, Honig and Cohen, 1996]) or be maximizers of complex expected utility functions over a planning horizon. The agents may be simple consumers of resources or may be able to transform one resource into another. The agents may be trustworthy or untrustworthy, selfish or altruistic, omniscient or ignorant. The  69  critical issue considered here is not what capabilities an agent possesses, but whether the capabilities are consistent with the properties of the agent's environment and coordination mechanisms. A more detailed classification of agent properties can be found in [Goodwin, 1993].  3.3.4  CLASSIFICATION OF APPROACHES  In this section, a number of D A I systems for resource allocation described in the literature are situated with respect to the dimensions of the taxonomy. Figure 3.5 shows a tabular representation of the environment and coordination mechanism dimensions. Certain cells of the table contain references to exemplars from the literature on multiagent systems. Two levels of fit are considered: the fit of the coordination mechanism to the environment and the fit of the agent's capabilities to the coordination mechanism.  market  •  NASA E O S s y s t e m Kraus,  negotiation £ o c o ro o cn ro  •  U M D L [Mullen a n d Wellman,  1995]  1997]  C o n t r a c t Net  •  Automated  Protocol [Smith a n d  contracting s y s t e m  Davis,  [Sandholm and  1981]  L e s s e r , 1995a]  team utility  •  social laws/ SOPs  •  hierarchy  •  Multi-agent D T P [Boutilier, 1996] [Shoham and T e n n e n h o l t z , 1992]  Information a g e n t s [Moore et al., 1997]  global utility  FIGURE 3.5  •  [Schwartz and  •  O A S I S [Georgeff  and Rao, 1998]  non-global utility  Classification of DAI systems with respect to the environment and coordination mechanism dimensions: The shaded area corresponds to "coerced" systems (see Section  70  ISSUES O F F I T  To illustrate the issue of environment/coordination mechanism fit, consider the shaded cells in the right-hand column of the table in Figure 3.5. The shaded cells correspond to situations in which the environment cannot be said to have a global utility function; however, the coordination mechanisms associated with the cells are centralized and therefore presuppose a measure of global utility. In such situations, it may be possible to coerce the agents into accepting a proxy global utility function. For example, in the  OASIS  multiagent air traffic control system described in [Georgeff and Rao, 1998], a  hierarchical coordination mechanism (in the form of "coordinator" and "sequencer" modules) seeks to minimize the total lateness of all aircraft. The utility function can be considered coerced because it is questionable whether the utilities of the agents representing individual aircraft should be pooled and compared in this manner. A n airline with a large stake in maintaining its reputation for always being on time may value lateness in a different manner than an airline that is most interested in providing low-cost fares to holiday destinations. Moreover, within the aircraft, each passenger has his or her own mapping from landing time to utility. At the other extreme, it is possible to model an environment that possesses a global utility function using autonomous, self-interested agents and negotiation or market protocols. A n example of this approach is the system for allocating large data sets to servers described in [Schwartz and Kraus, 1997]. The environment in which the multiagent system operates is classified as global because the data sources and servers are owned and maintained by N A S A as part of the Earth Observing System (EOS). Indeed, the dynamic, market-based allocation system is proposed as a replacement for an existing centralized allocation system. The global utility function to be maximized is that of N A S A , rather than that of the data sets and servers. The advantage of using agents rather than centralized hierarchical approaches for problems with well-defined global utility functions is that the agent metaphor provides a convenient means of decomposing large, complex systems. There are, however, two dangers associated with using highly autonomous coordination protocols in global environments. The first, as discussed in Section, is that non-global coordination mechanisms do not automatically lead to outcomes that maximize global utility. The second danger is that the agents themselves will be more complex than what is required by the environment.  71  To illustrate the latter danger, consider the E O S system. Despite the existence of a global utility function in the D A I environment, no assumptions are made regarding the guile of the agents created for the allocation system. A second-price sealed-bid (Vickrey) auction is used to induce the agents to reveal their true bidding information and penalties are defined for situations in which agents breach the terms of the bidding protocol. Although the resulting system has the virtue of generalizability, the capabilities of the agents and the complexity of the auction mechanism can be considered excessive for the problem at hand. In cooperative problem solving environments, there is nothing to prevent the system designer from dictating certain agent-level behaviors (for example, truthful price revelation). According to the criteria proposed by fRosenschein and Zlotkin, 1994], the primary objectives of the system designer are the quality and stability of the solution and the efficiency of the mechanism used to compute it. The "realism" of the agents is typically not a concern in CPS.  CLASSIFICATION OF T H E PROPOSED A P P R O A C H  The market-based approach proposed in this thesis can be classified along the environment, coordination mechanism, and agent capability dimensions in the following manner: •  Environment = global: The objective of the resource allocation system is to maximize total revenue by processing and shipping individual products. The agents themselves are merely artifacts introduced to facilitate the process. Since the division of resources and a numeraire good (money) between agents is zero-sum, global utility (defined as the sum of agent utilities) is independent of the precise terms of the deals struck between agents. The only requirement for an optimal equilibrium requirement is that all Pareto optimal exchanges that can occur do occur. The key assumptions and rationale underlying the assumptions are presented in greater detail in Section 4.1.  •  Coordination mechanism = market: The exchange of resource goods between agents occurs in a continuous reverse auction. A reverse auction differs from bilateral negotiation in that an auctioneer (or in this case, a simple price convention) determines the transfer price of the good being exchanged. A description of the market protocol is provided in Section 4.3.  •  Agent capabilities = rational with restricted autonomy of action: A fundamental requirement for any market to achieve a Pareto optimal equilibrium is strict rationality on the part of the participating agents. Indeed, the relationship between maximizing behavior by rational agents and Pareto optimality is tautological. However, it is important to note that rationality is  72  defined in relation to not only a particular set qf beliefs, but also to a particular set of capabilities. Thus, although it may be rational (in a colloquial sense) for a particular agent to buy a contract for which it has no requirement in the expectation of making a speculator's profit, agents in the proposed system are not endowed with the capability to reason about speculai  tion. A similar situation arises in the method used for price discovery: The bidding agent is asked to make a rational choice between buying a contract for a-resource at a particular price and not buying it. The agent has no capability to lie, barter, steal, or otherwise change the structure of its simple decision problem. It is in this sense that the autonomy of the agents is restricted. The objective of this chapter has been to review the theoretical foundations of both the motivating problem (manufacturing scheduling) and the proposed solution (a market-based system for cooperative problem solving). In Chapter 4, the focus shifts from describing theory to describing the technical challenges posed by the proposed framework and the solutions developed as part of the thesis.  73  CHAPTER 4: A FRAMEWORK FOR MARKET-BASED RESOURCE ALLOCATION As in other forms of cooperative problem solving, the objective of this thesis is to provide a method for solving large, well-defined problems by decomposing the global-level problem into smaller problems that can be computed independently and simultaneously. Given the size and complexity of the manufacturing problem, it is clear that some form of decomposition must be used. However, as discussed in Chapter 2, the fundamental problem with the conventional decomposition used in manufacturing is that the sub-problems into which the problem is decomposed are not independent. When a large degree of dependence exists between elements of a decomposed problem, the dependence must either be ignored (at the expense of solution quality) or resolved (thereby nullifying many of the advantages of decomposing the problem in the first place). Markets offer a form of decomposition in which a high degree of interdependence can be transformed into complete independence through the use of a price mechanism. The framework for the market-based approach to problem solving proposed in this research is shown in Figure 4.1. Unlike the sequence of stages conventionally used to address the manufacturing problem, market-based decomposition does not occur along functional lines. Instead, the metaphor of a large number of agents interaction within an economy is used to provide a basic blueprint for the system. Following Coleman's [1990] model of social outcomes, the framework is divided into the three distinct phases: 1. Decomposition — The first phase involves transformation of a global problem into a number of agent-level problems. In other words, the task is to model a complex system using agent-based constructs such as beliefs, desires, capabilities, and so on. The result of the decomposition phase is that each part and machine in the physical system is modeled and represented by a computer-based agent. 2. Rational choice — The second phase involves the off-line computation of each agent's willingness to pay for scarce production resources. In the case of an agent representing a part to be manufactured (a part agent), the rational choice phase requires the agent to plan an optimal trajectory through the production system and determine its reservation prices for various combinations of resources in different situations. To determine the optimal trajectory, the  74  1) Conventional decomposition of the manufacturing problem problem  M  p  MRP  s  '  CRP  scheduling  outcome  global-level decision process  2) Market-based decomposition problem  outcome  global-level decision process  rational choice  FIGURE 4.1  The market-based approach to decomposition (adaptedfrom [Coleman, 1990]): The global-levelproblem (productionplanning) is decomposed into many agent-level planning problems. The agents make decisions that are strictly rational given the information they have available. Finally, a market is used to resolve contention for resources and aggregate the agent-levelplans into a global-level outcome.  agent must consider a number of factors such as the costs and rewards facing the object, the probabilistic effects of various actions and events, and the unknown prices of the resources. 3. Aggregation — In the aggregation phase, the agents use the reservation prices computed in the rational choice phase to participate in a market for production resources. The role of the aggregation mechanism (in this case, a continuous reverse auction) is to ensure that each production resource is ultimately owned by the agent that is willing to pay the most for it. In this way, the self-interested interaction of the agents within the market determine the solution to the manufacturing problem. The objective in the remainder of this chapter is to identify the critical elements of the market-based approach and describe the degree to which the preconditions to the economic theory have been satisfied in the prototype system. Figure 4.2 provides a simplified roadmap of the major milestones • encountered in moving from problem definition to a market-based system. First, as described in  75  Section 4.1, the global manufacturing problem is decomposed into agent-level problems and modeled using the P-STRIPS knowledge representation language. Next, the P-STRIPS representation is transformed into a special data structure called a policy tree. The structured dynamic programming algorithm presented in Section 4.2 is used to iteratively refine the policy tree until it contains the'agent's optimal course of action for all its possible states. In addition to containing the agent's optimal plan, the policy tree contains the agent's reservation price for all relevant bundles of resource goods. This price information is extracted into a price list that the agent uses in the final stage of the process— exchanging contracts with other agents. Finally, Section 4.3 describes the continuous reverse auction protocol that aggregates the agent-level behaviors into a coherent global solution.  manufacturing problem  auction for production resources  VJ7 model objects in the manufacturing environment using P-STRIPS  - actions - effects - rewards - costs  FIGURE 4.2  •XX  ux  build core policy tree - temporal transitions - state-dependant transitions - terminal reward estimates  improve policy using S D P algorithm  extract agent's reservation price for all relevant resource bundles  Major milestones of the market-based approach.  4.1 PROBLEM DECOMPOSITION A problem that one typically encounters when modeling industrial-scale systems is their size and complexity. The agent-based decomposition advocated in this thesis serves two purposes: 1. Create independent sub problems — Market-based decomposition splits the large, complex problem into many smaller problems that can be solved independently and simultaneously. The independence property of market-based agents is discussed in greater detail in subsequent sections.  76  2.  Simplify the modeling of large systems— Agent-based decomposition provides modelers with a conceptual tool to manage the otherwise overwhelming complexity of large manufacturing systems. In this regard, the agent-orient approach to modeling relies on many of the same "ease-of-modeling" arguments as object-oriented modeling (e.g., fjacobson, Jacobson and Ericsson, 1995], [Shoham, 1993]). Specifically, the assumption is made that it is easier for a modeler to express the states, capabilities, and goals of a particular part or machine in isolation than it is to express the joint state, capability, and goal of all parts and machines in a manufacturing facility. A n empirical test of the ease-of-modeling assumption is beyond the scope of the thesis, however.  •4.1.1  APPROACHES TO M D P DECOMPOSITION  In stochastic optimization, the objective is to create a contingency plan (orpolicy) for every possible state of the system. In this type of optimization, no fixed sequence of actions is generated and thus there is no requirement to assume that all actions have deterministic outcomes. Stochastic optimization is especially well suited to manufacturing environments in which broken bits, delays in machine setups, late arrival of raw materials, and so on are facts of life. The ability to reason over such contingencies is essential for high-quality and robust plans. Despite its power, stochastic optimization is difficult to exploit in industrial environments because the Markov Decision Process (MDP) representations of state spaces become unmanageably large for even small problem instances. One approach to coping with large M D P s in manufacturing and other resource allocation environments is to decompose the large problem into a number of smaller, loosely-coupled subproblems that induce their own subMDPs (e.g., [Meuleau et al., 1998], [Dean and Lin, 1995]). To illustrate the potential payoff from M D P decomposition, consider a problem domain consisting of n completely independent subproblems. Assume that the number of states in the state space for subproblem I G 1, ..., n is  . If the independence between the subproblems is not rec-  ognized and the entire system is formulated as a single M D P , the size of the joint state space is |i>l| X | 5 | X ... X 2  . In contrast, i f the independence property is exploited and each subproblem is  solved in isolation, the total number of states considered is only  + | 5 | + ... + 2  . Thus, meth-  ods of decomposition and solution that exploit independence can result in an exponential decrease in problem size.  77  The difficulty in decomposing M D P s in practice arises from the fact that contention for finite resources creates dependencies between otherwise independent subproblems. To illustrate, consider the case of two numerically-controlled milling workstations. Under ideal circumstances, each workstation can processes its own jobs and remain oblivious to the operation of the other machine. If, however, both workstations are fed raw materials by the same industrial robot, then the workstations cease to be independent. Specifically, the optimal policy of one workstation depends on the availability of the industrial robot and therefore on the optimal policy of the other workstation. In [Meuleau , et a l , 1998], the problem of joint dependency or resources is addressed in the following way: 1. solve each subproblem in isolation, ignoring the issue of resource contention 2. use the value functions provided by the subproblem solutions to guide the heuristic allocation of resources to jobs. The key to this particular approach is that the optimal policies generated for the subproblems can provide hints as to what might be a good global allocation of resources. Although the approach achieves an exponential decrease in problem size and provides solutions very quickly, the quality of the solution is sensitive to the quality of the assignment heuristic and to the amount of resource contention in the problem environment. The market-based approach to M D P decomposition uses prices over bundles of resources to eliminate the dependencies created by resource contention. The cost of achieving independence in this manner is that the subproblems must incorporate the price information and are therefore much larger than the non-market-based case. In order to create sufficient information for the market-based allocation of resources, the subMDPs treat the ownership of resources as random variables, rather than as decision variables. To illustrate the difference between market-based and non-market-based M D P decomposition for resource allocation problems, consider the form of the optimal policy generated in the non-market-based case:  the optimal allocation of resources to subproblem i is vector X ; i  policy is V  r  78  the value of the optimal  In contrast, the market-based decomposition requires a policy of the form:  given an allocation of resources Xj e Q , where Q. • is the set of all valid allocations, the f  value of the optimal policy is Vj.  As might be expected, creating a contingency plan over all possible allocations leads to a massive increase in the size of the state space of the subMDPs. In practice, the explosion is mitigated somewhat by restricting the subMDP to relevant bundles of resources, as discussed in Section 4.2.4. However, the effort required to generate a "reservation price" for each relevant resource bundle is considerable. The advantage of the market-based decomposition is that the optimal policy for each subproblem is dependent only on the local characteristics of the subproblem and the vector of prices for the resources. In this way, full subMDP independence is achieved.  4.1.2  MARKET-BASED  DECOMPOSITION  In the decomposition used in this thesis, agents are used to represent identifiable real-world objects in the manufacturing environment, such as parts to be processed, production machines, and so on. A n important feature of this particular form of agent-based decomposition is that the term "represent" is used in two senses: 1.  Modeling — Agents are used to model objects in the manufacturing environment. Thus, an agent consists of data structures that contain information about a real-world object such as a.  state — properties of the object, such as its physical location and completion status;  b.  capabilities — the actions that the object can perform and the effects of the actions; and,  c.  ascribed goals — the sources of utility and disutility that the object encounters in the manufacturing facility.  2.  Agency — Agents are called upon to make decisions on behalf of the real-world objects they represent. Thus, an agent representing a part may decide when and how the part should wend its way through the production system. To make these decisions, the agents solve stochastic optimization problems and thus there is a one-to-one correspondence between agents and • the subproblem construct used in Section 4.1.1.  79  It is important to emphasize that, despite the use of the term "agency", the type of agent-theoretic issues that typically arise in the economic analysis of institutions are absent from this particular formulation. Specifically, agency theory (e.g., [Alchian and Demsetz, 1972]) posits a contractual relationship between two self-interested entities: a principal and an agent. Given the divergent goals and different information sets possessed by the two entities, agency theory addresses the problem of writing contracts such that agency costs (e.g., loafing, monitoring, misaligned incentives) are minimized [Gurbaxani and Whang, 1992]. In the form of agency considered here, however, such issues do not exist. Instead, the computerized agents are designed and implemented so that they faithfully pursue the interests of the objects they represent.  4.1.3  DEFINITION OF T H E G L O B A L  For the purpose of this thesis,  the manufacturing  PROBLEM  problem  is defined in the following way: A manufactur-  ing facility consists of a number of production resources { M l ,  M } . Although the production m  resources are typically referred to as "machines", M,- could be arty resource, such as a jig, automated guided vehicle, or even a person. The manufacturing facility exists for the purpose of transforming raw materials into "parts" {JI, ..., J } through one or more value-added processing "operations". A n 1  n  example of a part might be a component of fuselage from an aircraft wing that requires five operations: milling, grinding, drilling, polishing, and quality assurance. Once a part leaves the manufacturing facility, it is "shipped" to either the final customer or to another stage in the supply chain. In either case, the selling/transfer price is assumed to be known and exogenously determined. Moreover, the selling/transfer price is typically a function of the time at which the part ships. As a result, lateness penalties and any other time-dependencies are reflected in the payoff function for each part. The basic elements of the modeling approach are shown in Figure 4.3. The objective of the market-based system is to allocate production resources to parts such that the total profit of the manufacturing facility is maximized. The profit-based formulation of the problem subsumes many of the objective functions used in conventional scheduling (e.g., minimizing late or tardy jobs) and is superior to other proxy objective functions (e.g., maximizing machine utilization). The difficulty that arises in the profit maximization formulation occurs because each production  1  Parts are designated by the letter " J " (for jobs) to remain consistent with the standard notation used in the scheduling literature.  80  manufacturing environment  /  market-based system for resource allocation part-agent  FIGURE 4.3  -  machine-agent  Agent-based decomposition in a manufacturing environment: Objects in the physical world (such as parts and machines) are represented by computer-based agents.  resource has finite capacity. Thus, by deciding to schedule a part on a particular machine at a particular time, the manufacturing facility incurs an additional cost: the opportunity cost of using the resource in an alternate manner. Indeed, the global objective of maximizing profit in the facility is equivalent to minimizing the opportunity cost of all scheduling decisions. In the following sections, the decomposition of a manufacturing environment into agents representing machines ("machine-agents") and agents representing parts to be manufactured ("part-agents") is described in detail. In Section 4.1.6, the properties of the agents are stated in a more formal manner.  4.1.4  MODELING  MACHINES  The types of machines considered in this research include milling machines, drill presses, and the like. Each machine is assumed to be autonomous (i.e., there is no complementarity between the machine and an operator) and setup costs are assumed to be zero. Both of these assumptions are used to simplify exposition of the approach; however, as discussed in Section 4.1.7, it is possible to internalize Maximization of machine utilization—although common in practice—ignores Goldratt and Cox's [1984] observation that no revenue is realized by running machines. Instead, revenue is realized exclusively by shipping products.  81  complementarities, and other economic externalities using the market-based constructs described in the thesis.  PROPERTIES OF MACHINE-AGENTS  Time plays an important role in scheduling in general and thus time is an important element in the formulation of machine-agents. The relationship between time and production resources is characterized as follows: 1.  Discrete time intervals— The availability of a resource is expressed in terms of discrete and atomic intervals of time. For example, the interval T l could be defined as the period between 05 September, 2000, at 09:10 G M T and 05 September, 2000, at 09:15 G M T . In a similar manner, the interval T2 could be defined as the five-minute interval that immediately follows T l , and so on.  2. Exclusivity — A resource may not be shared during a particular unit of time. For example, if a particular part is being processed on a particular machine at a particular time, then no other part can be processed on that same machine at the same time. 3. Uniqueness — In general, no two units of processing time on a machine are directly interchangable. Each unit of time on each machine represents a unique "good" (in the economic sense) and thus contracts for resources make explicit reference to both the time and machine for which the contract is valid (e.g., Contract(Ml, T5). 4. Perishability — Once a time interval has passed, all contracts for processing during that unit of time are worthless. In other words, contracts for processing time are perishable goods that expire at a known time. In addition to time, there are a number of modeling issues concerning the costs and configuration of the production resources: 1. Fixed configuration in the short-run — In microeconomic analysis, a common assumption is that the aggregate supply of a good is fixed in the short-run. Indeed, the short-run is defined as the period in which there is insufficient time to make changes to the supply of inputs [Lipsey, Purvis and Steiner, 1985]. The corresponding assumption made in this thesis is that the supply of production resources within the manufacturing facility is known and stable  82  over the entire planning horizon. If the configuration of the facility changes (e.g., new machines are brought on-line), new schedules must be generated. 2. Sunk fixed costs — A common practice in cost accounting is to make a distinction between a production resource's fixed costs (e.g., procurement and installation) and its variable costs (e;g., energy requirements, wear, consumables, maintenance) [Deakin and Maher, 1987]. The important feature of fixed costs is that they have already been incurred and are therefore sunk. Variable costs, in contrast, vary with production. Given that the task at hand is to allocate resources to parts in the short-run, fixed costs can safely be ignored. 3. Independence — The independence property of resource goods is a corollary to the unique-' ness property above: each unique processing interval can have its own independendy-determined cost and market price. The independence between goods is readily observable in other environments in which time plays an important role. For example, the price a spectator is willing to pay for the exclusive use of a seat in a stadium depends critically not only on the seat's physical location, but also its temporal location. Thus, the rental price of the seat depends on whether there is an event in the stadium at that particular time and also whether the event is a boat show or the Superbowl. Although it is certainly possible for a consumer valuations of goods to be correlated, this type of interdependence arises from consumer preference, not from any inherent dependence relationship between the goods.  T H E DECISION PROBLEM FOR MACHINE-AGENTS  Given the foregoing, a machine-agent i can be seen to face a real-valued variable cost, c,- , whenever it 1  performs a processing operation during the interval of time t € {1, ..., T). Typically, the per-unit-time cost is stationary with respect to time (i.e., it does not change as a function of time as a response to, for example, inflation); however, the cost may depend on many other factors such as the^ nature of the macruning operation, the hardness of the part to be machined, the feedrate used, and so on. If the machine-agent chooses not to process a part during the time interval t, it incurs no variable cost but forgoes any revenue it could have realized by selling a contract for the interval. To capture the binary decision facing the machine-agent, the variable x,- = {0, l}can be used: X,- takes the t  83  1  value 1 when the machine-agent has contracted to perform processing on a part for price pj and (  zero otherwise. Under this formulation, a machine-agent's objective function can be written: f  \ ( i,tPi,t- u i,t)  maximize X  x  x  c  Eq: 4.1  Thus, the basic decision problem faced by a machine-agent is whether to sell contracts for its machining time at prices bid by other agents. Note that this decision problem is not equivalent to the much broader notion profit maximization since the machine-agent has no ability to maximize its revenue by maximizing the value of p . For example, a machine-agent is not endowed with the ability to initiate it  a call auction for its processing times. N o r is a machine-agent capable of speculating about market prices and refusing bids in the expectation of higher bids to come. Instead, a machine-agent, as it is defined here, practices a very restricted form of rationality: if the bid price for a block of processing time, pij, exceeds the variable cost of operating during that block, then the agent accepts the bid. There are at least two reasons for implementing machine-agent rationality in this way. First, as discussed in greater detail in Section, the division of economic surplus between buyer and seller is irrelevant in a cooperative problem solving (CPS) environment (recall Section 3.3.1 on page 62). The sole requirement under the first fundamental theorem of welfare economics is that each transaction be Pareto efficient; whether the seller receives the highest possible selling price is irrelevant from the global point of view. The second reason for relying a simple form of rationality for machine-agents is that is that the alternative—a revenue-maximizing machine-agent—could lead to gaming behavior by bidding agents and possible market failure.  PLANNING WITHOUT MACHINE AGENTS  The formulation of machine-agents is flexible enough to permit the modeling of a large number of production management issues, such the optimal scheduling of preventative maintenance. However, 3  The decision process faced by machine-agents resembles that of a firm engaging in an initial public offering (IPO) through an underwriter. Since the initial seller of the stock has no access to the secondary market, its decision to go public boils down to whether it should accept the underwriter's I P O price or withdraw the IPO. To use a specific example, investors were willing to pay up to $140. for shares of P A L M C O M P U T I N G on the day of its IPO. However, P A L M C O M P U T I N G itself only received $38 per share in the primary market. Despite the discrepancy, the I P O can be considered rational as long as P A L M C O M P U T I N G ' S original owners believed that the I P O price was fair at the time.  84  in the problems considered in this thesis, such issues are not a consideration. Moreover, the variable cost of each resource (that is, the machine-agent's reservation price) is assumed to be zero. In such circumstances, there is no requirement to include machine-agents in the market-based system. For this reason, the in-depth discussion of modeling issues in Section 4.2 focuses exclusively on part-agents.  4.1.5  MODELING  PARTS  Parts refer to physical items that are manufactured in the production facility being scheduled. Typically, a part is created by transforming of one or more raw materials into something of value to an end customer. Part-agents are responsible for coordinating all aspects of a part's transformation from a line item on an order to a finished good. To accomplish their coordination task, part-agents enter into contracts with other agents for the provision of production resources.  T H ELIFE-CYCLE OF A PART-AGENT  Unlike machine-agents (which represent production resources over relatively long periods) part-agents are ephemeral: they are created with an order arrives and are destroyed when the order is shipped. The different stages in a part-agents life-cycle are summarized below: 1. Order — Each order that arrives at a manufacturing facility can be viewed as a contract. In exchange for each part in the order, the ordering entity agrees to pay the manufacturing facility a predetermined amount of money. In most cases, the amount paid by the ordering entity 4  is a function of time. For example the order may specify the part's due date and a schedule of lateness penalties if the ship date exceeds the due date. The payoff may also be contingent on certain attributes of the part, such as conformance to specifications, finish quality, and so on. As is typically the case in manufacturing firms, the precise way in which the payoff function is determined is exogenous to the scheduling system. 2. Process plan — As discussed in Section 2.2.2, a number of decisions about a part's path through the production facility have been determined prior to the instantiation of a scheduling problem. For example, the part's process plan specifies the materials, operations, and  Money is used here to refer to any medium o f exchange that is (a) divisible and (b) valued by both parties in a transaction.  85  machines required to manufacture the part. Once an order for a particular part arrives, the process plan can be used as a template to provide the skeleton of the part-agent's policy. 3.  H o l d i n g costs — Holding costs are typically used in manufacturing environments to capture the intuition that resources locked up in work-in-process (WIP) inventory could be put to productive use elsewhere in the firm. In addition to the cost of capital, holding costs are used to capture time dependent risks, such as spoilage, breakage, deterioration, and obsolescence [Nahmias, 1989]. A part-agent is assessed a holding cost penalty, h , for each unit of time that t  the part it represents remains in the system. Typically, the actual per-unit-time charge is a function of the value added to the part. To illustrate, consider the case in which an order has been placed for a part and an agent has been created to represent the part. If no raw materials have been marshalled on the part's behalf and no processing has occurred, then the holding cost is taken to be zero. That is, a part that has not yet been started incurs no holding cost. In contrast, if a part has undergone many operations and is only a single operation away from being shipped, then a much larger per-unit-time holding cost should be used. Naturally, once an item is shipped, it no longer incurs a holding cost. 4.  Contracting — Agents are provided with an infrastructure for buying and selling contracts for production resources and the agents' activity within the market for resource contracts ultimately determines the schedule of the parts they represent. As discussed in Section 4.3, the market is implemented as a continuous double auction, which means that a contract for a particular unit of machining time may be bought and sold by many different agents before it is actually used. The price p  it  that a part-agent pays for a unit of processing time on machine M,-  at time t is determined by the interaction of agents in the market.  M A T I N G PARTS A N D ASSEMBLIES  In the general case, the payment received by the ordering entity may depend in a direct manner on the ship date (or some other property) of one or more other parts. For example, a part created as a component of an aircraft wing might not be shippable until it is mated with other parts into an assembly. Under these circumstances, an externality exists between the two parts: the contracting behavior of one part-agent direcdy affects the payoffs realized by other part-agents. In order to facilitate exposition in this thesis, all parts are assumed to be stand-alone and independent (that is, non-mating).  86  However, as discussed in Section 4.1.7, the market-based framework can also be used to internalize the externalities created by mating parts.  T H EDECISION PROBLEM FOR PART-AGENTS  The objective function for a part-agent can be specified in the following way: the goal of the agent is to maximize the expected value of its policy. In most cases, one would expect the final value realized by the part-agents to be positive (if this were not the case on average, the manufacturing facility would not be covering its aggregate variable costs and would ostensibly cease operations). There are two ways in which a part-agent j can increase the revenue side of its profit function: 1. earn its terminal reward, rj, by successfully shipping the part it represent, and 2. sell a contract for machining on machine M,- at time t for price  t  (in order to sell a particular  contract, the agent must first own the contract). A n implication of the price convention introduced and explained in Section is that part-agents sell resources for their indifference price. In other words, if an agent values a resource at $10, then it is indifferent between keeping the resource and selling it to another agent for exactly $10. Given the price convention (combined with the fact that part-agents never start with an endowment of production resources) the revenue opportunities available to the agent through the sale of contracts is exacdy zero. That is, since the revenue generated by a sale is exacdy offset by a drop in the agent's expected value, no part-agent can deliberately or inadvertentiy increase its wealth by selling a contract. The ownership of a production resource i at time t by agent j is represented by the variable i,j,t ~  x  {0' 1 }• Under this formulation, the objective function of a part-agent is to simply maximize its  terminal reward net of contract costs and holding costs: Eq: 4.2  The interesting feature of Equation 4.2 is that it is independent of the profit functions of all other agents. Thus, although each part-agent is in direct competition with other part-agents for scarce production resources, the market replaces direct agent-to-agent conflict with a price mechanism. To decide whether to purchase a contract, a part-agent does not need to consider the impact of its deci-  87  sion on other agents in the system since this impact is already embodied in the market price of the resource. As a consequence, the price mechanism permits the agents to make their decisions independent of one another—herein lies the ultimate value of the market-based approach [Hayek, 1945].  4.1.6  GLOBAL UTILITY AND T H ERULES OF M O N E Y  As discussed in Section 3.2, the first fundamental theorem of welfare economics states that a competitive equilibrium induces an optimal state. In other words, if self-interested, rational agents are permitted to buy and sell resources in a properly-functioning market, then the final outcome is a Pareto : efficient allocation of resources. The problem with the first theorem in practice is that Pareto efficiency is not a sufficient condition for global optimality unless certain preconditions are satisfied. Thus, in order for the market-based system to converge on a global (rather than merely Pareto) optimal allocation of resources, the agents must be designed such that they satisfy these preconditions. In the following sections, the preconditions are described and justified in the context of the manufacturing problem.  QUASI-LINEAR UTILITY FUNCTIONS  In the objective functions in Equation 4.1 and Equation 4.2, agents are defined as profit maximizers: they select a course of action so that the difference between the revenue they earn less the costs they incur is maximized. A more general formulation of the agent-level problems is to state the agents' objectives in terms of utility maximization. Specifically, the utility Uj for agent j takes the form Uj -- Mj + bjiXj)  Hq: 4.3  where Mj is the amount of numeraire good (money) held by the agent and §J(XJ)  is the utility the  agent receives from its consumption of all other goods in the Z-good economy, Xj = {x , hj  x, 2J  ...,x _ } L  hj  [Walsh et al., 1998].  The advantage of a quasi-linear utility function is that it permits a distinction to be made between the intrinsic utility that accrues from the ownership of goods, §j(Xj),  and the utility that accrues from the  ownership of special numeraire good that exists solely to facilitate inter-agent transactions. The precise form of §j(Xj)  can vary from agent to agent. However, the values of the function are determined  by solving the agent's M D P for the allocation of goods X . The implication of Equation 4.3 is that T  88  the utility an agent receives from owning goods (e.g., production resources) is direcdy expressible in terms of a quantity of good M a n d vice-versa. It is important to note that inter-agent transfers of good M a r e zero-sum. Thus, from a global point of view, the total amount of good M is the system is constant. That is, like all other goods in the production economy, money is neither created or destroyed during the operation of the market. It is merely exchanged between agents and thus the net effect of transfers of M between agents on global utility is zero. In contrast, the changes in intrinsic utility facilitated'by transfers of M do have an effect on global utility. To illustrate, consider the following example:  A machine, M l is considering the sale of a contract for production at time T l to a part-agent A l . By processing the part, M l will incur a loss of intrinsic utility due to wear and other variable costs. Thus, M l will only agree to sell the contract for processing if the • buyer ( A l ) indemnifies it for its loss of utility. From A l ' s perspective, purchasing the contract for machining increases the probability that it will ship. As a consequence, ownership of the contract results in an increase intrinsic utility. If the increase in intrinsic utility is greater than M i ' s ask price for the contract, then it is rational for A l to purchase the contract by transferring the appropriate quantity of M to M l .  Under the quasi-linear utility function, the exchange is Pareto efficient since M l is at least as well off after the sale of the contract and A l is stricdy better off. Put another way, if the gain in expected profit to A l exceeds the cost of wear and tear to the system to M l , the part represented by A l will be processed on M l at T l .  RISK N E U T R A L I T Y  Agents are defined to be risk neutral. As such, an agent's expected utility equals the agent's expected monetary value, Vj. UJ=VJ  Eq:4.4  The purpose of defining agents as risk-neutral is to establish equivalency between expected utility and expected monetary value (EMV). Whereas Equation 4.3 establishes the existence of an "exchange rate" between money and intrinsic utility, Equation 4.4 states that the exchange rate is independent of  89  the amount of good M held by the agent. Thus, the actual initial endowment of M to the agents can be arbitrarily large or small (as long as it is finite) without affecting the agents' reservation prices for resource goods. In addition, the property of risk neutrality is extended to the system as a whole so that global utility is taken to equal global expected monetary value: UQ = VQ The absence of wealth effects follows as a consequence of quasi-linear utility for agents. A n agent knows its willingness to pay a certain amount of good M in exchange for other goods and is therefore unwilling to buy resources simply because it has money. To illustrate, consider the effect of changing the initial endowment of agent j by some amount a:  Mj + a + §J{XJ)  = Uj+a  Eq: 4.5  Since money and utility are expressed in the same units, any transfer of money to other agents results in decrease in utility for the buying agent. The decrease for the buyer corresponds exactly to the dollar amount of the sale. Thus, for an exchange to be rational, the increase in intrinsic utility; tyj(Xj), that accrues to the buyer as a result of the purchase must exceed the price it pays. But since the price the buying agent pays is no less than the selling agent's intrinsic utility for the same good, the net effect of the transfer is that the agent with the highest intrinsic utility for the good gets the good. The selling agent is no worse off because it the good M appears in its utility function and the increase in M resulting from the sale offsets its loss of the good.  ADDITIVE UTILITY  Global utility, UQ, is a linear function of the agent utilities, Uj,  J =  r  G  Y j: a u  j  E c  i  :  4  -  6  Equation 4.6 states that the utility of the manufacturing organization as a whole is a linear function of the utility of each agent j that has been created during decomposition of the system. To better under-  90~  stand the additive property, it is useful to decompose the utility function into its components and recognize that the sum of Mj for all j is a constant, M. Rewriting Equation 4.6 yields:  G  = £  UjlMj + fy'Xj)] = M+ X <»jWj(Xj  Eq: 4.7  To verify the last term of Equation 4.7, recall that the function (J)-(^) is simply the expected value to agent j of owning the allocation of goods Xj. Moreover, the expected value for each agent is a function of exogenously-determined costs and rewards expressed in monetary units. Specifically, recall that there are three sources of costs: the variable costs of processing incurred by machines and the holding costs incurred by parts. O n the revenue side, there is only one real source revenue: the terminal rewards received by parts for successfully shipping. The important feature of all three sources of cost and revenue is that they are directly attributable to individual parts and machines (and hence attributable to individual agents). Moreover, as established in the preceding sections, the sources of costs and revenues for each agent are independent of other agents.  COMMON NUMERAIRE  The measure of utility for all agents is identical and interchangeable, Bij =  1 for V /  Eq: 4.8  The fourth property of agents states that a dollar is a dollar regardless of how the dollar is earned or which agent earns it. In other words, no job has a predefined, non-monetary priority over any other job.  F R O M PARETO OPTIMALITY TO GLOBAL OPTIMALITY  Combining the four properties above with the first fundamental theorem of welfare economics yields the following result: nax(L/ ) = max(V ) G  G  = max^  91  V- = ] £  m a x  (*j-  Eq: 4.9  The final term in Equation 4.9 restates of the first fundamental law of welfare economics and is therefore critical: the utility of the system as a whole can be maximized as a consequence of the maximizing behavior of each independent agent j. The price mechanism and the requirement that all exchanges be Pareto efficient ensure that the increase in utility for the buying agent is greater that the price paid to the seller in terms of the common numeraire good, M. As a consequence, the maximizing behavior of the agents leads to a strictly monotonic increase in global utility.  4.1.7 * C A V E A T S A N D  SCOPE  Before discussing the details of a system that implements Equation 4.9, two lurking issues regarding the "rules of money" and the properties of agents should be addressed. First, the additive utility property is not appropriate when inter-agent dependencies exits. For example, when two parts must come together simultaneously to create an assembly, neither part receives a reward until the assembly is completed. Although such dependencies tend to be the rule rather than the exception in manufacturing environments, the case of mating parts is taken to be beyond the scope of this thesis. However, as discussed in Section, there is a relatively straightforward way to address the problem of mating parts that is consistent with economic theory. Briefly, the relationship between assemblies and parts can be modeled by introducing hierarchical agent relationships analogous to contractor and subcontractor roles. The contractor agent receives the entire terminal reward for the finished assembly; however, the contractor must use side payments to subcontractors to coordinate the production of the individual parts. In effect the contractor's side payments are used to internalize the externality created by the synchronization issue. A related issue is the externality created by setups. If a production machine is setup for Operation A and an agent incurs the cost of setting up the machine for Operation B, then all agents that require Operation B benefit from the changeover without paying for it in any way. In a sense, setups are analogous to public goods in real economies. The defining characteristic of goods such as parks and clean air are that everyone benefits if the goods are provided, but no one is willing to incur the cost of provision alone. The problem created by setups is beyond the scope of the thesis. However, there are well-established techniques for internalizing public good externalities. For example, one approach is to introduce agents whose sole role is to consolidate demand for the good and ensure all agents who  92  benefit assume their share of the cost (e.g., governments in real economies). This issues surrounding setups and other externalities are discussed in greater detail in Section  4.1.8  A KNOWLEDGE REPRESENTATION  LANGUAGE FOR AGENTS  In order to model complex physical systems in terms of agent-oriented constructs—such as beliefs, desires, and capabilities—a knowledge representation language is required. This section provides an overview of the language used in this thesis. A knowledge representation language consists of two elements: syntax and semantics [Russell and Norvig, 1995]: •  Syntax describes the symbols and the allowable configurations of symbols that may be used to make statements about the world. The syntax may be graphical and consist of boxes and arrows. Alternatively, it may be mathematical or logical in flavor and consist of textual symbols (such as the examples from the situation calculus and STRIPS in Chapter 3). For any formal language, the syntax is ultimately a series of bits inside a computer's memory and thus the exact nature of the symbols used is not important as long as the different representations are isomorphic—that is, as long as there is a means of making lossless transitions from one set of symbols to the other. For example/it is a simple matter to translate decision trees (which have a graphical syntax) into productions (which have a textual IF-THEN syntax).  •  Semantics determines the relationship between sentences in the knowledge representation language and the world the language is meant to represent. In other words, semantics determine the meaning oi sentences [Winston, 1992].  For a formal knowledge representation language to be used for planning, it must also include mechanisms for inference. In the classical planning languages discussed in Section 3.1.3, inference is achieved through the well known inference rules of first-order logic (such as modusponens, resolution, and so on). However, in order to make decisions in the real world, an agent must be able to represent and reason about its own imperfect knowledge and the uncertain outcomes of its actions.  93  SOURCES OF UNCERTAINTY  The model of rationality used in this research (recall Figure 3.2 on page 53) admits two distinct forms of uncertainty: 1. Uncertainty about state: A n agent might not know precisely what state it is in. Indeed, the purpose of conventional management information systems is to help real agents refine their beliefs about their actual state (e.g., "Are we in a state with good quarterly performance?"; "Are we in a state in which Employee X has achieved her sales objectives?"). A n environment in which agents are assumed to have perfect sensing and can be relied on to know their current state is called accessible (orfully observable) [Russell and Norvig, 1995], [Puterman, 1994]. A n environment in which agents are permitted to be unsure are inaccessible or partially observable. 2. Uncertainty about outcomes: There are two distinct types of uncertainty about outcomes. Event uncertainty captures the agent's imperfect knowledge about how its environment evolves and changes over time. For example, an agent in a manufacturing environment may be adversely effected by events such a work stoppage due to labor unrest or a power outage. However, the agent has only imperfect knowledge about if and when such events will occur. The important feature of event uncertainty is that it has nothing to do with the agent. That is, the uncertainty is a feature of the environment exclusively. Action uncertainty, in contrast, arises from the agent's imperfect knowledge of how its own actions affect its environment. For example, a mobile robot may execute a PickUpBlock action. However, there may be a small chance of the block slipping through the gripper and remaining on. the table after execution of the action. Thus, the primary difference between an event and an action as they are defined here is that agents have no control over events. Note that in either case, all the uncertainty in this formulation arises as a result of the agent's inability to perfectly sense the world or its inability to perfectly predict outcomes. The world itself, however, is assumed to be a certain and unequivocal place. For example, if a block is on a table, it is on the table.  5  Moreover, in this research, a special accessibility assumption is made: all relevant features for an agent's states are known with certainty. Returning to the example of the mobile robot in Figure 3.1 on page  5  This assumption contrasts with the fuzzy-theoretic stance (e.g., [Zadeh, 1965]) in which the reality of the world itself is subject to nuances and degrees of truth.  94  43, the robot is assumed to have complete and perfect knowledge of the value of its x and y state variables. However, the agent is not assumed to have any knowledge of state variables that are not relevant to the planning task at hand. For example, the mobile robot is not assumed to have any knowledge of whether it is raining outside, the current stock price of I B M , and so on. Given these assumptions, the only remaining source of uncertainty is due to the agent's imperfect knowledge of causality. If the agent had better knowledge of causality, it would be able to refine its probabilistic assessments and make better decisions. For example, assume that a mobile robot believes that the sentence InGripper will be true following the PickUpBlock action in six out of ten instances. However, as noted in the discussion of fineness in Section 3.2.1, the agent could make better estimates with better information. For example, assume that a wet block is more likely to slip than a dry block when gripped. In such circumstances, the agent's belief about the truth value of InGripper following PickUpBlock could be expressed using conditional probabilities: P(InGripper | PickUpBlock, DryBlock) = 0.9 P(InGripper | PickUpBlock, -'DryBlock) = 0.1  Other factors that contribute to a block slipping could include whether the grip is oily, the hardness of the block, and so on. In principle, it should be possible to enumerate all the causal factors so that the truth value of an outcome following execution of an action is known with virtual certainty. In practice, however, such certainty is unachievable and thus the reasoning done by agents is necessarily approximate.  ELEMENTS OF T H E KNOWLEDGE REPRESENTATION LANGUAGE  In this thesis, agents are created to make decisions on behalf of objects (often inanimate) in the real world. As such, each agent must represent its object in both the agency sense of the word and the modeling sense of the word. In this section, the focus is the modeling sense. In addition to representing the object's state, the agent must be able to represent any relevant aspects of the environment in which the object is situated. The agent's own state is defined as the union of information about its object, the object's environment, and any other information that the agent requires to fulfil its decision making objectives. In addition to information about state, the agent requires information about  95  transitions between states and the costs and rewards associated with the transitions so that it can plan in dynamic environments. The language advocated for this purpose is a variant of the STRIPS representation that permits compact representations of actions with uncertain outcomes. The language is based on the work of the work by Boutilier etal. (e.g., [Dearden and Boutilier, 1997], [Boutilier and Dearden, 1996], [Boutilier,; Dearden and Goldszmidt, 1995]) which in turn incorporates concepts from A D L [Pednault, 1989] . ••• andBURIDAN [Kushmerick, Weld and Hanks, 1994]. The primary difference between the language developed in this thesis and its ancestors developed by Boutilier et al. is that the language for scheduling agents must represent and reason about time explicitly. In contrast, the planning domain generally used within the decision-theoretic planning community (at least for illustrative purposes) is the control of mobile robots. The core constructs of the action representation language are shown graphically in Figure 4.4. A n agent starts in a state s which may have an intrinsic utility to the agent represented by the immediate reward r(s). When the agent executes action a € A in state s, two things happen: first, the agent s  incurs an action cost c(a, s) ; second, the action has an effect e(a, s) e E  a  . The agent's knowl-  edge of effects is probabilistic and therefore its knows only which effects are in the set E  a  . and the  probability associated with each e(a, s) . The application of a set of effects to state s results in outcome—that is, a transition to a new state s'. In the following sections, the essential elements of this action representation framework are discussed in greater detail and illustrated with respect to a part • agent that requires three units of processing on three machines. States In STRIPS, states are represented by conjunctions of ground propositional atoms, such as s = (raining, "'umbrella). In the state description language used here, the notion of an atomic sentence is expanded to include predicates and the equality symbol. This notation permits parameterized sentences of the form variable name = value (e.g., OpStatus(Opl) = complete). Although the domain of value is not restricted to binary variables, it is assumed to be finite, discrete, and mutually exclusive.  96  S'n  FIGURE 4.4  A graphical representation of the elements of the knowledge representation language for agents.  It is important to recognize that the addition of predicates and the equality symbol are simply "syntactic sugar" [Russell and Norvig, 1995, p. 200]. That is, although the syntax resembles first-order logic, the semantics remain purely propositional (at least for the purposes of the scheduling algorithms considered here). The same sentence could be written in proposition form as OplComplete or -'OplComplete. However, the use of the predicate form simplifies parameterization over related state variables (e.g., OpStatus(Opl), OpStatus(Op2), ...). The equality sign permits compact representations of multi-valued attributes. For example, a propositional representation of time would require an atom' for each time unit (Tl, T2, ...). Moreover, since the time values are mutually exclusive, representation of a particular instant in time would.require a f-tuple in which t is the number of distinct time atoms and t - 1 time atoms are negated. The name = value notation makes it clear that the values of Time are mutually exclusive. The question of which variables to include in the agent's description of state is complicated by the inherent conflict between accuracy and computational feasibility discussed in Section O n one hand, the precision of conditional probability estimates increases monotonically with the amount of information in the conditioning term. O n the other hand, the curse of dimensionality (recall Section provides clear incentives to minimize the number of state variables in the problem formulation. Thus a judicious trade-off must be made between the decision-making value of additional information and the computational cost of using the information. Approaches to making this trade-off are suggested in the discussion of learning in Section  97  The number of state variables required to implement the simple part agents considered here is relatively modest, as shown in Table 4.1. Naturally, as in any scheduling system, the agent will need variables for representing the passage of time (Time), the completion status of operations (OpStatus(Opi)), and whether the agent owns contracts for specific production resources (Contract(Mj, Tj)). In addition, it is typically the case that the probability of completing processing in the next unit of time is conditional on the amount of processing that has already been performed on the operation. The state variable ElapsedTime(Opj) is used to retain this information from state to state so that it may be used when estimating outcomes. Encoding ElapsedTime(Opj) within the state description has obvious implications for the size of the problem state space. However, by making elapsed time information available to the planning algorithm, any discrete probability distribution for completion times can be used. Two special state variables are used to encode the fixed costs and benefits encountered by the agent as it works through the system. HoldingCost represents the penalty per unit time that the agent incurs whenever a unit of time passes. The standard practice in manufacturing is to make the holding cost a function of the value added to the part or the time it has spent in the system. Thus, as the part nears completion, its holding cost per unit time increases. The ProductValue variable represents the value of the product when it leaves the production system. Since the terminal rewards in this system are assumed to be exogenously determined, the variable can be treated like a constant in most cases. However, it may be desirable to account for specific changes to the terminal reward by assigning new values to ProductValue during processing. For example, certain measurable quality problems may occur during processing which result in the requirement to downgrade the final value of the part. The last two state variables shown in Table 4.1 are used by the agent for bookkeeping purposes. Shipped is a propositional variable which is used to determine whether the part has left the production system being scheduled. NewTimeUnit is used to simplify the representation of actions with temporal effects. Both these variables are discussed in greater detail in subsequent sections.  98  TABLE 4.1: State variables used to model a part agents State variable Time  Values 1,2  Description the current time w h e r e t is the n u m b e r of time  f  units in the current planning horizon OpStatus(Opj)  c o m p l e t e , incomplete  the status of e a c h p r o c e s s i n g o p e r a t i o n , O p |  Gontract(Mi,Tj)  y e s , no  whether the part o w n s a contract with m a c h i n e M| at time Tj  ElapsedTime(Opj)  0, 1,  .... m  the n u m b e r of units of p r o c e s s i n g time a l r e a d y i n v e s t e d in Opj  HoldingCost  o n e real value or multiple  the c o s t per-unit-time of holding o n e unit of  real v a l u e s that d e p e n d o n  v a l u e a d d e d in w o r k - i n - p r o c e s s inventory  the v a l u e of s o m e other  (typically a function of v a l u e - a d d e d or time in  state variable (e.g.,  the s y s t e m )  E l a p s e d T i m e ( O p i ) or OpStatus(Opi)) ProductValue  o n e or m o r e real v a l u e s  the e x o g e n o u s l y - d e t e r m i n e d v a l u e of the finished product  Shipped  y e s , no  w h e t h e r the product h a s b e e n s h i p p e d  NewTimeUnit  y e s , no  a flag to indicate whether a unit of time o n the real c l o c k h a s p a s s e d  Actions In STRIPS [Fikes and Nilsson, 1971], action descriptions consist of three lists (where a list is simply a' conjunction of propositional atoms): 1. precondition list: specifies atoms that must be true in the world for the action to be executed, 2. add list: specifies literals (possibly negated) that are known to be true in the world after the action is executed, 3. delete list: specifies atoms about which nothing is known following the action . In the fully 6  accessible case considered here, a delete list is not required.  It is important to recognize the distinction between atoms that are deleted via the delete list and atoms that are negated via the add list. In the former case, knowledge o f state is lost.  99  ( action^) action preconditions  I  ( aspect^)  ( aspect^  I outcome discriminants  I  (outcome^)  (outcome )  effects  I effects  ( probability )  ( probability )  FIGURE 4.5  ( aspect^)  r = = n  n  The action representation hierarchy forP-STRIPS: Action preconditions determine whether the action is executable; outcome discriminants influence the probability distributions over outcomes.  In probabilistic STRIPS (P-STRIPS), the basic STRIPS representation is expanded to support the description of actions with probabilistic outcomes. Since probability typically involves notions of dependence and independence, it is useful to introduce new constructs to allow dependence relationships to be represented parsimoniously. In what follows, each of the constructs in the P-STRIPS hierarchy shown in Figure 4.5 is described and illustrated with respect to particular action: Process(M2, 0p2, 04). The action describes processing operation Op2 on machine M2 at time unit 04 for exacdy one unit of time. The illustrations of the constructs are set in a sans-serif font to help distinguish the general case from the specific example. Following the recommendation of [Pednault, 1989], the notion of preconditions is divided into two distinct constructs: 1. Action preconditions — A n action precondition is a set of sentences (possibly empty) that specifies what must be true in the world before a particular action can be executed. In other words, the action preconditions define the set of actions A . s  100  In order to execute Process(M2, Op2, 04), three preconditions must be satisfied: First, the current time must be equal to time 04; second, the agent must own the contract for machine M2 at time 04. Finally, a physical precedence constraint dictates that Op2 cannot be commenced until O p l is complete. These preconditions are shown in Figure 4.6.  Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete  FlG URE 4.6  Action preconditions specify conditions that must be true in the worldfor the action to be feasible.  Outcome discriminants —- When an action is executed, its effects may depend on features of the current state. For example, actions of the form Process(Mj, Opj, T ) are likely to depend k  on the amount of processing that has already been invested in Opj. A discriminant [Boutilier and Dearden, 1996] is set of sentences (possibly empty) that corresponds to distinct sets of conditional effects. In other words, a discriminant J is used to construct conditional probabilities of the form P(cp|y) , where cp is a set of effects.  The probability of completing an operation after a single unit of processing time may be conditioned on a number of state variables. In this example, only ElapsedTime(0p2) is  101  considered. The relationship between ElapsedTime(Op2) and the probability of completing the operation by time 6~is shown Figure 4.7 by the cumulative histogram on the right.  Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete Discriminants: Dl: ElapsedTime(Op2) = 0 D2: ElapsedTime(Op2) = l D3: ElapsedTime(Op2) = 2  1.0  I F  TD 0)  0.5  LU VI  O CL  0  1  2  ElaspsedTime(Op2) F I G U R E  4.7  Discriminants are used to identify the relevantfeatures of the current state: When the action is executed, its outcome depends on the outcome discriminant.  The notion of effects must also be expanded in P-STRIPS to account for the possibility that a single action can have multiple possible outcomes: 3. Outcomes — The representation of non-deterministic effects requires the introduction of an additional construct that is used to batch together sets of effects. A n outcome is a set of effects, (p, that can be assigned a conditional probability, P((p|y) • The sum of the probabilities for the outcomes within a discriminant must sum to one. There are two basic outcomes of interest for any processing action: either the operation is complete at the end of the unit of time of processing or it remains in progress. As discussed above, the probability of either outcome is conditional on the amount of processing that has already been invested in the operation.  When ElapsedTime = 0, the probability of the operation being complete at the end of the unit of processing time is zero (i.e., min(c") > 1). Similarly, the assumption is made that the maximum amount of time that Op2 could require on M2 is three units of time. As such, the probability that the operation is complete after executing the Process(Op2, M2, 04) action in any state in which ElapsedTime = 2. is 1.0. When only one  102  unit of processing has already been applied to the operation, there is a 2 0 % chance that the operation will be complete at the end of the unit of processing time. The P-STRIPS representation of outcomes is shown in Figure 4.8.  Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete Discriminants: Dl: ElapsedTime(Op2) = 0 Outcomes: O l (not completed): D2: ElapsedTime(Op2) = 1 Outcomes: 01 (completed): P = 02 (not completed): D3: ElapsedTime(Op2) = 2 Outcomes: O l (completed): P =  FIGURE 4.8  0.2 P = 0.8  1.0  The execution of the action mil lead to one of the outcomes with a known probability.  Effects — The effects construct in STRIPS.  P = 1.0  P-STRIPS  is similar to the add lists in conventional  Effects are sentences that are known to be true in the world following execution of  actions.  The effects' associated with the "completed" and "not completed" outcomes are the same for every discriminant of Process(Op2, M2, 04). When the "completed" outcome occurs, the effect is OpStatus(Op2) = complete. When the outcome is "not completed", the OpStatus(Op2) variable remains unchanged. However, the ElapsedTime(Op2) variable is incremented by one to reflect the work that has been done during the unit of processing time, as shown in Figure 4.9.  103  Action: P r o c e s s ( O p 2 , Preconditions: Time =  M2,  04)  04  C o n t r a c t ( M 2 , 04)  = yes  OpStatus(Opl) = complete  Discriminants: Dl: ElapsedTime(Op2) = 0  Outcomes: 01 (not completed): P = 1.0 Effects: ElapsedTime(Op2)  = 1  D2: ElapsedTime(Op2) = 1  Outcomes: 01 (completed): P = 0.2 Effects: OpStatus(Op2) - complete 02 (not completed): P = 0.8 Effects: ElapsedTime(Op2)  = 2  D3: ElapsedTime(Op2) = 2  Outcomes: O l (completed): P = 1.0 Effects: OpStatus(Op2) = complete  FIGURE 4.9  Different effects lists are associated with each outcome.  Some actions have multiple sets of effects that are independent of each other. For example, all actions of the form Process(Opj, Mj, T ) are temporal—that is, their execution implies the passage of time. k  Moreover, the passage of time aspect the actions occurs regardless of whether there is a change in the completion status of the operation. Rather than including the passage of time within the effects of every discriminant and outcome, a factored representation based on action aspects is used [Boutilier and Dearden, 1996]. 5. Aspects — A n aspect is used to separate action effects into independent groups. Because the outcomes associated with each aspect are independent, the probability of joint outcomes can be determined by multiplying the probabilities of each outcome. For example, if the probabil-  104  NTU  NTU yes  value = -1  value = 0  value = -1  (a) FIGURE 4.11  o  (b) Reward trees conditioned on the value o/"NewTimeUnit: The tree in (b) is the "closed world" version in which paths not shown explicitly are assumed to be %ero.  ity of outcome 0  1 ;  from Aspect 1 is 0.4 and the probability of 0 j from Aspect 2 is 0.6, then  the probability of O  2  u  A0  2j  is 0.4 X 0.6 = 0.24.  As a temporal action, Process(Op2, M2, 04) has an aspect that describes the passage of time. To simplify this, the NewTimeUnit state variable is used. Since the passage of time is unconditional, only a single (empty) discriminant is required. The use of aspects to account for the passage of time is shown in Figure 4.10.  Rewards  In the conventional representation of rewards in the M D P literature, an agent receives an immediate reward r(s) whenever it enters the state s [Puterman, 1994]. The problem with the extensional formulation is that it requires a vector of rewards of size | S \. Moreover, it ignores the fact that rewards may be feature-based rather than state-based. A feature-based reward is triggered whenever certain preconditions are satisfied, regardless of what state the agent is in. A compact way to represent such rewards is to use trees in which interior nodes correspond to state variables, branches correspond to values of the state variables, and leaf nodes correspond to reward values. To illustrate, consider the issue of holding cost. A simple means of representing holding cost is to use the NewTimeUnit state variable as a reward precondition, as shown in Figure 4.11(a). The tree states that the agent receives a "reward" of -$1 whenever it enters a state in which NewTimeUnit = yes. Con7  In an extensional representation, each state in the state space is explicitly and uniquely named. In an intensional representation states are described by sets o f features. See [Boutilier, Dean and Hanks, 1999] for a more detailed discussion of the distinction between the two types of representations.  105  Action: Process(Op2, M2, 04) Preconditions: Time = 04 Contract(M2, 04) = yes OpStatus(Opl) = complete Aspects: A l (temporal): Discriminants: Dl: (empty) Outcomes: 01 (new time unit): P = 1.0 Effects: NewTimeUnit = yes A2 (processing): Discriminants: Dl: ElapsedTime(Op2) = 0 Outcomes: O l (not completed): P = 1.0 Effects: ElapsedTime(Op2) = 1 D2: ElapsedTime(Op2) = 1 Outcomes: 01 (completed): P = 0.2 Effects: OpStatus(Op2) = complete 0 2 (not completed): P = 0.8 Effects: ElapsedTime(Op2) = 2 D3: ElapsedTime(Op2) = 2 Outcomes: O l (completed): P = 1.0 Effects: OpStatus(Op2) = complete  FIGURE 4.10  An aspect is used to represent the passage of time in a temporal action.  versely, if the agent enters a state in which NewTimeUnit = no, it receives a reward of zero. The tree in (a) is exhaustive—that is, every s e S satisfies exactly one branch and is therefore associated with exactiy one reward. However, a variant of the closed world assumption^ can be applied to the reward tree The closed world assumption is often used to clarify incomplete representations o f state. The assumption states that any proposition not explicitly known to be true in the initial state can be presumed false [Weld, 1998].  106  NewTimeUnit  NewTimeUnit  yes  yes  i # HC  $1 / $ 2  \  OpStatus(Opl)  complete  OpStatus(Op2)  $3 complete  value = -1 O value = -3 value = -2  incomplete  OpStatus(Op3) value = -1  (a)  complete value.= -3  incomplete value = -2  (b) FIGURE 4.12  Action cost treesfor holding costs: In (a), the value of the HoldingCost variable is used to condition the value of the rewards. In (b), there is a perfect correlation between HoldingCost and OpStatus(Opj) and thus HoldingCost is removed to simplify the tree.  to reduce the tree's bushiness. Under this assumption, only branches with non-zero leaf nodes are included, as shown in Figure 4.11(b). Although all the rewards in the part-agent formulation used here are negative, the term "reward" is used instead of alternatives (such as "penalty") in order to remain consistent with the standard M D P nomenclature. The same tree structure can be used to represent more complex reward preconditions. For example, Figure 4.12(a) shows a reward tree for the case in which the state variable HoldingCost takes on different values. This type of tree would be useful whenever certain milestone events or actions are used to increment the value of HoldingCost. For example, assume HoldingCost is incremented whenever an operation is completed. Interestingly, given the direct correspondence between HoldingCost and the OpStatus(Opj) variables in this example, it is possible to eliminate HoldingCost from the reward representation, as shown in Figure 4.12(b). This latter representation is preferable given that variables of the form OpStatus(Opj) are bound to appear in the tree anyway.  107  Action  costs  A n action cost c(a, s) can be associated with the execution of an action in a particular state. For example, in the mobile robot example in Section 3.1.4, an action cost is associated with each movement action to reflect the drain on the robot's battery. To achieve parsimonious representation, the same feature-based approach used for rewards is used for action costs. The primary difference between rewards and action costs is that a separate action cost tree is required for each action. For the part-agents considered in this system, the "terminal reward" the agent receives when leaving the system is implemented as an action cost that is triggered whenever the agent executes the Ship action. As shown in the action schema for the Shipped action in Figure 4.13(a), the action can only be executed when 0pStatus(0p3) = complete and Shipped = no. Since the only effect of the action is to set Shipped = yes, and since there is no action with a Shipped = no effect, the action can only be executed once within a particular planning problem. The action cost is triggered by the execution of the action so there is no need to repeat the preconditions in the action cost tree. For this reason, the action cost tree shown in Figure 4.13(b), the tree is unconditional—that is, it consists of a single leaf node only. However, it is possible to have more complex action cost trees. For example, a terminal reward that is contingent on shipping date could be implemented by adding the Time variable to the action cost tree. As with rewards, the nomenclature used for action costs can be confusing. Although all the action costs in the part-agent formulation used here are negative, the term "action cost" is used instead of alternatives (such as "action reward") in order to remain consistent with the standard M D P nomenclature.  EVENTS AND TIME  In order to model the physical world accurately, a means of representing non-deterministic events (such as broken bits, power outages, and so on) is required. Recalling the distinction made between actions and events in Section, an event is a state transition that is not a direct consequence of an agent's actions. However, events can occur as indirect consequences of actions' and thus a distinction is made between two types of events in P-STRIPS: action-dependent and action-independent. Action-dependent events provide agents with mechanisms to reason about events that may occur while they are engaged in some action. For example, while executing an action of the form Process(Opj, Mj, T ), the k  bit of the machine may break. Although the breakage event is independent of the primary aspect of  108  Action: Ship Preconditions: Shipped = no OpStatus(Op3) •= complete Aspects: Al: Discriminants: Dl: (empty) Outcomes: O l (ship part): P = 1.0 Effects: Shipped = yes  O value=-$100  (a) FIGURE 4.13  (b)  The schema for the Ship action: The P-STRIPS representation is shown in (a) and the (unconditional) action cost tree associated with the action is shown in (b).  the action (i.e., whether the product is completed in the current processing step), the event only occurs during processing actions. To represent action-dependent events, event aspects are added to the action descriptions. To illustrate, consider the augmented schema for the Process(Op2, M2, 04) action shown in Figure 4.14.  For machine M2 and operation Op2, there is a 0.0028 probability of the bit breaking during a unit of machining. If the bit breaks, the machine requires a new setup. The probability of the part or the machine being damaged as a result of a bit breaking is assumed to be negligible.  The discriminants within the event aspects can be used to condition the probability of the event occurring on the value of any number of state variables. For example, the probability of a bit breaking may depend on a number of operating parameters, such as the hardness of the stock being machined, the age of the bit, the feedrate of the stock into the machine, and so on. In the example used here, most operating parameters are determined by the interaction of the part, operation, and machine and thus there is no need to introduce a large number of state variables into the discriminants. Instead, a different probability distribution can be associated with each Process(Op , M:, T ) action. i  109  k  Action: Process(Op2, M2, 04) Preconditions: Aspects: A l (temporal): A2 (processing): A3 (bit breakage): Discriminants Dl:  (empty) Outcomes:  01 (bit breaks): p = 0.0028 Effects:  Setup(M2, Op2) = False 0 2 (bit okay): p= 0.9972 Effects:  (empty)  FIGURE 4.14  An example of the use of P-STRIPS to represent an action-dependent event.  Action-independent events are more difficult to represent since actions form the basis of all state transitions in STRIPS-like formalisms. As their name implies, action-independent events should not be included as aspects of particular actions in the same way as action-dependent events. For example, events such as labor disputes and power outages are not caused or influenced by a particular action performed by the agent. However, since such events can have a significant impact on the agent's utility, the agent should be able to reason about and plan around their possible occurrence. Another important consideration is that an action-dependent event for one agent is often an action-independent event for other agents. Assume for example that the part represented by Agent 1 is being processed on Machine M2. Agent 1 can use an action-dependent representation of the bit breaking event since the event is linked to its processing action. However, for all agents waiting in the queue behind Agent 1, the bit breaking event is an action-independent event. In order to understand how action-independent events are handled in P-STRIPS, a brief explanation of the IncrementClock action is required. IncrementClock is a special fixed action that is automatically  110  assigned to certain states. A fixed action is simply an action that the planning algorithm cannot replace with another action. A portion of the action schema for IncrementClock is shown in Figure 4.15. As the preconditions indicate, the action is executed in any state in which the state variable NewTimeUnit is true. Since the "reset N T U " aspect is used to set the value of the NewTimeUnit state variable to false, the passage of time in the system can be seen as a series of transitions between two halves of the state space: the half in which NewTimeUnit is true and the half in which NewTimeUnit is false. Action: IncrementClock Preconditions: NewTimeUnit = True Aspects: A l (reset NTU): Discriminants: Dl: (empty) ' Outcomes: 01: P = 1.0 Effects: NewTimeUnit = no A2 (advance clock): Discriminants: Dl: Time = 1 Outcomes: 02: P = 1.0 Effects: Time  D2: Time = 2 Outcomes: 01: P = 1.0 Effects: Time = 3  Drr. Time = n Outcomes: 01: P = 1.0 Effects: Time = long  FIGURE  4.15  partial action schema for the IncrementClock action: The discriminant Dn is used to advance the clock past the planning horizon of t time units.  A.  111  The primary rationale for introducing the NewTimeUnit and IncrementClock constructs is to simplify the process by which the Time state variable is incremented. The "advance clock" aspect in Figure 4.15 shows how discriminants and effects are used to increment the value time whenever the IncrementClock action is executed. However, the IncrementClock action is also used to implement action-independent events. For example, the probability a particular machine requiring unscheduled maintenance can be added as an aspect:  Completion of the part requires machines M2 and M3 to be operational. The probability of M2  breaking down in any unit of time is estimated to be 0.003. The probability of  M3  breaking down in any unit of time is 0.00012. Since the aspects are assumed to be independent, the probability of both machines breaking down within a unit of time is 0.003 x 0.00012 =  0.00000036.  Since both breakdown events are taken to be independent of a particular action and independent of each other, they can be added to the IncrementClock action as separate aspects, as shown in Figure 4.16. Like any other discriminant, the discimrinants for action-independent events can be refined by conditioning the outcome probability distributions on state variables. Thus, the IncrementClock action provides a natural way of describing time-dependent events—such as any type of decay process—in a parsimonious and natural manner. Although it is certainly possible from a computational point of view to embed "exogenous events" into the agent's basic action descriptions without the overhead of a special IncrementClock action (e.g., [Boutilier, Dean and Hanks, 1999]), the clear distinction between action-independent and action-dependent events is adopted to simplify the modeling process.  TERMINAL  STATES  There are two situations in which the state transitions for an agent should cease: when the part has been shipped (Shipped = yes) and when the agent's planning horizon has been exceeded (Time = long). The fixed action End is associated with both these "terminal states" to prevent any further state transitions. In M D P terminology, the End action is used to implement absorbing states. Since the End action has no action cost and the absorbing states have no immediate reward, the agent's utility does not change once a terminal state has been entered.  112  Action: IncrementClock Preconditions: NewTimeUnit = True Aspects: A l (reset NTU): A2 (advance clock): A3 (M2 breakdown): Discriminants: Dl: (empty)  Outcomes: 01: P = 0.003  Effects:  Setup(M2) = false A4 (M3 breakdown): Discriminants: Dl: (empty) •  Outcomes: OI: P = 0.00012  Effects: Setup(M3) ='false  FIGURE 4.16  Aspectsfor action-independent events: The IncrementClock action is used to "host" all action-independent events.  4.2 AGENT-LEVEL RATIONALITY Formulating.resource allocations problems as M D P s and solving them using stochastic dynamic programming techniques such as policy iteration provides policies that are provably optimal. However, the curse of dimensionality (see Section on page 48) prevents all but the smallest problems from being formulated in this way. Moreover, although the agent-level problems addressed in this research are exponentially smaller than the global-level problem for an entire manufacturing facility, the curse of dimensionality remains problematic. To illustrate, recall the set of state variables for a part-agent shown in Table 4.1 on page 99. This set can be considered "minimal" in that it is sufficient for representing fundamental issues such as the passage of time, the completion status of jobs, and so on. However, the formulation in Table 4.1 does include state variables for representing more complex issues such as machine setups and material availability. In addition, the size of the problem is relatively  113  TABIJE.  4.2: Determination oj the state space for a three-operation three-machine task with a decision horizon of 10 time units.  Variable class  Number of instances, /  Number of values in domain, |d|  Number of unique values, \d\'  Time  1  10  10  OpStatus(Opi)  3  2  8  ContracUMi.Tj)  30  2  1,073,741,824  ElapsedTime  3  Op1 = 4 Op2 = 3 Op3 = 5  HoldingCost  1  1  1  ProductValue  1  1  1  Shipped  1  2  2  NewTimeUnit  1  2  2  60  2.06 x 10  Total number of unique states  13  modest: three operations, three machines, and a planning horizon of 10 time units. Despite its modest 1 "\  size, the problem induces a massive explicit state space of 10  states, as shown in Table 4.2.  A large portion of the state space in the problem formulation is the result of parameterized state variables such as Contract(Mj, Tj). The contract.status for a certain machine at a certain time is a binary variable (either the agent owns the contract or it does not). However, the need to consider each of T units of time on each of Mmachines as a separate resource means that 2  Mx  ^unique combinations of  contracts are possible in any state. Clearly, as the number of machines that are relevant to the agent increases or as the planning horizon is extended, the number of states becomes extremely large. A n important goal of this research is to identify ways in which the power of the M D P formulation can be retained without triggering an explosion in the size of the problem's state space. To achieve this end, four techniques have been employed: structured dynamic programming, assertions and constraints, rolling planning horizons, and distinct valuation and bidding phases. In the following sections, each of the "coping strategies" is described in greater detail.  114  4.2.1  STRUCTURED DYNAMIC  PROGRAMMING  The feature-based representations used in P-STRIPS provide a convenient and parsimonious means of representing actions, rewards, and action costs. By describing the antecedents and consequences of actions in terms of specific state variables (rather than in terms of states themselves) an enormous amount of irrelevant information can be eliminated from the action description. For example, rather than using a matrix of size \S\ to map each state to a reward value, P - S T R I P S employs simple tree structures such as those shown in Figure 4.12 on page 107. Despite the compactness of the representation language, difficulties occur when an attempt is made to use the representation language to reason about agent actions. The conventional MDP-based solution methods used in decision-theoretic planning require the agent's state space to be enumerated explicitly. Unfortunately, as the example in the preceding section illustrates, explicit enumeration of states is impractical, even for a small problem. In structured dynamic programming (SDP), the feature-based techniques used for problem representation are carried over into the reasoning and solution phase. For example, [Tatman and Shachter, 1990] show how dynamic programming algorithms can be applied directly to influence diagrams and decision trees. [Boutilier, Dearden and Goldszmidt, 1995] apply a similar approach to the planning domain. Indeed, the work of Boutilier etal. provides the foundation for, the S D P algorithm developed and implemented in this thesis (see [Boutilier, Dean and Hanks, 1999] for a recent review).  POLICY TREES  The foundation of the S D P algorithm used in this thesis is a policy tree. A policy tree has the same basic structure as the reward and action cost tress considered in Section—interior nodes correspond to state variables and the branches emanating from nodes correspond to values of the state variable. In addition to the basic tree structure, policy trees contain two types of special nodes: 1. Policy nodes — A policy node attaches to a node of the policy tree and indicates which action the agent should execute when it is in a state that satisfies the branches leading to the policy node. 2.  Value leaves — A value leaf contains the utility the agent expects to receive from being in a state that satisfies the branches leading to the value leaf.  115  To illustrate, consider the simple policy tree shown in Figure 4.17(a). The tree consists of a root node corresponding to the state variable Shipped and two leaf nodes corresponding to the domain values of the variable, {yes, no}. The policy nodes are represented by the action names adjacent to 'nodes of the tree.- In this example, the policy for the agent is simply "execute the action End in any state in which Shipped = yes and the action Wait in any which Shipped = no." The value leaves can be interpreted in a similar manner: the utility an agent expect to receive from being in a state in which 9  Shipped = yes is true is $0 and $-10 otherwise. To make a clear distinction between different types of nodes, interior nodes of policy trees are denoted by a black dot % and leaf nodes are denoted by a hollow dot O i  It is important to note that policy nodes need not be associated with leaf nodes. For example, in the tree shown in Figure 4.17(b), the Ship action is associated with an internal node corresponding to the branches {Shipped = no, OpStatus(Op3) = complete}. In other words, if the current state satisfies the conditions leading to the policy node, then the prescribed action is Ship, regardless of the value of subsequent branches (such as Time in this example). O f course, the value to the agent given the action that it has chosen may depend on other variables. Consequently, there may be many branches between a policy node and the value leaves associated with the policy node. In Figure 4.17(b), the binary state variable Quality is introduced to illustrate the case in which the value of executing the Ship action depends on the final quality of the product.  If Quality = high the agent receives $100 whereas  if the quality is determined to be low the agent receives only $50. Interior policy nodes are denoted by a partially-blackened dot O  R E L E V A N C E RELATIONSHIPS  The basic observation underlying the SDP approach is that not all state variables are necessarily relevant when deciding on an action. The relevance relationships that do exist between actions and certain state variables can be efficiendy represented using a policy tree. To illustrate, recall the simple policy tree in Figure 4.17(a). The overall policy depends on the value of the Shipped variable only. In any state in which Shipped = yes, the End action is executed. Alternatively, if Shipped = no, the Wait  The assumption o f risk neutrality from Section 4.1.6 permits the terms "value" and "utility" to be used interchangeably. The Quality state variable is used for illustrative purposes i n this example only.  116  value = 100  (a) FIGURE 4.17  value = 50  (b)  Basic policy trees: The value of certain leaves in the policy tree in (a) can be improved by executing the Ship action whenever operation Op3 is complete.  action is executed. Thus, with only two branches, the policy tree is complete—it specifies an action to be executed in every possible state in the agent's state space. The leaf nodes of the policy tree are called abstract states [Boutilier and Dearden, 1996]. A n abstract state contains just enough information to group together a set of concrete states. To use the problem formulation from Table 4.2 on page 114, the left-hand leaf of the policy tree in which Shipped = yes corresponds to half of the 10  concrete states. However, from the perspective of the SDP planning  algorithm, all the concrete states have the same utility value and are associated with the same action. As a consequence, there is no need to distinguish between the states on the basis of other state variables, such as whether the agent owns certain combinations of contracts, and so on.  POLICY IMPROVEMENT  Although the policy tree in Figure 4.17(a) is complete, it is clearly not optimal. In general, the S D P algorithm can be seen as an "anytime planner" since the policy tree provides a feasible policy at every point in the improvement algorithm. Moreover, the value of the policy to the agent rises monotonically with each iteration so it is conceivable that a fixed number of iterations could be used to set an upper bound on computation time. O f course, using the algorithm in anytime mode comes at the cost of optimality. In this research, the objective is to examine the interaction of strictly rational agents in a market and therefore no hard constraints on computational time are assumed. The SDP  117  algorithm is permitted to run to quiescence for every agent-level planning problem, regardless of the amount of computation required. To improve a policy tree, the S D P algorithm examines the defender actions (i.e., the actions already assigned to policy nodes) and attempts to find challenger actions that increase the value of the leaf or leaves beneath the policy node.  Recalling the terms of the value function in Equation 3.1 on page  46, there are two ways in which an action can influence the value of a state: 1. Action cost: A n action can involve an action cost (possibly negative) that is triggered by execution of the action. 2.  Outcomes: The value of the current state includes the expected value of the states reachable by executing the action.  To illustrate the basic process by which defender actions are replaced by challengers, consider the abstract state on the Shipped = no branch of the policy tree in Figure 4.17(a). One challenger action that the agent is bound to consider is Ship, which involves a large (negative) action cost (i.e., the agent receives $100 for executing the Ship action). Since the Ship action has two preconditions, a direct comparison cannot be made between Ship and the defender action, Wait, without first adding the preconditions to the tree. The first precondition, Shipped = no is already part of the tree and is satisfied by the policy node under consideration. The second precondition (OpStatus(Op3) = complete) must be grafted to the tree, as shown in Figure 4.17(b). Since policy trees must be exhaustive, the addition of the OpStatus(Op3) node creates the requirement for a new branch for all the states under the policy node in which OpStatus(Op3) * complete. The new branch is called a precondition complement. Obviously, the challenger action is not feasible in the abstract state in which its preconditions are not satisfied. As a consequence, a new policy node containing the defender action is attached to the precondition complement, as shown by the Wait action on the far right of Figure 4.17(b). Once the challenger and all the necessary precondition complements are added, the tree is evaluated. If a leaf has a higher expected value under the challenger action than the defender action, a policy  1 1  Fixed actions such as End (see Section cannot be changed. A s such, branches of the policy tree associated with fixed actions are ignored by the S D P algorithm. •  118  node containing the challenger action is attached to the leaf. Alternatively, if the challenger provides no improvement over the defender action for any leaf under the policy node, then the policy tree is restored to its pre-challenger state. In the example shown in Figure 4.17(b), the Ship action leads to a value of $100 or $50 (depending on the final quality of the product). In either case, the value is greater than that of the defender action: -$10. As such, the changes made to the tree are committed and the algorithm continues to search for other improvements. When no further improvements are possible, the policy improvement phase stops.  COMPUTATIONAL PROPERTIES OF T H E S D P ALGORITHM  The only difference between the S D P algorithm sketched out above and the policy iteration algorithm discussed in Section 3.1.4 is that the S D P algorithm works on abstract states rather than concrete states. Although there is a certain amount of overhead required to maintain the tree structure, the policies generated by S D P and policy iteration are equivalent (keeping in mind that the representation of the policy in S D P is much more compact). However, since policy iteration is guaranteed to converge on the optimal policy in time bounded by a polynomial function of the effective size of the state space, and since the number of abstract states in a policy tree is typically much smaller than the number of explicit states in a conventional M D P formulation, S D P can yield significant performance gains. Moreover, since only irrelevant information is ignored by the S D P algorithm, there is no loss of optimality or approximation associated with the technique. One important shortcoming of the S D P algorithm is that the size of the effective state space (i.e., the number of leaf nodes) is not known a priori. Indeed, the final size of the policy tree is not known until the optimal policy is found. In the worst case, every state variable is relevant to every choice of action and the number of abstract states in the policy tree is equal to the number of concrete states in the conventional M D P formulation of the problem. However, the problem formulations encountered in this research, the SDP algorithm is able to exploit independence relationships between state variables and actions. The question of how much computational leverage can be gained is an empirical question that depends on the problem's underlying structure. In Chapter 5, this question is addressed in greater detail. A t this point, it is sufficient to conclude that, in the context of the part-agents considered here, the SDP algorithm can lead to an exponential decrease in the number of states that would be required for conventional policy iteration. Although the exponential decrease does not fully offset  119  the exponential increase in state space caused by adding state variables, it does greatiy increase the size of M D P s that can be solved given a set of computational resources.  4.2.2  GROWING POLICY TREES  In the example in the previous section, the policy tree grew to accommodate an action precondition. In general, there are five additional sources of tree growth: fixed actions, rewards, action costs, discriminants, and supplemental discriminants. In this section, the different sources of tree growth are illustrated by stepping though a small number of iterations of the SDP algorithm. In the first steps, the foundation of the policy tree is built by adding the fixed actions and reward values. In subsequent steps, a tree evaluation and improvement process is repeated until the optimal policy is found. Since the number of leaf nodes in the optimal policy tree for this example is approximately 1000, the entire SDP process is not illustrated.  INFRASTRUCTURE FORFIXED ACTIONS  Fixed actions provide the "major limbs" of the policy tree, as shown in Figure 4.18. In the part-agent example considered here, two fixed actions—IncrementClock and End—are used to provide basic infrastructure for the planning algorithm. For example, in any state in which the product has been shipped (Shipped = yes), the End action is executed to ensure than no further state transitions are made by the agent. The IncrementClock action is used to manage time-based transitions for the agent. Since the overall objective of a part-agent is to determine its preferences for resource usage over time, accounting for time is a fundamental issue in the design of this type of agent. Figure 4.18 also introduces some new notation for policy trees: 1. The pie-wedge branch under the Time variable is used to represent a large number of values in the variable's domain. In the case of Time, the size of the domain is determined by the granularity of the time units and the length of the planning horizon. In this example, a generic unit of time is used and the planning horizon is assumed to be 10 time units. 2.  The special branch value "*" is introduced as a means of simplifying the representation of non-binary state variables—it denotes the complement of a specific branch or set of branches.  120  FIGURE 4.18  The "major limbs" ofa policy tree for a part agent: TheIncrementClock a^ra is used to manage time-based transitions. The End action is used to indicate the end of the planning process.  Thus, the branch Time = * adjacent to a branch labeled Time = long in Figure 4.18 denotes all states in which the value of Time is not known to be "long". If a *-valued branch was not used in this case, all non4ong values of time would have to be enumerated on the right-hand-side of the tree. The special time value long is used to represent the "long term"—that is, all time values greater than the planning horizon. As the schema for IncrementClock in Figure 4.15 on page 111 shows, if the agent's planning horizon is 10 (i.e., t = 10) and the IncrementClock action is executed when Time = 10, the value of time is set to "long.". The action associated with the Time = long branch is End, indicating a terminal state. In any state in which the planning horizon has been exceeded, the algorithm enters an absorbing state and all planning activity ends. The concept of rolling planning horizons is discussed in Section 4.2.3.  REAPING REWARDS  The leaves of the policy tree contain values corresponding to the utility the agent expects to receive by being in that particular abstract state. As such, any element of the problem definition that can affect the value of the leaves must be included in the tree. Recalling the discussion regarding the rewards in Figure 4.11 on page 105, the agent incurs an immediate reward equal to the per-unit-time holding cost of the part whenever it enters a NewTimeUnit = yes state. However, since the size of the  121  FIGURE 4.19  The core policy tree with rewards: The reward tree for the agent is appended to all the leaf nodes on the NewTimeUnit = yes side of the policy tree. To keep the diagram readable, only the reward tree rooted atT\me = 10 is shown.  holding cost is typically a function of the amount of value added to the part, the value of the reward is conditioned on the OpStatus(Opj) family of variables. The expanded reward tree must be appended to each leaf on the NewTimeUnit = yes side of the policy tree, as shown in Figure 4.19. For example, execution of the IncrementClock action from the Time = 10 state on the left-hand side of the tree in Figure 4.19 leads to an absorbing state. Since there is no action cost associated with the IncrementClock action, the values of the leaves under the Time = 10 node are determined exclusively by the immediate rewards. In this case, the agent incurs a holding cost based on the number of operations that have been completed and then enters a zero-valued terminal state. The tree shown in Figure 4.19 constitutes the agent's core policy tree. The core policy tree includes all fixed action and rewards and therefore constitutes a complete policy. Moreover, since the core policy tree is independent of actions, it does not change as the tree is improved.  122  POLICY MAPPING A N D EVALUATION  In order for the value of a policy tree to be determined, each abstract state must be mapped to its post-action outcome(s). If an action is deterministic, it has exactly one outcome with a probability of 1.0. However, if the action is non-deterministic, its execution will lead to a transition to one of many possible states. Consequently, the expected value of executing a non-deterministic action is determined by summing the value of each outcome state multiplied by the probability of the outcome occurring. In this example, the IncrementClock action is deterministic. As such, all the leaves under the Time = 10 node on the left-hand-side of the tree map to a single Time = long leaf on the right-hand-side of the tree, as shown by the dotted lines in Figure 4.20. In contrast, consider the Wait action on the right-hand-side of the tree in Figure 4.19. The agent's state description before the Wait action consists of the following atoms: {NewTimeUnit = no, Shipped = no, Time * long}. The Wait  action only has one effect—NewTimeUnit = yes—so the agent's new state following execution of the action is simply {NewTimeUnit = yes, Shipped = no, Time * long}. To map to an outcome state, the SDP algorithm starts at the root node of the policy tree and works down the tree until a leaf node is encountered. In this example, it is possible to resolve the first branching node (NewTimeUnit = yes); however, it is not possible to select a branch under the Time node or determine the correct values of the OpStatus(Opj) variables. Since, these variables do not currendy appear in the branches leading to the original policy node, there is insufficient information to complete the mapping.  To eliminate the incomplete mapping problem, the description of the abstract state in which the action is executed is augmented with additional information, as shown under the Wait policy node in Figure 4.20. The state variables added to eliminate mapping ambiguity are called supplemental discriminants. Although the abstract states below the original Wait policy node inherit the action, the values of the leaf node are now conditioned on specific values of Time and the OpStatus(Opj) variables. For example, the value of the state marked Sj can be calculated using the non-discounted ((3 = 1) version of the value function from Equation 3.1: F(5j)  = r(S ) l  + c(Wait,S )+  P(j\Wait,  l  jeS  123  S )V(j) x  E q : 4.10  FIGURE 4.20  The core policy tree with mapping information: The policy tree is augmented with supplemental discriminants to enable mapping of outcomes of the Wait action.  Since Wait is deterministic, there is only one outcome state j (labeled  in Figure 4.20). Furthermore,  there are no relevant action costs or rewards associated with the transition and thus Equation 4.10 reduces to V{S ) = 0 + 0 + [1.0x V(S )] X  2  = -2.  Eq:4.11  Each leaf node generates a separate value equation of the form shown above. However, the value equations are interdependent and must therefore be solved either as a set of simultaneous linear equations or through successive approximation (see [Puterman, 1994] for a description of both approaches in the context of policy iteration). The result of the evaluation stage under a given policy 71 is a value F ^ s ) for Vs E S.  124  FIGURE 4.21  A policy tree prior to improvement: A state transition mapping is shown for the Wait action. Note that the Ship action has already been selectedfor the state in which 0pStatus(0p3) = complete and its leaf node reflects the large negative action cost associated with the Ship action.  POLICY IMPROVEMENT  In the policy improvement phase, each policy node is evaluated with respect to the feasible actions available at the node. To illustrate the improvement phase, consider the node labeled S] in Figure 4.21. In this example, the policy node under consideration is associated with the Time = 9 node on the right-hand-side of the tree.  The node S  }  is defined by the following atoms: {NewTimeUnit = no, Shipped = no,  Time = 9, OpStatus(Opl) = complete, OpStatus(Op2) = complete, OpStatus(Op3) = incomplete}. The defender action at the node is Wait and the challenger action that is currently being evaluated is Process(Op3, M3, 09). The challenger's completion time discriminant is shown graphically in Figure 4.22 below.  125  ElaspsedTime(0p3) FIGURE 4.22  The cumulative completion time distribution for operation Op3: There is a probability of 0.1 the that operation will be complete after executing the Process(Op3, M3, 09) action in a state in which ElapsedTime(Op3) = 1. In contrast, j/"ElapsedTime(Op3) = 3, the probability of completion is 0.7.  The set of feasible actions in an abstract state consists of all actions for which the action preconditions are not contradicted by branches that define the state. For example, the policy node under consideration is at the end of a Time = 9 branch. Consequently, all actions of the form Process(Opj, Mj, T ) k  in which T ^ 9 are infeasible. The SDP algorithm considers each action in the feasible set and selects k  the one with the largest single-leaf increase over the defender action. This action is designated as the best challenger. If it turns out that the best challenger does not have any leaves with higher values than the defender action, then no changes are made to the tree.  The defender action at Sj, Wait (not shown), results in a single leaf with a value of -4 (this value corresponds to a holding cost penalty of -2 to move from Time = 9 to Time = 10 and a second holding cost penalty of -2 to move from Time = 10 to Time = long). The tree for the challenger action Process(Op3, M3, 09) is shown in Figure 4.23.  Although two of the action's preconditions (Time = 9 and OpStatus(Op2) = complete) are already satisfied by the branches leading to Sj, the third precondition (Contract(M3, 09) = yes) must be added to the tree above the challenger's policy node. The policy tree must be exhaustive—that is, it must provide an action for every concrete  126  FIGURE 4.23  A. partialpolicy tree showing outcomes: The challenger action Process(Op2, M2, 09) is appended to the policy tree. The value of the ElapsedTime = 2 discriminant reflects a 0.2 probability of moving to the OpStatus(Op3) = complete state and a 0.8 probability of remaining in an OpStatus(Op3) = incomplete state.  state—and therefore the addition of a new precondition branch means that a precondition complement (Contract(M3 09) = *) must also be added to the tree. A policy node containing the defender action (Wait) is associated with the precondition complement.  Any discriminant value of an action that can effect the value of an outcome must be included in the tree. For example, the "processing" aspect of the Process(Opj, Mj TL.) family of actions is conditioned  127  on values of ElapsedTime(Opj). Since these discriminants influence outcomes (and therefore influence values), discriminant nodes must be added to the tree.  Both aspects of the Process(Op3, M3, 09) action can lead to different outcome values and are therefore deemed relevant. The "temporal" aspect results in a single unconditional effect: NewTimeUnit = yes. The "processing" aspect is conditioned on values of ElapsedTime(Op3). The cross-product of the effects of the two aspects for ElapsedTime = 2 is shown in Figure 4.23 as two dotted arrows leading to different outcomes. The outcome {NewTimeUnit = yes, ElapsedTime(Op3) = 3} occurs with probability 1.0 x 0.8 = 0.8. The outcome {NewTimeUnit = yes, OpStatus(Op3) = complete} occurs with a probability of 1.0 x 0.2 = 0.2.  The expected value of executing the Process(Op3, M3, 09) when ElapsedTime(Op3) = 2 is therefore 0.2 x $97 (operation completes) + 0.8 x -$4 (operation continues) = $16.2. For this particular node, the challenger's expected value is clearly better than the defender's expected value (-$4).  Given that at least one of the challenger's leaf values is greater than the corresponding defender value, the Process(Op3, M3, 09) challenger tree is set aside as the best challenger and a new challenger tree rooted at Sj is created for the next action in the feasible set. If the new challenger tree contains a leaf value that is greater than the maximum leaf value in the best challenger, then the new challenger becomes the best challenger. Once all actions in the feasible set for a particular policy node have been evaluated the best challenger is permanently merged with the policy tree and the tree is evaluated. Tree evaluation is necessary since there may be value leaves in other parts of the policy tree that map to obsolete leaves in the defender tree (an obsolete leaf is one that has been replaced during the policy improvement stage). Iterations of improvement and evaluation continue until the defender policy is better than all challengers for every policy node in the tree.  ASSERTIONS A N D CONSTRAINTS  One shortcoming of the  P-STRIPS  language is that its semantics are incomplete. It is possible, for  example, to create subtrees (conjunctions of variable — value atoms) that are syntactically correct but nonsensical in the world being modeled. To illustrate, consider the following defender-challenger scenario:  128  A defender action at policy node Sj in Figure 4.24 is being challenged by action Process(M3, Op3, 01). In order to add the challenger policy node, the precondition OpStatus(Op2) = complete has been appended below Sj. The branch leading to the new policy node now contains the combination: {OpStatus(Opl) * complete, OpStatus(Op2) = complete}. However, precedence constraints in the manufacturing domain prevent such a situation from occurring in the physical world.  \  • complete /  OpStatus(Opl) \ incomplete  complete/  \  incomplete  Process(Op3, M3, 01) Q OpStatUS(Op2)  FIGURE 4.24 An example of an invalid combination of state variable values: There is no way for operation Op2 to be complete if operation Op 1 is incomplete.  The existence of nonsensical combinations of state variable values do not affect the validity of the policy generated by the SDP algorithm. Since the physical system never enters the nonsensical states, they are effectively ignored. However, the nonsensical states do create an unnecessary computational burden (in the form of unnecessary abstract states) and should be eliminated to the greatest extent possible. In this thesis, two constructs are introduced into the P-STRIPS language for this purpose: • assertions and constraints. A n assertion is a statement of valid combinations of state variable values that is applied to the core policy tree. In short, assertions coerce the core policy tree into representing only meaningful abstract states. To illustrate, recall the core policy tree shown in Figure 4.19 on page 122. Since the tree already contains various combinations of values from the OpStatus(Opj) family of state variables, it is possible to extend the branches of these variables with assertions before the policy improvement stage takes place. For example, the tree fragment in Figure 4.25 shows the addition of singleton OpStatus(Op2) = incomplete and OpStatus(Op3) = incomplete branches to the end of the  OpStatus(Opl) = incomplete branch. The singleton branches bind OpStatus(Op2) and OpStatus(Op3) to  129  known values and ensure that no inconsistent precondition can be added to the branch during the improvement phase. Thus, the Process(M3, 0p3, 01) action that caused the problem in Algorithm 4.24 would never be considered by the SDP algorithm because the action is not feasible from assertion-extended state Sj in Figure 4.25.  FIGURE 4.25  A tree-based representation of assertions: Assertions are added to the OpStatus(Opj) branches to restrict the growth of the policy tree to meaningful combinations of values.  Although assertions can be used to eliminate a large number of nonsensical branches, they can only be applied to state variables that exist in the core policy tree. Constraints, in contrast, are more general rules of the form: IF variable = value [AND | OR]  THEN variable [= | *] value [AND | OR]  When a branch is being added to the tree during the policy improvement stage, it is first checked against the constraint base. If the antecedent of the constraint is satisfied, then the consequents are enforced by the SDP algorithm. Although no constraints are required for the simple agents addressed in this research, they could play an important role in more complex agent formulations. For example, in an injection molding environment, P-STRIPS constraints could be used to express the operational  130  constraint that a light color cannot follow a dark color on a machine unless the machine is first cleaned. A n example of a policy tree generated by the prototype planning system is shown in Figure 4.26. Since the trees used in this research typically contain many thousands of nodes, only a small part of the tree is shown.  4.2.3  ROLLING PLANNING  HORIZONS  The number of values of the Time atom affects the number of contract variables and therefore has an exponential impact on the size of both the concrete and abstract state spaces. Since parts may spend a long time in a manufacturing system, a means of accounting for long planning horizons that does not trigger exponential growth in the size of the problem is required. The approach proposed here is based on the concept of a rolling planning horizon: 1. The agent plans over a finite horizon from Time = 1... t. 2. A special value, Time = long, is used to represent the interval from the end of Time = t to infinity. 3.  As the agent nears its planning horizon, it replans over the next t units of time.  By using rolling horizons, a very large infinite horizon planning problem is replaced by a sequence of smaller finite horizon planning problems. A similar approach is used by [Barto, Bradtke and Singh, 1995] in their real-time dynamic programming algorithm. For the finite horizon plan to be accurate, the value realized by the agent when moving from Time = / to Time = long must be a good approximation to the infinite horizon case. A t its simplest, the horizon estimate could be a single value. For example, if the long-run expected value to the agent of moving to a Time = long state was known to be V/, the value could be associated with the terminal state at the end of the Time = long branch. The value of this state would be propagated backwards in time by the SDP algorithm in the same manner as the terminal reward is propagated backwards in Figure 4.23 on page 127. A more realistic approach is to assume that the long-run expected value of the agent depends on various features of the agent's state at Time = t. Accordingly, the horizon estimates used by the SDP  131  JiRTree V i e w File Tree  0|:ciicc'obs;-j  | l:\THESIS\System\Data\Benchmark\Part03\01-06\Treeview.nrl'  1(177-1): [Root] IB- 2(178-1): [0pStatus(0p1 )=Complete] B 3(478-1): [OpSSa(us(Op1)-1 a  1 0(179-1): [OpStatus(Op2)=lncomplete] •  11 (255-1): [ O p S t a t u s ( O p 3 ) - l n c o m p l e t e ] &• 12(256-1): [ E l a p s e d T i m e ( O p 2 ) = 0 ]  assertions  EH3(43-1): [ElapsedTime(Op3)-0] B 16(16-2): [ S h i p p e d = N o ] li)..S4(11-4): [ N e w T i m e U n i t - Y e s ] P l n c r e m e n t C l o c k tk- 72(1 50-2): r N e w T i m e U n i t - l l±l 73(41-1): [ T i m e - L o n g ] P * E n d B-86(1 50-2): [ T i m e - T a 571 (525-2): [Time-6] B 565(1 50-2): [Time=1 i - 1 391(522-2): [Time=5] B 1425(150-2): [ f i m e = - ] S3 2708(519-2): [Time=4] B 2806(150-2): [ T i m e - T 1±1 5182(516-2): [Time-3]  $ 5183(44-2): [Contract(M1. 3 ) - Y e s ] a 5181(519-2): [HoldingCost-0.000023] Etl 5185(522-2): [Contract(M1. 4 ) - Y e s ] B 5286(522-2): [Contract(M1.4)=T ri 5287(525-2): [Contract(M1. 5 ) - Y e s ] . B 5317(525-2): [Contract(M1. 5)-*] EB 5318(254-2): [Contract(M1. 6 ) - Y e s ]  action preconditions  B 5330(254-6): [Contract(M1. 6)--] P * P r o c e s s i |M1. O p 1 , 03) l|] 5331 (520-7): [ E l a p s e d T i m s ( O p 1 ) = 3 ] B 141151(523-7): [Contract(M2.1)=Yes] B 111156(526-7): [Contract(M2. 5)=Yes]  policy node  $-141158(17-7): [Contract(M2. 6 ) - Y e s ]  outcome discriminants  a  111160(527-7): [ P r o d u c t V a l u e - 1 0 0 ] £ - 1 411 61 (524-7): [Contract(M3. 6 ) - Y e s ] 141163(0-93): [Contiact(M3. 5)=Yes] (90.00) [1 3 8 3 1 7 0 1 . 0 0 141164(0-99): [Contract(M3. 5)=*] (84.00) [ 1 3 8 3 1 8 ® 1 . 0 0 ] n-141162(524-7): [ C o n t r a c t u s . 6)=*] 141165(0-99): [Contract(M3. 5 ) - Y e s ] (84.00) [ 1 3 8 3 1 9 ®  00]  141166(0-99): [Contract(M3. 5)-"\ (78.00) |t38320@1.0, B 141159(527-7): [Contract(M2. 6)-"1 i  141167(47-7): [Contract(M3. 6)=Yes]  [i| 111168(17-7): [Contract(M3. S)-~] 111157(526-7): [Contract(M2. 5)=*]  value leaf  4 111155(523-7): [Contract(M2.1)-*] ri) 5333(251-6): [ElapsedTime(Op1)="1 El 5868(44-7): [Contract(M1. 3)-T P"Wait B-5550(1 50-2): [Time-"] H 9994(513-2): [Time=2]  destination node and probability  B-9995(44-2): [Contract(M1. 2)=Yes] B 9996(516-2): [ H o l d i n g C o s t » 0 . 0 0 0 0 2 3 ]  FIGURE 4.26  A screen-shot showing a portion of a policy tree: The policy nodes are indicated by the prefix "P*" whereas the values of the leaf nodes are shown in parentheses.  algorithm are implemented as reward trees rooted at end the Time = long branch. The branches of the reward tree can contain any state variables, although OpStatus(Opj) and ElapsedTime(Opj) are natural candidates. Generally, the larger the number of branches in the terminal reward tree rooted at the Time = long node, the better the approximation is to the infinite horizon case. However, as the size of  132  the reward tree approaches the size of the Time = t + 1 subtree of the infinite horizon planning problem, the advantage of the rolling horizon approach is eliminated.  BUILDING H O R I Z O N ESTIMATES  There are two ways of building the terminal reward treesforthe planning horizon: learning and backward induction. In the learning approach, historical information is used to identify the state variables that best discriminate between different horizon values. Although this is straightforward in practice, it assumes that large amounts of historical data are available and that outcomes in the past are representative of outcomes in the future. The backward induction approach exploits the fact that states in the past are unreachable. In other words, there is no way for an agent to make a state transition from the / + l  t h  term to the £ term so th  more distant planning problems can always be solved independently of their antecedents. It is therefor possible to solve a planning problem for some interval in the future and use the resulting tree as horizon estimates for the current planning interval. To illustrate; consider the sequence shown in Figure 4.27. Once the Term t + 1 problem is solved, a fully evaluated subtree exists for all abstract states in which Time  t +  j = 1. However, the start of Term t + 1 is also the end of Term t. Thus, the  Time + i = 1 values can be used as the Time = t | t  t  ong  values. This process can be repeatedformultiple  terms. Term t-1  :  :  Sequence of solution  FIGURE 4.27  The backward induction approach to generating horizon estimates: Since the start of the t+1  th  term can be used as the end of the t? term, all times greater than tj in the t+1 h  problem can be ignored once the problem is solved.  Although the basic backward induction approach described above provides the same outcomes as solving all three terms as a single large problem, it also retains much of the dimensionality of the sin-  133  th  \  physical state  Contract(M3, 09)  Contract(M3, 10)  value = 89  resource state  value = 80  A partial subtree for Time = 1. The resource the resource variables are sorted to the bottom.  FIGURE 4.28  gle large problem. For example, if a leaf value in a Time + i = 1 state is dependent on the value of a t  resource state variable, (e.g., Contract(M3, 19)) then the variable needs to be carried back into the Term t state space. Given the large number of combinations of resource state variables, this approach is generally not feasible.  COLLAPSING RESOURCE SUBTREES  To avoid the problem of carrying a large number of state variables backwards from term to term, certain parts of the T i m e into the Time = t| t  0ng  t + 1  = 1 subtree can be replaced with their expected values before transforming it  reward tree. To illustrate, consider the partial policy tree shown in Figure 4.28.  In this tree, only the subtree for Time = 1 is considered. In addition, all the nodes for the resource variables are sorted to the bottom of the tree.  The value realized by the agent depends on whether it owns contracts for processing on machine M3. For example, if the agent does not own Contract(M3, 09) but owns Contract(M3, 10), its expected value is $89.  134  The probability of owning a contract for a resource is a function of two things: the price of the resource and the agent's willingness to pay. Although it is impossible for an agent to know the final selling prices of resources  a priori,  the agent's willingness to pay is easily derived from the values in the  leaf nodes of its policy tree.  If the agent does not own Contract(M3, 10), it is willing to pay up to $89 - $80 = $9 to move from the state in which Contract(M3, 10) = no to the state in which Contract(M3, 10) = yes. If the agent purchases the contract, its expected utility is the value on the left side of the subtree ($89) less the price it pays for the contract. Clearly, if the price of the contract is greater than $9, it is not rational for the agent to participate in the transaction. In such a case, the agent remains in its original "resource state" and receives the value on the left side of the subtree ($80). Given this information, the expected value of being in the state marked Sj can be expressed as a function of the probability that the market price of the contract is less than the agent's valuation of the contract:  E[V(5 )] = (89 - E[Price(Contract(M3, 10)) | Price(Contract(M3,10)) < 9]) x 7  /'/-^(PriceCContractCMS, 10) < 9) + (80 x (1 - /'ro/b(Price(Contract(M3, 10) < 9)))  More generally, let s s  R  and  p A R E N T  represent a node for resource R that is to be collapsed. Furthermore, let  represent the branches emanating from the resource node corresponding to owning the  resource and not owning the resource respectively. Finally, let A be the difference in the expected values of the state in which the agent own the resource and does not own the resource. In other words, — V(S-JI) is the agent's reservation price for the resource. The expected value of the par-  A =  ent node is simply the value the agent expects to receive if it buys the contract multiplied by the probability is buys the contract plus the value the agent expects to receive if it does not buy the contract multiplied by the probability that it does not buy the contract: ( l ( PARENT^ E  v  s  = (V{s )-E[Price(R)\Price{R)<A\)xProb{Price((R))<A))) R  (V(s_ ))x(l-Prob(Price(R)<A)) iR  + E q : 4.12  A n important feature of the expression above is that the expected price of the contract is conditional on the maximum the agent is willing to pay: the expression E[Price(R)\Price(R)  < A ] is the  expected value of the price of the contract given that the price of the contract is less than A . Given  135  historical data on the selling prices of the resources, it is possible to estimate the probabilities in Equation 4.12. Starting from the bottom of the policy tree, each resource tree can be collapsed and  ;  replaced by a single value estimate until no resource nodes remain. The resulting'reward tree is approximate; however, it is greatly reduced in size relative to the combined physical tree and resource tree.  4.2.4  BIDDING STRATEGIES F O R AGENTS  In the discussion of the action representation language in Section 4.1.8, no mention is made of Buy or Sell actions for agents even though the agents are expected to interact within a market. Although such capabilities are important for the agents considered in this research, it is possible to make a distinction between the valuation and bidding phases of the agents' behavior [Mullen and Wellman, 1995], [Meuleau et al., 1998]. The planning and scheduling algorithms discussed to this point have been concerned with determining the agent's best policy given its cost and reward structure. The ownership of resources has been treated as a set of random variables over which the agent has no control. In this section, the policy trees generated by the S D P algorithm are used to determine the agents' willingness to pay for resources. In Section 4.3, the mechanisms by which the agents use their private valuations of resources to achieve an equilibrium outcome is described.  T H E RESOURCE STATE SPACE  When a policy tree is sorted so its resource nodes are at the bottom (given that the root node is defined as the top), a distinction can be made between the agent's physical state and its resource state. Physical state is determined by variables such as Time, OpStatus(Opj), ElapsedTime(Opj), and so on. In contrast, the agent's resource state is determined by the values of the variables corresponding to the ownership of resources—in this case, variables of the form Contract(Mj, Tj). The actions identified in the  P-STRIPS  language and reasoned about using the S D P algorithm in the agent's valuation phase  cause transitions in physical state. The Buy and Sell actions available to the agent in the bidding stage cause state transitions in the agent's resource state. To illustrate the relationship between the valuation and bidding phases, recall the partial policy tree shown in Figure 4.28 on page 134. After the SDP planning algorithm is applied to the policy tree, each leaf node in the tree contains the value that the agent expects to achieve by being in that particu-  136  lar abstract state and executing its optimal policy. In this formulation, resources, such as the Contract(Mj, Tj) variables, are not affected by any actions in the agent's repertoire and are therefore treated as uncontrollable random variables. However, by sorting the resource variables to the bottom of the tree, it is a relatively straightforward to determine the agent's reservation price for certain resources or combinations of resources. For example, Figure 4.29 shows a policy tree from the benchmark problem that has been sorted so that the resource state space is distinct from the physical state space. In the example from the previous section, the agent is willing to pay any amount less than $9 to move from a state defined by Contract(M3, 10) = no to one defined by Contract(M3, 10) = yes. Conversely, the agent's asking price for the resource can be determined by moving in the other direction. To willingly move from a leaf with a value of $89 to one with a value of $80, the agent must be given compensation of at least $9. Given valuation information of this form, it is possible to determine a rational bidding policy for the agent.  D E T E R M I N I N G RESERVATION PRICES  To illustrate the issues surrounding a rational bidding policy for an agent, the Edgeworth box notation introduced in Section 3.2.2 is revisited. For the two-agent, two-resource equilibrium analysis considered here, however, a number of modifications must be made to the conventional Edgeworth formulation. First, good X i is designated as a numeraire good, M, such that X j = M e 5 R . A numer+  aire good is a unit of exchange (such as money) that has value for all agents. Indeed, given the conditions on risk neutrality and global utility set out in Section 4.1.6, all agents are assumed to derive the exact same utility from each unit of M. The second good in the Edgeworth box economy considered in this section is a discrete, indivisible resource (such as a contract for a unit of processing time on a particular machine). The domain of x is therefore restricted to x 6 {0, 1} . In other words, the 2  2  agent either owns or does not own the resource. To provide specific numbers for constructing an Edgeworth box, the policy tree from Figure 4.28 on page 134 is used. For the other agents considered in this section, optimal policy trees of the same basic form are assumed to exist. Given that one agent owns the resource and the others do not, the  137  root of resource subtree  3 631030(254-2): [ElapsedTime(Opl)-*] £1-646267(47-2): [ElapsedTime(Op1)=0] El 648442(510-2): [ProdudValue=100]  i a 648444(522-2): [Contract(M1. 4)-Yes] i) 648445(525-2): [Contrart(M1.5)=Yes] i i) 648446(528-2): [Contract(M1. 6)=Yes] I j B 648447(531-2): [Contrad(M1. 7)-Yes] <' \ I a 648448(532-2): [Contrad(M1. 8)«Yes] 9 648449(529-2): [Contratf(M2, 8)=Yes] E 648450(513-7): [Contrad(M2. 7)=Yes] \ (±1-648451(526-7): [Contract(M1. 2)-Yes] i 648452(516-7): [Contrad(M2. 6)-Yes] B 648453(523-7): [Contract(M1. 3)=Yes] $-648454(533-7): [Contrad(M2.5)=Yes] the value of owning Contract(M3, 8) •|--648455(0-99): [Contract(M3. 8)=Yes] (68.00) given the agent owns the bundle of resources I L 648456(0-99): [Contrad(M3. 8)=T (62.00) shown in the antecedent branches is L 648457(0-99): [Contrart(M2. 5)=*1 (62.00) $68 - $62 = $6 648458(0-99): [Contrad(M1. 3)-T (62.00) B 648459(516-7): [Contract(M2. 6)="| 4 648460(523-7): [Contract(M1. 3)=Yes] [ 648461(0-99): [Contract(M2. 5)=Yes] (62.00) 648462(0-95). [Contract(M2.5)-*\ (56.00) 648463(0-99): [Contract(M1. 3)=*] (56.00) B 648464(516-7): [Contract(M1. Z)-"\ i) 648465(526-7): [Conlract(M1. 3)=Yes] - [ -648466(0-99): [Contract(M2. 6)-Yes] (62.00) L 648467(0-99): [Contract(M2, 6)=T (56.00) L 648468(0-99): [Conlract(M1. 3)="| (56.00) 6-648469(513-2): [Contrad(M2. 7)="] il 648470(526-2): [Contrart(M1, 2)-Yes] B 648471 (516-7): [Contract(M2. 6)=Yes] 5 648472(523-7): [Contract(M1. 3)-Yes] 1-648473(0-99): [Contrad(M2. 5>Yes] (62.00) 1- 648474(0-99): [Contrad(M2. 5)-"] (56.00) 648475(0-99): [Contract(M1. 3)="] (56.00) .8-648476(516-2): [Contrad(M2. 6)-"] #648477(523-2): [Contred(M1.3)-Yes] (-648478(0-99): [Contrad(M2. 5)= Yes] (56.00) L648479(0-99): [Contract(M2. 5)»"] (50.00) 648480(0-99): [Contract(M1. 3)-T (50.00) 6-648481(516-2): [Contrad(M1. 2)-"] 4 648482(526-2): [Contract(M1. 3)=Yes] 648483(0-99): [Contract(M2. 6)=Yes] (56.00) 648484(0-99): [Contrad(M2. 6)=*] (50.00) 648485(0-99): [Contrad(M1, 3)=*] (50.00) 6-648486(529-2): [Contrad(M2. 8)-"] il-648487(513-7): [Contrad(M2. 7)=Yes]  FIGURE 4.29  A policy tree sorted to enable extraction ofprice information: By sorting the resource nodes to the bottom of the tree, transitions in the resource state space resultingfrom buy and sell transactions can be isolated.  jents are identified by their potential roles within a single transaction as pure seller (S) and pure uyer (B):  Agent S currently owns the contract for machine M3 at Time = 10. Based on the information in its policy tree, Agent S is better off if it can sell Contract(M3,10) for any amount greater than $9. At a selling price of exactly $9, Agent S is indifferent between keeping the resource and selling it.  138  The axes for the selling agent are drawn so that they bound the indifference curve through agent's initial endowment, (x) . For reasons discussed in Section below, it is both possible and conves  nient to set the length of the M axis of the to the marginal cost of good Xj, as shown for the selling agent in Figure 4.30(a). O f course, since x is an indivisible good, values of x within the interior of 2  2  the Edgeworth box are infeasible. The true indifference "curves" for x reduce to pairs of points of 2  the form ( M j , 0) and ( 0 , 1) . In order to facilitate exposition, however, the conventional representation of continuous indifference curves is used.  Mk  Mi $11-c»-  B  $9 +  (a) potential seller of x  x  (b) potential buyer for x  2  2  FIGURE  4.30  X  2  2  Indifference curves and initial endowmentsfor the seller (a) and the buyer (b): The length of the M axes is assumed to be the indifference price (reservation price) of the resource goodfor each agent.  Agent B does not currently own Contract(M3, 10). However, according to its optimal policy tree, Agent B is willing to pay up to $11 to acquire the resource. As such, Agent B is indifferent between the status quo c o = ( 1 1 , 0 ) 5  and the allocation that results if  Agent B buys Contract(M3, 10) for $11: x - = ( 0 , 1) . The indifference curve for Agent B  B is shown in Figure 4.30(b).  The indifference curves for the two agents can be arranged to form an Edgeworth box, as shown in Figure 4.31(a). The box provide a convenient means of evaluating the rationality of transferring the resource good, x , between the two agents in exchange for the numeraire good, M. A n exchange is 2  considered rational if the outcome is at least as good as the initial endowment for both agents and  139  strictly better for one of the agents. Since this is identical to the definition of Pareto efficiency, any Pareto efficient exchange is considered rational for all participants, and vice-versa. To construct an Edgeworth box for a discrete good, the indifference curves for the two agents must intersect at the initial endowment CO. Since the resource good is indivisible, all feasible allocations must lie on either vertical edge of the box. In Figure 4.31, the range of values along the x  2  — 0 axis  and >g is known as the contract curve because any voluntary exchange that takes place  between  between the two agents must do so along the line.  l  x  contract curve  A  I  budget line at seller's indifference price  (a)  FIGURE 4.31  An Edgeworth box representation of different exchanges: Thefigurein (a) shows the contract curve and budget linefor the two agents. In (b) an allocation x' that is weakly preferred by the seller but strongly preferred by the buyer is shown. In (c), the indifference prices of the agents are reversed. The allocation x' is weakly preferred by Agent S but is inferior to CO from Agent B's perspective.  A unique equilibrium allocation, x , should occur at the intersection of the contract curve and the Walrasian budget line. Recall from Section that the slope of the budget line is determined by the equilibrium price of the two goods.  The problem in the economy considered here is that the  resource goods do not satisfy the assumptions on which the theory of competitive equilibrium is based. Specifically, the theory assumes that the goods are infinitely divisible and that the agents are price takers. However, a contract for a particular unit of time on a particular machine is both unique 12  In this economy, good M is a numeraire good. The price of a numeraire good is 1 by definition and thus the slope of the budget line is determined by the price of %2 in terms ofM.  140  and indivisible. As such, the assumption of a large number of sellers in not satisfied and competitive • forces cannot be relied on to produce an equilibrium price. To get around the problem of a non-competitive market and ensure the existence of a unique.equilibrium allocation, the following price convention is introduced: sellers always sell at their indifference price. Note  that for reasons discussed in Section A.l A A below, the price convention could specify any price along the contract curve. For example, the net surplus created by the exchange could be split equally between the two agents. However, the seller's indifference price has the virtue of being easy to calculate whereas a "fair" price that splits the surplus evenly would require additional information such as the reservation price of the buyer. Given the price convention, a unique budget line can be drawn through the points at which the seller's initial indifference curve intersects the two axes, as shown in Figure 4.31(a). The implications of the price convention on the equilibrium allocations of resources can be illustrated with the following scenarios:  In Figure 4.31(b), an allocation x' is shown that lies on the indifference curve for the selling agent. Although the budget line is omitted for clarity, the point x' corresponds to the indifference price for the seller ($9). The buyer's indifference curve through x' is to the left of its initial indifference curve and thus'the allocation x' is a Pareto improvement over the initial endowment co. It is therefore jointly rational for the agents to exchange Contract(M3, 10) for the seller's asking price of $9. In this exchange, the utility of Agent B is increased by the distance between its indifference curves along the xi axis: $11 - $9 = $2. Thus, whenever the buyer is willing to pay more than the seller's indifference price, a Pareto efficient transaction will occur in which the buyer captures all the surplus created by the transaction. The price convention used here ensures this outcome.  In Figure 4.31(c) the indifference prices for the agents are reversed. According to the price convention, Agent S is willing to sell trie-resource for $11 and Agent B is willing to buy the resource for any amount less than $9. As before, an allocation x' is placed on the seller's indifference curve where it intersects the x  2  = 0 axis. However, as the diagram  shows, the indifference curve for Agent B through allocation x' is inferior to its initial endowment. As such, it is weakly rational for Agent S to sell but it is not rational for Agent  141  B to. buy at the ask price. In such a case, no transaction occurs and the allocation remains at co.  The Edgeworth box analysis in this section allows a number of observations to be made about the bidding and selling behavior of agents:  •  '  •1. The indifference curves through the initial endowments of resources are provided for each agent by their respective optimal policy trees. Each agent knows precisely how much it is willing to pay (in terms of the numeraire good) for resources that it does not own and how much it is willing to accept in exchange for resources that it does own. In other words, the policy tree provides a preference ordering over resources (and bundles of resources) that is both transitive and complete. 2.  Since the indifference prices are known by the agents with certainty, they correspond to the private values case described in Section This has implications for the choice of auction form, as discussed in Section 4.3. Specifically, the English, second-price sealed-bid, and continuous double auction are functionally equivalent given the characteristics of the bidding agents.  3.  The price convention ensures that a stable equilibrium allocation exists for each resource good. As Figure 4.31(b) and (c) illustrate, once a good has been exchanged between two agents, the same good cannot be exchanged again without a change in indifference prices. In other words, all other things being unchanged, the prices in the market are guaranteed to rise monotonically. O f course, changes in an agent's physical state typically result in different valuations and a new equilibrium.  The point of the Edgeworth box analysis is to describe how  the equilibrium will attain in a given physical state. The importance of the distinction between the valuation and bidding phases is that the information generated by the agent during its valuation phase can be combined with a price convention to create stable, Pareto efficient bidding and selling policies. Thus, the operation of the market depends critically on the agents' ability to determine their private values for resources.  1 3  If there were no changes in the underlying physical state o f the world, the N e w York Stock Exchange would open, achieve equilibrium, and close its doors for good.  142  T H E R O L E OF T H E N U M E R A I R E G O O D  The Edgeworth box representation helps to illuminate the role of the numeraire good in an agent's bidding and selling policy. Note that the agent's absolute endowment of M never appears in the analysis. That is, the rationality of a particular transaction is assumed to be independent of the total wealth of either agent. Instead, rationality is determined at the margin: the cost of the transaction must be no higher than the benefit in order for the transaction to be considered rational. This approach is similar to the "marginal cost base contracting" approach described in [Sandholm, 1993]. Under these circumstances, there is nothing to be gained by giving each agent a constraining "budget" of good M since the optimization problem the agent solves include all the costs and rewards the agent faces. To illustrate, consider the case of an agent representing a part with a large terminal 14  reward and a tight deadline. Since the agent seeks to maximize its expected value net of all costs and rewards, it is clear that the agent should be willing to bid high (relative to jobs with lower terminal rewards) in order to complete processing before its deadline."There is no need to allocate a budget of M to the agent to reflect its priority over other agents since the priority structure is defined already by the agent's problem formulation. If an artificially-imposed budget constraint prevents an agent from bidding for a different allocation of goods X'  2  when (|) .(X'-) >- tyj(Xj), then the opportunity for a  Pareto improvement is lost. By eliminating of the issue of absolute wealth from the bidding phase, a number of important sources of complexity and possible market failure are also eliminated. First, an agent cannot use the revenue realized by the sale of one good to subsidize a Pareto inefficient exchange at some point in the future. The important message in Figure 4.31 (c) is that global wealth is destroyed by an inefficient transaction, regardless of how the transaction is funded. Similarly, agents are unable to speculate because their private values are determined during a planning process that is free of information about other agents. To illustrate, consider the following scenario:  T h e u s e o f a finite b u t s u f f i c i e n t l y l a r g e a l l o c a t i o n o f M t o e a c h a g e n t i s e q u i v a l e n t t o t h e p r o v i s i o n o f f r e e a n d u n l i m i t e d credit. F o re x a m p l e , i f t h e agent h a s a n o p p o r t u n i t y t o i n c r e a s e its utility b y s o m e a m o u n t a b y p u r c h a s i n g a c o n t r a c t f o r a r e s o u r c e , i t c a n " b o r r o w " a n y a m o u n t l e s s t h a n (X w i t h o u t i n c u r r i n g i n t e r e s t o r transaction costs. Since the net effect o f such a b o r r o w - a n d - b u y transaction is a n increase i n the agent's n e t utility, the t r a n s a c t i o n i s rational.  143  Agent I's process plan indicates that Machine M4 is unsuitable for any of the operations on the part that the agent represents. As such, the value of a unit of processing of time on M4 is worth exactly zero to Agent 1. However, if Agent 1 notices in passing that contracts for processing on M4 historicallytrade for $20 and that the current price is $5, then the agent might be tempted to buy at $5 in the hope of reselling the contract later for a speculator's profit. However, the agents in this system are not capable of this type of broader "rational" behavior. Agent I's reservation price for a contract on M4 is $0 and, according to the operationalization of rationality used here, the agent is unwilling to pay more than this amount.  PRICE DISCOVERY VERSUS T H E PRICE C O N V E N T I O N  The overall objective of this particular market-based system is to solve a global-level resource allocation problem. According to the formulation in Section 4.1.6 for cooperative problem solving environments, the objective function to be maximized is simply the sum of agent utilities and the actual final utilities achieved by the agents do not matter. Thus, as long as Pareto efficient exchanges take place and Pareto inefficient exchanges do not take place, the exact manner in which the total economic surplus is divided between the buyer and seller is irrelevant. To illustrate, recall the exchange in Figure 4.31(b). The transaction price is arbitrarily set to the seller's reservation price ($9) and therefore the surplus realized by Agent S as a result of the exchange is exactly zero. The surplus realized by Agent B is its reservation price less the transaction price: $11 $9 = $2. More importantly, the increase in global utility resulting from the exchange is $0 + $2 = $2. Note, however, that the exchange of good M is zero-sum and thus the increase in global utility is independent of the transaction price. The same increase in global utility could have been realized with any transaction price on the contract curve (i.e., between $9 and $11). It is on this basis that the use of the price convention is justified. Although it is possible to devise a more elaborate price discovery mechanism that determines a "fair" transaction price, there is little benefit to be gained from the additional computational cost and complexity.  4.3 AGGREGATION In this section, the auction mechanism used to implement the market is described in detail. As discussed previously, the bidding policies of the agents in this system are greatly simplified (relative to  144  real markets) by the existence of private values,.risk neutrality, and the ultimate irrelevance of the division of surplus between the buying and selling agents. However, the nature of the goods themselves—contracts for processing time on production machinery—introduces additional complexity in the form of dependence between resources. In the following sections, these sources of complexity are described in detail and an auction mechanism that addresses the problem is proposed.  4.3.1  SETTING T H E STAGE: A N E X A M P L E P R O B L E M  To illustrate the essential characteristics of the auction mechanism used in the thesis, a simplified version of the benchmark problem introduced in Section 2.5 is used. Instead of each of the three jobs requiring three operations on three different machines, each of the three jobs in the simplified version requires a single operation on a single machine. Thus, the problem is an instance of the 1 | | "EiWjCj class of scheduling problems. The duration of the lone operation, Cj, is known with certainty for each job j}  5  Once the job has  been processed by the machine M l , it leaves the system and receives an exogenously-determined terminal reward, rj. For each unit of time that the part is in the system, it incurs a per-unit-time holding cost of Wj. The specific values of these parameters for the example problem are summarized in Table 4.3. TABLE 4.3: Details of the example scheduling problem  Processing time (Cj)  Terminal reward (rj)  C O S t (Wj)  1  3  $100  $1  Job 2  5  $100  $1  Job 3  2  $100  $1  Job Job  Holding  Problems of the 1 | | Z^wyCj class are solvable in polynomial time using the weighted shortest processing time (WSPT) rule. O f course, when n = 3 jobs, it is a simple matter to find the optimal schedule by inspection, as shown in Figure 4.32. In this case, the equilibrium allocation of resources in the market should yield the same global outcome as the WSPT rule: $95 + $90 + $98 = $283. 1 5  The terms "job", "part" and "agent" are used interchangeably in this section since a job corresponds to the processing of a part and an agent is created to represent the job in the planning system.  145  J3  J3  J1  J1  J1  J2  J2  J2  J2  1  2  3  4  5  6  7  8  9 10 11 12  4.3.2  •  Job  Time in system  Revenue(net of holding costs)  Job 1  5  $95  Job 2  10  $90  Job 3  2  $98  Global expected value  FIGURE 4.32  J2  $283  The optimal solution to the 3-job, 1-machine example problem.  CHOICE OF AUCTION  FORM  The type of market in which agents in this system participate is a continuous reverse auction. In a reverse auction, the buyer submits a request to the auctioneer for a good. The auctioneer then collects ask prices from sellers and matches the buyer with'the low-cost seller. Thus, apart from the fact that auction participants are competing on the basis of lowest cost instead of highest price, the essentials of the auction setting are identical to those discussed in Section 3.2.3. In a continuous auction, there is no batching of bids or market call. The NASDAQ market is an example of a real-world market structured as a continuous double auction [Schwartz, 1988]. Unlike NASDAQ however, agents participating in the market described here have known private values. As such, each agent has no uncertainty about the maximum it is willing to pay for a particular resource. In addition, an agent in this system will experience no regret if it buys the resource for any amount up to and including its reservation price. There are two important implications of the private values case. First, since there is no risk of winner's curse, the dominant strategy for all bidders is to reveal their true reservation prices to the auctioneer [Milgrom, 1989]. The revelation of true reservation prices greatly simplifies the task of matching bidders and sellers. The second implication is that private values (and risk neutrality) lead 16  146  to equivalence between various forms of auctions. The advantage of a continuous auction in the context of manufacturing is that it permits jobs to join the auction at any time. If a call auction were used, the arrival of a new job following the closure of the auction would necessitate the start of a new auction. The disadvantage of the continuous auction is that each good might be bought and sold multiple times before the most efficient allocation is attained. However, as discussed in the following sections, auctions for resource allocation can place a massive computational burden on the auctioneer. Consequently, any means of distributing the burden across multiple auctioneers (e.g., market makers, individuals) is advantageous.  4.3.3  C O M P L E M E N T A R I E S A N D SUBSTITUTES  One condition for attainment of a Pareto optimal equilibrium that has not been addressed to this point is the condition of market completeness. A market is complete if every good in the market is traded at publicly quoted prices [Mas-Colell, Whinston and Green, 1995]. A market that is incomplete . will fail to achieve equilibrium and render the predictions of the first fundamental theorem of welfare economics invalid. In the context of scheduling, incompleteness is a pervasive source of market failure [Walsh et al., 1998]. The failure occurs because the dependencies that may exist between goods are not explicitly reflected in market prices. To illustrate the problem, consider the two forms of dependence below: substitutes and complements. •  Substitutes — Agent 1 requires a single unit of processing time (for notational convenience, assume that resources of the form Contract(Mj, Tj) can be denoted Tj). The amounts the agent  Since the auction participants in this situation are artificial and can therefore be programmed to reveal their true reservation prices regardless, the existence of a dominant strategy is interesting but unnecessary for implementation.  147  is willing to bid for the units of processing time—both individually and collectively—are shown below:  Resource  Maximum bid price  T1  $10  T2  $10  [Tl, T2]  $10  Since Agent 1 only requires one unit of time, and is indifferent whether it consumes T l or T2, the two resources are substitutes. Therefore, the price the agent is willing to pay for one good (say T2) depends on whether the agent already owns a substitute good (Tl). In this example, the agent would bid zero for T2 if it already owned T l (and vice-versa). Complements — Agent 1 requires two units of processing time and has a binding deadline dj = 2. If the part ships before the deadline, the agent receives a terminal reward rj — 10:  3  fbO  vi <  [ 0  otherwise  Cj  dj  The amounts the agent is willing to bid for units of processing time— both individually and collectively-—are shown below:  Resource  Maximum bid price  T1  $0  T2 ,  $0  [Tl, T2]  $10  Again, the value of one good is dependent on whether the agent owns the other. Resources T l and T2 are said to be  complements  in this context since the agent is willing to pay more for  both that it is for either one individually. The dependence relationships between goods can also be expressed using a compact tree notation identical to that used throughout Section 4.2. The amount the agent is willing to pay (or must receive  148  Legend Resource Value = 10  Value = 0  Value =10  Value = 0  (a) substitutes FIGURE 4.33  Value = 10  owns contract  /  \ does not own contract  Value = 0  (b) complements  Free-based representations of dependence between goods: In (a), the items are perfect substitutes. In (b), the items are perfect complements.  as compensation) to move from one allocation to another is the difference between the target leaf node and the starting leaf node. For example, in the substitute case shown in Figure 4.33(a), the agent is willing to pay $10 - $0 = $10 to move from the allocation R = {} to either R' = { T l } or R' = {T2}.  In a market in which the dependence relationships between goods is not priced, the market is incomplete and may fail to attain an equilibrium. To illustrate, consider the simple market scenario used by McAfee and McMillan [1996] to illustrate market failure:  Agent 1 requires two units of processing time and has a binding deadline of two time units in order to receive its terminal reward of $3. If the job is completed after the deadline, Agent 1 receives no terminal reward. In such a situation, Agent 1 regards T l and T2 as complements—one is worthless without the other. Agent 2, in contrast, faces the same deadline but only requires a single unit of processing time. In the absence of a holding cost, Agent 2 regards T l and T2 as perfect substitutes. The details of the scenario are summarized in Table 4.4.  TABLE 4.4: A problem for which no equilibrium price exists. Agent  Processing time (Cj)  Deadline (dj)  Terminal reward (rj)  Agent 1  2  2  $3  Agent 2  1  2  $2  149  Since Agent 2's terminal reward is only $2, the optimal global outcome is to assign both units of pro cessing time to Agent 1. In a market, however, processing time is not "assigned" and thus the agents themselves must converge on the optimal outcome via their bidding behavior. If markets are made for T l and T2 as distinct goods, the following problem occurs:  Assume Agent 1 and Agent 2 bid competitively for T l . Agent 2 is willing to pay up to $2 and thus Agent 1 will have to bid $2 to secure the resource. When the agents bid for T2, . the situation is the same except that Agent 1 has already paid $2 for T l . Given that its  .  terminal reward for being processed is $3, it is only willing to bid up to $1 to secure the second unit of processing time. However, Agent 2 is willing to bid up to $2 and therefore wins the auction for T2.  To avoid this problem, Agent 1 would have to be able to bid on T l and T2 joindy in an all-or-nothing deal. The issues surrounding this combinational approach are described in the following section.  4.3.4  T H ECHALLENGES OF COMBINATIONAL AUCTIONS  A combinational auction  is an auction in which bidders may choose to bid on aggregations or bun-  dles of goods. As Rothkopf, Pekec and Harstad [1998] point out, there has been relatively litde scholarly work in this area of combinational auctions due primarily to the belief that the computational disadvantages of such auctions outweigh their benefits. However, the recent auctions for radio spectrum in the United States and other countries have increased interest in the use of combinational auctions in situations in which there is significant complementarity or substitutability between goods being offered for sale. Interestingly, the F C C declined to permit combinational bids in its spectrum auctions because of concerns about the computational feasibility of considering every possible aggregation of licenses [McAfee and McMillan, 1996], [Rothkopf, Pekec and Harstad, 1998]. Given that there is a maximum of 2"-l unique aggregations that must be quoted on the market in a combinational auction of n goods, it is clear that feasibility of combinational designs is a legitimate concern. Indeed, the problem of winner determination for combinational auctions is known to be NP-complete and it has been shown that no polynomial time algorithm can be constructed for 1 7  Some authors use the term combinational auction (e.g., [Rothkopf, Pekec and Harstad, 1998], [McAfee and McMillan, 1996]) whereas others use the equivalent term combinatorial auction (e.g., [Sandholm and Suri, 2000], [Rassenti, Smith and Bulfin, 1982]).  150  achieving an allocation that is guaranteed to be at least as good as a specified lower bound [Sandholm, 1999]. In the case of the F C C spectrum auctions, a system of parallel, multi-round auctions were used in which bidders could observe whether they were likely to win a particular good and adjust their bids for other goods accordingly [McAfee and McMillan, 1996]. However, such auctions require bidders to anticipate the behavior of other agents and may lead to allocations that are economically inefficient. In addition to the spectre of coping with an unmanageably large number of aggregations, continuous combinational auctions create several complexity issues that do not occur in single-round single-item auctions. In the following sections, each of these complexity issues is described in detail and, where appropriate, general strategies for addressing the issues are suggested.  CONSOLIDATION OF SUPPLY  Consolidation of supply must occur whenever an agent makes a bid for an aggregation that is not currendy quoted on the market. For example, if Agent 1 bids for [Tl, T2] but the current allocation has T l owned by Agent 2 and T2 owned by Agent 3 then the market must consolidate ask prices from the two potential sellers in order to provide the bidding agent with a single ask price. Although consolidation of supply creates additional overhead for the auctioneer, the task itself consists of straightforward search and summation operations.  CONSOLIDATION OF D E M A N D  Consider the partial resource tree for Agent 3 shown in Figure 4.34 and assume that the agent owns T l and T2. Since Agent 3 faces a hard deadline, it treats [Tl, T2] as pure complements (i.e., its ask price for T2 is identical to that of [Tl, T2]). Although the tree-based representation of resource ownership generated by the SDP algorithm provides a parsimonious means of expressing interdependence relationships (complements and substitutes) between resources goods, the trees also create a subtie completeness anomaly in the agents' preference orderings for resources. Specifically, the tree does not reveal the agent's ask price for T l in isolation. To illustrate the difficulties created by the anomaly, assume two agents, Agent 1 and Agent 2, can bid for the resources owned by Agent 3 at the prices shown in Figure 4.35. Although the optimal alloca-  151  tion of resources is clearly R - ^ l T l } , R ={T2}, and R3={}, neither bidder can unilaterally afford the 2  $98 to initiate the transaction. This situation is known by different names in the combinational auc-. tion literature: Branco [1997] refers to it as the "free rider" problem; Rothkopf, Pekec and Harstad, [1998] refer to it as the "threshold problem". Regardless of nomenclature, the issue is essentially the inverse of supply consolidation—hence the introduction of another name: the demand consolidation problem. The demand consolidation problem arises whenever a Pareto efficient exchange requires an existing aggregation of goods with a known price to be split into individual goods or sub-aggregations for which prices are riot known.  Value = 98 FIGURE  4.34  Value = 0  The resource treefor Agent 3 depicts an all-or-nothing situation: The two resources T l and Tl are complementary and a deadline eliminates all substitutes.  Ask prices  Bid prices  Resource  Owner  Price  Resource  Bidder  Price  T2  Agent 3  $98  T1  Agent 1  $70  [Tl, T2]  Agent 3  $98  T2  Agent 2  $70  FIGURE  4.35  A pricing scenario in which a Pareto efficient exchange requires consolidation of demand.  One means of addressing the demand consolidation problem is to introduce the concept of residual supply. Consider the bids put forward by the potential buying agents and the sequence in which they may be considered by Agent 3. If Agent 2 makes a bid for T2 in isolation, no transaction can occur because Agent 2's maximum bid of $70 is well below Agent 3's ask price of $98. However, if Agent 1  152  makes a bid for T l , the situation is different. Since Agent 3 does not have an ask price for T l in isolation, it must provide the ask price for the smallest aggregation of resources that contains T l (in this case, [Tl, T2]). Since Agent 1 has not bid for T2, the "residual" resource can be sold to another agent (Agent 2 for example) and the revenue resulting from the sale can be factored into the overall transaction. Consolidating supply in this way enables surplus-creating «-way transactions between agents. The question of exactly how much each of the buying agents contributes to the $98 selling price of the bundle is an open question. However, in a cooperative problem solving environment, a simple scheme in which Agent 1 pays its reservation price ($70) for T l and Agent 2 pays any amount greater than or equal to $28 for T2 is sufficient for a Pareto efficient outcome.  C O M P O U N D BIDS  A n agent's objective in bidding is to move to a leaf in the resource portion of its policy tree with a higher value. However, in a continuous or multi-round auction, each auction participant normally has an existing allocation of resources when it submits bid prices to the auctioneer. The agent's ability to pay for one set of resources is therefore contingent on its ability to recoup the investment it has already made in substitute resources. To illustrate, consider the sub-optimal schedule faced by Agent 3 in Figure 4.36. Based on the SPT sequencing rule, it is clear that global utility is maximized when the shorter job (J3) is processed first. Figure 4.37 shows that the existing allocation of resources to Agent 3 is R = {T4, T5}and the 3  expected utility associated with the allocation is $95. By moving to the optimal allocation, R' = 3  {Tl, 72}, Agent 3 would avoid three units of holding cost and thereby realize a value of $98. As a consequence, Agent 3 is willing to bid any amount up to $3 to move from R to R' . However, since 3  3  [Tl, T2] and [T4, T5] are substitutes, the agent could dispose of [T4, T5] once in its new state with no corresponding change in utility. As a consequence, $3 understates Agent 3's willingness to pay for [T1,T2].  As shown by the arrows in Figure 4.37, Agent 3's transition from the node marked R to the node R' 3  3  can be viewed as a compound bid consisting of a bid and a linked ask. A linked ask permits the agent to divest itself of resources made redundant by the bid half of the transaction. The ask is "linked" to the bid in the sense that both the buying and the selling parts of the transaction are required to  153  JI  1  FIGURE  FIGURE  J  1  2  JI  *J3  3  4  "J3:  1  5  6  7  8  910  Job  Time in system  Holding  Job 1  3  $1  $97  Job 3  5  $1  $95  COSt (Wj)  Revenue (net of holding costs)  4.36 A sub-optimal2-job, 1-machine schedule.  4.37 A tree-base representation of a compound transaction: A linked ask and a bid are required to move from state R to state R' via state R'3. The net surplus generated by the transaction is A ^ _^ ^ = $3. 3  R  3  R  achieve the agent's intended outcome and therefore the feasibility both parts are evaluated jointiy. Returning to the example, the question facing Agent 3 is how much is it willing to pay to move to node R' ? Its reservation price for the bundle [Tl, T2], denoted Price ([Tl, 3  Res  plus gained by the agent in moving to the new allocation—in this case,  154  Tl]), is the gain in sur-  A  R j  _^ > = R  — ^(-SRJ) — p l  u s a n  y revenue realized by selling the less desirable substitute  bundle [T4, T5] on the open market: Price ([T\, Res  T2]) = A  R j  R j  , + Price ([T4, mt  T5])  E q : 4.13  In the limiting case in which the agent receives its asking price for its linked ask resources (i.e., PriceMktQJ^,  T5]) = Pn'ce^ r.([T4, T5] = $95), Agent 3's willingness to pay for the bundle [Tl, T2] jy  attains its maximum value: $3 + $95 = $98. Thus, by recognizing the issue of existing allocations and substitutability, Agent 3's willingness to pay increases dramatically. This increase is important since a Pareto efficient transaction can occur at a bid of $98 (Agent I's ask price for [Tl, T2] is $97) whereas the same transaction cannot occur i f Agent 3 is only willing to bid $3. In short, ignoring the possibility of agents divesting their redundant resources can create a source of market failure. The compound bid construct is similar to Sandholm's [1998] S-contract (swap contract). In an S-contract, an agent is permitted to exchange a single good with another agent. Compound bids are more general since they permit the simultaneous exchange of multiple goods (including the numeraire good 1 Pi  M) and may involve more than two agents.  However, Sandholm shows that some form of swap  mechanism—even in their simplest form as S-contracts—are necessary for attaining the optimal allocation.  N E X T - B E S T STATES  The "next-best" state issue is essentially the seller's version of the compound bid issue described above. Specifically, the price at which an agent is willing to sell a resource is dependent on two things: the agent's ability to buy substitute resources and the difference in utility between the original state and the state in which the agent owns the substitute resources. Assume, for example, that Agent 3 owns T l and T2 and is therefore in the position to sell the resources. Based on the information in Figure 4.37 on page 154, the quoted ask price for the bundle [Tl, T2] is $98. In other words, Agent 3 is willing to relinquish ownership of [Tl, T2] for any amount greater than $98 and is indifferent when the bid price is exactly $98. However, the quoted ask price  1 8  Since linked ask resources can be purchased by one or more agents (through consolidation of demand), the compound bid construct also subsumes Sandholm's [1998] M-contract (multiagent contract) type.  155  ignores the possibility of Agent 3 selling one set of resources and subsequendy being able to buy aclose substitute. If a close substitute is available, the actual ask price for the original resources is -merely the seller's net change in utility caused by the compound transaction plus the cost of the resources to get to the next-best state. To illustrate, assume that Agent 3 sells T l and T2 and moves to R'3 = {}. In addition, assume that it can move to R" = {T4, T5} for $10. The change in utility associated with the compound transaction 3  is $95 - $98 = -$3. If the cost of the substitute resources is added to this loss, then it is clear that it is rational for Agent 3 to accept any amount greater than $13 in exchange for [Tl, T2] given that is has the opportunity to buy [T4, T5] for $10. If Agent 3's next-best state is not taken into account, then exchanges based on bids between $98 and $13 would not occur even though such exchanges would be Pareto efficient. Despite the fact that a seller choosing a next-best state is similar to a buyer choosing a compound bid, the seller's case is considerably more complex than the buyer's case considered in Section In the buyer's case, the ultimate target state of the compound bid is known. As such, there is a unique pair of buy and sell transactions that moves the agent from its current state to the target state. In the seller's case, however, the target state is unspecified and many candidate next-best states may have to be considered. The heuristics used to address this problem, and the rationale behind the heuristics, are presented in Section  DEADLOCK  Deadlock occurs when the market for a particular good is thin and external demand cannot be relied on to complete compound transactions. In such cases, agents within the transaction must play dual roles as buyers and sellers simultaneously. Returning to the allocations shown in Figure 4.36 on page 154, recall that the net surplus available to Agent 3 in moving from R = {T4, T5} to R' = {Tl, T2} 3  3  is only $3. In addition, recall that if Agent 3 can obtain its asking price for its existing allocation, {T4, T5}, then a Pareto efficient exchange can occur. In the example in Section, an unspecified agent in the market was willing to pay Agent 3's reservation price for its linked ask. However, in the two-agent case considered here, no external demand exists for T4 or T5. Although it is true that Agent 1 will be in a position to bid on [T4, T5] once it has sold [Tl, T2], this demand does not arise  156  until after the transaction is complete. In its current resource state, Rj = {Tl, T2}, Agent 1 is willing to pay exactly zero for [T4, T5]. As a consequence, the only way the transaction can occur is if one of the participant bids 'as, if the transaction has already occurred. The inability of agents to look forward to compete a Pareto efficient exchange is defined here as deadlock. Interestingly, the same type of difficulty arises in real markets. For example, it is rare that an existing homeowner can buy a new home without knowing for certain the selling price of her existing home. Because of this, a tradition of provisional contracts (e.g., "conditional offers") has evolved. The conditional offer concept can be implemented in artificial markets in the form of a two-phase protocol: In the first phase, each participant in the exchange assumes that the provisional transaction has occurred and responds to requests for price information accordingly. In the second phase, the initiator of the transaction evaluates the transaction's overall desirability and decides whether to commit or roll back all parts of the transaction. In the situation in Figure 4.36, the bid for [Tl, T2] by Agent 3 causes Agent 1 to make a provisional transition from Rj = {Tl, T2, T3} to R'j = {T3}. The provisional transition permits Agent 1 to recognize its demand for Agent 3's linked ask resource. Assuming that Agent 1 pays Agent 3 some amount P in exchange for [T4, T5], Agent 1 makes a second provisional transition to the state IV\ — {T3, T4, T5}. Given the cost of moving to this next-best state, Agent I's asking price for [Tl, T2] becomes A  R |  _^  R  » = $97 - $95 = $2 plus the cost of the substitute resources P.  O n the other side of the transaction, the amount Agent 3 is willing to pay for [Tl, T2] is A ^ _^ R  / =  R  =$98 - $95 = $3 plus the revenue it receives from the sales of its linked ask, P. The overall transaction is desirable if the following inequality is satisfied: V - R 3 '  +  i > > A  R,-»-R,-  +  P  E  c  l  :  4  -  1  4  Since P appears on both sides of the inequality, its actual value is irrelevant and the feasibility of the two-phase transaction reduces to the marginal change in utility for each agent caused by the transaction. In this example, Agent 1 is worst off by $2 after the transaction but Agent 3 is better off by $3 so the transaction should occur regardless of the value of P. In this system, the transaction price for  157  the linked ask is assumed to be the selling agent's ask price; however, this is merely a convention adopted for consistency with the price convention introduced in Section 4.2.4.  STABILITY OF EQUILIBRIA  As its name implies, equilibrium does not change without some form of exogenous disturbance such as the entrance of a new market participant, the arrival of new information, or a change in the underlying physical state of the system. In manufacturing environments, such disturbances are common and may cause agents to revalue their current resource allocations. To illustrate, recall the agent level T  policy tree shown in Figure 4.28 on page 134. Each change in the agent's physical state results inj a transition to a new resource subtree. If the values on the leaf nodes on the "new" resource tree are different from those on the "old" tree, the agent will need to buy or sell resources to restore equilibrium. In a dynamic manufacturing environment, agents will be constantly buying and selling resources, much like in a real continuous double auction such as the NASDAQ market for equities. In the artificial case of the simplified benchmark problem (in which the outcomes of actions in the physical world are deterministic), the values on the policy trees do not change from physical state to physical state. Consequently, the equilibrium reached in the first round of bidding does not change unless new agents enter the market as buyers or sellers. In the non-deterministic case in which agents are unsure of the outcomes of their actions, agents need to continuously revalue their resource allocations based on actual (rather than expected) outcomes. For example, an agent that expects to require no more than two units of processing time for an operation is willing to pay very little for a third unit of time. However, if the agent unexpectedly encounters a machine breakdown midway through an operation, it would reevaluate its willingness to pay for additional units of processing time. As decision-theoretic planners, the agents in this system have some look-ahead capability. Consequently, if the  expected cost  of a machine breakdown (i.e., the cost of the breakdown X the probability of the  breakdown occurring) exceeds the cost of an otherwise redundant resource, then the agent—being rational—ensures that the backup resource is in place. Indeed, the better the agent's information about probabilistic events, the more stable the initial equilibrium.  158  4.3.5  CHOICE OFMARKET  PROTOCOL  There are three special features of the production resource market considered in this research that differ from the general case of combinational auctions: 1. Sparseness of bids — The tree-based policies generated by the agents during their planning stage include only the aggregations of resources that they deem to be relevant. For example, if an agent knows with certainty that it requires a maximum of three units of processing time, it will have no reason to bid on bundle containing more than three units of time. 2.  Price convention— The reservation price for each aggregation is known to the agent and is truthfully revealed to the auctioneer under the terms of the price convention (recall Section 4.2.4). Thus, rather than seeking to maximize its surplus on a particular sale, an agent will sell a resource contract for an amount exacdy equal to its indifference price for the resource. In this way, a Pareto efficient exchange can occur without an exhaustive search over all agents for the best bid price.  3.  Substitutability of resource goods— Because of the time-dependent formulation of resource.goods, many substitutes for a particular good are typically available. For example, an agent that has a non-zero reservation price for Contract(Ml, T4) will almost certainly have a non-zero reservation price for a later contract on the same machine, say Contract(Ml, T5). O f course, each resource may be valued differendy by the agent—that is, the contracts are unlikely to be perfect substitutes. However, unless there is a massive discontinuity in the expected payoff resulting from a delay of one unit of time, the substitution cost in accepting Contract(Ml, T5) in exchange for Contract(Ml, T4) will be relatively small. For example, it may simply be equal to the incremental increase in holding cost.  A number of complete algorithms exist for allocating bundles of goods to agents. For example, [Parkes, 2000] has implemented a number of complete and complete algorithms for multi-round combinational auctions. Sandholm ([Sandholm, 1999], [Sandholm and Suri, 2000]) has implemented a number of provably optimal algorithms for combinational auctions that exploit sparseness of bids and advanced search techniques to solve "reasonably" large problems (e.g., 200 goods) in an "acceptable" amount of time. Although the underlying allocation problem is intractable and thus the scalability of any optimal algorithm is limited, the value of the solutions generated by Sandholm's algorithms  159  increase monotonically with the amount of computation. As a consequence, the algorithms can be used in anytime mode. Unlike the general case of combination auctions addressed by Sandholm's algorithms however, the auction protocol developed in this thesis exploits the two features unique to this particular market: the price convention and the high degree of substitutability between resource goods. Although the resulting allocation algorithm is not complete (that is, it is not guaranteed to find the optimal allocation), it does possess the following desirable characteristics: 1. Distributed — The data and computation required to match buyers and sellers is delegated to the individual agents. Equilibrium is reached through an iterative process in which individual agents initiate bidding and continue to bid until no utility-increasing exchanges can be made at current market prices. The auctioneer (or auctioneers since more than one may be used) fulfills a simple brokerage role: it keeps a list of resource ownership and uses this information to redirect requests for price information to the appropriate agent or agents. There is no market call so equilibrium prices emerge only after a number of bidding rounds in which agents are free to buy resources or resell resources bought in previous rounds. Prices are 1Q  computed "just in time" and are disseminated as required. 2. Monotonic — Since all transactions that occur in the market are Pareto efficient, the value of the global objective function is guaranteed to rise monotonically. 3. Polynomial — The maximum number of bids required to attain equilibrium (which may be a local maximum) is a polynomial function of the number of agents and the number of bids in the agents' bid lists (see Section 5.2.2 for the complexity analysis). Note, however, that the agents' bid lists are extracted from their final policy trees and are therefore exponential in the dimensionality of the agent-level problems. Rapid computation of allocations is important in the manufacturing environment because a new equilibrium must be computed after each change in the physical state of the system.  This approach relies on the fact that the agents are networked entities and that the cost o f inter-agent communication is negligible.  160  1. select candidate bid  resources  FIGURE 4.38  5. determine next-best state  The critical elements of the proposed market protocol.  E L E M E N T S OF T H E PROTOCOL  The basic operation of the market protocol is shown graphically in Figure 4.38. In order to better understand the flows between the agents and the auctioneer, several critical elements are defined below: 1.  Candidate bid: Each agent seeks to maximize its utility given a set of prices for resources. In most cases, an agent j with an allocation of resources Ry will be able to identify an alternative allocation of resources, Ry, with a higher utility (the exception is when the agent already owns its most preferred allocation of resources). The candidate bid list'is an ordered list of all the target states with higher absolute utility values than the agent's current state. If the agent already owns resources, candidate bids may involve both bid and linked ask components, as discussed in Section  2.  Request for quotation: The term "bid" implies that the agent submits a list of desired resources and a bid price to the auctioneer. The auctioneer is then expected to use the bid price to determine whether the bid can be filled at current market prices. In this protocol, the submitting agent is responsible for making the ultimate decision whether a bid is feasible. A l l that is submitted to the auctioneer is a list of desired resources analogous to a requestfor quota-  161  tion (RFQ) in real markets. The auctioneer is expect to respond to each R F Q with a market price. The choice of whether to commit to the transaction is made by the bidding agent based on its own reservation price for the bid resources and the market price of the resources. 3. Seller-specific RFQs (Steps 3 and 4 in Figure 4.38): When an.agent submits a R F Q , it has no knowledge of which agents own the resources in the request. However, the auctioneer does have this information and partitions the original R F Q into multiple seller-specific RFQs. The process of creating seller-specific RFQs is equivalent to the consolidation of supply role described in Section 4. Next-best state and substitute resources: Before responding to a R F Q with an ask price, a selling agent must determine its next-best state. The resources that permit the agent to attain its next-best state are called substitute resources. In this protocol, selling agents do not perform an exhaustive search for the most cost-effective next-best state. Instead, a constraint and two heuristics are used to determine a suitable set of substitute resources: a.  Out-of-bounds resources (constraint): The substitute resources selected by the selling agent cannot be members of the out-of-bounds list (see below).  b.  Priority resources (heuristic): If possible, the substitute resources are selected from the priority list of resources (see below).  c.  Minimal change (optional heuristic): If the resources in the priority list are insufficient to provide a next-best state, the agent buys additional resources on the open market such that its next-best state has a similar expected value to the current (pre-RFQ) state. The purpose of this heuristic is to minimize the impact of the substitute transaction on the cost of the overall transaction.  5. Out-of-bounds (OOB) Ust: When an agent becomes involved in a provisional transaction by submitting a candidate bid or by responding to an R F Q , it makes certain price information available to other agents. Since the other agents use this information to make their decisions, it is important that all participants refrain from becoming involved in other buying and selling activity until the existing transaction is completed. The O O B list contains all the resources that are currently "in play" explicitly as well as all the other resources owned by agents participating in the transaction. As other agents are brought into the transaction, the O O B list grows accordingly.  162  6. Priority list: The priority list is essentially the opposite of the O O B list. When a selling agent is determining its next-best states, it starts by considering resources in the priority lists. A resource is added to the priority list if it matches one of three criteria: a.  Unowned: Unowned resources have an ask price of zero. As such, the addition of "free" resources to the priority list biases the selection of next-best states towards slack resources.  b.  • Linked ask: As discussed in Section, the feasibility of a candidate bid often depends on the existence of sufficient demand for the initiating agent's linked ask. Adding linked ask resources to the priority list helps generate this demand and eliminates the problem of deadlock.  c.  Residual supply: In the discussion of consolidation of demand in Section, it was shown how demand for residual supply could enable Pareto efficient transactions that wouldmot otherwise occur. Adding residual supply resources to the priority list increases the probability that they will be used as substitute resources elsewhere in the transaction.  7. Firm ask price: There are three situations in which determination of the next-best state is not required in order to respond to a R F Q : a.  Unowned: The auctioneer does not create seller-specific RFQs for unowned resources. The ask price for such resources is taken to be zero.  b.  Linked ask: Since the owner of the resources in a linked ask is the initiator of the transaction, the price for the resources can be considered firm. In this protocol, the transaction price for the linked ask is taken to be the ask price.  c.  Residual supply: The existence of residual supply implies that the owner has already executed a provisional transaction in response to a previous RFQ. As such, residual supply resources are similar to unowned resources. Since the price of the resources exchanged in this way appear on both sides of the feasibility inequality (see below), the actual transaction price is irrelevant. The convention in this protocol is that price of residual supply is zero.  163  8. Feasibility inequality: When determining whether it should commit or roll back a particular ' candidate bid, an agent uses the feasibility inequality: R^W  A  Mkl( BlD)-  >Price  X  ( LYNKEDASiO  Eq: 4.15  Revenue X  A candidate bid is considered feasible when the increase in surplus created by the transaction ( A _^ R  R  /)  is greater than the market price of the bid resources  ing from the sale of the linked ask  (^BID)  ^  e s s a n  Y revenue accru-  ( ^ L I N K E D ASK)-  9. Transaction list: The transaction list is simply a list of the agents and provisional transactions required to determine the market price of a particular candidate bid. When the feasibility of the candidate bid is determined, the transaction list is used to commit or rollback the provisional changes made by the participating agents.  \  An, important computational consideration which is apparent given the description above is that determination of the market price of a candidate bid is a recursive process. Each R F Q may require the receiving agent to select a next-best state and spawn a follow-on RFQ. The recursion stops when the R F Q contains resources with firm ask prices. In the worst case, the submission of an R F Q would result in a new R F Q being spawned for each agent participating in the market. In practice however, the bias towards resources on the priority list causes the system to converge to equilibrium with a minimal amount of recursion.  ILLUSTRATION  In this section, the simplified benchmark problem introduced in Section 4.3.1 is used to illustrate the functioning of the market protocol. As shown graphically in Figure 4.39, the initial allocation of resources across all three jobs is assumed to be Rj = {T6, T7, T8}, R = {Tl, T2, T3, T4, T5} and 2  R = {}. This allocation is clearly suboptimal and Agent 3 is assumed to join the market after the 3  allocation process has already commenced. In the following sections, each numbered step in Figure 4.38 is described in general terms and illustrated through application to the simplified benchmark problem.  164  FIGURE 4.39  J2  J2~ •J2,. J2 * J2  J1  J1  J1  1  2  6  7  8  :  3  4  5  9  10 11  12  •  Job  Time in system  Revenue (net of holding costs)  Job 1  8  $92  Job 2  5  $95  Job 3  12  -$12  The initial (sub-optimal) resource allocation for the example problem: Although Job 3 has entered the system, it has notpurchased resources and is incurring holding cost over the decision horizon of 12timeunits.  Step 1: Select candidate bid A candidate bid represents a transaction in resource space that will move the agent to a higher valued OA  state.  According to the feasibility criterion set out in Equation 4.15, the agent must determine  whether the market price of the bid resources is lower than the sum of the surplus the agent obtains from the transition to the target state plus the revenue that accrues from the sale of the linked ask (if one exists). Since a feasible bid implies that the buyer values the resources more than the seller (or sellers), buyer-initiated transactions are sufficient to move the market towards a Pareto optimal allocation of resources. A n agent in a given state typically has a number of alternative candidate bids. To determine the sequence in which candidate bids are submitted to the auctioneer, the bids are sorted in decreasing order of value. The agent starts with the first candidate bid and stops when either a feasible bid is found or when the end of the list is encountered. In this sense, the selection rule for candidate bids is greedy—an agent always starts by attempting to move to its highest-valued state. When all agents exhaust their list without finding a feasible candidate bid, the market has achieved equilibrium. 20  Note that since all resources are assumed to be "goods", it is impossible to move to a higher-valued state by disposing o f resources. Although it is possible for certain resources (such as solvents) to become "bads" after a certain amount of use, such cases are not considered.  165  Agent 3 (which represents Job 3 in Figure 4.39) is assumed to bid first. The table below shows the agent's candidate bids sorted in decreasing order, in general, only target states with values larger than the current state are considered. In this case, Agent 3 would start by creating a candidate bid for [ T l , T2].  Bid resources  Utility of target state  Linked ask resources  Linked ask price  Surplus (A)  [ T l , T2]  $98  {}  N/A  $98  [ T l , T3]  $97  {}  N/A  $97  [T2, T3]  $97  {}  N/A  $97  [ T l , T4]  $96  {}  N/A  $96  [T2, T4]  $96  {}  N/A  $96  L_[TVT4]  $96  {}  $96  ^  —-—  y  [T10, T 1 2 ] ^ — [ T i l , T12]  $88"  ^-"-"NTA  $88  {}  N/A  $88  Step 2: Submit request for quotation (RFQ) The R F Q submitted to the auctioneer consists of four lists: bid resources, out-of-bounds resources, priority resources, and provisional transactions. The initiating agent is responsible for creating the R F Q and adding resources to each list as required.  To create an RFQ, Agent 3 starts by asking the auctioneer for a list of resources not owned by any agent. If such resources exist, they are added to the priority list. In this example, T9 through T12 are added to the priority list. Agent 3 then adds T l and T2 to the RFQ's bid list. At the same time, the agent adds the bid resources to the OOB list (to indicate that the resources are in play) and, if necessary, removes them from the priority list. If the candidate bid involves a linked ask, the linked ask resources are added to the priority list. Finally, any other resources currently owned by Agent 1 but not in the linked ask are added to the OOB list.  166  Steps 3 and 4: Determine  owners and relay seller-speciGc  RFQs  O n receipt of an R F Q , the auctioneer consults its lookup table of resources and owners. The bid resources are grouped by owners and the requisite number of seller-specific RFQs are created. In the case of unowned resources, no R F Q is created. The auctioneer sends the seller-specific RFQs to the appropriate agents and, on receipt of an R F Q , each agent "locks" itself. The lock prevents the agent from participating in other transactions until the overall feasibility of the current transaction has been determined.  The bid resources are owned by a single agent—Agent 2. The auctioneer creates a single seller-specific bid for [ T l , T2] and forwards it to Agent 2.  Step 5: Determine  next-best  state  When a seller-specific R F Q is received by an agent, it selects a next-best state using the criteria described in Section Once the agent has identified its next-best state, it appends the details for the new state to the RFQ's transaction list.  In order to respond to the RFQ, Agent 2 first determines the implications of selling [ T l , T2]. In this case, the structure of Agent 2's policy tree is such that it can quote a price for [ T l , T2] without creating residual supply. The value of the pre-RFQ state R  2  R'  2  = { T l , T2, T3, T4, T5} is $95; the value of the state after selling the bid resources = {T3, T4, T5} is-$12.  Rather than submitting an ask price of $95 - (-$12) = $107, Agent 2 searches for a next-best state. Since the priority list contains resources, these resources are checked first suitability as substitutes. In this case, Agent 2 selects the bundle [T9, T10] as a substitute. The next-best state is therefore denoted R"  2  Step 6: Submit RFQ  for substitute  = {T3, T4, T5, T9, T10}  resources  In order to respond to the R F Q , the selling agent must determine the cost of the substitute resources required to achieve the next-best state.  167  To determine the cost of achieving its next-best state, Agent 2 creates a new RFQ and adds T9 and T10 to the bid resource list. In addition, Agent 2 transfers the resources from the priority list to the OOB list and adds the provisional transaction from R to R" to the 2  2  transaction list. The RFQ is then submitted to the auctioneer.  Step 3(a): Determine  owners and relay seller-specific  RFQs  The submission of the RFQ by Agent 2 starts a recursive.procedure. However, in this case, the bundle [T9, T10] is unowned. As a consequence, no seller-specific RFQ is created by the auctioneer.  Step 7: Return price for substitute  resources  Since the bid resources are unowned, the fixed ask price (zero) is returned to Agent 2.  Step 8: Return  ask  price  The change in utility for Agent 2 caused by the transition from allocation R to allocation 2  R" is -$5; the cost to Agent 2 to acquire [T9, T10] is $0. As such, Agent 2 is indifferent 2  between staying at R and moving to R" if it is paid $5. According to the price 2  2  convention, the ask price for the resource is set to the seller's indifference price.  Step 9: Return  market  price  When more than one seller-specific R F Q is created, the auctioneer must sum the ask prices from each before returning the total market price to the initiating agent.  Since only one seller-specific RFQ was required in this case, the total market price for the original bid resources, [ T l , T2], is $5.  Step 10: Decide  whether candidate  bid is  feasible  The feasibility inequality (recall Equation 4.15 on page 164) for Agent 3's candidate bid evaluates to ($98 -(-$12)) > $5 - $0. In other words, the gain in surplus realized by Agent 3 ($110) is more than enough to compensate Agent 2 for the net cost of moving to a next-best state ($5). The candidate bid is therefore deemed feasible and the provisional  168  transactions contained in the transaction list are committed. The resulting allocation of resources is shown in Figure 4.40.  J3  J3 J2 J2 J2 J1 J1 J1 J2 J2 — i  1 2  FIGURE 4.40  3 4  5  6  7  8  1  1  1 — ^  9 10 11 12  Job  Time in system  Revenue (net of holding costs)  Job 1  8  $92  Job 2  10  $90  Job 3  2  $98  The allocation of resourcesfollowing the purchase of [Tl, T2] by Agent 3.  A round of bidding continues until every agent has had an opportunity to work down its candidate bid list in an attempt to execute a utility-increasing exchange. In the next step of the example, it is Agent I's turn to attempts to improve its situation. The first candidate bid it considers is [ T l , T2, T3], which is jointly owned by Agent 2 and Agent 3. The provisional allocation of resources resulting from the candidate bid depends on the order in which the seller-specific RFQs are sent out. If Agent 2's seller-specific R F Q is sent out first, the allocation shown in Figure 4.41(a) emerges. Conversely, i f Agent 3's seller-specific R F Q is sent out first, then the allocation in Figure 4.41(b) emerges. In case (a), the result of the provisional allocation is a net decrease of $1 in the overall utility realized by the agents and the candidate bid is deemed infeasible. In case (b) however, there is net increase of $2 and the candidate bid is committed. The final allocation is the same regardless of the path taken. For example, in case (a), Agent 1 continues to submit candidate bids until [T3, T4, T5] is deemed feasible and is committed. From this point forward, no further candidate bids from any of the agents is feasible and equilibrium is achieved. In case (b), Agent 3 makes a second successful bid for [ T l , T2] to put the market into equilibrium.  169  (a) S e l l e r - s p e c i f i c R F Q f o r A g e n t 2 i s s e n t f i r s t  Jl  J1  J1  J2  J2  J2  J2  J2  J3  J3  1  2  3  4  5  6  7  8  9  10  11  12  Utility b e f o r e  Utility f o l l o w i n g  provisional  provisional  transaction  transaction  Job  •  Change  (Aj)  Job  1  $92  $97  $5  Job  2  $90  $92  $2  Job  3  $98  $90  -$8  (b) S e l l e r - s p e c i f i c R F Q f o r A g e n t 3 i s s e n t f i r s t  J1  J1  J1  J3  I J3  J2  J2  J2  J2  J2  1  2  3  4  5  6  7  8  9  10  4.41  12  Utility b e f o r e  Utility f o l l o w i n g  provisional  provisional  transaction  transaction  Job  F I G U R E  11  •  Change  (Aj)  Job  1  $92  $97  $5  Job  2  $90  $90  $0  Job  3  $98  $95  -$3  Two possible provisional transactions: The transaction depends on the order in which the auctioneer creates seller-specific asks in response to the candidate bid by Agent 1 for  [Tl, T2, T3].  SUMMARY O F E X A M P L E  P R O B L E M  For the simple single-machine problem considered here, the equilibrium outcome of the market is identical to the known optimal solution. Moreover, the threats to equilibrium created by the requirement to consolidation supply and demand, the issue of compound bids and next states, and the problem of deadlock in thin markets are eliminated by the two-phase feature of the auction. Since ask  170  A L G O R I T H M 4.1: CONTINUOUS REVERSE A U C T I O N  ^PRIORITY*~  - ^  r  r  R  . ^  (initialize priority list)  o r  1. Repeat  2.  continue <— FALSE  3.  For each agent /  4. 5-  For each candidate bid R / such that F ( R / ) > V(RA: R  W f l  <-{r:r6R/,ra,}  6.  Get price  -  ^FIXED  8.  For each r e R/r/^££>  1  9. 10.  (R ): BID  *~ ^PRIORITY  Price(R )  °  £/£>  R  <- Price(R )  BID  BID  + Price(r)  Find selling agent J such that R . n R  J  J  ^0  D  11.  Find next-best state such that R^' n R  12.  Get price ( R / )  13.  If F ( R / ) > F ( R ) + Price(R )  BID  f  BID  14.  commit transaction  15.  continue <— TRUE  16.  then  Else  17.  rollback transaction  18.  next candidate bid  19.  = 0  Next agent i  20. End if continue = FALSE  prices are quoted just-in-time, there is no requirement for a centralized auctioneer or price list. Given the number of unique markets that need to be made in the worst case of a combinational auction, this eliminates the problem of an infeasibly large price list. Algorithm 4.1 shows the basic steps involved in the market protocol. In Chapter 5, the computational characteristics of the auction mechanism are analyzed and the distributional advantages of the approach are described in greater detail.  171  CHAPTER 5: EMPIRICAL RESULTS The proof-of-concept pyramid in Figure 1.1 on page 7 identifies a number of theoretical and empirical questions that must be resolved in order to evaluate the overall feasibility of the market-based approach. In this research, the question of feasibility is necessarily empirical due to the nature of the algorithms employed. Specifically, the effective size of an agent's policy tree depends critically on the manufacturing environment in which the agent is situated and the way in which the environment is represented using the P-STRIPS language (see Section 5.1.3). Hence, the only way to know whether the S D P algorithm will construct an optimal policy tree in a reasonable amount of time is to formulate some problems and solve them. In this chapter, issues of feasibility are explored by solving problems using a prototype market-based system. In keeping with the framework presented in Chapter 4, the prototype consists of two major components: an agent-level planning system (rational choice) and a combinational auction in which agents can buy and sell contracts for resources (aggregation). The question of whether it is possible to achieve agent-level rationality using the SDP algorithm is addressed iri Section 5.1. The question of whether the proposed auction protocol leads to an optimal market equilibrium is discussed in Section 5.2. In Section 5.3, the performance of the market-based prototype with respect to more complex problems is assessed and in Section 5.4, the benchmark problem from Chapter 2 is revisited.  5.1 ACHIEVING AGENT-LEVEL RATIONALITY Without agent-level rationality, the first fundamental theorem of welfare economics says nothing about the overall efficiency of market outcomes. Thus, the research question addressed in this phase of the thesis is: "Is it is possible in practice to implement rational agents for manufacturing scheduling problems?" O f course, the general answer to the question is "no"—planning and scheduling are both known to be NP-hard and the requirement to account for non-deterministic outcomes exacerbates the computational issues considerably. However, a fundamental advantage of the market-based approach is that it permits large problems to be decomposed into many smaller problems. Although the small problems remain NP-complete in principle, they may be small enough in practice to formulate and solve in a reasonable amount of time. Hence, the emphasis in this section is placed on the "in practice" part of the research question.  172  Ultimately, the goal of this program of research is to solve agent-level planning problems that are large enough Xo permit the market-based system to be implemented in industrial settings. Stating the objective in this way requires that the term "large enough" be defined. Based on the descriptions in [Fox, 1987] and the researcher's own experience, the assumption is made that an agent in a real-world manufacturing environment should be capable of sequencing 5 to 15 operations. O f course, "operation" ' is not an objective measure since atomic units of work can be bundled together in many different ways. For example, an operation can be defined to include setup and teardown of a machine. Alternatively, setup, processing and teardown can be considered as distinct operations in their own right. The point of binding "large enough" to a specific range of values is simply to emphasize that the planning requirements for- a single agent in a manufacturing environment are surprisingly modest. The complexity that does exist in industrial scheduling generally arises as a consequence of the sheer number of jobs and the different costs and rewards associated with each, not because of the complexity of the jobs themselves.  5.1.1  CONVENTIONAL STOCHASTIC  OPTIMIZATION  One possible means of achieving agent-level rationality in uncertain environments is to formulate the agent-level problem as a Markov Decision Problem (MDP) and solve it using an established technique such as policy iteration. As discussed previously, however, the curse of dimensionality greatly limits the use of the M D P approach, even for the small agent-level problems considered here. To illustrate, consider the number of states need to represent the state space of an agent in the benchmark problem:  Agent 2 requires three operations ( O p l , 0 p 2 , and 0p3) on three different machines ( M l , M2, and M3). The amount of processing time for each operation, p  h  deterministic, as shown in the table below:  Operation (/)  Processing time (p^  Op1  4  Op2  3  Op3  5  173  is assumed to be  Number of States vs. Planning Horizon (MDP formulation) tE+18 tE+17 1.E+16 tE+15 1E+14  ro u tn cn _o in a > OJ <> / v. (U .£2  E  1E+13 1E+12 1.E+11 tE+t) IE+09 tE+08 1E+07 1.E+06 1.E+05 1.E+04 tE+03 tE+02 1.E+01 1.E+00 4  6  8  10  12  planning horizon length, t (time units) FIGURE  5.1  The curse of dimensionality for a part-agent: As the length ofAgent 2"s planning horizon approaches a "useful" si%e, the concrete state space becomes unmanageably large.  Recall from the analysis in Table 4.1 on page 99 that approximately 1 0  13  states are required to repre-  sent the agent's planning problem over a decision horizonof 10 units of time. More generally, the size of the concrete (or explicit) state space is shown (on a logarithmic scale) over a range of planning horizon in Figure 5.1.  5.1.2  COPING STRATEGIES  As discussed in Section 4.2, three "coping strategies" have been used to attenuate, or at least delay, the explosive growth of the state space shown in Figure 5.1: 1. Structured dynamic programming (SDP) — Policy iteration is performed over a policy tree rather than an explicit state space. The SDP algorithm is designed to exploit sources of independence within the problem formulation.  174  2.  Assertions and constraints — Assertions are used to eliminate branches of the policy tree that correspond tp invalid states in the physical world. Constraints were not used for the simple problem formulations considered here.  3.  Rolling planning horizons — In the rolling planning horizon approach, a planning horizon of t units of time is used to generate a policy tree. However, since the agent's planning problem may extend beyond the horizon, point estimates of expected utility are used for the interval [t+1, °°]. As the agent approaches the end of its planning horizon, it replans for the next t units of time.  5.1.3  COMPUTATIONAL  LEVERAGE  The S D P algorithm is highly sensitive to the existence of independence relationships within the problem formulation. For example, if the planning algorithm is deciding whether to execute the action Process(Op2, M2, T5)  in a particular state, a number of factors combine to define the sets of relevant  and irrelevant information for the decision process: •  The one-way arrow of time ensures that all state variables of the form Contract(Mj, Tj) for Tj < 5 are irrelevant. Although the S D P formulation does not use a conventional M D P representation of states, the Markovian property of history independence [Puterman, 1994] remains an important feature of the approach.  •  If there is a precedence constraint that states that O p l must be completed before 0p2 can be started, then the value of the state variable OpStatUS(Opl) is certainly relevant. However, since the amount of time actually used to complete O p l has no impact on 0p2, all state variables of the form ElapsedTime(Opl) are irrelevant.  •  If there is a precedence constraint that states that 0p2 must be completed before 0p3 can be started, then an assertion binds the state variable ElapsedTime(0p3) to a known value of zero. Known values create singleton branches and have no effect on the number of abstract states in the solution.  Since relevance relationships are specific to a particular problem formulation, the efficacy of the coping strategies can only be evaluated empirically within a class of problems. For the agent-level plan-  175  ning problems considered here, the method used for evaluating the performance of the coping strategies is as follows: 1. Formulate agent-level planning problems that differ along a number of dimensions: a.  length of the planning horizon  b.  deterministic vs. stochastic processing times  c.  single operation vs. multiple operations  2. Solve the problems using the SDP algorithm. 3. Compare the number of abstract states in the optimal policy tree to the number of concrete states in the conventional M D P formulation. The number of abstract states in the agent's policy tree is used as a measure of computational leverage for two reasons. First, the modified policy iteration algorithm used in S D P is polynomial in the size of the abstract state space. As such, the problem's effective size (as measured by the number of leaf nodes in the final policy, tree) is the primary indicator of computational performance. Second, the as the number of states becomes large, the issue of representation becomes important. For most of the explicit M D P formulations considered here, the limiting factor is not computational time but rather the number of bits required to store the state space description. In the following sections, the effects of different problem formulations on the effective size of the agent-level planning problems are explored. The purpose of this exploration is not to provide a comprehensive analysis of the computational performance of the SDP algorithm. Instead, the objective is provide a broad indication of what is gained by using the coping strategies and highlight the remaining sources of complexity.  PLANNING HORIZON  The importance of time in scheduling problems means that the length of the planning horizon, t, has a large effect on the size of both the concrete and abstract state spaces. Although the rolling planning horizon approach permits problems with arbitrarily long horizons to be addressed (at the cost of periodic replanning), the estimates of terminal rewards used in the rolling horizon approach introduce elements of approximation and "nearsightedness" into the technique. It is preferable, therefore, that the planning horizon for an agent span as much of its decision problem as possible. For example,  176  Number of States vs. Planning Horizon (MDP and SDP formulations) " L E + 1 5 1 E + 1 4 1 . E + 1 3 I E + 1 2  tE+09  if)  tE+08  1 E + 1 0  ;tat  —  1 E + 0 7  in H— O  t E + 0 6  _Q  t E + 0 4  __ o  E  3 C  concrete states in MDP formulation  1 . E + 1 1  ro o V) O) O  1 . E + 0 5  jm _____— __l  tE+03 1 E + 0 2  —- w aDsiraci states in SDP formulation  t E + 0 1  1  t E + 0 0  1  planning horizon length, f (time units)  FIGURE 5.2  The effect of the length of the planning horizon on the number of concrete and abstract states.  if an agent's expected processing requirement is 10 units of dme (suitably denned), then a planning horizon longer than 10 units should be used to reduce the agent's reliance on the accuracy of the terminal reward estimates. A comparison between the explicit M D P formulation and the S D Pformulationfordifferent planning horizons is shown in Figure 5.2. The upward slope of the SDP line indicates that the number of abstract states increases exponentially with t. Moreover, the slight upward curve in the line suggests that the rate of growth is actually worse than exponential. However, despite the rapid growth characteristics of the abstract state space, the important result shown in Figure 5.2 is the dramatic decrease in both the absolute number of states and the rate of state space growth (indicated by slope of the line) provided by the SDP algorithm. For example, whereas almost 1 0  14  concrete states would be  requiredforthe M D P formulation of the t = 11 problem, the SDP algorithm requires just under 16,000 abstract states.  177  STOCHASTIC PROCESSING T I M E S  Outcomes in manufacturing environments are characterized by uncertainty. However, most of the problem's considered to this point in the thesis are deterministic. In order to better understand the impact of stochastic operation times on the effective size of the problem's state space, experiments were conducted to compare the state space growth of deterministic and stochastic versions of the same problem. In the stochastic version of the problem, each unit of processing time for each operation results in equal likelihood of finishing and continuing. For example, the cumulative finishing time distribution for actions of the form Process(Op3, M3, •) is shown in Figure 5.3. Note that the maxi1  mum processing time for each operation is the same for both the deterministic and stochastic versions. As a result, the number of state variables of the form ElapsedTime(Opj), and hence the number of concrete states, is constant.  cumulative probability of completion  ElapsedTime(Op3) FIGURE  5.3  The probability that 0p3 is complete after executing Process(Op3, M3, •): The probabilities are conditioned.on different values of state variable ElapsedTime(0p3). The maximum amount ofprocessing requiredfor the operation is assumed to befiveunits of time.  Figure 5.4 shows the size of the state space plotted against the length of the planning horizon for the deterministic and stochastic versions of the problem. The size of the concrete state space is also S i n c e the p r o b a b i l i t y o f c o m p l e t i o n f o r any value o f  ElapsedTime(0p3) is  c o n s t a n t , the c o m p l e t i o n ' t i m e s i n  this e x a m p l e a p p r o x i m a t e a n e x p o n e n t i a l d i s t r i b u t i o n . H o w e v e r , as d i s c u s s e d i n S e c t i o n 4.1.8, P-STRIPS p e r m i t s the use o f any discrete p r o b a b i l i t y d i s t r i b u t i o n .  178  Number of Abstract States vs. Number of Concrete States 10000000000 T  1  1  1  1  1  1  1  •o !J 0  1  1  1  1  1  1  1  1  2  3  4  5  6  7  planning horizon length, f (time units) FIGURE 5.4  A. comparison of deterministic and stochastic problems with the same number of concrete states: For the shorter planning horizons, non-determinism has virtually no effect on the si^e of the finalpolicy tree.  shown for reference. For shorter planning horizons, there is virtually no difference between the deterministic and stochastic case. However, as the planning horizon is increased, the two lines diverge as the number of states in the stochastic case jumps dramatically. The main reason for the jump is an artifact of the problem formulation. Specifically, the mean makespan for the stochastic case is shorter than the known makespan (4 + 3 + 5 = 12) for the deterministic case. Because of this difference, the planner in the stochastic case has more choice in determining a schedule. For example, in the stochastic case there is a 0.5 = 0.125 probability of all three operations being completed after three units of processing. Since there are many ways to schedule three units of time within a planning horizon of five or six units, the bushiness of the policy tree naturally increases with the length of the planning horizon.  179  The conclusion based on Figure 5.4 is that non-determinism perse does not significandy affect the effective size of an agent-level planning problem. However, the ratio of mean makespan to the length of the decision horizon does impact the bushiness of the policy tree.  N U M B E R OF OPERATIONS  To examine the effect of the number of operations on problem size, a family of problems with a finite planning horizon (six units of time) but a different number of operations summing to a makespan of six was solved. As the results in Figure 5.5 suggest, there is no clear, monotonic relationship between the number of jobs and the effective size of the problem for a given planning horizon. O f course, this result must be interpreted cautiously given the small range of values considered here. Unfortunately, limitations of the current prototype system prevent a more comprehensive exploration and thus the issue remains an area for future investigation.  problem  concrete states  abstract states  Number of Abstract States vs. Concrete States c o 900 JS 800  2 operations  3  E 700  1 operation  9,216  200  2 operations  884,736  784  3 operations  50,331,648  251  600 a Q_ 500  o ' 400 CO  in 300 CO  200 ra  1 operation  in o 100 ra in 0  £i  ra  10  100  3 operations  xm 10000 100000 1000000 1E+07 -E+08  concrete states (MDP formulation) FIGURE 5.5  The relationship between the number of operations in ajob and the si%e of the abstract state space: Note that the number of jobs is not an accurate predictor of state space for the range ofproblems considered here.  5.1.4  INTERPRETATION OF RESULTS  To recapitulate, the advantage of the decision-theoretic planning (DTP) formulation of the agent problems is that it permits the precondition of strict agent-level rationality to be satisfied. Moreover,  180  the D T P formulation supports reasoning about non-deterministic outcomes. Given that manufacturing environments are characterized by uncertainty about completion times, breakdowns, resource availability, and so on, there is a good fit between the agent-level problem formulation and the environment in which the agents are situated. The primary challenge of the D T P formulation is the "curse of dimensionality". In standard D T P (e.g., [Dean et al., 1993], [Russell and Norvig, 1995]), the agent-level problems are transformed into M D P s and solved using well-know techniques such as policy iteration. However, as the agent-level problems become more complex and realistic, the size of the agent's state space grows to an unmanageably large size. As Figure 5.1 on page 174 illustrates, there is absolutely no possibility of using conventional M D P solution techniques to address the agent-level planning problems required for market-based resource allocation. The results using coping strategies such as S D P are more encouraging, however. The key result from the prototype implementation is that the coping strategies employed in this thesis permit the solution of agent-level planning problems that are infeasible to represent (never mind solve) using conventional M D P solution techniques. Although the example problems solved using the prototype remain too small to be considered "useful" in industrial environments, this is largely a result of limitations posed by the languages and platform used for prototype implementation (see • Section If more efficient internal algorithms (e.g., sorting, search, tree joining), better languages, and faster hardware were used to implement a prototype, one would expect a performance increase of several orders of magnitude over the existing prototype. By extrapolating from the results in this section, it is estimated that a better implementation could be used to solve problems with several million abstract states (versus the tens of thousands considered here) in an acceptable amount of time. Agents with this type of planning capability could be of practical use in industrial environments and thus exploration of additional sources of computational leverage and prototype enhancements are identified in Chapter 6 as areas for further research.  181  5.2 DISTRIBUTED COMBINATIONAL AUCTIONS In Section 4.3, a protocol for a continuous combinational auction was introduced and illustrated with a simple example. A n important feature of the proposed protocol is that it supports combinational bids without the need to explicitly quote prices for every possible bundle of goods. That is, agents can submit candidate bids for any resource or aggregation of resources and the auctioneer will return a market price. Although the auctioneer is responsible for controlling the flow of price information between agents, the bulk of the work to determine the market prices is done by the agents them selves. One means of visualizing the operation of the market is to situate it with respect to other "hill climbing" techniques. In Section 5.2.3, the market framework is characterized as a distributed; hill climbing algorithm and its computational properties, are briefly discussed in the hill climbing context.  5.2.1  SEARCHING FORNEXT-BEST  STATES  A basic issue that arises in practice is whether the selling agent must conduct a "shallow" or "medium" search for a next-best state. To illustrate the difference, consider the case of a "deep" search (which has not been implemented in the prototype system). In a deep search for its next-best state, the potential seller of a resource is given the opportunity to evaluate all the transitions in its candidate bid list (with the exception of those containing resources in the initial request for quotation (RFQ) and those on the out-of-bounds list) in order to find its best post-sale alternative. Thus, in a deep search, each R F Q results in a recursive chain of optimal bidding attempts. At the other end of the continuum is shallow search. In shallow search, the selling agent is permitted to search for substitute resources in the priority list only. If no utility-increasing transition can be constructed from priority list resources, the seller simply quotes its reservation price for the resources in the RFQ. Although shallow search eliminates recursion, a potential source of market failure could arise i f the seller—for some reason unique to that particular agent—places no value on priority list resources. For example, an agent with a binding deadline of Time = 10 could be asked to submit a price for a subset of its resources and consider those on the priority list as possible substitutes. If all the resources on the priority list correspond to processing contracts that occur after the Time = 10 deadline, then the seller's shallow search for a next-best state fails. However, i f some other agent with a more forgiving deadline can supply the selling agent with appropriate substitute resources and then  182  partially offset its own losses from the priority list, two beneficial outcomes accrue. First, the price for the original R F Q resources is lower due to the seller's ability to find a substitute resources. Second, demand is created for the priority list resources. Facilitating simple «-way trades for substitute resources (such as in the example above) is what is meant by "medium" depth search for a next-best state. The selling agent performs a neighborhood (rather than exhaustive) search for substitute resources. Although this results in recursion, the recursion ends as soon as an agent can fulfil its substitute resource requirements from the priority list. For the problems addressed in this chapter and in Chapter 4, a shallow search for the seller's next-best state proved to be sufficient for finding the correct equilibrium (that is, the equilibrium corresponding to the optimal allocation of resources). It is clear, however, that in in the general case, deep search is required to avoid all sources of market failure and ensure an optimal allocation. Thus, although the priority list and the shallow search heuristic gready reduce the complexity of the algorithm, the resulting allocations may be arbitrarily far from the optimal outcome. Precisely how much approximation results from shallow search in this type of market is an empirical question that is beyond the scope of the thesis.  5.2.2  COMPLEXITY ANALYSIS  The core difficulty with medium-depth search is that the process used to determine market prices is potentially recursive. Rather than simply consult a central repository of current market prices, potential buyers in this system must submit a R F Q to the auctioneer. The auctioneer relays relevant portions of the original R F Q to the agents that own the resources under consideration and these agents typically generate their own RFQs for substitute resources. Thus, a single R F Q could conceivably set off a chain of follow-on RFQs from each agent in the system. To get some idea of the worst-case performance of the process, consider Algorithm 4.1 on page 171. Assume that each agent i — 1, ..., n has the same number of bids, b, in its candidate bid list and each agent is initially in the lowest-valued state of its resource tree. Since each agent starts searching at the top of its candidate bid list for a feasible bid, it could be that all b-\ states worth more than the current state are evaluated before a feasible candidate bid is found. Moreover, assume that the situation is the same for all agents so that each agent has to consider b-\ bids in a single round of bidding. Finally,  183  assume that each candidate bid generates n follow-on RFQs so that every agent is involved in every provisional transaction. Note that the protocol prevents infinite recursion since no agent can submit more than one R F Q within the context of a single conditional transaction. Under these assumptions, 2  the total number of RFQs for each round of bidding is proportional to n (b — 1). To estimate the total number of auction rounds required to attain equilibrium, assume that only one Pareto efficient transaction is executed in each round. Although a single transaction is sufficient to require another round of bidding, the fact that prices increase monotonically ensures that there are no cycles and that the total number of rounds is no greater than b X n. Consequently, the maximum number of RFQs that could be generated when finding equilibrium is 0(b n ). In other words, the auction protocol is polynomial in the number of bids on the agents' candidate bid lists and the number of agents in the system. Although the number of candidate bids, b, is a function of the size of the agent's policy tree and is therefore exponential in the complexity of the agent-level problem, the SDP algorithm ensures that only relevant bundles of resources are considered. Thus, the set of candidate bids considered by each agent is complete but minimal. In addition, the actual number of RFQs generated during an auction tends to be much smaller than the worst case due to the bias introduced by the priority list, the greedy bidding of the agents, and the monotonicity of prices. As illustrated in Section 5.3.1 below, agents move up their bid lists very quickly. Since an agent's utility cannot decrease as the result of a transaction, early gains are "locked in". Because of this, the number of candidate bids that actually result in a committed transaction tends to be very small.  5.2.3  B I D D I N G AS H I L L  CLIMBING  The process of buying and selling contracts for resources in a market can be seen as a special form of hill climbing. In hill climbing, the objective function to be maximized (in this case, global utility) is visualized as a surface in which the height dimension is the objective function's value. A hill climbing algorithm attempts to reach the solution by considering its current location and moving along the steepest possible path to a higher location. The process is iterative and ends when the algorithm cannot identify any further value-increasing moves.  184  In a market that is complete and efficient, all transactions that are committed lead to a monotonic increase in global utility. Moreover, in such a market, all Pareto efficient exchanges that can occur do occur. Thus, the surface created by an efficient market formulation has no local maxima (although multiple global maxima may exist). Sandholm [1998] identifies four atomic types of contracts including ordinary bilateral contracts, cluster contracts (more than one good is exchanged), swap contracts (in which each agent gains and loses a good in the transaction), and multiagent contracts. A n OCSM-contract—which combines aspects of all four atomic contract types—is shown to be sufficient for implementing an efficient combinational auction. In other words, a hill climbing algorithm can attain the globally optimal allocation in a finite number of exchanges without backtracking (see [Sandholm, 1998] for the proof).  The auction protocol developed in the thesis supports a rich set of inter-agent exchanges that mirrors the functionality of OCSM-contracts. In addition, the use of buyer-initiated RFQs delegates much of the hill climbing work to individual agents. When agents submit candidate bids to the auctioneer, they do so in an effort to maximize their own utility net of any payments they must make to other agents. Thus, by submitting their best bids first, the agents are implicitly identifying paths of steep ascent. Similarly, when the agents decide whether to commit or rollback the candidate bid based on the market's response to their R F Q , they are implicidy determining the direction (uphill or downhill) of the transaction. Since prices can only increase, gains made by a particular exchange are locked in for all subsequent exchanges.  The difference between the auction protocol proposed here and Sandholm's optimal algorithm is that the auction protocol relies on the use of a priority list and incomplete search (i.e., shallow or medium-depfh search). Since the search for the next-best state is incomplete, the "task allocation graph" is not fully connected. Thus, although the types of contracts available to the agents are equivalent to Sandholm's OCSM-contracts, only a small subset of the possible transactions are actually considered during the auction. As a consequence, the auction protocol described in this thesis may attain equilibrium at an economically inefficient local optimum. The loss of optimality resulting from incomplete search depends to a large degree on idiosyncratic properties of the manufacturing environment, such as the extent to which resource contracts are substitutable. As a consequence, the performance of the protocol needs to be evaluated empirically in  185  different manufacturing environments. For example, in [Parkes, 2000], the performance of serveral complete and incomplete auction protocols are evaluated against a standardized problem set. Such an analysis, however, is beyond the scope of the thesis and is left for future work.  5.3 EVALUATING THE PERFORMANCE OF THE MARKET In this section, the operation of the market-based system as a whole is examined in greater detail. The objective is to gain a better understanding of the framework's computational strengths and weaknesses and to identify the type of optimization problems that can be addressed using markets. Section 5.3.1 provides empirical results from a simple resource allocation task. In Section 5.3.2, the generalizability of the framework is assessed by addressing multiple machine problems with more complex objective functions. Finally, in Section 5.3.3, the flexibility of the market in the face of new opportunities and uncertainty is examined. 5.3.1  EMPIRICAL RESULTS  To illustrate the performance of the proposed market protocol, consider the typical sequence of transactions used to achieve equilibrium shown in Figure 5.6. The problem setup used to generate the sequence is identical to that used in Section  Three jobs each consisting of a single operation require different amounts of processing time on a single machine. The holding cost faced by the agents for each unit of time in the system is $1. The initial allocation of resources to agents is arbitrary and suboptimal, a shown in Figure 5.6.  After the second round of bidding, the optimal allocation of resources is found by the market. By analyzing the sequence of exchanges leading to the equilibrium outcome shown in Figure 5.6, a number of insights concerning the operation of the protocol can be gained. •  In Round 1, Agent 1 displaces Agent 2 from T2. Note, however, that Agent 2's next-best allocation uses T7, which is the linked ask for Agent 1. By consuming resources that will ultimately be redundant for Agent 1, the transaction can occur without involving Agent 3 (i.e., the amount of recursion is limited by the priority list).  186  Round  Initiating agent  Allocation at time t 1  0 1  2  3  A1  A2  A3  Total  2  93  89  92  274  2  97  89  92  278  2  3  4  5  6  7  8  9  10  11  2  1  2  3  2  1  3  2  1  12  1  1  1  1  2  3  2  2  3  2  2  1  1  1  2  2  2  2  2  3  3  97  92  90  279  3  3  3  1  2  2  2  2  2  1  1  90  92  98  280  1  3  3  1  1  1  2  2  2  2  2  95  90  98  283  2  3  3  1  1  1  2  2  2  2  2  (no c h a n g e )  283  3  3  3  1  1  1  2  2  2  2  2  (no c h a n g e )  283  (all)  3  3  1  1  2  2  2  2  2  (no c h a n g e )  283  FIGURE 5.6  •  Values  1  The sequence of transactions used by the marketprototype to attain an equilibrium outcome.  Unowned resources are also on the priority list and thus Agent 2 could have selected T12 instead of T7 for its next-best state. However, note that each agent incurs a $1 holding cost for each unit of time that it is in the system. The "minimal change" heuristic (recall Section biases Agent 2's selection of its next-best state to T7 because it leaves the agent's net value unchanged. If Agent 2 had used T12, its ship time would be delayed by one unit and its net value would have decreased accordingly.  •  In Round 1, Agent 2 purchases [T5, T8] from Agent 3. The transaction makes Agent 2 better off by $3 and Agent 3 worse off by $2. Since Agent 2 pays Agent 3 its ask price of $2, the exchange is Pareto efficient.  •  Under the current implementation of the bidding protocol, the agents bid in succession. More generally, bidding could occur simultaneously on different threads as long as each chain of bids is compartmentalized in some way. As an artifact of the sequence used in this example, the agent that can afford to pay the most for processing—Agent 3—bids last.  •  At the end of Round 1, Agent 3 bids for and purchases [Tl, T2]. However, rather than shift all jobs down two time units, the seller (Agent 1) selects its next-best state from the priority list and purchases Agent 3's linked ask [T9, T10]. Agent 2 remains unaffected by the transaction.  187  •  The algorithm iterates until a round occurs in which there are no changes. Since several transactions occurred in Round 1, a second round is required.  •  In Round 2, Agent 1 is given the opportunity to improve the allocation it accepted at the end of.Round 1. It starts by bidding for [Tl, T2] but cannot pay Agent 3's ask price since Agent .1 sold Agent 3 the identical bundle in the previous round (prices rise). The best allocation that Agent 1 can afford involves the purchase of [T4, T5] from Agent 2. Since buying [T4, T5] makes Agent 1 better off by $5, it can afford to compensate Agent 2 for the $2 loss it incurs by selling the resources.  •  ;•  After Agent 1 has completed its transaction in Round 2, Agent 2 and Agent 3 take their turns. In Agent 2's case, it submits candidate bids for all the bundles that would permit it to finish before T10. However, in each case, the agent cannot afford the ask price returned by the auctioneer and is therefore unable to initiate a transaction. In Agent 3's case, its allocation of [Tl, T2] is already maximally preferred and it has no incentive to initiate a transaction.  •  A third round is required because a transaction occurred in Round 2. However, since Agent 1 cannot afford to do better than [T3, T4, T5], no transactions occur in the third round. Equilibrium is therefore attained.  As the sequence Figure 5.6 shows, each committed transaction results in a strict increase in global utility. This property of the market coupled with the priority list heuristic permits the market to converge very quickly to the optimal allocation of resources.  5.3.2  EXTENSIONS TO MULTIPLE-MACHINE  PROBLEMS  In a simple single-machine problem, such as the one used for illustration in Section 5.3.1, the profit maximization behavior of the market reduces to the shortest processing time (SPT) sequencing rule. Although it is interesting that the market can "discover" the SPT sequencing rule viaa process of iterative improvement, the outcome is of little practical value given the simplicity of implementing SPT. The situation is very different in the case of multiple machines, however, since no optimal sequencing rule for the general problem is known to exist. Simple sequencing rules like SPT do not work well in multiple machine environments due to the requirement to consider the cost implications over all operations simultaneously and to account for the possibility of concurrent activity on multiple  188  machines. To illustrate the basic issues, consider the coarse-grained multiple-machine problem shown in Figure 5.7. This particular problem is referred to as "coarse-grained" because the units of time used in the problem formulation are longer than the finer-grained time units used in the benchmark problem. It may be that the two problems are expected to span the same amount of calendar time; however, the coarse-grained problem contains less information and is therefore easier to solve in a single planning horizon.  Operations  Jobs  Op1  Op2  Op3  J1  1  1  2  J2  2  1  2  J3  2  1  1  Time M1  1  2  3  JI  J2  J2  M2  4  5  6  7  Legend  JI J2  JI  J3 M3  F I G U R E  JI  5.7  JI  J2  J2  Processing times and the optimal schedule for a coarse-grained three-machine problem.  The sequence of transactions used by the market to attain the equilibrium allocation of jobs to machines is shown in Figure 5.8 (the optimality of the final schedule with respect to total profit can be confirmed by inspection). As expected, the greedy bidding behavior of the agents and the monotonicity of prices ensures that the solution is found in a relatively small number of iterations (two plus one to confirm that no further exchanges are possible). The interesting feature of the solution is that it demonstrates that the market does not simply apply the SPT rule to each machine. For example, the  189  total amount of processing required by J2 is greater than that required by J3; however, J2 appears first in the final permutation schedule. Another interesting feature of the solution is its stability in the face of multiple optimal solutions. Although swapping the order of J2 and J3 increases the total makespan of the schedule (from 7 units of time to 8 units of time), it leaves its total cost unchanged at $17. Since the market minimizes the total cost of the solution regardless of makespan, the schedule in which J3 precedes J2 is also an equilibrium outcome. However, because of the monotonicity of prices and the requirement that at least one agent be strictly better off as the result of an exchange, the market does not oscillate between the two solutions.  5.3.3  FLEXIBILITY OF T H E M A R K E T A P P R O A C H  In Section 2.2.4, a number of limitations of the conventional (MRP-based) decomposition of the manufacturing problerfi were identified. In general, the most important shortcomings of MRP-based approaches are their inability to reason about non-deterministic outcomes and their inflexibility in the face of new information, opportunities, and constraints. In this section, the single-machine problem from Section 5.3.1 is revisited and a new job is inserted into the system after an equilibrium allocation has been attained. The purpose of this exercise is to illustrate how the market reacts to uncertainty, time-dependent payoff functions, and undercapacity. The essential elements of the newly arrived job are described below.  At Time = 2, a new job, J4, enters the system and is assigned an agent (Agent 4). J4 is more complex than the existing jobs in the system for two reasons. First, its completion time is not known with certainty. Second, the transfer price received by the manufacturing system for processing the part is contingent on its completion time. The completion time distribution and the payoff function for the job are shown in Figure 5.9.  A n important aspect of the problem formulation to keep in mind is that the payoff function shown in Figure 5.9 is determined exogenously. For example, the customer for the part may have specified different prices for different delivery times in response to its own requirements and preferences. Alternatively, the payoff function could have been determined subjectively by taking into account loss of  190  Round  Initiating agent  Allocation at time f  Values  Machine 2  1  3  4  6  5  7  A1  A2  A3  Total  -7  -7  -7  -21  96  -7  -7  82  93  95  -7  181  93  -7  96  182  96  -7  95  184  96  94  93  283  M1 0  M2 M3 M1 1  1  1  M2  1  M3 Ml 1  2  2  2  1  2  M2  3  3  3  1  3  M3  2  2  1  2  M3 M1 3  1  M2  2  3  M2  1  2  FIGURE 5.8  1  3  2  3  3  2  1  3  1  1  2  2  3  3  2 1  2  2  3  3  2 1  1  2  3  3  1  1  M3  1  1  1  M3 M1  1  3  1  M2  1  3  1  M2  M1  1  3  M3  1  2  1 3  M2  M1  1  2  M3 M1  1  2  2  283  (no change)  283  3  3  2  (no change)  3  Equilibrium allocations of resourcesfor the coarse-grained three-machine problem: If the problem is solved in a single planning horizon, the marketfindsthe minimum cost schedule.  191  customer goodwill, and so on. The interesting issue raised by this example revolves around determination of Agent 4's willingness to pay for production resources. O n one hand, the new job has a much higher payoff than the incumbents if it can be processed before Time = 6. O n the other hand, in the worst case, the job may require five units of processing and thereby prevent one or more of the incumbent jobs from finishing by its own deadline (Time = 11). The sequence of transactions leading to the new equilibrium is shown in Figure 5.10. When Agent 4 joins the market, it has the opportunity to bid on any resource currendy owned by the incumbent agents (of course, since the first unit of time has already passed, no further bidding occurs for T l ) . Given the magnitude of Agent 4's payoff for early delivery and the infeasibility of processing all the jobs in the time available, it is not surprising that the new job displaces one of the other jobs. A t first, JI is displaced but J I eventually displaces 32, which is longer and therefore less profitable. By the end of the first round, the set of jobs to be processed is set. In the second and third rounds, the issue becomes Agent 4's willingness to pay for a fifth unit of processing time. According to the problem formulation, the probability of Agent 4's operation being complete after four units of processing time is 0.95. Thus, the agent's purchase of a fifth unit of pro-  Completion Time Distribution 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0  $300  a probability • cumulative  I  *  payoff for J4  $2oq payoff for J1, J2, and J3 $100 $50  i  J H  $0  1  2 3 4 5 6 7 8 9  10 11  completion time ElapsedTime(Opl)  FIGURE 5.9  The completiontimedistribution andpayoff function for the newjob.  192  Round  agent  4  5  3  1  1  1  2  2  2  2  2  1  3  1  1  1  2  2  2  2  2  (no change)  283  2  3  1  1  1  2  2  2  2  2  (no change)  283  3  3  1  1  2  2  2  2  2  (no change)  283  2  2  2  2  2  1  4  4  1 2  2  4  4  4  4  4 4  3 4  3  4  4  1  4  1  4  2  4  MffSfl  4  7  FIGURE 5.10  A1  A2  A3  95  90  98  A4 0  Total 283  3  -11  91  90  195.98  365.98  1  3  93  -11  90  195.98  367.98  1  3  (no change)  367.98  4  4  1  1  1  3  1  1  3  (no change)  373.70  *L  z  1 1  1  1.  (no change)  373.70  1  1  3  1  <  1  4  12  1  4  4  10  3  4 5  9  4  4  3  8  11  3  . ....  6  Values  2  0  1  Allocation at time f  Initiating  4  3  I  1  1  92  -11  95  195.98  371.98  92  -11  91  201.70  373.70  JL  92  -11  95  198.79  374.79  1  91  -11  95  201.65  376.65  1  (no change)  376.65  The sequence required to attain equilibrium after the arrival of the newjob: Agent 4's purchase of a fifth unit ofprocessing time can be seen as insurance. The price the agent is willing to pay for the insurance is a complexfunction of is completion probabilities andpayofffunction.  cessing time can be seen as "insurance" to cover the risk of being incomplete after four units of processing. However, the value of the insurance depends on its timing with respect to the agent's payoff function. For example, the difference between having no insurance and having Til as insurance is worth about five cents to Agent 4. In contrast, the expected value of T6 as insurance is $5.72 and only slightly less ($5.67) for T7. Although it might seem odd that Agent 4 ultimately decides to purchase its fifth unit of processing at Time = 7 rather than Time = 6, the discontinuity is due to the interaction of Agent 4's payoff function and the opportunity cost of T6. Specifically, Agent 3 is willing to pay $1 for T6 whereas Agent 4's expected value is reduced by only five cents if it purchases T7 instead.  193  The relatively complex schedule that emerges from the interaction of the agents in Figure 5.10 is especially interesting when one considers that what emerges is a price for Agent 4's option to use the machine at Time = 7. If, during execution of the plan, Agent 4 finds that it has completed processing after two units of time (probability = 0.325), the OpStatus(Opl) = complete outcome will lead to a new physical state. In the resource subtree corresponding to the new physical state, Agent 4 will value its remaining contracts for processing on Machine M l at exactly $0. This revaluation will trigger a new auction and the excess capacity will be absorbed by the remaining jobs (including J2). The ability of the agents to price real options in the manner may have important applications in other domains, as discussed in Section  5.3.4  INTERPRETATION OF RESULTS  The distributed hill climbing algorithm implemented by the market delegates the task of identifying value-increasing exchanges to the individual agents. Thus, the role of the centralized market is to simply iterate over the exchanges suggested by the agents until no agent is willing to suggest an exchange at current market prices. Although the resulting equilibrium is not guaranteed to be optimal with respect the global utility function, the results presented here indicate that the algorithm is capable of avoiding local optima and finding the optimal solutions in certain cases. As stated above, the generalizability of this observation needs be explored more systematically. In the problems addressed in the thesis, the market was shown to be capable of discovering of simple sequencing rules such as SPT. In addition, the market can provide good solutions in cases in which no optimal sequencing rules are known to exist, such as generalized multi-machine problems (Section 5.3.2) and problems with complex payoff functions and non-deterministic processing times (Section 5.3.3). O f course, in both cases, the problems used for illustration consist of a finite number of jobs and are small enough to be solved within a single planning horizon. Given the ultimate objective of industrial-scale applicability, a more interesting class of problems are those with an indeterminate number of jobs and an infinite planning horizon. In the following section, the benchmark problem from Chapter 2 is addressed using a rolling planning horizon approach in order to gauge the broader, real-world relevance of the market-based framework developed in this thesis.  194  5.4  S O L U T I O N  T O  T  H  E B E N C H M A R K  P R O B L E M  Recall from Section 2.5.3 that although Johnson's rule provides the optimal solution to certain instances of the F3 | | C  class of problems, the resulting schedule is optimal with respect to  MAX  makespan, not total profit. Given that maximization of total profit (or in the case of the simple problems considered here, minimization of cost) is a more realistic goal in many manufacturing environments, a method for addressing the F3 | |  XWJCJ  class of problems is preferred. Unfortunately, there  is no known polynomial time solution for multiple-machine, minimum cost scheduling problems. Indeed, the optimal minimum cost solution to the benchmark problem in Section 2.5.3 (reproduced as Figure 5.11 below) was found by inspection. The objective of this section is to assess the market-based prototype's ability to find a solution to a F3 | | HWJCJ problem using a rolling horizon approach. According to the Johnson's rule solution  (recall Figure 2.2 on page 24), the minimum makespan for the benchmark problem is 17 time units. Given that the prototype system is not capable of solving problems of this size and the general desirability of examining the impact of rolling planning horizons on auction outcomes, the agent-level problems were decomposed into four rolling horizon problems: Time = [1, 6], [7, 11], [12, 15], and [16, 24].  Time 1  M1 M2  2  3  4  11 ji | J3 j J3  5  6  Hi  7  8  9  10  FIGURE 5.11  12  13  14  15  16  17  18  Legend L ege  J2 J2 J2 JI  JI JI JI  M3  11  J2 J2 J2 JI JI JI JI  J2 J2 J2 J2 | J2 | J2  J3  The optimal minimum cost solution to the benchmark problem (reproducedfrom Figure 2.3).  When the rolling planning horizon technique is used for solving the agent-level planning problems, a sequence of auctions must also be used. In this particular case, the first auction is held for resources in the interval Time = [1, 6]. A t Time = 7, a second auction is held for resources in the interval  195  Time = [7, 11], and so on. Note that in the general case of non-deterministic actions, the second auction cannot be held prior to Time = 7 because the agents do not know what physical state they are actually in until they can observe the outcomes of their actions at that time. Since each physical state has a different resource subtree with a different set of value leaves, knowledge of physical state is required to determine the agents' reservation prices. In the deterministic case considered here, however, each agent's physical state at Time = 7 can be deduced as soon as the first auction attains equilibrium. For example, if Agent 3 purchases contracts for four units of processing on Machine M l during the first interval, then it is known with certainty that the state variable OpStatus(Opl) = complete at the end of the interval since the operation requires precisely four units of processing.  5.4.1  RESULTS OF T H E SEQUENTIAL A U C T I O N  Figure 5.12 shows the resource allocations at equilibrium for the four stages of the sequential auction. In the first stage of the auction, each agent has the ability to alter its current expected value by buying contracts for production resources in the time interval under consideration. However, production resources beyond Time = 6 are not priced in the first stage and therefore the agents must rely on their terminal reward estimates (recall Section 4.2.3) to guide their decisions. Consider, for example, the case in which Agent 1 owns no resources in the interval Time = [1, 6]. The expected value of being in Time = 7 without any processing is (according to the agent's policy tree) $46. The expected value reflects the agent's beliefs about its ability to purchase resources in the future, complete its processing before any hard deadlines, and receive its reward for shipping (i.e., the product's transfer price). Naturally, the expected value is net of both holding cost and the cost of buying contracts for production resources. Being in a different physical state at the end of the auction interval induces a different expected value. If, at the end of the first interval, Agent 1 has undergone two units of processing on M l and has thereby completed its first operation, its expected value jumps to $58. As a result, Agent 1 is willing to pay any amount up to $12 to procure contracts for two units of processing M l in the first stage of the auction. Similarly, if the agent can make its best-case purchase in each time unit so that it has started its third and final operation by the end of the interval, its expected value is $82. O f course, if the esti-  196  Auction time horizon  Allocation at time t Machine  1  2  3  4  5  6  1  2  2  2  2  1  1  1  M1 initial allocation  M2 M3 M1  1 to 6 time units  1  M2 M3  7 to 11 time units  12 to 16 time units  7  8  9  10  M1  Q  3  3  3  M2  2  2  2  M3  1  1  1  2  2  12  13  14  15  16  2  2  2  3  3  17  18  19  20  21  time units  3  M2  M1 M2 M3  FIGURE 5.12  11  M1  M3  17 to 24  1  3  Equilibrium allocations of resources at the end of the auctionsfor the rolling horizon case: Since the agents are "nearsighted", they do not converge on the minimum cost schedule.  mates of future prices are not very good or if the prices are not stable over time, the accuracy of the rolling horizon/sequential auction approach will suffer. The other agents in the auction face their own cost and reward structures; however, they have access to the same historical price information and, in this example, face the same per-unit holding cost ($1) as Agent 1. Under these circumstances, the market converges on a straightforward SPT allocation  197  over the first auction interval: Since it costs Agent 1 $4 to wait for Agent 2 or Agent 3 on Ml, but it only costs the other agents $2 to wait for Agent 1 on the same machine, Agent 1 is willing to pay the most to go first.  5.4.2  T H E PROBLEM OF  NEARSIGHTEDNESS  Although the agent's estimates of prices for resources in the future (i.e., beyond the current planning horizon) impact its willingness to pay for resources in the current auction, a problem occurs because future prices are considered only in an aggregate way. In this sense, bidders in a sequential auction are nearsighted: The agents will schedule operations optimally in the current auction stage without being able to reason at a specific level about the downstream impacts of their early decisions. To illustrate the problem, consider the decision whether to schedule J2 or J3 following JI in the first stage of the auction. Since both J2 and J3 require all the remaining time on Ml in the interval Time = [1, 6], both agents are willing to pay the same for the contracts. To make the example interesting, assume that Agent 2 bids first and purchases the resources, as shown in Figure 5.12. However, what the market does not know in the first stage of the auction is that Agent 3's second operation only requires a single unit of processing time whereas Agent 2's second operation requires three units of processing time. Thus, had the first auction stage been slightly longer (e.g., over the interval Time = [1, 8]), Agent 3 would have recognised its willingness to pay more to be scheduled first on Ml. As it turns out, by scheduling J2 first, the total cost of the schedule ($40) is higher than the known minimum cost ($39).  5.4.3  INTERPRETATION OF BENCHMARK  RESULTS  The conclusion to be drawn from Figure 5.12 is that in the infinite horizon case, the use of rolling planning horizons introduce another source of approximation and can therefore impact the optimality of the final solution. However, even when the planning horizons are short relative to the makespan of the schedule (such as the case considered here), the rolling horizon/sequential auction approach appears to provide reasonably good solutions. One reason for this is that the market is flex-  198  ible and is not necessarily limited to permutation schedules. To illustrate, consider the simple two-machine, two-job problem shown in Table 5.1 below:  TABLE 5.1: A problem in which nearsightedness is potentially costly. Job  Processing time for Op1  Processing time for Op2  J1  5  100  J2  6  2  Assuming that the first auction stage is six time units long, the agent representing JI is willing to pay more and is therefore scheduled first on M l . If a strict permutation schedule is used, the decision to schedule JI first would be extremely costly since J2 would have to wait until Jl's very long second operation is complete before being processed on M2. In this worst-case outcome, the total cost of the schedule (assuming a holding cost per-unit-time of $1) is $105 + $107 = $212. In contrast, the cost of the permutation schedule in which J2 is sequenced first is only $111 + $8 = $119. A n important feature of the market is that Agent 2 eventually recognizes the scheduling error made in the first stage of the auction. As the sequence in Figure 5.13 shows, once J2 has been processed on M l , it can preempt JI on M2. Thus, although agents in sequential auctions are prone to make nearsighted errors, they are also capable of "recovering" from the errors in subsequent auction stages. In this example, the total cost resulting from the allocation shown in Figure 5.13 is $14 + $107 = $121. Although this result is suboptimal, it is still a "good" schedule relative to the worst-case allocation selected in the first stage of the auction.  In the problems considered in this thesis, setup and changeover costs are assumed to be zero. In cases in which there are significant changeover costs, preemption and non-permutation schedules require consideration of these costs.  199  Auction time horizon initial allocation  1 to 6 time units  7 to 12 time units  13 to 18 time units  Allocation at time t Machine  1  2  3  4  5  6  1  1  1  1  2  M1 M2 M1  1  M2  1 7  s  9  10  11  M1  2  2  2  2  2  M2  1  1  1  1  1  13  14  15  16  17  2  2  1  1  1  12  1  M1 M2  1  FIGURE 5.13 An example of a problem in which the market "recovers"from nearsighted mistake in an early stage of the auction: In the third stage of the auction, the market recognises its sequencing error and J2 preemptsJI on M2.  200  CHAPTER 6: CONCLUSIONS AND RECOMMENDATIONS FOR FUTURE RESEARCH The overall vision of this program of research is to provide decision makers with a means o f exploiting powerful but inexpensive computing devices to solve large, complex problems. For example, in a manufacturing environment in which the processing of 300 jobs needs to be coordinated at any given time, a decision support system consisting of 300 agents running independently and simultaneously on a large number of computers could be responsible for determining the allocation of scarce production resources to competing uses. The agents, by constantly reacting to new information and seeking to further their own self interest would unknowingly drive the system as a whole to an efficient outcome. In environments such as manufacturing, allocative inefficiency creates an important source of deadweight social loss. Indeed, the magnitude of the gains realizable through better management of manufacturing resources is reflected in the corporate mantra of 12 Technologies Inc., a provider of advanced planning systems (APS) software for manufacturing: "l2 Technologies will add $50 billion of value, in growth and savings, for our customers by the year 2 0 0 5 . I n short, better decision making within manufacturing and other sectors of the economy can generate significant economic benefits. The market-based approach described in this thesis is directed at achieving this end.  6.1 CONCLUSIONS The research question addressed in the thesis is: Is itpossible to generate optimal solutions to large resource allocation problems using electronic markets and artificial rational agents? The framework for market-based  resource allocation that emerged as a response to the research question attempts to satisfy two seemingly contradictory design objectives: fidelity to existing economic and operations research theory and applicability to real-world, industrial-scale problems. With these objectives in mind, the methodology used to address the research question can be summarized as follows: 1. Identify the theoretical f o u n d a t i o n s o f the a p p r o a c h  —  In this research, a literal interpre-  tation of well-known economic theory is used to motivate and guide the entire framework. The key prediction provided by Walrasian microeconomics is that a socially optimal outcome 1  S a n j i v S i d h u , C E O a n d f o u n d e r o f 12 T e c h n o l o g i e s I n c . , a s q u o t e d o n t h e l2.COM w e b s i t e .  201  can be attained by permitting agents to selfishly maximize their own utility. The practical benefit of formulating the agent system in this way is that the price mechanism permits the agent-level problems to be decoupled and solved independently. Thus, economic metaphor and theory is used as a means of achieving distributed computation. 2. Develop an overall framework for market-based multi-agent planning — The three phases of the proposed framework are choice  at the agent level, and  aggregation  decomposition  of the large problem into agents,  rational  of the agent-level solutions using a market. The theoret-  ical foundations of the approach—specifically, the first fundamental theorem of welfare economics—imposes strict preconditions on the design choices made in each phase. 3. Select appropriate techniques for each phase of the framework — Given the preconditions of the first theorem, techniques from operations research, artificial intelligence, and economics are identified, refined, and synthesized to provide a concrete implementation of the the proposed framework. 4. Evaluate the performance of the implemented system with respect to the design objectives — With respect to the first design objective (fidelity to existing economic theory), the prototype system is used to demonstrate the general feasibility of satisfying the preconditions imposed by the first theorem: a.  Rationality — A t the agent-level of decision making, variations of well-established techniques from operations research and computer science are used to implement agents that are capable of generating rational policies in the face of uncertainty. The result of the planning stage is a preference ordering over resource bundles that is transitive and complete on one hand and compact on the other.  b.  Efficiency — The auction protocol used in the aggregation phase achieves market completeness by permitting agents to bid on bundles of resources. The protocol is shown to be capable of finding the correct equilibrium given various initial conditions in very small number of auction rounds. Although the simple market protocol developed in the thesis does result in complete search over the entire space of possible contracts, the results indicate that incomplete search can lead to good when there is a high degree of substitutability among resources.  With regards to the second design objective (applicability to real-world, industrial-scale problems) there is clearly much work to be done before strong claims of practicality can be made.  202  As discussed in the section on future research later in this chapter, more sophisticated proof-of-concept prototypes are required to demonstrate the benefits of market-based distributed computation for real manufacturing problems.  6.1.1  CONTRIBUTIONS OF T H E THESIS  The primary contribution made by the thesis is the development of a novel framework for using ' agents and markets to solve large stochastic optimization problems. The framework is concrete in the sense that in each phase, the design challenges are identified, solution techniques are proposed, and an implemented system is used to evaluate the proposed solutions. To the best of the author's knowl edge, no other system for multi-agent resource allocation has been constructed entirely on a foundation of economic theory. In addition, to the best of the researcher's knowledge, no comparable implementations of viable rational agents (strictly defined) and a viable combinational auction have been reported in the literature. In the context of manufacturing planning, the proposed framework offers a number of important benefits over existing approaches. These benefits are enumerated and described in Table 6.1. In addition to the primary contribution, a number of secondary contributions have been made by the thesis, as summarized in Table 6.2. The secondary contributions are novel techniques or refinements to existing techniques that were introduced to achieve the primary objectives of the market-based system. For example, the requirement to implement market-based agents necessitated a number of incremental changes to the SDP algorithm proposed by Boutilier et al. ([Boutilier, Dean and Hanks, 1999], [Dearden and Boutilier, 1997], [Boutilier, Dearden and Goldszmidt, 1995]). As a consequence, the SDP algorithm was used to solve very large M D P s (i.e., problems consisting of more than 1 0  14  states). A secondary contribution was also made in the field of auction theory in the form of a novel heuristic protocol for combinational auctions.  6.1.2  LIMITATIONS OF T H E THESIS  As discussed in Section 6.1, the principal limitation of the thesis is that it falls short of making a convincing argument that the proposed market-based framework is appropriate for industrial-scale problems. The limiting factor in this regard is clearly the size of the agent-level problems that can be solved using the current prototype. For example, since the original benchmark problem requires a  203  TABLE 6.1: Benefits of the market-based approach to manufacturing planning. Benefit scalability  Description Markets are inherently flexible s i n c e the complexity of the d e c i s i o n p r o c e s s of e a c h a g e n t representing a part is i n d e p e n d e n t of the n u m b e r of other part-agents in the s y s t e m .  flexibility  S i n c e the market runs continuously, a g e n t s c a n join or l e a v e the s y s t e m at a n y time. A high-priority product c a n enter the s y s t e m at a n y point a n d s c h e d u l e s will b e adjusted accordingly. M o r e o v e r , the a g e n t s are "contingency planners" a n d c a n react v e r y quickly to c h a n g e s in their environments.  modularity  T h e a g e n t m e t a p h o r p r o v i d e s a m o d u l a r a n d flexible m e a n s of d e c o m p o s i n g a n d m o d e l i n g large, c o m p l e x manufacturing s y s t e m s .  sensitivity to i n c r e a s e s in computational p o w e r  S i n c e the agent-level p r o b l e m s a r e s o l v e d in parallel, a n y p e r f o r m a n c e i n c r e a s e that permits larger p r o b l e m s to b e s o l v e d or longer planning h o r i z o n s to b e c o n s i d e r e d h a s a significant impact o n the overall quality of the solutions g e n e r a t e d by the m a r k e t - b a s e d s y s t e m .  optimality  T h e m a r k e t - b a s e d a p p r o a c h p r o v i d e s a n a p p r o a c h that is m o r e realistic than M R P - b a s e d a p p r o a c h e s a n d m o r e reliable (in terms of optimality) than h e u r i s t i c - b a s e d a p p r o a c h e s . A s the quality of the information provided to the p l a n n e r s i n c r e a s e s , s o to d o e s the global utility of the outcome.  price information  T h e equilibrium prices attained in the market represent the v a l u e of the r e s o u r c e to the manufacturing s y s t e m at a particular point in time. Unlike the s t a n d a r d c o s t estimates typically u s e d in c o s t a c c o u n t i n g , the market prices take into a c c o u n t all c o s t s a n d r e w a r d s (including opportunity c o s t s ) f a c e d by all of the a g e n t s in the s y s t e m . T h e information p r o v i d e d by the r e s o u r c e prices c a n b e u s e d for long-term planning activities, s u c h a s c a p a c i t y planning (e.g., p u r c h a s i n g n e w m a c h i n e s to i n c r e a s e the s u p p l y of production r e s o u r c e s that prices s h o w to b e in high demand).  non-deterministic  In the p h y s i c a l world of m a n u f a c t u r i n g , o u t c o m e s are subject to  actions a n d e v e n t s  uncertainty. In this s y s t e m , it is p o s s i b l e to represent o u t c o m e s u s i n g arbitrary probability distributions. T h i s feature is e s p e c i a l l y important in e n v i r o n m e n t s in w h i c h low probability/high impact o u t c o m e s must b e factored into d e c i s i o n s .  multi-attribute utility  T h e a g e n t utility functions a r e m o r e flexible than the single-attribute  functions  criteria typically u s e d by c o n v e n t i o n a l O R s c h e d u l i n g t e c h n i q u e s .  integrated framework for  In the b e n c h m a r k p r o b l e m u s e d here, only two t y p e s of a g e n t s are u s e d :  d e c i s i o n support  part-agents a n d a n auctioneer. H o w e v e r , the market framework permits m a n y different t y p e s of a g e n t s to interact. F o r e x a m p l e , it is p o s s i b l e to introduce m a c h i n e - a g e n t s to r e p r e s e n t the interests of production m a c h i n e s . In this way, other d e c i s i o n p r o b l e m s ( s u c h a s preventative m a i n t e n a n c e planning) c a n b e incorporated into the s y s t e m .  204  TABLE 6.2: Secondary contributions made by the thesis. Contribution  Feature  Description  structured d y n a m i c  non-propositional a n d  State v a r i a b l e s are not restricted to binary  programming  "-valued branches  v a l u e s . W h e n certain v a l u e s of a state variable  (Section 4.2)  c a n be g r o u p e d together, a " - v a l u e d b r a n c h is introduced to r e d u c e the b u s h i n e s s of the policy tree. assertion a n d constraint constructs  A s s e r t i o n s a n d constraints a r e introduced a s a m e a n s of specifying additional s e m a n t i c content about the d o m a i n b e i n g m o d e l e d . B y using a s s e r t i o n s a n d constraints, the b u s h i n e s s of the policy tree is r e d u c e d  rolling planning h o r i z o n s  T h r o u g h a p r o c e s s of b a c k w a r d induction, planning p r o b l e m s in the future a r e s o l v e d a n d then c o l l a p s e d into terminal reward e s t i m a t e s for planning p r o b l e m s in the present. In this way, d i s c o n t i n u o u s "end effects" a r e a v o i d e d .  representation of  T h e IncrementClock action permits  action-independent  a c t i o n - i n d e p e n d e n t events to b e r e p r e s e n t e d  events  a n d r e a s o n e d about during the c o u r s e of planning.  protocol for  extraction private v a l u e s  B y sorting the r e s o u r c e n o d e s of a policy tree  combinational  from policy tree  to the bottom, the private v a l u e s for r e s o u r c e s  auctions  a n d c o m b i n a t i o n s of r e s o u r c e s c a n be  (Section 4.3)  determined. r e v e r s e auction d e s i g n  T h e u s e of a R F Q - b a s e d protocol greatly. simplifies the informational requirements of the c o m b i n a t i o n a l auction s i n c e it permits the calculation of just-in-time a s k p r i c e s .  marginal utility  T h e total monetary e n d o w m e n t of e a c h a g e n t  formulation a n d price  is i g n o r e d a n d only the marginal benefit from  convention  e a c h c a n d i d a t e transaction is c o n s i d e r e d . T h i s simplifies the auction a n d eliminates s o u r c e s of market failure s u c h a s s p e c u l a t i o n .  t w o - p h a s e bidding  T h e t w o - p h a s e protocol permits a g e n t s to  protocol  e n g a g e in n-way (rather than m e r e l y 2 - w a y ) transactions.  priority list  T h e priority list for a s s i g n i n g next-best states h e l p s to limit the a m o u n t of recursion that o c c u r s during n-way t r a n s a c t i o n s .  205  decision horizon of at least 17 units of time, it had to be decomposed and solved using a combination of rolling planning horizons and sequential auctions. Although sequential auctions are shown to lead to "good" schedules, they do not provide the same guarantee of optimality that is provided in the single-horizon case.  •  '  A large part of the problem is the current prototype. In development of the prototype system, a number of binding (but poorly documented) limitations of the  VISUAL BASIC  6.0 development platform  were encountered. These limitations imposed upper bounds on problems size that could be solved that had little to do with computation time or memory. O f course, prototype performance is not the only issue. As the results in Chapter 5 show unequivocally, decision-theoretic planning (with or without the S D P algorithm) is plagued by the curse of dimensionality. The failure to achieve enough computational leverage with the coping strategies presented in the thesis does not in itself invalidate the market-based approach. The market and the agents participating in the market are separate computational issues and thus any method of solving the agent-level problems that satisfies the preconditions of rationality can be used. Decision-theoretic planning was used in the thesis—despite its well-documented scalability issues—because of its elegance and conceptual fit with the model of rationality used in the economics literature. The critical research challenge that lies ahead is to reexamine the decision-theoretic planning formulation to determine whether it should be abandoned in favor of one of many alternative approximate approaches or whether there are.other sources of computational leverage within the decision-theoretic approach that can be exploited. This issue is addressed in greater detail in Section 6,2.1.2. It is important to note that, in this early stage of research, approximation techniques and heuristics for the agent-level problem have been avoided to the greatest extent possible. The obvious reason for avoiding approximation is that it nullifies the strong claim of global optimality provided by the combination of the first fundamental theorem. A second reason for de-emphasizing practical results and focusing on the theoretical foundations of the approach is that doing so helps highlight issues that— for whatever reason—have not been widely addressed in the distributed A I literature. For example, until recently, there has been surprisingly little research activity around the issue of combination auctions in resource allocation environments . The same can be said about the issue of estimating horizon effects (or "end effects") for agent-level planning problems. If approximation techniques were  206  used more extensively in this thesis, it is unlikely that either of these important issues would have come to the fore. A second limitation of the market-based approach is that heuristics and approximation, are required to address very large, industrial-scale problems. The sources of approximation introduced by the framework are summarized in Table 6.3. TABLE 6.3: Sources of approximation in the market-basedframework Source  Description  Mitigation Strategies  Rolling  Rolling planning h o r i z o n s are required  planning  b e c a u s e increasing the length of the  minimized by exploiting i n c r e a s e s in  horizons  planning horizon h a s a n exponential  algorithmic efficiency a n d computational  effect on the p r o b l e m s i z e (primarily  power at the a g e n t level to plan o v e r  T h e effects of the approximation c a n be  through variables of the form  longer h o r i z o n s . H o w e v e r , the  Contract^M|, Tj)). H o w e v e r , by u s i n g  effectiveness of this a p p r o a c h is b o u n d e d  historical price information to estimate  by the exponential growth for the p r o b l e m .  the v a l u e of b e i n g in a certain p h y s i c a l  In addition, finer-grained r e a s o n i n g by  state at the e n d of a shorter planning  a g e n t s a b o u t future p r i c e s of r e s o u r c e s  horizon, the problem of  could l e a d to m o r e realistic estimates of  n e a r s i g h t e d n e s s (see S e c t i o n 5.4.2)  terminal r e w a r d s .  is introduced. Incomplete  S h a l l o w or m e d i u m s e a r c h depth is  F o r certain p r o b l e m d o m a i n s — s u c h a s  s e a r c h in the  u s e d in the c o m b i n a t i o n a l auction to  manufacturing s c h e d u l i n g — t h e high  combinational  eliminate or r e d u c e the a m o u n t of  d e g r e e of r e s o u r c e substitutability m a y  auction  recursion that o c c u r s in r e s p o n s e to  m e a n that the impact of incomplete s e a r c h  protocol  e a c h request for quotation. B y failing  is minimal. In other d o m a i n s , standard  to c o n s i d e r e v e r y p o s s i b l e  a p p r o a c h e s to a v o i d i n g local m a x i m a (e.g.,  transaction, there m a y be i n s t a n c e s in  simulated a n n e a l i n g , tabu s e a r c h , a n d s o  which the equilibrium attained in the  on) m a y l e a d to better results.  auction is e c o n o m i c a l