Approximate Dynamic Programming Methods for Advance Patient Scheduling by Antoine Saur´e B.Sc. Engineering, Universidad de Chile, 2000 M.Sc. Operations Research, Universidad de Chile, 2002 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Business Administration) The University of British Columbia (Vancouver) October 2012 c Antoine Saur´e, 2012 Abstract This dissertation studies an advance multi-priority patient scheduling problem. Patrick et al. [44] formulated a version of this problem as a discounted infinitehorizon Markov Decision Process (MDP) and studied it using a linear programming method based on an affine value function approximation. This thesis starts by presenting an alternative solution approach for this problem based on the use of simulation, a policy iteration framework and a non-linear value function approximation. It then extends the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. by considering patients who receive service across multiple days and for irregular lengths of time, and by allowing the possibility of using overtime on different days of the booking horizon. The research described in this dissertation is based on the hypothesis that some patients can be booked further into the future allowing the appointments for urgent patients to be scheduled earlier, and it seeks to identify effective policies for allocating available service capacity to incoming demand while reducing patient wait times in a costeffective manner. Through the use of Approximate Dynamic Programming (ADP) techniques, it illustrates the importance of adequately assessing the future impact of today’s decisions in order to more intelligently allocate capacity. Chapter 1 provides an overview of the multi-priority patient scheduling problem and a review of the literature relevant to it. Chapter 2 describes a simulationbased algorithm for solving a version of this problem and compares the performance of the resulting appointment scheduling policies against the performance of four other policies, including the one derived from the linear programming method. Chapter 3 extends the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. It presents a discounted infinite-horizon ii MDP model for scheduling cancer treatments in radiation therapy units and a lin- ear programming method for solving it. The benefits from the proposed method are evaluated by simulating its performance for a practical example based on data provided by the British Columbia Cancer Agency (BCCA). Chapter 4 describes a teaching tool developed to illustrate advance patient scheduling practices to health care professionals and students. Finally, this dissertation concludes with additional discussion, extensions and further applications. iii Preface An abbreviated version of Chapter 3 has been accepted for publication in the European Journal of Operational Research. Saur´e, A., J. Patrick, S. Tyldesley, M. L. Puterman. Dynamic multi-appointment patient scheduling for radiation therapy, 2012. I developed and implemented all the models and wrote most of the paper. Check the first page of Chapter 3 to see a footnote with similar information. iv Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1 2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . 4 A Simulation-based Approach for Dynamic Multi-priority Patient Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 MDP formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Decisions epochs . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 State space . . . . . . . . . . . . . . . . . . . . . . . . . 9 v 2.2.3 Action sets . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.4 Transition probabilities . . . . . . . . . . . . . . . . . . . 10 2.2.5 Immediate cost . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.6 Optimality equations . . . . . . . . . . . . . . . . . . . . 12 2.3 Solution approach . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Approximate policy iteration . . . . . . . . . . . . . . . . . . . . 19 2.4.1 Post-decision state formulation . . . . . . . . . . . . . . . 21 2.4.2 Approximation architecture . . . . . . . . . . . . . . . . 24 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.1 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.2 The policy evaluation step . . . . . . . . . . . . . . . . . 44 2.5.3 The policy improvement step . . . . . . . . . . . . . . . . 45 2.5.4 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . 47 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.6.1 A small example . . . . . . . . . . . . . . . . . . . . . . 48 2.6.2 A small clinic . . . . . . . . . . . . . . . . . . . . . . . . 58 2.6.3 A larger clinic . . . . . . . . . . . . . . . . . . . . . . . 62 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.5 2.6 2.7 3 Dynamic Multi-Appointment Patient Scheduling for Radiation Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3 MDP formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.3.1 Decision epochs . . . . . . . . . . . . . . . . . . . . . . 76 3.3.2 State space . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.3.3 Action sets . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.3.4 Transition probabilities . . . . . . . . . . . . . . . . . . . 78 3.3.5 Immediate cost . . . . . . . . . . . . . . . . . . . . . . . 79 3.3.6 Optimality equations . . . . . . . . . . . . . . . . . . . . 80 3.4 Solution approach . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.5.1 86 Policy insights . . . . . . . . . . . . . . . . . . . . . . . vi 3.5.2 3.6 4 5 A practical example . . . . . . . . . . . . . . . . . . . . 89 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 The Appointment Scheduling Game . . . . . . . . . . . . . . . . . . 100 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3 Game description . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3.1 Target audiences and learning objectives . . . . . . . . . . 106 4.3.2 Rounds of the game . . . . . . . . . . . . . . . . . . . . 108 4.3.3 Student Evaluation . . . . . . . . . . . . . . . . . . . . . 111 4.3.4 Supporting materials . . . . . . . . . . . . . . . . . . . . 111 4.3.5 Game variations . . . . . . . . . . . . . . . . . . . . . . 118 4.4 Our experience using the game . . . . . . . . . . . . . . . . . . . 118 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Conclusions, Extensions and Further Applications . . . . . . . . . . 121 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 A Dynamic Multi-priority Patient Scheduling with Postponements . . 134 A.1 Decision epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 A.2 State space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 A.3 Action sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 A.4 Transition probabilities . . . . . . . . . . . . . . . . . . . . . . . 136 A.5 Immediate cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 A.6 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . 138 A.7 Solution approach . . . . . . . . . . . . . . . . . . . . . . . . . . 138 B Dynamic Capacity Allocation for Long-term Care . . . . . . . . . . 142 B.1 Decision epochs and state space . . . . . . . . . . . . . . . . . . 143 B.2 Action sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 B.3 Transition probabilities . . . . . . . . . . . . . . . . . . . . . . . 146 B.4 Immediate cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 B.5 Optimality equations . . . . . . . . . . . . . . . . . . . . . . . . 147 vii B.6 Solution approach . . . . . . . . . . . . . . . . . . . . . . . . . . 147 viii List of Tables Table 2.1 Performance of Patrick et al.’s booking policy for different values of T3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 2.2 18 Size of the equivalent linear programming model for several very small instances of the problem. . . . . . . . . . . . . . . 25 Table 2.3 A comparison of booking policies for a small example. . . . . 53 Table 2.4 A comparison of booking policies for a small clinic. . . . . . . 60 Table 2.5 A comparison of booking policies for a larger clinic. . . . . . . 65 Table 3.1 Service levels for patients who received radiation therapy in British Columbia between 2004 and 2008. . . . . . . . . . . . 70 Table 3.2 Examples of treatment specifications. . . . . . . . . . . . . . . 73 Table 3.3 Characteristics of the treatment types used to evaluate the performance of the proposed methodology. . . . . . . . . . . . . 91 Table 3.4 Wait time penalties. . . . . . . . . . . . . . . . . . . . . . . . 92 Table 3.5 Simulation results. . . . . . . . . . . . . . . . . . . . . . . . . 96 Table 4.1 Patient representation. . . . . . . . . . . . . . . . . . . . . . . 105 Table 4.2 A comparison of policies 1 to 4 for the original game setting. . 117 Table 4.3 A comparison of policies 1 to 4 for the game setting but assuming a daily capacity of 4 appointment slots. . . . . . . . . . . . 117 Table 4.4 A comparison of policies 1 to 4 for the game setting but assuming that demand is generated from a Poisson distribution with mean 3 appointment requests per day. . . . . . . . . . . . . . . 117 Table 4.5 A comparison of policies 1 to 4 for a small outpatient clinic. . . 118 ix List of Figures Figure 2.1 Timeline associated with the patient scheduling decisions. . . 8 Figure 2.2 N-day rolling horizon. . . . . . . . . . . . . . . . . . . . . . 9 Figure 2.3 Types of policy evaluation algorithms. . . . . . . . . . . . . . 15 Figure 2.4 Decision-tree representation of the post-decision state concept. 22 Figure 2.5 Residual plots for data generated from the exact solution to a very small problem instance. . . . . . . . . . . . . . . . . . . 27 Figure 2.6 Alternative pre-decision state representation. . . . . . . . . . 29 Figure 2.7 Residual plots for data generated by simulating Patrick et al.’s policy for a small outpatient clinic. . . . . . . . . . . . . . . . 31 Figure 2.8 Selected partial plots from an additive regression model. . . . 32 Figure 2.9 Approximation architectures. . . . . . . . . . . . . . . . . . . 33 Figure 2.10 The logistic function. . . . . . . . . . . . . . . . . . . . . . . 35 Figure 2.11 Regions of the logistic function. . . . . . . . . . . . . . . . . 35 Figure 2.12 Simulation-based implementation of an approximate policy iteration algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 38 Figure 2.13 The decision generator module. . . . . . . . . . . . . . . . . 41 Figure 2.14 Faster implementation of the exponential function. . . . . . . 42 Figure 2.15 A policy evaluation step. . . . . . . . . . . . . . . . . . . . . 44 Figure 2.16 The generalized harmonic stepsize rule. . . . . . . . . . . . . 46 Figure 2.17 A policy improvement step. . . . . . . . . . . . . . . . . . . 47 Figure 2.18 The approximate policy iteration algorithm. . . . . . . . . . . 47 Figure 2.19 Final values of the approximation parameters for a small example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 49 Figure 2.20 Simulation results for a small example. . . . . . . . . . . . . 52 Figure 2.21 A comparison of the SS policy and LP policy for different predecision states. . . . . . . . . . . . . . . . . . . . . . . . . . 55 Figure 2.22 Final values of the approximation parameters for different values of α¯ and δ . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Figure 2.23 Final values of the approximation parameters for different values of h and m. . . . . . . . . . . . . . . . . . . . . . . . . . 58 Figure 2.24 Final values of the approximation parameters for a small clinic. 59 Figure 2.25 Simulation results for a small clinic. . . . . . . . . . . . . . . 61 Figure 2.26 Final values of the approximation parameters for a larger clinic. 63 Figure 2.27 Simulation results for a larger clinic. . . . . . . . . . . . . . . 64 Figure 3.1 Methodological framework. . . . . . . . . . . . . . . . . . . 72 Figure 3.2 Radiation therapy treatment patterns. . . . . . . . . . . . . . 75 Figure 3.3 Timeline associated with the treatment scheduling decisions. . 76 Figure 3.4 Policy insights. . . . . . . . . . . . . . . . . . . . . . . . . . 88 Figure 3.5 Graphical display of Hm . . . . . . . . . . . . . . . . . . . . . 90 Figure 3.6 Partial display of Cin for selected treatment types. . . . . . . . 93 Figure 3.7 Booking day preferences for selected treatment types. . . . . . 94 Figure 3.8 Policy comparison for selected treatment types. . . . . . . . . 95 Figure 4.1 Sample game calendar. . . . . . . . . . . . . . . . . . . . . . 104 Figure 4.2 The game board. . . . . . . . . . . . . . . . . . . . . . . . . 106 Figure 4.3 Performance of policies 1 to 4 for the original game setting. . 113 Figure 4.4 Performance of policies 1 to 4 for the game setting but assuming a daily capacity of 4 appointment slots. . . . . . . . . . . 114 Figure 4.5 Performance of policies 1 to 4 for the game setting but assuming that demand is generated from a Poisson distribution with mean 3 appointment requests per day. . . . . . . . . . . . . . 115 Figure 4.6 Performance of policies 1 to 4 for a small outpatient clinic. . . 116 xi Glossary ADP BC Approximate Dynamic Programming British Columbia BCCA British Columbia Cancer Agency CARO Canadian Association of Radiation Oncologists MDP Markov Decision Process ORMS VCC Operations Research and Management Science Vancouver Cancer Center xii Acknowledgments I would like to thank my co-supervisors Dr. Martin L. Puterman and Dr. Jonathan Patrick for their constant encouragement and insightful direction. The research presented in this dissertation would certainly not have been the same without their invaluable contribution. I am also thankful to the two other members of my committee, Dr. Maurice Queyranne and Dr. Steven Shechter, for many useful comments and suggestions through the course of my work. My education at the University of British Columbia was enriched by the interaction with other fellow students. I thank Greg Werker and Dror Hermel for their advice and kind comments, but most importantly, for their friendship. I would also like to thank all the administrators, physicians, therapists and clerks at the BCCA’s Vancouver Centre for their contribution in my research. In particular, I am grateful for the valuable insights and helpful comments provided by Mike Darud, John French and specially Dr. Scott Tyldesley. Finally, I thank my wife Marianela and my family for their unconditional support throughout these years. They provided everything that I needed to succeed. I also thank my close friends: Arturo Haro for his encouraging words over the years and Pablo Santib´an˜ ez for his unselfish help. This research was partially supported by NSERC Discovery Grant 5527-07 and by a research grant from the Canadian Institutes of Health Research (CIHR) in support of The CIHR Team in Operations Research for Improved Cancer Care. xiii Dedication I dedicate this work to my wife, Marianela, who supported me each step of the way with her love, humor and patience. xiv Chapter 1 Introduction Globally, and as a consequence of factors such as an increasing demand, constrained public and private budgets and diminishing supply of staff, health care systems are facing increasing and lengthy wait times for a wide range of services. Although in some cases delays have little medical impact on patients, in others, excessive waits can potentially deteriorate patients’ health [51]. Delays in radiation therapy, for example, are associated with tumor progression, the persistence of cancer symptoms and psychological distress [11, 35]. For this reason, health care providers and policymakers around the world are under constant political and community pressure to improve efficiency and reduce wait times. Health care systems are being seriously challenged to deliver high quality care with limited resources. In this context, the development of effective patient scheduling methodologies has become critical for health care organizations to improve patient flow, provide timely care and increase the utilization of medical resources. Through an improved match between health care resources and patient needs and a better assessment of future demand, high quality patient scheduling methods can potentially reduce patient wait times while also improving the utilization of critical resources, influencing the overall performance of health care systems. Patient scheduling methodologies may be classified as either allocation scheduling or advance scheduling. Allocation scheduling refers to methodologies for assigning specific resources and appointment times to patients, but only once all patients for a given service day have been identified. Advance scheduling, on the 1 other hand, refers to methodologies for scheduling patient appointments in advance of the service date, when future demand for service is still unknown. Advance scheduling methods usually assume that a second level of scheduling is needed which assigns patients to specific appointment times and resources. Although the implied decomposition, and the sequential determination of start times, is not necessarily optimal, not many advance scheduling methodologies address the two levels of scheduling simultaneously [27]. Most studies in the patient scheduling literature address allocation scheduling problems. Magerlein and Martin [36], Cayirli and Veral [10], Mondschein and Weintraub [40], Gupta and Denton [27], Cardoen et al. [9] and Begen and Queyranne [5] provide comprehensive reviews of this topic. The research presented in this dissertation, however, falls into the second class of scheduling problems. Advance patient scheduling decisions usually rely on the expertise of one or two bookings agents and are made without explicitly considering the impact of current decisions on the future performance of the system. The presence of highly variable demand, limited service capacity and multiple types of patients makes it extremely challenging for booking agents to adequately assess the real impact of their actions in order to more intelligently allocate capacity. The lack of an ability to predict the actual impact of scheduling decisions generates several inefficiencies that usually translate into unnecessary delays, an unsystematic prioritization of patients and inefficient resource utilization. In this thesis, we acknowledge the importance of timely access to care and the need for more efficient ways of using medical resources as well as the difficulty of developing effective patient schedules through simpler means and describe two Approximate Dynamic Programming (ADP) methods for solving advance patient scheduling problems. We also recognize the importance of the dissemination of patient scheduling concepts and techniques and present the Appointment Scheduling Game (ASG), a teaching tool we developed to illustrate advance patient scheduling practices to health care professionals and business students. The research in this dissertation is based on the hypothesis that some patients can be booked further into the future allowing the appointments for more urgent patients to be scheduled earlier and seeks to identify good policies for allocating available service capacity to incoming demand, while reducing patient wait times 2 in a cost-effective manner. Rather than assigning patients to specific appointment times, our models allocate available capacity to different patient types. Once capacity is allocated, a second level of scheduling, which is not addressed in this dissertation, is needed to assign patients to specific appointment slots. 1.1 Thesis outline The contents of this thesis are organized into five chapters. Chapter 1 includes a brief introduction, a thesis overview and a review of the literature relevant to our work. In Chapter 2, we describe a simulation-based algorithm for solving a simple version of the advance patient scheduling problem. In Chapter 3, we present a discounted infinite-horizon Markov Decision Process (MDP) model for scheduling cancer treatments in radiation therapy units and describe a solution approach based on linear programming. In Chapter 4, we provide more details about the ASG. Finally, in Chapter 5, we summarize the contents of this thesis, present our main conclusions and suggest some potential extensions to our research. In Chapter 2, we first characterize the patient scheduling problem under study. We then formulate a simple version of it as a discounted infinite-horizon MDP model and describe the development and implementation of a simulation-based approach for solving it. The proposed method is based on the classical dynamic programming policy iteration algorithm and uses a generalized logistic function in terms of the post-decision state variable to approximate the discounted cost-to-go function. After providing a detailed description of the different modules and steps of the algorithm, we compare the performance of the resulting scheduling policies against the performance of four other policies, including the one derived by Patrick et al. [44] using a linear programming approach, for three different problem instances. To the best of our knowledge, this is the first time a parametric non-linear approximation architecture is used to optimize patient scheduling policies in a dynamic setting. This is also the first study to compare the use of two popular approximate dynamic programming techniques, the linear programming approach and the simulation-based approach, for the same problem. In Chapter 3, we extend the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. by considering patients who re- 3 ceive service across multiple days and for irregular lengths of time, and by allowing the possibility of using overtime on different days of the booking horizon. We formulate and solve a discounted infinite-horizon MDP model for scheduling cancer treatments in radiation therapy units. We use an affine architecture to approximate the value function in our formulation and solve an equivalent linear programming model through column generation to obtain an approximate optimal policy for this problem. We evaluate the benefits from the proposed method by simulating its performance for a practical example based on data provided by the British Columbia Cancer Agency (BCCA). To our knowledge, no previous research has considered multi-appointment requirements when optimizing advance multi-priority patient scheduling policies in a dynamic setting. Chapter 3 also constitutes a novel application of approximate dynamic programming to a problem that has received limited attention in the operations research literature. Even though games, puzzles and paradoxes have been long recognized as important tools for developing educationally rich material, the literature in Operations Research and Management Science (ORMS) in general – and introductory textbooks in particular – scarcely makes use of them. This is mainly because realistic problems tend to be too complex and therefore not particularly suitable for teaching material. In Chapter 4, we describe in detail the Appointment Scheduling Game. The game introduces the problem faced by a booking agent in an appointment scheduling office and illustrates the main patient scheduling challenges in systems with multiple types of patients, uncertain demand and limited capacity. This chapter starts with a literature overview of the use of classroom games to teach ORMS concepts. We then describe the game’s target audience and learning objectives. The chapter ends with some comments on our past experiences using the game and a discussion of the challenges and lessons we have learned developing the game. To the best of our knowledge, this is the first game designed to focus on patient scheduling and its challenges. 1.2 Literature review Even though patient scheduling problems have been studied extensively over the last decade, the dynamic allocation of medical capacity in advance of the service 4 date and in the presence of multiple types of patients has received limited attention. Rising et al. [50] describe a case study where simulation models are used to evaluate alternative decision rules for scheduling appointments in an outpatient clinic. The authors consider two types of patients, walk-ins and patients with advance appointments, and focus on the effects of different booking rules on patient throughput and physician utilization. Gerchak et al. [24] study the advance scheduling of elective surgeries at an operating room when a random portion of the daily capacity is devoted to emergency surgeries. They determine the optimal number of elective patients to accept each day and analyze the tradeoffs between capacity utilization and delays. They prove that the optimal capacity allocation policy is not a strict booking limit, but one where the number of elective patients accepted increases in conjunction with the number of patient waiting, and conclude that the computational effort required to find the optimal cutoff policy appears to be considerably greater than that required to find the optimal policy. Lamiri et al. [32] extend the model developed by Gerchak et al. [24] by specifying the set of elective surgeries to be performed in each day of a planning horizon and propose a solution method combining Monte Carlo simulation with mixed-integer programming. Vermeulen et al. [62] develop a dynamic although case-specific method for scheduling CT-scan appointments at a radiology department. They show through the use of simulation that simple rules based on comparing available and needed capacity for different patient groups allow a significant improvement in the number of patients scheduled on time. Sch¨utz and Kolisch [53] adopt a revenue management approach to address the problem of determining whether or not to accept requests for examinations on an MRI scanner. Although they consider a single booking period, requests coming from different patient types are allowed in advance and during the examination day. Most related to our work are the papers of Erdelyi and Topaloglu [22] and Patrick et al. [44]. They study a problem that involves allocating a fixed amount of daily capacity among entities with different priorities. Erdelyi and Topaloglu consider a finite planning horizon and focus on a class of policies that are characterized by a set of protection levels, whereas Patrick et al. consider an infinite planning horizon and seek optimal scheduling policies using the linear programming approach to ADP. In both cases, the authors assume that each entity to be served requires only one unit of capacity. 5 The majority of the contributions from the operations research community to radiation therapy, the topic of Chapter 3, are associated with treatment planning [31, 33, 39]. Some work has also been done in the scheduling of pre-treatment activities [45, 64]. The radiation therapy appointment scheduling problem, in particular, has attracted only limited attention. Conforti et al. [15] and Conforti et al. [14] describe integer programming models intended to support radiation therapy scheduling decisions. They adopt a block and a non-block strategy, respectively. Petrovic and Leite-Rocha [46] consider two scheduling rules, forward and backward, and vary the system configuration to investigate the effect of parameters such as target times, protection levels and the decisions timing on the performance of the system. They also describe a heuristic algorithm to try to improve the schedules generated through these two rules. The methodologies in these three papers include both multi-priority and multi-appointment requirements but differ from ours in that their models are static and thus provide an operational and myopic solution. By myopic we mean a solution that considers only what is known at the present, ignoring the impact of today’s decisions on the future. In contrast, we provide a dynamic policy that integrates future cost over an infinite horizon. We choose an infinite planning horizon because we are interested not only in generating specific patient schedules but also in understanding the properties of good booking policies. To the best of our knowledge, the research presented in this dissertation makes a significant contribution to the literature in patient scheduling. First, it describes an alternative approach for solving a dynamic multi-priority patient scheduling problem. Second, it considers the use of a parametric non-linear functional form to approximate the value function in the MDP formulation of this problem. Third, it compares the performance of patient scheduling policies derived using two alternative ADP approaches, the simulation-based approach and a linear programming approach, for the same problem. Fourth, it extends the MDP model and solution approach developed by Patrick et al. [44] by considering multi-appointment capacity requirements and by allowing the possibility of using overtime on different days of the booking horizon. The application of this model to the radiation therapy patient scheduling problem also constitutes a novel application of approximate dynamic programming to a problem that has received limited attention. 6 Chapter 2 A Simulation-based Approach for Dynamic Multi-priority Patient Scheduling In this chapter, we describe a discounted infinite-horizon Markov Decision Process (MDP) to dynamically schedule patients with different priorities to a health care facility and a simulation-based method for solving it. The proposed solution method is based on the classical dynamic programming policy iteration algorithm and uses a parametric non-linear value function approximation architecture. A comparison of different booking policies and two solution techniques is included. 2.1 Problem description We consider a system similar to that of Patrick et al. [44] that has the capacity to perform C fixed-length procedures a day. The capacity of the system is measured in discrete units called appointment slots. Each day, the booking agent in charge of the system receives service requests from a pre-specified set of I different patient types. Patients are classified into types according to their priorities, each priority having a different medically acceptable wait time. Each patient requires only one appointment slot. We assume the number of service requests of type i the agent receives in a given day follows an independent Poisson distribution with mean mi 7 requests per day (mi > 0). Patient types and demand distributions do not change over time and are uncorrelated. Appointment scheduling decisions are made once a day and the booking agent may schedule patients at most N days in advance. To relieve excess demand, the booking agent has the ability to divert patients to an alternative capacity source at a cost of h per patient. The cost associated with booking a priority i patient on day n (from today) is represented by cin . We assume that cin is non-increasing in i (i.e. patients with a smaller index i are more urgent), non-decreasing in n and equal to zero if n ≤ Ti , where Ti is the wait time target (medically acceptable wait) associated with a patient of priority i. For simplicity, we assume that all patients waiting to be booked on a given day must be either scheduled or diverted, and consequently that there is no upper limit on the number of diversions. Neither rescheduling nor cancelations are considered. At the beginning of each day, the booking agent observes the number of appointment slots scheduled on each of the subsequent N days. He/she then waits until the end of the day to learn the number of patients of each type waiting to be booked and decides how to allocate the available service capacity among the different patient types. Figure 2.1 depicts the timeline associated with this problem. Figure 2.1: Timeline associated with the patient scheduling decisions for a given day k. 2.2 MDP formulation In this section, we formulate the multi-priority patient scheduling problem described above as a discounted infinite-horizon MDP model. The main objective 8 of this model is to identify an effective way of allocating capacity to incoming demand while reducing patient wait times in a cost-effective manner. We base our formulation on the dynamic multi-priority patient scheduling model developed by Patrick et al. [44]. 2.2.1 Decisions epochs Booking decisions are made at the end of each day over an infinite horizon. 2.2.2 State space At the end of each day, the booking agent has access to the current schedule from today to the end of the booking horizon as well as the number of patients of each type waiting to be booked. A state of the system, denoted by s ∈ S, is represented by a vector s = (u, w) = (u1 , . . . , uN , w1 , . . . , wI ), where un is the number of appointment slots already booked on day n and wi corresponds to the number of patients of type i waiting to be booked. As a consequence of considering an N-day rolling horizon1 , the number of appointment slots booked on day N of any valid state of the system is equal to zero. That is uN = 0 (see Figure 2.2 for further clarification). Figure 2.2: Relationship between the decision epochs and the finite booking horizon. The booking horizon is not static but rolling, day n at the current decision epoch becomes day n−1 at the subsequent decision epoch. 1 Day n at the current decision epoch becomes day n − 1 at the subsequent decision epoch. 9 2.2.3 Action sets At the end of each day, the booking agent must decide on which day to schedule each of the patients waiting to be booked. Alternatively, he/she may divert some patients to an alternative capacity source. An action available to the agent is represented by a vector a = (x, y) = (x11 , . . . , x1N , x21 , . . . , x2N , . . . , xI1 , . . . , xIN , y1 , . . . , yI ), where xin is the number of patients of type i booked today on day n and yi corresponds to the number of patients of type i being diverted. Note that rather than assigning patients to specific appointment slots, the model allocates available daily capacity to each patient type. Once capacity is allocated, a second level of scheduling is needed to assign patients to specific appointment times. The set of feasible actions compatible with state (u, w) ∈ S, denoted by A(u,w) , must satisfy the following constraints: N ∑ xin + yi = wi ∀i (2.1) ∀n (2.2) n=1 I un + ∑ xin ≤ C i=1 Constraint (2.1) restricts the number of bookings and diversions for each patient type i to be equal to the number of patients of that type waiting to be booked. Constraint (2.2) limits the total number of appointment slots booked today for day n to be less than or equal to the available service capacity that day. Note that the number of patients of type i being diverted can be written as yi = wi − ∑Nn=1 xin ∀i. All actions are non-negative and integer. 2.2.4 Transition probabilities Once actions are taken, the only source of uncertainty in the transition to the next state of the system is the new demand for service. As a result of choosing booking action a = (x, y) in state s = (u, w), a ∈ As and s ∈ S, and having qi new service requests from patients of type i, the state of the system the next day, denoted by s = (u , w ) is determined by the following probability distribution: 10 p(s |s, a) = I ∏ Pr(qi ), if s satisfies equations (2.4) and (2.5); i=1 0, otherwise. I u + x n+1 i(n+1) , n < N; ∑ un = i=1 0, n = N. (2.4) ∀i wi = qi (2.3) (2.5) Equation (2.4) defines the new number of appointment slots booked on day n as a function of the number of slots previously booked on day (n + 1) plus all new bookings that affected that day. Equation (2.5) specifies the new number of patients of each type waiting to be booked as the number of new requests of each type. The term Pr(qi ) in (2.3) represents the probability of having qi new service requests from patients of type i. We assume daily demand for each patient type follows an independent Poisson distribution that does not change over time. 2.2.5 Immediate cost The immediate cost associated with choosing action a = (x, y) in state s = (u, w), a ∈ As and s ∈ S, comes from two sources: the cost associated with the resulting patient wait times and the cost associated with possible patient diversions. We formulate the immediate cost as in (2.6), where cin represents the cost (if any) associated with booking a priority i patient on day n and h is the diversion cost. I c(s, a) = ∑ N I ∑ cin xin + ∑ hyi i=1 n=1 ∀s ∈ S, a ∈ As (2.6) i=1 Although we do not consider postponements, we adopt Patrick et al.’s definition of cin in (2.7) to make results comparable. We define cin as a function of gi , the cost associated with delaying a patient i’s booking, and λ , the discount factor. The value of gi corresponds to the cost of booking a patient of type i one day late. It can be (loosely) interpreted as a daily late booking penalty. 11 cin = 0, n ≤ Ti ; n ∑ λ k−T −1 gi , i n > Ti . ∀i, n (2.7) k=Ti Since gi is decreasing in i and Ti is increasing in i, the value of cin is nonincreasing in i and non-decreasing in n. The values of the booking cost, although arbitrary, allow us to approximate the different wait time targets. It could be argued that the booking cost could be better represented by a cost function that increases at a faster rate in n. However, Patrick et al. experimented with such a cost function and discovered no difference in terms of the policy dictated by their model. An alternative cost function determined in relation to existing guidelines for acceptable waits and taking into consideration estimates of the importance of timely service for different patient types is used in Chapter 3. The function in (2.6) balances the cost to the patient in wait time and the cost to the system in having to resort to an alternative capacity source. Determining the specific values to assign to gi and Ti is difficult and therefore it should be done by a panel of medical experts in relation to existing guidelines and taking into consideration the impact of delays on patients’ health. Quantifying the value of the diversion cost can also be challenging. However, the most obvious source of surge capacity is overtime, in which case there exists a specific cost value. 2.2.6 Optimality equations The value function in our formulation, denoted by v(s), corresponds to the expected total λ -discounted cost over the infinite horizon. Given a stationary policy d ∞ , which uses decision rule d every decision epoch t, v(s) can be defined as follows: v(s) ≡ Eds ∞ ∞ ∑ λ t−1 c(st , d(st )) ∀s ∈ S (2.8) t=1 Clearly, we are not so much interested in determining the value function for a given policy as in finding an optimal stationary policy. To identify such a policy 12 we should solve the following optimality equations2 : v(s) = min c(s, a) + λ a∈As ∑ p(s |s, a)v(s ) ∀s ∈ S (2.9) s ∈S The challenge with this formulation is that even for very small settings, the size of the state space and the size of the corresponding action sets make the problem of deriving v impossible. The state variable s = (u, w) and the action variable a = (x, y) have (N + I) and (I × N + I) dimensions, respectively. Assuming that wi can take up to Qi possible values, this means that we might have up to (N+1) (C + 1)N × ∏Ii=1 Qi different states and ∏Ii=1 Qi different, although not neces- sarily feasible, actions. For example, a very small setting with a capacity of six appointment slots per day, three patient classes with Q = (9, 6, 3) and a twelve-day booking horizon involves more than 1012 possible states and 1028 potential actions. 2.3 Solution approach Fortunately, a large number of potential methods for dealing with the problem caused by the exponential increase of the state space and the number of available actions, called Approximate Dynamic Programming (ADP), has been developed in the last couple of decades [8, 47, 59]. The literature in this field focuses primarily on two types of methods: simulation-based algorithms and linear programming methods. Simulation-based methods are classified into two types: policy evaluation algorithms, which deal with the approximation of the cost-to-go function of a single policy and can be embedded within a policy iteration scheme, and Qlearning algorithms, which deal with the approximation of the optimal cost-to-go function (not just the cost-to-go function of a single policy). Q-learning methods maintain and update for each state-action pair an estimate of the expression that is minimized in the right-hand side of the optimality equations. The most common approach within policy evaluation algorithms is to use an approximation architecture to represent the value function (cost-to-go function) in the dynamic programming formulation of a problem. The approximation can be constructed or 2 Note that c(s, a) and p(s |s, a) are given by (2.6) and (2.3), respectively. 13 updated repeatedly for a sequence of policies, in the context of a policy iteration scheme, or, alternatively, be used to construct an approximate form of the optimality equations. The first class of policy evaluation methods, called direct methods, use simulation to generate value function estimates for different states of a system and fit a pre-defined approximation architecture to the cost-to-go samples using some normed error3 . Direct methods allow for non-linear value function approximations and flexibility in the collection of cost-to-go samples. The second and most popular class of policy evaluation methods, called indirect methods, determine the optimal value function approximation within a specific class of functions, usually a linear combination of basis functions, by approximating the solution of the optimality equations on the subspace spanned by the approximation architecture (the resulting problem is viewed as a form of projected optimality equations). Some of the most popular indirect methods for discounted problems are the temporal differences method, the least squares temporal differences method and the least squares policy evaluation method – TD(λ ), LSTD(λ ) and LSPE(λ ), respectively. Direct and indirect methods differ in speed of convergence and in their suitability for different problem settings. They also share a common bottleneck, the slow speed of simulation. Figure 2.3 illustrates the direct and indirect approach for a value function approximation defined as a linear combination of K basis functions. Starting with Tesauro [60], who studied the game of backgammon, simulationbased algorithms have been applied applied to several problems, including jobshop scheduling [65], preventive maintenance scheduling [16], call admission control [37, 57], revenue management [25, 55], order acceptance [29] and driver/truck allocation problems [48, 56]. Linear programming methods are based on formulating the optimality equations of a dynamic programming model as a linear program and on choosing an approximation architecture to represent the variables (value function) in it. This approach was initially proposed by Schweitzer and Seidmann [54] and has been reconsidered by de Farias and Van Roy [17, 18], Adelman and Mersereau [2], Patrick et al. [44], Desai et al. [21] and Adelman and Klabjan [1]. An appropriately chosen 3 Note that this is different than the sense of “direct” method used by Maxwell et al. [38]. The authors of this recent paper propose “direct search” optimization methods as a potential alternative to value function fitting procedures. 14 Figure 2.3: Two methods for approximating the value function vd of a stationary policy d as a linear combination of K basis functions (subspace V = {Φr|r ∈ RK }). In the direct method (left), vd is projected on V . In the indirect method (right), the approximation is found by solving Φr = ΠLd (Φr), a projected form of the optimality equations. Π denotes projection with respect to a suitable norm on V and Ld is the linear operator associated with policy d (source Bertsekas [7]). approximation architecture reduces the number of variables in the linear program. However, due to the large number of constraints, it is often necessary to use techniques such as constraint sampling or column generation to make a solution to the linear program possible. Linear programming methods are particulary useful when the expectation in the optimality equations can be computed exactly. Patrick et al. [44] adopted the linear programming approach to solve a slightly more general version of the multi-priority patient scheduling problem described in Section 2.1. Their version included the possibility of postponing to the next day the scheduling decisions for some patients and an upper limit on the number of diversions. In their solution approach, they made use of the affine approximation architecture shown in (2.10) to produce a linear program with a manageable number of variables and solved its dual through column generation. 15 N I U, W ≥ 0,W0 ∈ R v(u, w) = W0 + ∑ Un un + ∑ Wi wi n=1 ∀(u, w) ∈ S (2.10) i=1 Using the linear programming approach to ADP, they were able to identify, under some conditions, an analytical solution for the optimal values of the approximation parameters in (2.10). The solution is given by the following equations: λC λ Ti −T1 mi − T1C − 1 − λ 1 −λ i=1 I W0∗ = h λ ∑ (2.11) 1 ≤ n ≤ T1 ; h, ∗ ∗ Un = λUn−1 , T1 < n < N; 0, n = N. (2.12) Wi∗ = UT∗i (2.13) ∀i Patrick et al. also concluded the following. First, if excess capacity is present and the conditions mentioned above are violated, then the myopic policy becomes optimal. Second, if insufficient capacity is provided and the conditions are violated, then there is no definable solution for the optimal values of the approximation parameters (the solution is identically zero) and the consequent patient scheduling policy becomes a myopic policy. We refer the reader to [44] for more details. Nevertheless, we have found that the performance of the patient scheduling policy derived from the analytical solution above, which is described next, is very good independently of whether the conditions are satisfied or not. Using the optimal value function approximation given by (2.11), (2.12) and (2.13), Patrick et al. derived a booking policy which they evaluated using simulation. Their policy applied to the problem setting described in Section 2.1 translates into the following scheduling guidelines: (1) Book patients in increasing priority order; (2) Book as much type 1 demand as possible into the day interval [1, T1 ], starting with day one and working up to day T1 ; 16 (3) For each successive patient type i, book incoming demand into any available appointment slot in the day interval [1, Ti ] starting with day one, then day Ti , and working down to day two; (4) Divert any remaining demand. The main drawback of this booking policy is the sometimes unnecessary long wait times for lower priority patients, especially if the wait time targets are set too far back. Unless there is available capacity on day one, lower priority patients will be booked on their target dates, regardless of the level of congestion in the system. It is unlikely that day Ti will be filled by lower priority patients since higher priority demand has yet to be booked into that day. Therefore, unless the system is significantly under capacitated, shorter waiting times for some lower priority patients could be achieved at the expense of a slight increase in the expected discounted cost by booking these patients earlier or, equivalently, by choosing shorter wait time targets. This does not necessarily mean that the actual (medically defined) wait time targets should be changed, it means that the booking intervals defining the policy could be modified in order to avoid unnecessary delays. Note that most of time wait time targets are not a lever one could play with. For example, consider a small outpatient clinic with a capacity of ten appointment slots per day. The clinic divides demand into three priority classes with wait time targets of 7, 14 and 21 days, respectively, and chooses a 21-day booking horizon. Demand from each priority class is Poisson with means 5, 3 and 2 patients per day, respectively. The overtime cost is 100 and the late booking penalties are 20, 10 and 5. The discount rate is 0.99. Table 2.1 shows the performance of Patrick et al.’s patient scheduling policy in terms of the mean average wait times and the mean discounted cost for different values of T3 . The 95% confidence intervals provided in this table were obtained by simulating the evolution of the system for 2,500 days with statistics collected after the first 1,000 days. Each scenario was run 5,000 times. Table 2.1 shows how lower values of T3 reduce the mean average wait time for the lowest priority patient type while maintaining the mean average wait times for higher priority types and the mean discounted cost almost unchanged. For example, setting up T3 at 15 days instead of 21 days reduces the mean average wait time for type 3 patients from 19.96 to 14.27 days (28.5%) while slightly increasing 17 Table 2.1: Performance of Patrick et al.’s policy for a small outpatient clinic and different values of T3 (T1 = 7 and T2 = 14). Wait time target type 3 [days] 21 20 19 18 17 16 15 Mean average wait times [days] Type 1 Type 2 Type 3 3.07 ± 0.01 3.08 ± 0.01 3.08 ± 0.01 3.09 ± 0.01 3.09 ± 0.01 3.11 ± 0.01 3.12 ± 0.01 12.41 ± 0.02 12.40 ± 0.02 12.41 ± 0.02 12.41 ± 0.02 12.41 ± 0.02 12.42 ± 0.02 12.44 ± 0.02 19.96 ± 0.01 19.01 ± 0.01 18.06 ± 0.01 17.11 ± 0.01 16.16 ± 0.01 15.21 ± 0.01 14.27 ± 0.01 Mean discounted cost [$] 1, 027.31 ± 33.32 1, 030.30 ± 33.29 1, 032.23 ± 33.23 1, 038.45 ± 33.29 1, 042.93 ± 33.34 1, 049.39 ± 33.34 1, 056.75 ± 33.39 the mean average wait time for type 2 patients from 12.41 to 12.44 days (0.2%), the mean average wait time for type 1 patients from 3.07 to 3.12 days (1.6%) and the mean discounted cost from 1,027.31 to 1,056.75 (2.9%). This result allows us to conclude that simultaneous reductions in the wait time targets for lower priority patients may provide significant reductions in the mean average wait times for these patients at the expense of a slight increase in the mean discounted cost. For example, setting up T2 at 12 days and T3 at 15 days, while maintaining T1 at 7 days (scenario not shown in the table), reduces the mean average wait times from 12.41 and 19.96 days to 10.62 and 14.24 days for patient types 2 and 3, respectively. This with an increase in the mean average wait time for type 1 patients of 0.05 days and an increase in the mean discounted cost of 4.924 . Patrick et al.’s policy fails to take into account the fact that the marginal discounted cost associated with booking a patient on a given day should increase as the number of available slots on that day decreases, or more generally, the fact that the marginal discounted cost should depend on the total number of appointment slots booked throughout the system (system workload). This limitation is a direct consequence of the type of approximation architecture from which the policy was derived. Under an affine approximation, the cost associated with reducing the number of available slots on a given day from one to zero is the same as the cost associated with reducing the number of available slots from ten to nine. In the remainder of this chapter, we describe a simulation-based algorithm for 4 All differences are statistically significant. 18 solving (approximately) the dynamic multi-priority patient scheduling model presented in Section 2.2. The algorithm, a direct policy evaluation method, is based on a policy iteration framework and uses a non-linear functional form to approximate the cost-to-go function. The method was developed as part of our efforts to answer the following questions: • Is the policy obtained from the linear programming approach, using an affine value function approximation on the states variables, optimal? • Can the value function be better represented by a different type of approximation architecture? • Does a different approximation architecture provide a better booking policy for this problem? • What are the advantages and disadvantages of using a simulation-based algorithm versus a linear programming method to approximately solve patient scheduling problems? To the best of our knowledge, this is the first time that an explicit non-linear approximation architecture in the approximation parameters and the state variables is used to optimize patient scheduling policies in a dynamic setting. This is also the first study in comparing the use of a simulation-based approach with the use of a linear programming approach for the same problem. 2.4 Approximate policy iteration The classical dynamic programming policy iteration method is one of several algorithmic strategies for solving discounted infinite-horizon problems [49]. The algorithm starts with an arbitrary policy d 0 and iteratively generates a sequence of new policies d 1 , d 2 , . . . until convergence is achieved. In each iteration k, the algorithm first performs a policy evaluation step that computes vk (s) ∀s ∈ S as the solution of the system of linear equations in (2.14) and then a policy improvement step that determines a new policy d k+1 according to (2.15). The process is repeated until d k+1 (s) = d k (s) ∀s ∈ S, in which case the optimal policy d ∗ is d k . 19 vk (s) = c(s, d k (s)) + λ ∑ p(s |s, d k (s))vk (s ) ∀s ∈ S (2.14) ∀s ∈ S (2.15) s ∈S d k+1 (s) ∈ arg min c(s, a) + λ a∈As ∑ p(s |s, a)vk (s ) s ∈S The general structure of an approximate policy iteration algorithm is the same as the general structure of the classical policy iteration method. In each iteration k, however, we compute an approximate value function v k (s) rather than the exact value function vk (s), s ∈ S. This is usually done by updating the value of the approximation parameter defining the class of functions (approximation architecture) chosen to represent the value function in the problem at hand5 . Then, the algorithm generates a new policy d k+1 which is greedy with respect to v k (s) as stated in (2.16). In some cases, the policy improvement step can be done exactly, in others, an approximate or heuristic solution to (2.16) is needed. d k+1 (s) ∈ arg min c(s, a) + λ a∈As ∑ p(s |s, a)v k (s ) ∀s ∈ S (2.16) s ∈S In practice, an initial stationary policy d 0 is needed to run either version of the policy iteration algorithm. A suitable choice of d 0 may be some heuristic guess based on intuition or an existing policy for the same or a similar problem. It is recommendable to choose this policy carefully since the rate of convergence of the algorithm depends, among other things, on it. The implementation of a generic approximate policy iteration algorithm, as suggested by Bertsekas [7], usually consists of five modules: an initialization module, a simulator, a decision generator, a value function approximator and a least squares solver. Below is a brief description of each module. • The initialization module. Given a probability distribution or a booking policy, it generates a state s0 ∈ S from which a sample trajectory is generated. 5 Note that we use a tilde to indicate a function that approximates the corresponding function without tilde. 20 In some cases, when a booking policy is provided, this module generates cost-to-go samples to be used by the least squares solver to obtain an initial set of approximation parameter values. • The simulator. Given a state s ∈ S and an action a ∈ As , it determines the next state of the system s . • The decision generator. Given the current state of the system s, it finds an action a∗ ∈ As that is greedy with respect to the current value function approximation v(·). This module solves the right-hand side of the approximate optimality equations. • The value function approximator. Given a state of the system s ∈ S, it provides the decision generator with a value function (cost-to-go) estimate v(s). • The least squares solver. Given a set of sample trajectories generated by the simulator module, it fits a pre-defined approximation architecture to the cost-to-go samples in some normed error sense. The norm usually used is the Euclidean norm, in which case this module solves a least squares problem to obtain a new value function approximation. Before providing more details about the actual approximate policy iteration algorithm used to solve the multi-priority patient scheduling problem described in Section 2.1, we introduce an alternative dynamic programming formulation of the problem based on the definition of a post-decision state variable. We also summarize our efforts to identify a better functional form to approximate the value function for this problem. 2.4.1 Post-decision state formulation The first step prior to implementing our algorithm was to formulate the optimality equations in (2.9) in terms of the post-decision state variable. The post-decision state variable for this problem, denote by s x , corresponds to the state of the system immediately after decisions are made. Sutton and Barto [59] introduced this idea, as the after-state variable, in the context of a tic-tac-toe example to represent the game board after one player has made his move but right before his opponent has 21 moved. We adopt the term post-decision state, first introduced by Van Roy et al. [61] and recently reconsidered by Powell [47], because it emphasizes that there is no new information in the post-decision state. Figure 2.4 illustrates this concept for the patient scheduling problem under study. Figure 2.4: Decision-tree representation of the post-decision state concept. Decision nodes (squares) correspond to pre-decision states s ∈ S and outcome nodes (circles) represent post-decision states s x ∈ Sx . A post-decision state formulation is particularly useful for patient scheduling problems because many different booking actions produce exactly the same postdecision state and because the problem of identifying the optimal booking actions at a given decision epoch becomes a deterministic optimization problem. The value of using post-decision state variables is mostly computational; it considerably reduces the size of the state space and avoids the need to compute the expectation in the right-hand side of the optimality equations (a task often intractable). The expectation is already captured in the formulation. Thus, simulation-based algorithms designed in terms of post-decision state variables are more efficient for large-scale problems like the one at hand because they are often less complex, simplifying the challenge of creating good approximations. In our setting, the post-decision state s x ∈ Sx is represented by a vector ux = 22 (ux1 , . . . , uxN ), where uxn corresponds to the number of appointment slots booked on day n right after taking the optimal action a∗ = (x∗ , y∗ ) ∈ As in pre-decision state s ∈ S. To model the evolution of the system from a pre-decision state s = (u, w) ∈ S to a post-decision state s x = ux , and from a post-decision state s x = ux ∈ Sx to a pre-decision state s = (u, w), we introduce three transition equations. I ∗ uxn = un + ∑ xin ∀n (2.17) i=1 un = uxn+1 , n < N; 0, (2.18) n = N. ∀i wi = qi (2.19) Equation (2.17) defines the next post-decision state as the current appointment schedule plus all new bookings. Equations (2.18) and (2.19) characterize the next pre-decision state as the system schedule at the subsequent decision epoch plus the new demand for service. The value of qi in (2.19) corresponds to the number of new service requests from patients of type i. Note from (2.18) that day n at the current decision epoch becomes day n − 1 at the subsequent decision epoch and therefore that the N-th day of a pre-decision state has no bookings in it. Let vx (s x ) be the value of being in post-decision state s x ∈ Sx . The relationship between this function and v(s), the value function defined in (2.9), can be stated as follows: v(s) = min {c(s, a) + λ vx (s x )} ∀s ∈ S (2.20) ∀s x ∈ Sx (2.21) a∈As vx (s x ) = ∑ p(s|s x )v(s) s∈S We will use discrete event simulation and the fact that the right-hand side of Equation (2.20) becomes a deterministic optimization problem to introduce a nonlinear approximation to vx (sx ) and obtain a very efficient patient scheduling policy. 23 2.4.2 Approximation architecture The second step prior to implementing our algorithm was the choice of an approximation architecture to represent the value function in terms of the post-decision state variable. The following sections summarize our efforts to identify the true form of this function. We start by reviewing our conclusions from a regression analysis we performed based on data obtained from the exact solution to very small problem instances. We then describe the use of the linear programming approach and an alternative value function approximation (affine in the state of individual appointment slots) to gain some additional intuition into the actual form of the value function in terms of the pre-decision state variable. To this end, we model the state of the system as a set of binary variables ukn , where ukn is equal to one if appointment slot k on day n is already booked, and zero otherwise. Subsequently, we present our conclusions from a regression analysis involving cost-to-go estimates obtained by simulating the evolution of the scheduling system under Patrick et al.’s policy for several initial post-decision states. Finally, this section ends with a justification of our choice of approximation architecture. All regression analyses discussed in this section were performed using R, a free software for statistical computing and graphics. Our initial conjecture was that v(s) was non-decreasing and convex in the number of appointments booked on each day of the booking horizon. We expected the marginal discounted cost associated with booking a patient on a given day to increase as the number of available slots on that day decreases. In other words, we expected the cost associated with reducing the available capacity on a given day from one to zero to be higher than the cost associated with reducing the available capacity on that day from ten to nine. It also seemed reasonable to think that the marginal discounted cost should depend on the total number of appointment slots booked throughout the system and not only on the number of appointment slots booked on each day independently. Exact solutions to very small problem instances In an initial effort to gain some intuition into the actual form of the true value function in terms of the pre-decision state variable, we solved to optimality a number of 24 very small instances of the original model formulated in terms of the pre-decision states. To this end, we first transformed the optimality equations in (2.9) into their equivalent linear programming form and then used a mathematical programming solver (GAMS/CPLEX) to find the optimal values v∗ (s) ∀s ∈ S defining the true value function. max ∑ v(s) v (2.22) s s.t. c(s, a) + λ ∑ p(s |s, a)v(s ) ≥ v(s) ∀s ∈ S, a ∈ As s ∈S The implementation of the equivalent linear program in (2.22) was initially carried out in GAMS, a high-level modeling software for mathematical programming and optimization, but finally developed using Delphi, a general-purpose programming language. The very small scenarios solved involved two priority classes, booking horizons of at most three days and a daily capacity of no more than four appointment slots. Table 2.2 characterizes the size of some of these problem instances. Solution times did not exceed two minutes, however, model generation times ranged from five minutes to ten hours. Note that the equivalent linear program in (2.22) has one variable for every state s ∈ S and one constraint for every feasible state-action pair (s, a), s ∈ S and a ∈ As , making an exact solution for larger problem instances practically impossible. Table 2.2: Size of the resulting linear programming model as a function of the number of patient types (I), the daily capacity (C), the length of the booking horizon (N) and the maximum number of arrivals of each type (M1 and M2 ). |S| represents the size of the state space and |F| the number of feasible state-action pairs. The value of |F| is defined as ∑ |As |. s∈S I C N M1 M2 |S| |F| 2 2 2 4 3 4 2 3 3 4 3 4 5 5 5 750 1,536 3,750 13,956 51,132 249,060 25 To gain some insights into the actual form of the true value function, we constructed first and second-order multiple linear regression models with s as the independent variable and v∗ (s) as the dependent variable. The general forms of these two models are given in (2.23) and (2.24) below. Each data point j in our data sets consisted of the value function v j , the number of patients already booked on each day of the booking horizon {u jn }Nn=1 and the number of patients of each type waiting to be booked {w ji }Ii=1 associated with a specific pre-decision state. The superscript c in (2.24) indicates centered variables. Centered means that the corresponding mean has been subtracted from every value of the variable. Variables were centered to avoid multi-collinearity and computational difficulties. N I v j = β0 + ∑ βn u jn + ∑ δi w ji + ε j n=1 N ∀ j (2.23) (first-order) i=1 N I I v j = β0 + ∑ βn ucjn + ∑ µnk ucjn ucjk + ∑ δi wcji + ∑ νik wcji wcjk n=1 N +∑ i=1 k≥n ∀ j (2.24) k≥i I ∑ ξni ucjn wcji + ε j (second-order) n=1 i=1 Although it seems unreasonable to draw conclusions from the exact solution to such small problem instances, some relevant observations can be made from our analysis: (1) Although a first-order multiple linear regression model results in R2 values of over 0.9, a second-order multiple linear regression model (in the pre-decision state components) is statistically more appropriate to describe the true form of the value function. A first-order model tends to under-estimate the value of a pre-decision state when the system is close to empty or close to full and over-estimates it for intermediate utilization levels. See the left residual plot in Figure 2.5. (2) Despite the fact that first and second-order terms in a second-order multiple linear regression model account for over 95% of the variation in the true value 26 function in terms of the pre-decision state variable, this model shows a poor fit suggesting that further analyses and the development of transformations and higher order models is required to best represent the value function. See the right residual plot in Figure 2.5. Figure 2.5: Residual plots (residuals versus fitted values) for a first and a second-order multiple linear regression model (left and right, respectively) obtained from the exact solution to a very small problem instance (I = 2, C = 3, N = 3, M1 = 3 and M2 = 5). The conclusions above suggest that a better value function approximation for this problem should be non-linear in the number of appointment slots booked on each day of the booking horizon. They also suggest that the booking actions associated with an optimal scheduling policy should not depend on the available capacity on each day independently. An affine approximation in the pre-decision state components under-estimates the value of some states and over-estimates the value of others, possibly leading to more conservative policies. An alternative approximate linear programming model In an effort to capture the actual form of the value function in terms of the predecision state variable, we approximately solved a number of small problem instances using the linear programming method and an alternative, more specific, 27 value function approximation. To do this, we first modified the original MDP model formulated in terms of the pre-decision state variable to consider the state of each appointment slot in the system individually. A state of the system was represented by a vector s = (u, w) = (u11 , . . . , uCN , w1 , . . . , wI ), where ukn is a binary variable equal to one if appointment slot k on day n is already booked, and zero otherwise (see Figure 2.6). Accordingly, an action was defined as a vector a = (x, y) = (x111 , . . . , xICN , y1 , . . . , yI ), where xikn is a binary variable equal to one if a patient of type i is booked in appointment slot k of day n, and zero otherwise. All the other state and action components remained the same as in the original MDP model. The set of feasible actions compatible with state (u, w) ∈ S, denoted by A(u,w) , was given by constraints (2.25) to (2.27) below. C N ∀i (2.25) ∀k, n (2.26) ∀n, k < C (2.27) ∑ ∑ xikn + yi = wi k=1 n=1 I ukn + ∑ xikn ≤ 1 i=1 I I ukn + ∑ xikn ≥ u(k+1)n + ∑ xi(k+1)n i=1 i=1 Constraint (2.25) limits the number of bookings and diversions for each patient type i to be equal to the number of patients of that type waiting to be booked. Constraint (2.26) guarantees that each appointment slot in the system is assigned to at most one patient. Constraint (2.27) defines a booking order, increasing in k, for the appointment slots in each day. We then used the affine value function approximation given in (2.28) to approximately solve the alternative model. The overall idea behind this disaggregate approximation was to capture potential non-linearities in the marginal discounted cost associated with booking a patient on a specific day as a function of the number of appointment slots already booked that day. C v(s) = W0 + ∑ N I ∑ Ukn ukn + ∑ Wi wi k=1 n=1 i=1 28 U, W ≥ 0,W0 ∈ R ∀s ∈ S (2.28) Figure 2.6: Alternative pre-decision state representation. Daily capacity utilization (left) versus individual capacity units (right). Unfortunately, the solution to (2.29), the linear program obtained by substituting (2.28) into the equivalent linear programming formulation of the alternative MDP model, made no difference as the optimal values of Ukn remained the same as the values obtained from using the original MDP model and the affine approxima∗ = U ∗ ∀k, n. Thus, the linear programming method and tion in (2.10). That is Ukn n the alternative MDP formulation failed to capture any potential non-linearities in the marginal discounted cost associated with the booking decisions. Although surprising, this result is consistent with the conclusion obtained by Patrick et al. after using a piece-wise linear non-decreasing convex function in the number of bookings in each day to approximate the value function in terms of the pre-decision state variable. We have not found any clear explanation for this result yet. Additional analyses presented in this chapter suggest a clear non-linear relationship between the marginal discounted cost and the pre-decision state components. C max U,W ≥0 W0 + ∑ N I ∑ Eα [ukn ]Ukn + ∑ Eα [wi ]Wi k=1 n=1 i=1 subject to 29 (2.29) C (1 − λ )W0 + ∑ N I ∑ µkn (s, a)Ukn + ∑ ωi (s)Wi ≤ c(s, a) k=1 n=1 ∀s ∈ S, a ∈ As i=1 where I µkn (s, a) = ukn (s) − λ uk(n+1) (s, a) + ∑ xik(n+1) (s, a) ∀k, n, s ∈ S, a ∈ As i=1 ωi (s) = wi (s) − λ mi I C c(s, a) = ∑ ∑ N ∀i, s ∈ S I ∀s ∈ S, a ∈ As ∑ cin xikn + ∑ hyi i=1 k=1 n=1 i=1 The parameter vector α in (2.29) represents the weight of each pre-decision state in the objective function of the equivalent linear program. We assumed ∑s α(s) = 1 and considered it as an exogenous probability distribution over the initial state of the system in order to compute Eα [ukn ] ∀k, n and Eα [wi ] ∀i. Regression analysis of simulated results In a final effort to gain a better understanding of the actual form of the true value function, this time expressed in terms of the post-decision state variable, we carried out a second regression analysis. Data for this analysis were generated by simulating Patrick et al.’s policy for the small outpatient clinic described in Section 2.3 starting from 5,000 different post-decision states. We chose to use postdecision states instead of pre-decision states in anticipation of implementing of a simulation-based approach. As discussed earlier, in Section 2.4.1, writing the value function in terms of the post-decision state variable provides significant computational advantages and simplicity. The value function estimate (i.e. the expected discounted cost-to-go) for each post-decision state was computed as the average discounted cost over 300 1,500-day replications. Starting post-decision states were determined as the state of the system after a warm-up period of 1,000 days. We first fit a simple linear regression model, a quadratic model without interaction terms and a quadratic model with first-order interaction terms, all of them with the post-decision state components as the independent variables and the average discounted cost as the dependent variable. The simple linear regression model 30 resulted in an R2 value of 0.85, the quadratic model without interaction terms in an R2 value of 0.91, and the quadratic model with first-order interaction terms in an R2 value of 0.99. Although providing very high R2 values, these models offered a very poor fit to the actual data patterns (see Figure 2.7) suggesting further analysis and the development of transformations and higher order models. At this stage in our analysis, the non-linear nature of the relationship between the average discounted cost and the post-decision state components was quite clear (see Figure 2.8). However, the actual form of the value function in terms of the post-decision state variable was still uncertain. Figure 2.7: Residual plots from a first-order (left) and a second-order regression model (right) obtained from data generated by simulating Patrick et al.’s policy for a small clinic. The dependent variable in the secondorder model was the natural logarithm of the average discounted cost. We then incorporated a set of additional predictors into the simple linear regression model. The additional independent variables included the differences between the number of bookings in consecutive days (uxn+1 − uxn , ∀n < N), the minimum daily number of bookings over the booking horizon (minn {uxn }), the maximum daily number of bookings over the booking horizon (maxn {uxn }) and the total numi ber of bookings within the different wait time targets (∑Tn=1 uxn , ∀i). Our best model ended up providing an R2 value of 0.87, without much improvement in terms of fit. We additionally tested several transformations of the post-decision state compo31 Figure 2.8: Partial plots from an additive regression model. These plots show the marginal relationship between the average discounted cost and the number of appointment slots booked on days 7 and 14, respectively. Dotted lines indicate 95% pointwise confidence bands. nents (higher orders) and transformations of the average discounted cost (square root and natural logarithm) without much success. The term regularization in mathematics and statistics refers to methods that involve introducing additional information in order to solve an ill-posed problem or to prevent overfitting. This information is usually of the form of a penalty for complexity. In model selection, these methods work by implicitly or explicitly penalizing models based on the number of parameters. We plan to consider regularization techniques in our future work on approximation architectures. After constructing several other models, we discovered that a much better fit to the data was provided by polynomial regression models with the average discounted cost, or its natural logarithm, as the dependent variable and the total number of bookings over the booking horizon as the only independent variable. The best model for the relationship between the natural logarithm of the average discounted cost and the total number of bookings was given by a 4th order polynomial (R2 = 0.99). The best model for the relationship between the average discounted cost and the total number of bookings, on the other hand, was given by a 6th order polynomial (R2 = 0.99). To avoid any possible numerical bias due to the use of 32 inverse functions to recover the value of the original variables (e.g. exponential function bias), we opted for omitting any transformation of the average discounted cost. Figure 2.9 summarizes some of the models considered in this analysis. Figure 2.9: Fit provided by different models to the data generated by simulating Patrick et al.’s policy for a small clinic. Average discounted cost estimates are plotted against the total number of booking over the booking horizon. The main issue with choosing a polynomial model to represent the value function in terms of the post-decision state variable is the extrapolation out of the range of the data. Scheduling policies derived from a polynomial approximation could potentially generate extreme workload levels for which the approximation could be very poor. Furthermore, there would be no guarantee of always having a nondecreasing functional form in the total number of bookings. 33 The logistic function After analyzing numerous data sets and constructing several polynomial models, we observed a common S-shaped pattern across models. As a result, we were able to identify a function that provides a similar fit to the data as high-order polynomial models. This function, called the logistic function, has a nice interpretation and provides a set of desirable properties in the context of our problem. It is monotone (behaves well for states outside of the range of the data), it is bounded above and below and it can be used to fit different data patterns. The general form of the logistic function is: f (x) = b0 + b1 (b 1 + e 2 x − b3 ) with b1 > 0. (2.30) Plots of the logistic function look like a stretched S-shape and are increasing or decreasing between two horizontal lines at levels b0 and b0 + b1 (see Figure 2.10). The value of b0 and the value of b0 + b1 are the floor and the ceiling of the curve, respectively. Data that follow an increasing logistic curve usually describe constrained growth or a cumulative quantity. The value of b2 determines the spread of the function and its direction. The function is increasing if b2 < 0, decreasing if b2 > 0 and constant if b2 = 0. For small values of x, an increasing logistic function behaves like an increasing exponential function. However, for large values of x the two functions behave quite differently (see Figure 2.10). There is only one change in curvature in the logistic function, at point (b3 /b2 , b0 + b1 /2). An increasing logistic function changes from convex to concave at this point while a decreasing logistic function shifts from concave to convex. Figure 2.11 illustrates this property for an increasing logistic function. It also highlights a region in which the function can be considered approximately linear in x. We have chosen a slightly more general version of the logistic function to approximate the value function in terms of the post-decision state variable. This in order to consider the impact on the discounted cost-to-go of each post-decision state component. The generalized logistic function shown in (2.31) is non-negative and includes one approximation parameter for each post-decision state component. 34 5000 1000 2000 f(x) 3000 4000 5000 4000 3000 1000 2000 f(x) 0 10 20 30 40 50 60 0 x−values 10 20 30 40 50 60 x−values Figure 2.10: The logistic function. On the left, a logistic function with parameters b0 = 450, b1 = 4750, b2 = −0.15 and b3 = −4.50. On the right, a logistic function with parameters b0 = 450, b1 = 4750, b2 = 0.15 and b3 = 4.50. Figure 2.11: Change in curvature in the logistic function. An increasing logistic function changes from convex to concave at its midway point. 35 b1 v(ux ) = b0 + bj ≥ 0 N 1 + exp ∑ ∀j (2.31) −b2n uxn + b3 n=1 We believe this functional form provides a good approximation to the expected discounted cost-to-go as a function of the post-decision state components for the following reasons. First, it is non-negative and therefore suitable for modeling cumulative cost values. Second, it is increasing in the number of appointment slots booked on each day of the booking horizon. Third, it provides a similar fit to the simulation results as high-order polynomial models. It increases similar to an exponential function for low congestion levels and slowly when the system is more congested. Fourth, it is more flexible in fitting different types of data (see Figure 2.11). Finally, the generalized logistic function makes the marginal expected discounted cost associated with booking an appointment slot on a given day dependent not only on the number of appointments already booked that day but also on the number of bookings in the other days of the booking horizon. This function, unlike an affine function, provides good expected cost estimates when the system is close to being empty or full. Equations (2.32) and (2.33) define the (approximate) marginal expected discounted cost associated with booking a patient of type i on day n for an affine and a logistic value function approximation, respectively. The affine approximation is defined in terms of the pre-decision state variable as in (2.10). min (ux ) = cin − h + λUn−1 ∀i, n (2.32) min (ux ) = cin − h + λ [v(uxn + 1, ux −n ) − v(ux )] ∀i, n (2.33) In (2.33), ux −n represents the number of bookings in the other days of the booking horizon. Note that the marginal discounted cost associated with an affine approximation is constant. 36 2.5 Implementation Having introduced a post-decision state formulation and chosen a non-linear architecture to represent the value function in terms of the post-decision state variable, we are now able to describe our approximate policy iteration algorithm. We will start by characterizing its five basic modules and then provide full details about the policy evaluation and the policy improvement steps and their interaction. There is a large number of potential types of methods for dealing with intractable dynamic programming models. The algorithm detailed in this section can be classified as a direct policy evaluation method. It concerns the approximation of the cost-to-go function in terms of the state variables and is embedded within a policy iteration framework. Unlike in most applications, where models are formulated in terms of pre-decision state variables and value functions are approximated using either neural networks or linear architectures in the approximation parameters, our algorithm is formulated in terms of a post-decision state variable and is based on a logistic approximation to the cost-to-go function. As we will see in this section, the use of a non-linear approximation architecture (in the state components and approximation parameters) introduces numerous challenges in the context of a simulation-based solution approach. For clarity, the research in this chapter studies a simplified version of the multipriority patient scheduling problem. Our algorithm, however, could be easily extended to consider postponements and a limited diversion capacity. 2.5.1 Modules A simulation-based implementation of an approximate policy iteration algorithm, as mentioned earlier, usually consists of five modules: an initialization module, a simulator, a decision generator, a value function approximator and a least squares solver. In this section, we describe in detail the implementation of each module for the multi-priority patient scheduling problem under study. All modules, with the exception of the least squares solver, were implemented in Java, a multi-purpose programming language. The least squares solver was implemented in GAMS with MINOS and IPOPT as non-linear solvers. Figure 2.12 demonstrates the interaction between these modules. 37 Figure 2.12: Simulation-based implementation of an approximate policy iteration algorithm (adapted from Bertsekas and Tsitsiklis [8]). Initialization This module is used to generate a post-decision state from which several sample trajectories will be produced. There are many alternative ways of generating an initial post-decision state ux0 . However, we have chosen to carry out this task by using Patrick et al.’s policy. To this end, we first construct an initial post-decision state by sampling the number of bookings in each day of the booking horizon from an integer uniform distribution that goes from 0 to C. We then simulate the evolution of the system under Patrick et al.’s policy for T0 decision epochs. The parameter T0 is called the warm-up period. The resulting post-decision state after the warm-up period is ux0 . We also use this module at the start of the algorithm to generate expected discounted cost estimates for several initial post-decision states ux0 . These samples are used by the the least squares solver to determine b0 , the initial set of approximation parameter values. 38 The simulator Given a pre-decision state s = (u, w) ∈ S and action a = (x, y) ∈ As , it generates the next pre-decision state of the system s = (u , w ) ∈ S according to the transition functions in (2.17), (2.18) and (2.19), where qi is generated from a Poisson distribution with mean mi patients per day. Alternatively, given a post-decision state s x = ux ∈ Sx , it generates the next pre-decision state s = (u , w ) ∈ S according to the transition functions in (2.18) and (2.19) and the demand distributions. The decision generator Given a pre-decision state s ∈ S, it finds the optimal booking action a∗ ∈ As by solving (2.34), where c(s, a) and vb (s, a) are the immediate cost and the approximate expected discounted cost-to-go values associated with taking action a ∈ As in state s. The value of c(s, a) is given by (2.6) while the value of vb (s, a) is provided by the cost-to-go approximator module described in the next subsection. a∗ ∈ arg min c(s, a) + λ vb (s, a) (2.34) a∈As Thus, the optimal booking action a∗ = (x∗ , y∗ ) in state s = (u, w) ∈ S is given by the solution to (2.35), the optimization problem obtained by substituting the definitions of c(s, a) and vb (s, a) into (2.34). I b0 + c x + hy + λ in in i ∑ ∑ ∑ (x,y)∈A(u,w) i=1 n=1 i=1 1 + exp min I N b1 N ∑ n=1 −b2n uxn + b3 (2.35) The optimization problem above is a non-linear integer program. There are several commercial solvers that can find an exact solution to this problem within a few seconds. Unfortunately, calling a solver every time that an action needs to be determined is impractical. As we will see, a policy evaluation step in our algorithm involves thousands and thousands of decision epochs. For this reason, we solve (2.35) heuristically. Given a pre-decision state s = (u, w) ∈ S, our heuristic first 39 identifies the set of patient types for which there are new appointment requests, denoted by I(w), and the set of days in which there still is available capacity, denoted by N(u). It then computes (approximately) the marginal expected discounted cost associated with each possible individual capacity allocation decision with indexes in I(w) × N(u) and finds the type-day combination (i∗ , n∗ ) ∈ I(w) × N(u) with the minimum marginal cost. If the minimum marginal cost is negative, it books a patient of type i∗ on day n∗ (xi∗ n∗ = xi∗ n∗ + 1). If not, it diverts all the remaining demand (y = w). After an individual booking decision has been determined, the list of patients waiting to be booked and the current appointment schedule are updated (wi∗ = wi∗ − 1 and un∗ = un∗ + 1, or w = 0, depending on the case). This is repeated until no patients are waiting to be booked. Figure 2.13 provides a pseudo-code of this algorithm. This procedure seems to provide one of the possible optimal solutions that can be obtained from using a non-linear integer programming solver, at least for the problem instances considered in our analysis. To validate our heuristic, we compared the actions obtained from using this procedure with the optimal actions obtained from a non-linear integer programming solver for 1,000 different pre-decision states. The cost-to-go approximator Given a pre-decision state s = (u, w) ∈ S, an action a = (x, y) ∈ As and a set of approximation parameter values b, it approximates the expected discounted cost associated with the post-decision state defined by s and a by computing (2.36). b1 vb (s, a) = b0 + N 1 + exp I ∀s ∈ S, a ∈ As (2.36) ∑ −b2n (un + ∑ xin ) + b3 n=1 i=1 The cost-to-go approximator is consulted by the decision generator for approximate expected discounted cost-to-go values millions of times in every policy evaluation step. For this reason, its efficient implementation was key to achieving reasonable execution times. After doing some code profiling, we surprisingly discovered that the bottleneck in early versions of this module was Java’s implementation 40 1: 2: 3: 4: 5: 6: while w > 0 do for i ∈ I(w) do for n ∈ N(u) do min ← ∆[c(s, a) + λ vb (s, a)]/∆xin end for end for 7: 8: (i∗ , n∗ ) ← arg mini,n {min } 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: if mi∗ ,n∗ < 0 then xi∗ n∗ ← xi∗ n∗ + 1 un∗ ← un∗ + 1 wi∗ ← wi∗ − 1 else for i ∈ I do yi ← wi wi ← 0 end for end if end while Figure 2.13: The decision generator module. of the exponential function. The exponential function, like many other mathematical functions, is one whose output is impossible to calculate exactly (except when its argument is equal to zero). It can only be computed with a certain degree of precision. The exponential function in Java, although obliged to give its value as accurately as possible in double precision, is implemented so basically that the time required by this function is considerable and actually varies depending on the input value. For this reason, we opted for using a faster implementation at the expense of some loss of accuracy. Our implementation is based on the following limit. exp(x) ≡ lim 1 + k→∞ x k k (2.37) The larger the value of k, the closer (1 + x/k)k will be to an accurate value of exp(x). In practice, useful values of k are powers of two since raising (1 + x/k) to 41 the power of k, in this case, will be equivalent to multiplying (1 + x/k) by itself a number of times. We chose k = 256 and programmed exp(x) as in Figure 2.14. 1: 2: 3: 4: 5: 6: function exp(x,k) x ← 1 + x/k x ← x × x, x ← x × x, x ← x × x, x ← x × x x ← x × x, x ← x × x, x ← x × x, x ← x × x return x end function Figure 2.14: Faster implementation of the exponential function. The time required by the exponential function in Figure 2.14 is on average more than three times faster than Math.exp(), Java’s implementation of exp(x), and certainly accurate enough for our purposes. Since we use the exponential function to obtain approximate cost-to-go estimates, we believe there is no need for more precise values. The least squares solver Given a sample {v¯r }Rr=1 of R average discounted cost observations produced by a R policy evaluation step starting from R different post-decision states {uxr 0 }r=1 , this module solves the least squares problem in (2.38) to obtain a new set of parameter values b∗ . b∗ ∈ arg min b≥0 R ∑ v¯r − vb (uxr 0 ) 2 (2.38) r=1 The solution to the least squares problem in (2.38), unlike the linear case, does not have a closed or recursive form. The function vb (·), defined in (2.36), is nonlinear in some of the approximation parameters. For this reason, we rely on a nonlinear solver to obtain an optimal solution to this problem. Unfortunately, there is no guarantee that a non-linear solver will converge to an optimal solution from an arbitrary starting point. Actually, solvers’ default starting points are often a very poor choice. Therefore, we are forced to influence the likelihood of convergence by 42 specifying good starting values for b every time we solve the least squares problem. Starting values do not have to be perfect, they just need to be reasonable. Fortunately, the properties of the generalized logistic function allow us to specify good starting points for each approximation parameter in (2.38). As mentioned earlier, the value of b0 and the value of b0 + b1 represent the floor and the ceiling of a logistic curve. Thus, a good starting point for b0 would be minr {v¯r } and for b1 maxr {v¯r } − b0 . Additionally, when normalized, the middle 50% of a logistic curve spans a little more than two units on the horizontal axis. If we consider the total number of bookings in the system on the horizontal axis, that is x = ∑Nn=1 ux0n , we can choose two divided by the interquartile range of the total number of bookings as a starting point for b2n ∀n. Finally, the ratio b3 /b2 represents the value on the horizontal axis at which a logistic curve reaches its midway point. Thus, a good starting point for b3 would be b2 times the median of the total number of bookings. Setting appropriate bounds on the value of the approximation parameters is also very important. Bounds help non-linear optimization algorithms by preventing them from moving far away from optimal solutions or into regions with unreasonable large function or derivative values. Non-linear optimization algorithms often use large steps as the initial steps in a line search and non-linear functions will very likely be called with some variables at their lower or upper bounds. Objective functions including the exponential function, for example, are likely to get an exponentiation overflow error if the content of the exponential function has no upper bound (the exponential function overflows for large arguments). Since the definition of the least squares problem in (2.38) involves a generalized logistic function, we have chosen to model the argument of the exponential function in (2.36) as an intermediate variable and restrict its value to be less than or equal to five. Non-linear optimization algorithms use the derivatives of the objective function and the constraints to determine good search directions, and they use function values to determine if constraints are satisfied or not. The scaling of variables and constraints is important because it determines the relative size of the derivatives and function values and therefore the search path taken by the algorithms. Acknowledging this, we have set up scale factors of 100, 1000, 0.1 and 1 for b0 , b1 , b2 and b3 , respectively. 43 2.5.2 The policy evaluation step A policy evaluation step generates R expected discounted cost estimates starting from R different post-decision states by simulating the system under the booking policy given by the current value of b. The expected discounted cost estimate v¯ r associated with post-decision state uxr 0 is computed as the average discounted cost over K replications starting from uxr 0 , where each replication consists of T decision epochs. Figure 2.15 provides a pseudo-code of this part of the algorithm. A policy evaluation step receives as input the current value of the approximation parameters b and generates as output a set of post-decision states and expected discounted cost r R estimates {uxr 0 , v¯ }r=1 . 1: 2: 3: 4: 5: 6: 7: 8: 9: for r = 1 to R do uxr 0 ← Initialization for k = 1 to K do s1 ← Simulator(uxr 0 ) for t = 1 to T do a∗ ← Decision Generator(st , b) ct ← c(st , a∗ ) st+1 ← Simulator(st , a∗ ) end for T 10: t−1 vk (urx ct 0 )← ∑λ t=1 end for 1 K 12: v¯r ← ∑ vk (uxr 0 ) K k=1 13: end for 11: Figure 2.15: A policy evaluation step. Parameters R, K, T and T0 play an important role in the convergence of the approximate policy iteration algorithm. Therefore, their values are chosen carefully for each particular instance of the patient scheduling problem. The value of R is determined by considering ten state-discounted cost observations per approximation parameter (a rule of thumb). The value of K is defined as the minimum number of replications needed to stabilize the average discounted cost values associated with 44 a small sample of post-decision states. The value of T is chosen such that the tail of the discounted cost after the T -th decision epoch in an infinite trajectory is no larger than ε. This is stated in (2.39), where cmax is an upper bound on the immediate cost. We set ε = 1 so it is of little practical significance. Finally, the value of T0 is determined as the minimum number of days after which the system utilization reaches a steady-state. T = min t ∈ N : λ t+1 cmax ≤ε 1−λ (2.39) The value of the parameters defining a policy evaluation step are determined in the following order. First, the sample size R. Second, the warm-up period T0 . Third, the length of a simulation run T . Last, the number of replications K. 2.5.3 The policy improvement step A policy improvement step determines a new booking policy by updating the approximation of the expected discounted cost-to-go function. It receives b j−1 , the r R current values of the approximation parameters, and a set {uxr 0 , v¯ }r=1 of post- decision states and expected discounted cost estimates and generates a new set of parameter values b j . To this end, it first consults the least squares module for an optimal solution b∗ to the least squares problem defined in (2.38) and then uses this solution to compute b j according to (2.40). b j = (1 − α j ) × b j−1 + α j × b∗ ∀j (2.40) We smooth the value of b rather than just using b∗ because of the noise in our estimates (random demand sampling is needed to generate expected discounted cost estimates). The value of α j in (2.40) determines how much weight we put on b∗ in iteration j. We have chosen a simple and deterministic stepsize rule, called the generalized harmonic rule [47], to generate α j . This rule is defined below. αj = α¯ α¯ + j − 1 ∀ j, α¯ > 0 45 (2.41) The stepsizes produced by the generalized harmonic rule depend on the value ¯ By using a reasonably large value of α¯ we can reduce of a positive constant α. the rate at which the stepsize drops to zero, which often helps convergence in the case of poor initial parameter estimates (see Figure 2.16). If α¯ is chosen too small, the algorithm may significantly over-predict or under-predict the true parameter values. Figure 2.16: The generalized harmonic stepsize rule. Figure 2.17 summarizes the structure (pseudo-code) of a policy improvement step. To obtain b0 , the initial set of parameter values, we solve the least squares problem in (2.38) for a set of post-decision states and expected discounted cost estimates generated by simulating the evolution of the system under Patrick et al.’s policy (initialization module). Since Patrick et al.’s policy provides an already very cost-efficient way of booking multi-priority patients, initial parameter values generated through this approach are very good starting points for our algorithm. 46 1: r R b∗ ← Least Squares Solver({uxr 0 , v¯ }r=1 ) 2: 3: αj ← α¯ ¯ α + j−1 4: 5: b j ← (1 − α j )b j−1 + α j b∗ Figure 2.17: A policy improvement step. 2.5.4 The algorithm Our algorithm finds a booking policy for the multi-priority patient scheduling problem by alternating policy evaluation and policy improvement steps until the difference between the values of two consecutive sets of approximation parameters is less than or equal to δ percent (in the sup-norm sense), or a maximum number of iterations J is reached. The general structure of the algorithm is described in Figure 2.18. j←1 done ← false 3: while j ≤ J and not done do 1: 2: 4: 5: r R j−1 ) {uxr 0 , v¯ }r=1 ← Policy Evaluation(b 6: 7: r R b j ← Policy Improvement({uxr 0 , v¯ }r=1 ) 8: 9: if |b j − b j−1 |/b j−1 ≤ δ then ∞ 10: done ← true end if 13: j ← j+1 14: end while 11: 12: Figure 2.18: The approximate policy iteration algorithm. In addition to the parameters defining the MDP model, an instance of the algorithm is characterized by an initial set of parameter values b0 , the number R 47 of post-decision states to consider in each policy evaluation step, the number K of replications that define the expected discounted cost estimate for each postdecision state, the number T of days in a simulation run, the length of the warm-up ¯ the tolerance level δ and the maximum number period T0 , the stepsize parameter α, of iterations J. 2.6 Results In this section, we provide some insights regarding the characteristics of the patient scheduling policies obtained through the proposed simulation-based algorithm and compare their performance to that of four other policies. We start by studying a small example. We present the final values of the approximation parameters, analyze the consequent booking policy and perform some sensitivity analysis. We then evaluate the potential benefits from the proposed algorithm by applying it to two more practical examples, one of them taken from Patrick et al. [44]. The algorithm, as mentioned before, was mostly coded in Java, a multi-purpose programming language. The computer used to solve the different examples in this section was a 3.00GHz Quad Core PC with 16GB of RAM. 2.6.1 A small example Consider a very small clinic with a capacity of 6 appointment slots per day. The clinic divides demand into three priority classes with wait time targets of 4, 8 and 12 days, respectively, and chooses a 12-day booking horizon. Demand from each priority class is assumed to be Poisson with means 3, 2 and 1 patients per day, respectively. The overtime cost is 100 and the late booking penalties for each patient class are 20, 10 and 5. The discount rate is 0.99. The algorithm required 190 iterations (i.e. 190 policy evaluation and policy improvement steps) and approximately eight hours to converge to the final parameter values shown in Figure 2.19. We used the following algorithm configuration: R = 150, K = 300, T0 = 100, T = 1, 225, α¯ = 1, δ = 0.1 and J = 1, 500. Figure 2.19 displays the value of b2 for each component of the post-decision state variable. Since the larger the value of b2 the more costly it is to use capacity on a given day, the plot at the bottom of Figure 2.19 suggests that the booking agent 48 Parameter Value b0 b1 b3 411.61 4,261.95 4.05 0.06 0.00 0.02 0.04 Value 0.08 0.10 Parameter b 2 2 4 6 8 10 12 Booking day Figure 2.19: Final values of the approximation parameters for a small example. should consider booking patients on days one and two before scheduling them later in the booking horizon. Acknowledging that it is impossible to characterize a decision rule for every possible pre-decision state of the system, we compare the performance of the resultant booking policy to the performance of four other policies and analyze some state-action pairs to identify the characteristics that make the resulting policy the best choice in terms of the mean discounted cost. The following five policies are compared in terms of mean discounted cost, mean average wait times, mean average daily capacity utilization, mean number of patient diversions, mean average time to the first available appointment slot and mean percentage of late bookings over 1,000 simulation runs. • Myopic policy (M). Patients are booked as soon as possible, in increasing 49 priority order, according to the immediate cost function defined in (2.6). This policy considers only what is know at the present, ignoring the impact of today’s decisions on the future performance of the system. The myopic policy resorts to diversions for priority class i only when there is no available capacity within the first n¯ i days, where n¯ i = maxn {n : cin < h}. • Patrick et al.’s policy (LP). Patients are booked according to the booking guidelines defined in Section 2.3. This policy does not necessarily correspond to the approximate optimal policy suggested by the linear programming approach. In some cases, when some conditions are violated, the optimal affine value function approximation obtained through the linear programming approach is zero and the consequent scheduling policy becomes a myopic policy. • DMB policy (DMB). Patients are booked as soon as possible, in increasing priority order, on the day associated with the minimum number of bookings within the corresponding wait time targets. Diversions occur only when there is no available capacity within the corresponding targets. This policy is intended to distribute a more even workload to each day of the booking horizon. DMB stands for “Day with the Minimum number of Bookings”. • AA policy (AA). Policy derived from the final solution provided by the approximate policy iteration algorithm when the affine approximation architecture in (2.42) is used. AA stands for “Affine Approximation”. N v(ux ) = U0x + ∑ Unx uxn U x ∈ RN+1 ∀ux ∈ Sx (2.42) n=1 When an affine approximation architecture in the post-decision state components is used, the least squares problem in (2.38) becomes a simple linear regression problem and the optimal booking action a∗ = (x∗ , y∗ ) in predecision state s = (u, w) ∈ S is given by: 50 (x∗ , y∗ ) ∈ arg min (x∗ ,y∗ )∈A(u,w) I N I ∑ ∑ (cin + λUnx )xin + ∑ hyi i=1 n=1 (2.43) i=1 Clearly, given a final set of parameter values {Unx∗ }Nn=0 , the AA policy will only book patients of each priority class into those days for which (cin + λUnx∗ − h) < 0, diverting any remaining demand. It is important to note that the affine version of the algorithm and the linear programming method developed by Patrick et al. solve different problems. The former fits an affine approximation architecture to cost-to-go samples using a least squares approach while the latter uses linear programming to solve an approximate form of the optimality equations. The affine version of the algorithm (the AA policy) gives a least upper bound on the expected discounted cost in the class of affine approximations. In addition, the affine version of the algorithm is formulated in terms of the post-decision state variable while the linear programming method developed by Patrick et al. is formulated in terms of the pre-decision state variable. • SS policy (SS). Patients are booked according to the solution to (2.34), where b is the final set of parameter values obtained through the approximate policy iteration algorithm using a non-linear approximation architecture. SS stands for “S-Shaped”. Each policy was evaluated using common random patient arrivals. The simulation length was 1,400 days with statistics collected after the first 100 days. The first 100 days were simulated under Patrick et al.’s policy. Simulation results are presented in Figure 2.20 and summarized in Table 2.3. Table 2.3 provides 95% confidence intervals for each performance metric considered in this study. We can see from Table 2.3 that the SS policy outperforms all the other policies in terms of the mean discounted cost and behaves relatively well (no worse than third place) with respect to all the other performance metrics. The myopic policy is better than the SS policy in terms of the mean average wait time for the lowest priority patients, the mean average daily capacity utiliza51 Discounted Cost Average Wait Times 6 4 Days 15000 0 2 5000 Dollars 8 10 25000 Type 1 Type 2 Type 3 M LP DMB AA SS M LP DMB Booking Policy Average Capacity Utilization Diverted Patients 6.00 Booking Policy Type 2 Type 3 5.80 200 100 5.85 Number 5.90 300 5.95 Type 1 SS 0 5.75 Appointment Slots AA M LP DMB AA SS M LP DMB AA Booking Policy Booking Policy Average Time to First Available Slot Late Bookings SS 60 40 Percentage 4 20 3 0 2 1 Days 5 6 80 7 Type 1 Type 2 Type 3 M LP DMB AA SS M Booking Policy LP DMB AA Booking Policy Figure 2.20: Simulation results for a small example. 52 SS Table 2.3: A comparison of the performance of the booking policies for a small example with capacity of 6 appointment slots per day. Criterion Type M LP DMB AA SS Discounted cost – 9,229 ± 431 1,390 ± 60 1,332 ± 64 1,573 ± 85 1,180 ± 64 Avg. wait times 1 2 3 4.89 ± 0.05 5.48 ± 0.06 5.73 ± 0.06 1.92 ± 0.01 6.67 ± 0.02 10.93 ± 0.02 1.94 ± 0.01 5.47 ± 0.02 9.19 ± 0.02 2.30 ± 0.01 4.63 ± 0.03 7.89 ± 0.05 2.00 ± 0.01 4.71 ± 0.03 8.13 ± 0.05 Avg. capacity utilization – 5.95 ± 0.00 5.86 ± 0.00 5.89 ± 0.00 5.92 ± 0.00 5.91 ± 0.00 Diversions 1 2 3 70.93 ± 3.14 0.00 ± 0.00 0.00 ± 0.00 182.02 ± 3.30 0.04 ± 0.02 0.00 ± 0.00 152.88 ± 3.29 0.00 ± 0.00 0.00 ± 0.00 114.47 ± 3.27 0.00 ± 0.00 0.00 ± 0.00 106.06 ± 2.85 15.57 ± 0.59 0.24 ± 0.04 Avg. Time to first – available slot 4.84 ± 0.06 1.81 ± 0.01 1.48 ± 0.01 2.18 ± 0.01 1.83 ± 0.01 1 2 3 54.66 ± 0.98 15.92 ± 0.59 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 7.38 ± 0.18 0.00 ± 0.00 0.00 ± 0.00 0.17 ± 0.01 0.00 ± 0.00 0.00 ± 0.00 Percentage late tion and the mean number of diversions. However, it does much worse than the SS policy with respect to the mean average wait times for higher priority patients, the mean average time to the first available appointment slot and the mean percentage of late bookings. This policy books almost 30% of the patients late. The performance of the LP policy is slightly better than the performance of the SS policy in terms of the mean average wait time for the highest priority patients and the mean average time to the first available appointment slot. This as a direct consequence of lower capacity utilization levels. Additionally, the LP policy does not book any patients late. This policy, however, provides much longer average wait times for lower priority patients and a much larger number of patient diversions. It diverts, on average, close to 5% of the highest priority patients. Relatively speaking, the DMB policy performs similar to the LP policy. The difference in their mean discounted costs is not statistically significant. The DMB policy, however, provides shorter average wait times for lower priority patients and the best mean average time to the first available appointment slot. Nevertheless, it still diverts, on average, almost 4% of the highest priority patients. The performance of the AA policy is comparable to the performance of the SS policy. Their overall numbers are very similar. However, there is a big difference 53 in terms of the number of highest priority patients that are booked late. The AA policy schedules about 7% of them late while the SS policy less than 0.2%. There are a couple of interesting characteristics that allow the SS policy to achieve the lowest mean discounted cost among all the policies considered in this study. Unlike the other policies, the SS policy reacts to low capacity utilization levels early in the booking horizon by using available appointment slots to serve lower priority patients sooner. It also reacts to low capacity utilization levels late in the booking horizon by booking some of the highest priority patients late. Additionally, the SS policy anticipates high capacity utilization levels by diverting patients of all types preemptively, even if there still is available capacity within the corresponding wait time targets (further explanation below). Figure 2.21 provides a comparison of the SS policy and the LP policy (the second best policy in terms of the mean discounted cost) for six different predecision states. Pre-decision states are labeled Case 1 to Case 6 and the differences are highlighted in grey. Most of the time the actions taken by these two policies are the same (Case 1). However, if there is a low capacity utilization early in the booking horizon, the SS policy will book lower priority patients on days one and two rather than on their respective wait time targets (Case 2). Similarly, if the current future system utilization is low, and there is no available capacity within the corresponding wait time target, the SS policy will book some highest priority patients late (Case 3). Note that, in this case, the LP policy will divert highest priority patients at a higher cost. In addition, if the capacity utilization is high early in the booking horizon, the proposed policy will divert patients of all types preemptively (Cases 4 to 6). The SS policy does that in order to avoid larger numbers of late bookings and diversions, and therefore a higher cost, later. The number of diversions, however, will depend on the utilization level. In some cases, the SS policy will divert all the new demand, in others, only a fraction of it (Cases 5 and 6). Some days may even be partially filled (Case 5). Note that, unlike the LP policy, the SS policy preemptively diverts all types of patients, including patients of the highest priority class. This is the only counterintuitive aspect of this policy. However, by diverting patients preemptively, the SS policy prevents the need for greater numbers of diversions, and possibly late bookings, later. In order to justify our choice of α¯ and δ , we performed independent sensitivity 54 Figure 2.21: A comparison of the SS policy (left) and the LP policy (right) for six pre-decision states labeled Case 1 to Case 6. Pre-decision states and actions are presented according to the notation at the top of the figure. For each case (side-by-side diagrams), the shaded cells indicate how the booking actions differ from each other. 55 analyses on these two parameters. Results are presented in Figure 2.22. We first ran the algorithm for values of α¯ ranging from 1 to 20. The final value function approximations did not differ significantly in terms of the value of b2 , but they did with respect to the values of the other approximation parameters. Expected discounted cost estimates, however, remained the same in a relative sense. Even though the performance of the system for different value of α¯ was very similar, early booking, late booking and preemptive diversion thresholds changed, altering the behavior of the consequent booking policy. We ended up choosing α¯ = 1 ¯ the fewer the mainly because of time considerations. The lower the value of α, number of iterations needed for the algorithm to converge. After setting α¯ equal to one, we used the algorithm to obtain solutions for values of δ ranging from 1 to 0.01. We observed that the values of δ smaller than 1 provided almost the same value function approximations and therefore almost identical patient scheduling policies. We adopted δ = 0.1 for the same reason we chose α¯ = 1. A solution for δ = 0.01 required 42 hours, more than five times longer than a solution for δ = 0.1. We next studied the changes in the final value function approximation and in the consequent booking policy caused by different values of the diversion cost and the demand rates. Problem parameters h and m, respectively. The value of h represents the cost associated with diverting a patient to an alternative capacity source and mi is the mean value of the Poisson distribution used to model the daily number of service requests of type i. Results are presented in Figure 2.23. We first ran the algorithm for values of h between 80 and 120. As expected, some of the final parameter values changed considerably. The values of b0 and b1 increased a bit less than proportional with respect to the changes in the value of h. The value of b2 , however, remained practically the same. The resulting appointment scheduling policy also changed. It reacted to higher diversion cost values by booking more high priority patients late. It also kept similar capacity utilization levels, and service levels, by diverting almost the same number of patients but in a different priority mix. The larger the value of h, the fewer highest priority patients and the more lower priority patients were diverted. We then used the algorithm to obtain solutions for demand rates 10% lower and 10% higher than in the base case. The final value function approximation changed 56 Scenario b0 b1 b3 # iterations Time [hrs.] α¯ α¯ α¯ α¯ α¯ α¯ = 1 = 2 = 3 = 5 = 10 = 20 411.61 406.29 408.28 427.36 426.15 430.61 4,261.95 4,457.74 4,520.01 4,999.01 5,077.50 4,950.87 4.05 4.11 4.13 4.16 4.13 4.09 190 390 390 738 1,500* 1,500* 8.1 17.0 17.0 32.4 64.7 64.9 δ δ δ δ =1 = 0.1 = 0.01 = 0.01 509.22 411.60 406.81 408.56 3,779.90 4,261.95 4,333.65 4,546.71 4.03 4.05 4.08 4.11 24 190 500* 1,000* 1.0 8.1 21.0 42.7 Figure 2.22: Final values of the approximation parameters for different values of α¯ and δ . “*” indicates the cases for which the maximum number of iterations was reached. drastically. In the case of lower demand rates, the resulting policy recognized the system was over-capacitated and booked patients earlier in the booking horizon, eliminating the need for diversions. It handled demand peaks using late bookings rather than diversions. In the case of higher demand rates, the resulting policy reacted to the lack of capacity by using any available capacity on day one, and sometimes on day two, and by diverting any remaining demand. As a final note in this section, we observed that the benefits from using the approximate policy iteration algorithm were greater when a discount factor of 0.999 was used. In this case, the mean discounted cost associated with the SS policy 57 Scenario b0 b1 b3 # iterations Time [hrs.] h = 80 h = 90 h = 100 h = 110 h = 120 326.50 369.37 411.60 458.00 505.57 3,328.13 3,746.55 4,261.95 4,835.75 5,321.90 4.06 4.05 4.05 4.06 4.06 284 190 190 220 220 12.1 8.1 8.1 9.3 9.3 15.41 411.60 6,445.19 1,143.43 4,261.95 270,534.12 5.17 4.05 2.04 544 190 655 22.9 8.1 30.2 m = 0.9 × m m = 1.0 × m m = 1.1 × m Figure 2.23: Final values of the approximation parameters for different values of h and m. was 20% lower than the mean discounted cost associated with the LP policy. The SS policy provided the lowest mean discounted cost and the lowest mean number of patient diversions, with almost no patients booked late, followed closely by the DMB policy. The algorithm required 154 iterations and approximately 7 days to determine the final values of the approximation parameters for this scenario. The following configuration was used: R = 150, K = 600, T0 = 100, T = 14, 630, α¯ = 1, δ = 0.1 and J = 500. 2.6.2 A small clinic Consider now a small clinic with a capacity of 10 appointment slots per day. The clinic divides demand into three priority classes with wait time targets of 7, 14 and 58 Parameter Value b0 b1 b3 21.10 7,684.77 6.51 0.02 0.03 Value 0.04 0.05 Parameter b 2 5 10 15 20 Booking day Figure 2.24: Final values of the approximation parameters for a small clinic. 21 days, respectively, and chooses a 21-day booking horizon. Demand from each priority class is assumed to be Poisson with means 5, 3 and 2 patients per day. The diversion cost, the late booking penalties and the discount factor remain the same as in the previous example. The algorithm required 212 iterations and approximately 42 hours to converge to the final parameter values presented in Figure 2.24. We used the following algorithm configuration: R = 240, K = 350, T0 = 200, T = 1, 312, α¯ = 1, δ = 0.1 and J = 1, 500. The overall shape of b2 in Figure 2.24 is similar to the one obtained for the previous example, suggesting that the resulting policy should behave similar to the one described earlier. The actual parameter values, however, are smaller and increase and decrease faster. This results, as we will see next, in fewer patients being diverted and more patients being booked late. To evaluate the benefits from the proposed algorithm in this setting, we simu59 Table 2.4: A comparison of the performance of the booking policies for a small clinic with capacity of 10 appointment slots per day. Criterion Type M LP DMB AA SS Discounted cost – 19,507 ± 813 919 ± 70 1,063 ± 79 1,772 ± 127 889 ± 75 1 Avg. wait times 2 3 6.95 ± 0.11 7.49 ± 0.12 7.74 ± 0.12 2.93 ± 0.03 12.24 ± 0.05 19.83 ± 0.03 2.98 ± 0.04 10.15 ± 0.07 18.04 ± 0.05 3.14 ± 0.05 8.74 ± 0.12 14.63 ± 0.16 3.14 ± 0.04 7.68 ± 0.15 17.4 ± 0.16 Avg. capacity utilization – 9.97 ± 0.00 9.92 ± 0.00 9.94 ± 0.00 9.96 ± 0.00 9.96 ± 0.00 Diversions 1 2 3 73.17 ± 4.26 0.00 ± 0.00 0.00 ± 0.00 123.56 ± 4.33 0.00 ± 0.00 0.00 ± 0.00 108.48 ± 4.36 0.00 ± 0.00 0.00 ± 0.00 83.87 ± 4.34 0.00 ± 0.00 0.00 ± 0.00 79.51 ± 4.29 0.00 ± 0.00 0.00 ± 0.00 Avg. time to first – available slot 6.86 ± 0.11 2.70 ± 0.03 1.67 ± 0.02 2.58 ± 0.04 2.91 ± 0.04 1 2 3 47.55 ± 1.49 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 5.64 ± 0.26 0.00 ± 0.00 0.00 ± 0.00 0.69 ± 0.04 0.00 ± 0.00 0.00 ± 0.00 Percentage late lated the evolution of the system under the resulting policy and compared its performance with the performance of the other four policies introduced in the previous section. Each booking policy was simulated for 1,600 days with statistics collected for each of 1,000 simulation runs after a warm-up period of 200 days. The first 200 days were simulated using Patrick et al.’s policy (this is the policy we use in the initialization module to generate initial post-decision states). Simulation results are presented in Figure 2.25 and summarized in Table 2.4. The simulation results do not differ considerably from the results obtained for the previous small example, so the same conclusions apply to this case. The SS policy outperforms all the other policies in terms of the mean discounted cost and behaves relatively well with respect to all the other performance metrics. Unlike in the previous case, however, the SS policy does not divert any lower priority patients and books late a larger, but still small, percentage of the highest priority patients. The difference in terms of the mean discounted cost between the SS policy and the LP policy is small but statistically significant. This was corroborated by a pairedsamples t-test. The performance of the AA policy in terms of the mean number of diversion is slightly worse, relatively speaking, than in the previous setting. 60 Average Wait Times 20 15 Type 1 Type 2 Type 3 0 5 10 Days 30000 10000 Dollars 50000 Discounted Cost M LP DMB AA SS M LP DMB AA Booking Policy Booking Policy Average Capacity Utilization Diverted Patients Type 3 200 300 Type 2 0 100 Number Appointment Slots 9.75 9.80 9.85 9.90 9.95 Type 1 SS M LP DMB AA SS M LP DMB AA Booking Policy Average Time to First Available Slot Late Bookings Type 1 Type 2 Type 3 60 40 Percentage 6 20 4 0 2 Days 8 80 10 100 Booking Policy SS M LP DMB AA SS M Booking Policy LP DMB AA Booking Policy Figure 2.25: Simulation results for a small clinic. 61 SS 2.6.3 A larger clinic We finally increase the capacity of the clinic to 30 appointment slots per day. The demand from each priority class is assumed to be Poisson with means 15, 10 and 5 requests per day, respectively. All the other parameters stay the same as for the small clinic in the previous example. The algorithm required 332 iterations and approximately 6 days to converge to the final parameter values shown in Figure 2.26. We adopted the following algorithm configuration: R = 240, K = 250, T0 = 450, T = 1, 420, α¯ = 1, δ = 0.1 and J = 1, 500. The values of b2 are significantly lower than in the two previous settings. Also, the overall pattern shown in Figure 2.26 is flatter and drops quickly below the level associated with day two. This suggest that, unless there is available capacity on day one, the resulting booking policy will never book low priority patients early in the booking horizon. To evaluate the potential benefits from using the resulting policy, we compared its performance against the performance of the four other policies introduced earlier. Each booking policy was simulated for 1,870 days with statistics collected for each of 1,000 simulation runs after a warm-up period of 450 days. The warm-up period was simulated under Patrick et al.’s policy. Simulation results are presented in Figure 2.27 and summarized in Table 2.5. The simulation results for this setting clearly differ from the results obtained for the two previous examples. In this case, the SS policy is the third best policy in terms of the mean discounted cost, right after the DMB and the LP policy. This result could be a direct consequence of insufficient exploration in the parameter space or a sign of an inappropriate algorithm configuration. We observe, however, that even though the DMB and the SS policy generate higher discounted costs than the LP policy, they divert on average fewer patients. Since none of these three policies performs late bookings, the only explanation for the higher discounted costs associated with lower diversion levels is the use of diversions earlier in the planning horizon. This suggests that both the DMB and SS policy would perform better than the LP policy in steady-state conditions. Thus, a natural extension of the research in this chapter would be a simulation study of the impact of the initial 62 Parameter Value b0 b1 b3 55.04 13,975.09 10.17 Figure 2.26: Final values of the approximation parameters for a larger clinic. state conditions on the performance of each policy. We would also like to note that the SS policy is the only policy among the three policies mentioned above that is directly obtained from an algorithmic approach. The actual affine value function approximation suggested by the linear programming method for this setting, and for the other two examples in this section, is zero. The probability distribution over the initial state generated by the initialization module does not satisfy the conditions stated in Patrick et al. [44]. Thus, to evaluate the benefits of the policies directly derived from the linear programming method and the simulation-based algorithm, we should compare the myopic policy and the SS policy. The SS policy clearly outperforms the myopic policy in almost all the performance metrics. The myopic policy is better than the SS policy only in terms of the mean average wait time for the lowest priority patients. 63 Average Wait Times 15 Type 1 Type 2 Type 3 0 5 10 Days 80000 40000 Dollars 120000 20 Discounted Cost LP DMB AA SS M LP DMB AA Booking Policy Booking Policy Average Capacity Utilization Diverted Patients SS 300 100 29.7 29.8 Number 500 29.9 Type 1 Type 2 Type 3 0 29.6 Appointment Slots 30.0 M M LP DMB AA SS M LP DMB AA Booking Policy Average Time to First Available Slot Late Bookings SS Type 1 Type 2 Type 3 60 40 Percentage 6 20 4 0 2 Days 8 80 10 100 Booking Policy M LP DMB AA SS M Booking Policy LP DMB Booking Policy Figure 2.27: Simulation results for a larger clinic. 64 AA SS Table 2.5: A comparison of the performance of the booking policies for a larger clinic with capacity of 30 appointment slots per day. Criterion M LP DMB AA SS Discounted cost – 56,827 ± 2,598 943 ± 102 1,083 ± 112 15,479 ± 476 1,399 ± 132 1 Avg. wait times 2 3 7.64 ± 0.14 8.11 ± 0.14 8.36 ± 0.14 2.73 ± 0.06 11.99 ± 0.08 20.17 ± 0.04 2.69 ± 0.07 10.43 ± 0.14 19.33 ± 0.06 3.61 ± 0.05 9.07 ± 0.06 18.12 ± 0.04 3.62 ± 0.05 7.53 ± 0.16 20.56 ± 0.04 – 29.98 ± 0.00 29.92 ± 0.00 29.95 ± 0.00 29.91 ± 0.00 29.98 ± 0.00 1 Diverted patients 2 3 84.65 ± 6.65 0.00 ± 0.00 0.00 ± 0.00 121.34 ± 7.07 0.00 ± 0.00 0.00 ± 0.00 111.29 ± 7.1 0.00 ± 0.00 0.00 ± 0.00 164.98 ± 7.5 0.00 ± 0.00 0.00 ± 0.00 108.12 ± 6.76 7.79 ± 0.61 0.00 ± 0.00 Avg. time to first – available slot 7.46 ± 0.14 2.45 ± 0.05 1.34 ± 0.02 1.30 ± 0.01 2.13 ± 0.02 1 2 3 56.63 ± 2.02 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 36.49 ± 0.68 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 Avg. capacity utilization Percentage late 2.7 Type Conclusion In this chapter, we described a dynamic multi-priority patient scheduling problem and developed a simulation-based approach to identifying good advance patient scheduling policies for this problem. The proposed method is based on the classical dynamic programming policy iteration algorithm and uses a generalized logistic function to approximate the expected discounted cost-to-go function. To the best of our knowledge, this is the first time that an explicit non-linear approximation architecture in the approximation parameters, other than a neural network, is used to optimize resource allocation policies in a dynamic setting6 . This is also the first time that policies derived from a linear programming and a simulation-based approach to ADP are compared for the same problem. It could be argued that the use of non-linear approximation architectures to solve dynamic programming models is very common given the popularity of neural networks. However, applying this technology to specific problems can be a non-trivial challenge. Also, unlike the generalized logistic function used in our 6 Note that the classical logistic regression is equivalent to a one-layer, single-output neural network (single-neuron net) with a logistic activation function. To the best of our knowledge, there is no equivalent formulation for the generalized logistic model in our approach. 65 algorithm, neural network approximations do not have an easy and intuitive interpretation. They act more like a black box given a set of input variables. The results in this chapter demonstrate that the true value function for the patient scheduling problem can be better represented by a generalized logistic function. They also show that policies obtained through the proposed algorithm perform better, in most cases, than other policies for this problem. In particular, they provide lower discounted cost values and shorter average wait times for higher priority patients than policies directly obtained from the linear programming approach using an affine value function approximation. The potential benefits from using non-linear value function approximations are clear, however, non-linear approximation architectures introduce numerous additional challenges in the context of a simulation-based algorithm: • The problem of finding an action at a given decision epoch involves solving a non-linear integer programming model. In some cases, this can be done easily, in others, an additional approximation is needed. This introduces a new source of error. • A policy improvement step entails solving a non-linear least squares problem. Unlike the linear case, this problem does not necessarily have a closed or simple recursive form. Also, there is no guarantee that a commercial nonlinear solver will converge to the optimal solution to this problem from an arbitrary starting point. For this reason, a good initial solution, reasonable bounds and appropriate scaling factors are needed to influence the likelihood of convergence. • The algorithm’s convergence rate depends on the quality of the initial value function approximation. Thus, having either a very good understanding of the system dynamics or knowing a good existing policy is key. • Non-linear approximation architectures may provide a better fit to the true value function associated with a specific problem, however, the approximation parameters will not always have nice interpretations. The main limitation of the proposed algorithm is the large amount of time required to perform a policy evaluation step. This is a direct consequence of the 66 slow speed of simulation and the complexity of the optimization problem solved at each decision epoch. In order to be able to solve more practical instances of the problem, a more efficient policy evaluation code should be written. Specifically, any improvement in the implementation of the decision generator module would significantly decrease the time it takes to find a solution to the problem. For example, a faster implementation of the exponential function or a more efficient way of finding the minimum marginal cost in our heuristic would make the code much more efficient. If this is not possible, alternative simulation-based algorithms such as temporal-difference learning or direct search methods [38] could be adapted to this problem. However, there is no guarantee of success. In this sense, even though the linear programming approach is mostly limited to affine approximation architectures, its main advantage is that it avoids the need for iterative learning (policy evaluation and policy improvement steps). As an alternative, parallel computing could be used to speed up policy evaluation steps. For clarity, we have solved a simplified version of the multi-priority patient scheduling problem. Our algorithm, however, could be easily adapted to consider a limited diversion capacity, service requests for multiple appointments and different service times. It could also be modified to include postponements. The latter, although, would require adding the number of postponed decisions of each priority class as a new component of the state space. The main limitation would still be the size of the problems that can be solved. For example, it would be unreasonable to think in using the algorithm (in its current state) to solve a high dimensional patient scheduling problem like the one presented in the following chapter. 67 Chapter 3 Dynamic Multi-Appointment Patient Scheduling for Radiation Therapy In this chapter, we formulate and solve a discounted infinite-horizon Markov Decision Process (MDP) for scheduling cancer treatments in radiation therapy units. The main purpose of the MDP model is to identify good policies for allocating available treatment capacity to incoming treatment requests, while reducing patient wait times in a cost-effective manner. We use an affine architecture to approximate the value function in the MDP formulation and solve an equivalent linear programming model through column generation to obtain an approximate optimal policy. The benefits from the proposed linear programming method are evaluated by simulating the performance of the approximate optimal policy for a practical example based on data provided by the British Columbia Cancer Agency (BCCA). The research presented in this chapter was part of a study of scheduling practices we undertook at the BCCA. The study was motivated by the potential impact of delays on radiation therapy cancer patients and by inefficiencies in the use of expensive resources1 . 1 An abbreviated version of this chapter has been accepted for publication in the European Journal of Operational Research. Saur´e, A., J. Patrick, S. Tyldesley, M. L. Puterman. Dynamic multiappointment patient scheduling for radiation therapy, 2012. 68 3.1 Introduction External beam radiation (hereafter referred to as radiation therapy) is a cancer treatment that uses high-energy rays to kill or shrink tumor cells. It is the principal therapy for some types of cancer, but it is also used in combination with other treatments and therapies (e.g. chemotherapy, hormonal therapy and surgery) for many other types of cancer. When a cure is not possible, radiation therapy can be used for palliative purposes for multiple types of cancer. In British Columbia (BC), approximately 52% of the cancer patients require radiation therapy some time during the course of their illness and 40% receive radiation therapy within five years of diagnosis (Source: BCCA registry and treatment databases). The urgency of radiation therapy varies for different types of cancer and indications. It can be characterized as urgent or routine and also as curative or palliative. Curative therapy can be grouped further into primary curative therapy (where radiation is the main or only treatment given to cure a patient) or adjuvant (when radiation is added to another primary curative therapy, usually surgery, to reduce the risk of relapse). Indications for radiation therapy arise for almost all types of cancer in all organs systems resulting in hundreds of potential cancer sites and treatment types. However, a few cancer sites and treatment types dominate the overall demand for treatment. Radiation therapy is delivered primarily with external beam radiation machines (linear accelerators). These machines may have different energies of radiation, or additional equipment, making them useful for subtypes of radiation treatments and cancer sites. Most radiation therapy centers have a variety of identical machines that serve a range of patient types, allowing for redundancy in the case of breakdowns. Some smaller centers may have two to four identical machines perhaps catering to patients of particular types with more complex patients sent to larger centers with specialized equipment. Wait times for radiation therapy are a concern not only in Canada but worldwide. Long standing evidence suggests that delays in radiation therapy are associated with tumor progression, persistence of cancer symptoms, psychological distress and decreased cancer control and survival rates [11, 13, 23, 35, 42, 63]. For this reason, many cancer institutions around the world have adopted wait time benchmarks for the time from when the patient is ready to begin treatment to the 69 start of it. In Canada, the maximum acceptable wait suggested by the Canadian Association of Radiation Oncologists (CARO) for all non-emergency and non-urgent cases is 14 days [41]. Unfortunately, fewer than 75% of the radiation therapy treatments in BC are initiated within this time frame. Table 3.1 shows the service levels for patient who received radiation therapy in BC between 2004 and 2008. Table 3.1: Service levels for patients who received radiation therapy in British Columbia between 2004 and 2008 (Source: BCCA registry and treatment databases). Year Patients % initiated within 14 days 28 days 56 days 2004 2005 2006 2007 2008 9,834 10,144 10,168 10,487 10,318 76.4% 71.3% 73.8% 77.0% 70.0% 97.5% 96.2% 97.5% 96.3% 96.2% 100.0% 99.9% 100.0% 99.9% 99.9% Delays in radiation therapy are a direct consequence not only of an imbalance between capacity and demand but also a result of inefficient patient scheduling. Demand for, and the complexity of, radiation therapy has increased dramatically in the past decade as a result of higher cancer incidence rates, the development of new diagnostic techniques and changes in the the complexity of treatment delivery. Treatment capacity has failed to keep up with demand and no scheduling policies have been developed to increase the efficiency of radiation therapy systems. Three relevant aspects make scheduling radiation therapy treatments especially challenging. First, radiation therapy treatments can be classified into multiple types. The classification is usually made on the basis of cancer site, treatment intent and urgency level. Second, radiation therapy treatments are spread out over time. For most types of cancer, radiation therapy is delivered in daily consecutive sessions for a time period that may vary between 1 day and 8 weeks, with breaks on weekends. Third, radiation therapy sessions do not necessarily have the same duration. Each session is scheduled for a time period ranging from 12 to 60 minutes. The combination of these three aspects of radiation therapy means that a simple first-come-first-served policy will inevitably perform very poorly. This is either 70 because later arriving more urgent demand would be forced to wait longer or else because the uneven session lengths would create unusable gaps in the system resulting in wasted capacity (not unlike in the game Tetris). Thus, more sophisticated mathematical models are necessary than have hitherto been used for this problem. To that end, in this chapter, we formulate the radiation therapy appointment scheduling problem as a discounted infinite-horizon MDP. To deal with an intractable number of variables and constraints, we first approximate the value function in the equivalent linear programming formulation of the MDP using an affine architecture and then solve the dual of the resulting approximate linear programming model through column generation. From the solution we derive an approximate optimal booking policy which we test via simulation. Figure 3.1 illustrates this methodological approach. We assume that the machines used for radiation therapy do not differ significantly and thus that treatments can be delivered by any treatment unit. The total treatment capacity is determined by aggregating individual capacities from multiple machines. This assumption may not be realistic for all cancer centers but it is realistic for some small centers equipped with three to four identical machines and focusing on a particular subtype of cancer and for subcomponents of larger cancer centers with three to four redundant machines that cater to four to five different cancer sites with 18 to 20 treatment types. For simplicity, we assume treatment requests are observed when patients are ready to begin treatment and not earlier. In this way, we implicitly take into account patient availability and the time needed for any preparatory activities such as imaging and treatment planning. Our models could be easily adapted to include a delay between the date of request and the first potential day of service. This delay could be stochastic and differ from one treatment type to another. From a methodological point of view, the research presented in this chapter represents a significant extension of the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. [44]. In addition to multiple priority types, we consider patients who receive treatment across multiple days and for irregular lengths of time. Furthermore, we allow the possibility of using overtime on different days of the planning horizon, and not necessarily for entire treatments. These additional complications are essential for any realistic attempt to model the scheduling of radiation therapy treatments. The new dimensions to 71 Figure 3.1: Methodological framework. the problem and the possibility of enlarging the system capacity through the use of overtime, together with the non-convex nature of the overtime cost, make this problem much more difficult to model and solve. To the best of our knowledge, this is the first research initiative to incorporate multi-appointment requirements when optimizing advance multi-priority patient scheduling policies in a dynamic setting. This research project also constitutes a novel application of approximate dynamic programming to a problem that has received very limited attention in the operations research literature. The remainder of this chapter is organized as follows. Section 3.2 provides a detailed description of the radiation therapy appointment scheduling problem. Section 3.3 describes the proposed MDP model and Section 3.4 our solution approach. In Section 3.5 we provide some managerial insights based on applying our methodology in a small sample problem. Additionally, we evaluate the benefits of the proposed method by simulating its performance in a more practical example based on BCCA patient data. Finally, Section 3.6 states our main conclusions and suggests possible extensions. 72 3.2 Problem description The radiation therapy appointment scheduling process is the process by which available treatment capacity is assigned to incoming demand for treatment. It starts with the arrival of one or more treatment requests and relies on the expertise of one or two booking agents. Treatment requests specify the characteristics of the treatments to be booked, including cancer site, treatment technique, intent, urgency level, capacity requirements, dosage and earliest start date. The intent can be radical, palliative or adjuvant. In terms of capacity requirements, a request indicates the number of daily consecutive sessions into which a treatment is divided and the number of appointment slots required to deliver each session. Appointment slots are typically 12 or 15 minutes long and quite often the initial session is one appointment slot longer than the rest of the sessions in order to provide additional time for patient and system setup. The earliest start date corresponds to the date from which the patient is ready to begin treatment. Table 3.2 provides two examples of treatment specifications. Table 3.2: Examples of treatment specifications. Attribute Cancer site Treatment technique Treatment intent Urgency level Number of sessions Session durations [min.] Dose [cGy.] Earliest start date Treatment 1 Treatment 2 Breast Right isocenter Adjuvant Standard 16 1×24 + 15×12 4,250 5 days from today Lung Parallel opposed pair Palliative Urgent 5 1×24 + 4×12 2,000 Today Appointment scheduling decisions take place in batches, once or many times a day, after the arrival of one or more requests. In order to define starting dates, the booking agent must set priorities and scan the current schedule of suitable treatment units for available capacity to fit complete treatments. Under current practice, the booking decisions are made without explicitly considering that some treatments can be initiated further into the future allowing the appointments for 73 more urgent treatments to be scheduled earlier. The presence of highly variable demand, limited treatment capacity, multiple urgency levels and multiple appointment requirements make it impossible for the booking agent to adequately assess the real impact of today’s decisions on the future performance of the system. This lack of foresight generates several inefficiencies that usually translate into unnecessary delays, an unsystematic prioritization of patients, a larger number of isolated appointment slots that cannot be used and a higher overtime utilization. The MDP model presented in Section 3.3 seeks to provide the booking agent with a means of adequately assessing the future impact of today’s decisions in order to more intelligently allocate capacity. The performance of the radiation therapy appointment scheduling process is usually measured through the patients’ median wait time, the percentage of treatments initiated within the clinical benchmark (service levels) and the average capacity utilization. We focus our analysis on the service levels, but also consider other performance measures. 3.3 MDP formulation Each day the booking agent receives requests for appointments for treatments classified into I different pre-specified types. Treatments are classified on the basis of cancer site, treatment intent, urgency level and capacity requirements – that is the number of daily consecutive sessions into which the requested treatment is divided and the number of appointment slots required to deliver each session. Types do not change over time. Daily demand for each treatment type i follows a discrete probability distribution with mean mi requests per day. The demand distribution is independent for each type and does not change over time. For modeling convenience, we assume scheduling decisions are made once a day and the booking agent may schedule treatments at most N days in advance, where N is the number of days in the booking horizon. Assuming that decisions are made once a day is not too far from reality. In practice, booking decisions usually take place in batches a couple of times a day. In this manner, the booking agent – typically a radiation therapist – is able to assess the future workload of the system, take into account clinical considerations and prioritize urgent cases. Although our approach 74 can accommodate different appointment durations, session durations are measured in the same discrete time units called appointment slots. The number of appointment slots booked on any given day cannot exceed (Cr +Co ) units. Parameters Cr and Co are fixed and correspond to the daily regular-hour and overtime capacities respectively. Since the booking agent may schedule treatments at most N days in advance, we define a planning horizon to keep track of the number of appointment slots booked within and beyond the booking horizon. The number of days in the planning horizon, M, is defined large enough to allow the completion of any treatment initiated on day N of the booking horizon. That is M = N + maxi {li } − 1, where li represents the number of sessions of a treatment of type i. Each treatment i of type is associated with a vector ri = {ri j }lj=1 . This vector describes the duration, in appointment slots, of each of its sessions. For example, r = (2, 1, 1, 1, 1) represents a treatment consisting of an initial session of two appointment slots followed by four additional sessions of one appointment slot each. See Figure 3.2a. Figure 3.2: Graphical representation of some possible treatment patterns. Pattern (a), for example, represents a treatment consisting of an initial session of two appointment slots followed by four additional sessions of one appointment slots each. Once the booking requests for a given day are known, the booking agent observes the number of appointment slots scheduled from today to the end of the planning horizon and determines how to allocate the available treatment capacity to the waiting demand. As an alternative action, he/she may decide to postpone some of the booking decisions to the next day. Figure 3.3 depicts the timeline associated with the radiation therapy appointment scheduling decisions. We next formulate the radiation therapy appointment scheduling problem as a discounted infinite-horizon MDP model. We expand the dynamic multi-priority patient scheduling model developed by Patrick et al. [44] by introducing multiple appointment requests, multiple session durations and allowing parts of the appoint- 75 Figure 3.3: Timeline associated with the treatment scheduling decisions. ments to be delivered using overtime. 3.3.1 Decision epochs Scheduling decisions are made at the end of each day. 3.3.2 State space At the end of each day, the booking agent has access to the current schedule from today to the end of the planning horizon as well as the number of treatments of each type waiting to be booked. The current schedule provides the booking agent with the number of regular-hour and overtime slots available on each day of the planning horizon. Thus, a state of the system, denoted by s ∈ S, can be written as in (3.1), where um represents the number of regular-hour appointment slots already booked on day m, vm the number of overtime slots already booked on day m and wi the number of treatments of type i waiting to be booked. s = (u, v, w) = (u1 , . . . , uM , v1 , . . . , vM , w1 , . . . , wI ) (3.1) The definition of w considers not only the new demand for treatment but also all treatments not scheduled previously. Note that, as in Chapter 2, the planning horizon is not static but rolling and thus that uM = vM = 0 at the beginning of each decision epoch. In order to use the mathematical programming models described later, we require a finite state space. For this reason, we implicitly assume upper bounds to the number of treatments of each type waiting to be booked. The bounds are set sufficiently high so they are of little practical significance. 76 3.3.3 Action sets At the end of each day, the booking agent must decide on which day to start each of the treatments waiting to be scheduled. In some cases, this implies delivering parts of some treatments using overtime. Alternatively, the agent may postpone to the next day the scheduling decisions for some treatments. Overtime and postponements are intended to relieve the stress on the system. Any action available to the booking agent can be represented by (3.2), where xin represents the number of treatments of type i booked today to start on day n and ym is the number of overtime slots booked today on day m. a = (x, y) = (x11 , . . . , x1N , x21 , . . . , x2N , . . . , xI1 , . . . , xIN , y1 , . . . , yI ) (3.2) Three important observations follow. First, we have chosen to model the capacity utilization component of the state space using two state variables instead of one because ym is a non-convex function of the total number of appointment slots already booked on day m and the capacity allocation decisions. In this way, we do not need to define any additional variables and constraints to deal with the non-convexity of the overtime cost. Second, the number of treatments of type i that remain unscheduled at the end of the day can be written as wi − ∑Nn=1 xin ∀i. Third, note that we are not assigning patients to specific appointment slots, we are just allocating capacity to each treatment request. The set of feasible actions compatible with state (u, v, w) ∈ S, denoted by A(u,v,w) , must satisfy constraints (3.3) to (3.5) below. N ∑ xin ≤ wi ∀i (3.3) ri(m−k+1) xik ≤ Cr + ym ∀m (3.4) vm + ym ≤ Co ∀m (3.5) n=1 I um + ∑ i=1 min{m,N} ∑ k=max{m −li +1,1} 77 xin ∈ Z+ ym ∈ Z+ ∀i, n ∀m Constraint (3.3) limits the number of bookings for each treatment type to be less than or equal to the number of treatments waiting to be booked. Constraint (3.4) restricts the total number of appointment slots booked today for day m to be less than or equal to the available treatment capacity that day. This is equivalent to ensuring that the number ym of overtime slots booked today for day m is sufficient to cover the new bookings made for that day. Constraint (3.5) limits the total overtime utilization on day m to be less than the overtime capacity. All action variables are positive and integer. 3.3.4 Transition probabilities Once all scheduling actions are taken, the only source of uncertainty in the transition to the next state of the system is the number of new requests for each type of treatment. Thus, as a result of choosing booking action a = (x, y) in state s = (u, v, w), a ∈ As and s ∈ S, and having qi new requests of type i, the state of the system the next day, denoted by s = (u , v , w ), will be determined by the transition probability in (3.6). p(s |s, a) = I ∏ Pr(qi ), if s satisfies equations (3.7) to (3.10); i=1 0, I otherwise. min{(m+1),N} um = um+1 + ∑ i=1 (3.6) ∑ ri[(m+1)−k+1] xik − ym+1 m < M, (3.7) m < M, (3.8) ∀i, (3.9) k=max{(m+1) −li +1,1} vm = vm+1 + ym+1 N wi = wi − ∑ xin + qi n=1 uM = vM = 0 (3.10) 78 Equations (3.7), (3.8) and (3.10) define the new number of regular-hour and overtime appointment slots booked on day m as a function of the number of previous slots booked on day m + 1 plus all new bookings that affected that day. Equation (3.9) determines the new number of treatments waiting to be booked as the number of treatment requests that have not yet been booked plus new demand. The term Pr(qi ) corresponds to the probability of having qi new requests for treatments of type i. 3.3.5 Immediate cost The immediate cost associated with choosing booking action a = (x, y) in state s = (u, v, w), a ∈ As and s ∈ S, comes from three sources: the penalties associated with the resulting patient wait times, the cost associated with the use of overtime, and the penalties associated with postponing some of the booking decisions. We represent the total immediate cost as follows: I c(s, a) = ∑ N M ∑ cin xin + i=1 n=1 I ∑ m=1 N hm ym + ∑ gi wi − ∑ xin i=1 (3.11) n=1 where n cin = ∑ λ k−1 fik ∀i, n hm = λ m−1 h ∀m k=1 In Equation (3.11), cin represents the penalty (if any) for starting a treatment of type i on day n, hm is the discounted overtime cost associated with an overtime booking on day m and gi corresponds to the penalty for postponing to the next day the booking of a treatment of type i. The values of cin are obtained by discounting the penalties fik associated with each additional day of wait before the start of a treatment. The relative orders of the wait time penalties are determined by expert opinion and investigated through sensitivity analysis. They are defined in relation to existing guidelines for acceptable waits and taking into consideration estimates of the importance of radiation therapy for different disease sites. The overtime cost is denoted by h and the discount factor by λ < 1. To avoid keeping track of the wait times associated with postponed requests, 79 our model assumes that the portion of the demand that is not booked today is handled the same as the new demand tomorrow. The postponement penalties, however, are set orders of magnitude higher than the wait time penalties and the overtime cost. Postponements thus constitute a last resort to relieve stress on the system and are intended to ensure problem feasibility. 3.3.6 Optimality equations The value function in our formulation, denoted by v(s), corresponds to the total expected discounted cost over the infinite horizon. Of course, we are not so much interested in determining the value function for a given policy as in finding the optimal stationary policy. To identify such a policy we need to solve the following optimality equations: v(s) = min c(s, a) + λ a∈As ∑ p(s |s, a)v(s ) ∀s ∈ S (3.12) s ∈S The challenge with this formulation is the same as for the multi-priority patient scheduling model described in Chapter 2. Even for very small instances of the problem, the size of the state space and the size of the corresponding action sets make a direct solution to (3.12) impossible. The state variable s = (u, v, w) and the action variable a = (x, y) have (M + I + M) and (I × N + M) dimensions, respectively. Assuming that wi can take Di possible values, this means that we might have up to (Cr + 1)M × ∏Ii=1 Di × (Co + 1)M different states and ∏Ii=1 DNi × (Co + 1)M different (not necessarily feasible) actions. As an example, the formulation for the practical example presented in Section 3.5 involves more than 10300 possible states and 10500 potential actions. 3.4 Solution approach In order to deal with an intractable number of states and actions, we adopt the linear programming method illustrated in Figure 3.1. We first transform our MDP model into its equivalent linear programming form. The linear programming approach to discounted infinite-horizon MDPs, initially presented by d’Epenoux [19], is based on writing the optimality equations in (3.12) as follows: 80 max ∑ α(s)v(s) (3.13) s∈S subject to c(s, a) + λ ∑ p(s |s, a)v(s ) ≥ v(s) ∀s ∈ S, a ∈ As s ∈S The value of α(s) represents the weight of state s ∈ S in the objective function. The solution to the equivalent linear programming model in (3.13) is the same as the solution to the optimality equations in (3.12) when α is strictly positive [20, 30]. We normalize α to ∑s∈S α(s) = 1 and consider it as an exogenous probability distribution over the initial state of the system. The equivalent linear programming model, however, does not avoid the curse of dimensionality. The model in (3.13) has one variable for every state s ∈ S and one constraint for every feasible stateaction pair (s, a), s ∈ S and a ∈ As , making its solution impossible. To deal with the curse of dimensionality and solve the radiation therapy appointment scheduling problem we chose the following affine approximation to v(u, v, w): M v(u, v, w) = W0 + M I ∑ Um um + ∑ Vm vm + ∑ Wi wi m=1 m=1 (3.14) i=1 U, V , W ≥ 0,W0 ∈ R In this way the approximate mathematical programming model in (3.13) is still linear and the new variables are directly interpretable. Any other more sophisticated approximation such as a linear combination of a set of more general basis functions would make the subproblem in our column generation algorithm a M non-linear integer programming problem. The values of {Um }M m=1 , {Vm }m=1 and {Wi }Ii=1 represent the marginal cost of having an additional regular-hour appointment slot occupied on day m, the marginal cost of having an additional overtime slot used on day m and the marginal cost of having one more treatment of type i waiting to be booked, respectively. If we substitute (3.14) into (3.13) and restrict α to be a probability distribution, we obtain: 81 M max W0 + U,V ,W ,W0 M I ∑ Eα [um ]Um + ∑ Eα [vm ]Vm + ∑ Eα [wi ]Wi m=1 m=1 (3.15) i=1 subject to M (1 − λ )W0 + M I ∑ µm (s, a)Um + ∑ νm (s, a)Vm + ∑ ωi (s, a)Wi ≤ c(s, a) m=1 m=1 i=1 ∀s ∈ S, a ∈ As U, V , W ≥ 0,W0 ∈ R where Eα [um ] = ∑ α(s)um (s) ∀m µm (s, a) = um (s) − λ um (s, a) ∀m ∀m νm (s, a) = vm (s) − λ vm (s, a) ∀m ∀i ωi (s, a) = wi (s) − λ s∈S Eα [vm ] = ∑ α(s)vm (s) s∈S N Eα [wi ] = ∑ α(s)wi (s) wi (s) − ∑ xin + mi ∀i n=1 s∈S The approximate equivalent linear programming model in (3.15) has a tractable number of variables, (2M + I + 1), but still an intractable number of constraints. For this reason, we solve its dual (3.16) using column generation. min ∑ (3.16) c(s, a)X(s, a) X s∈S, a∈As (1 − λ ) ∑ X(s, a) = 1 s∈S, a∈As ∑ µm (s, a)X(s, a) ≥ Eα [um ] ∀m ∑ νm (s, a)X(s, a) ≥ Eα [vm ] ∀m ∑ ωi (s, a)X(s, a) ≥ Eα [wi ] ∀i s∈S, a∈As s∈S, a∈As s∈S, a∈As X 82 ≥ 0 Column generation finds the optimal solution to (3.16), the master problem, starting with a small set of feasible state-action pairs and iteratively adding the state-action pair associated with the most violated primal constraint. Unfortunately, there is no easy way to identify an initial set of feasible state-action pairs to (3.16). For this reason, we tested several approaches to perform this task. Among them are pre-defined solutions, rounding heuristics and column sampling. We ended up implementing the following two additional mathematical programming models. The model in (3.17) finds an initial set of state-action pairs by combining the action-set constraints and the definition of µm in (3.15). It is solved for each treatment type and focuses on dual feasibility rather than optimality. (s, a)i ∈ arg max ηi : ηi ≤ µm (s, a) ∀m, vm = ym = 0 ∀m, w j = 0 ∀ j = i s∈S, a∈As ∀i (3.17) The model in (3.18) corresponds to a Phase I method. It starts from the initial set of state-action pairs provided by (3.17) and incorporates new state-action pairs into its formulation until the value of the slack variable s is equal to zero. min s (3.18) subject to (1 − λ ) ∑ X(s, a) = 1 s∈S, a∈As ∑ µm (s, a)X(s, a) ≥ Eα [um ] − s ∀m ∑ νm (s, a)X(s, a) ≥ Eα [vm ] − s ∀m ∑ ωi (s, a)X(s, a) ≥ Eα [wi ] − s ∀i s∈S, a∈As s∈S, a∈As s∈S, a∈As X, s ≥ 0 The model used to identify the state-action pair associated with the most violated primal constraint, the pricing problem, is itself an optimization problem. Given the dual values associated with the current solution to (3.16), {Um }M m=1 , 83 I {Vm }M m=1 and {Wi }i=1 , the next state-action pair to enter the basis is given by (3.19). M arg min c(s, a) − (1 − λ )W0 − s∈S,a∈As M I ∑ µm (s, a)Um − ∑ νm (s, a)Vm − ∑ ωi (s, a)Wi m=1 m=1 i=1 (3.19) Three additional constraints are added to (3.19) to guarantee that only valid state-action pairs are generated. Constraint (3.20) limits the number of overtime bookings on day m to be less than or equal to the total number of new bookings for that day. Constraints (3.21) and (3.22) ensure that the M-th day in the planning horizon of any valid state-action pair has no appointments in it. I min{m,N} ri(m−k+1) xik ≥ ym ∑ ∑ i=1 ∀m (3.20) k=max{m −li +1,1} uM = 0 (3.21) vM = 0 (3.22) The column generation algorithm iterates until no primal constraint is violated or until we are close enough to quit (stopping criterion of -0.0001), giving ∗ M ∗ I us {Um∗ }M m=1 , {Vm }m=1 and {Wi }i=1 . These values are then used to identify an approximate optimal policy. In practice, rather than computing and storing the approximate optimal actions for each state s ∈ S, a resource-intensive task, we only compute them as needed by solving (3.23) d ∗ (s) ∈ arg min (x,y)∈As I N M ∑ ∑ Cin xin + ∑ Hm ym i=1 n=1 (3.23) m=1 where (n−1)+li −1 Cin = cin + λ ∑ ri(k+1−n+1)Uk∗ − (gi + λWi∗ ) k=(n−1) Hm = m = 1; h, ∗ ∗ , m > 1. hm + λVm−1 − λUm−1 84 ∀i, n The integer programming model in (3.23) is obtained by inserting the approxi∗ I ∗ M mate value function defined by {Um∗ }M m=1 , {Vm }m=1 and {Wi }i=1 into the right hand side of the optimality equations in (3.12) and ignoring the constant terms. The coefficients accompanying the booking actions in (3.23) have direct interpretations and their values provide a good description of the approximate optimal policy. Cin balances the penalty associated with a wait time of n days and the cost due to the (n−1)+l −1 loss of available treatment capacity in the future, cin + λ ∑k=(n−1)i ri(k+1−n+1)Uk∗ , against the benefit due to the fact that the booking decision is not postponed and the treatment request does not re-appear in tomorrow’s demand, gi + λWi∗ . Hm balances the cost associated with a one-unit increase in the system total capacity on day m and the cost due to the loss of available overtime capacity on day m − 1 ∗ , against the benefit from freeing regular-hour capacity on tomorrow, hm + λVm−1 ∗ . It is important to note that when the expected daily day m − 1 tomorrow, λUm−1 demand for appointment slots does not exceed the regular-hour capacity (i.e. when ∗ M ∗ I capacity is not a binding constraint), {Um∗ }M m=1 , {Vm }m=1 and {Wi }i=1 are equal to zero. In this case, the coefficients of the booking actions do not need to be adjusted to take into account the impact of today’s decisions on the future and the approximate optimal policy becomes a myopic policy (Cin and Hm are simply defined by cin − gi and hm , respectively). One of the difficulties with column generation is the large number of iterations needed to find the optimal solution to the master problem (often referred to as the tailing-off effect of column generation, L¨ubbecke and Desrosiers [34]). In order to reduce execution times, we experimented with alternative ways of finding an initial set of feasible columns, different stopping criteria and several methods for solving the master problem in each iteration. We found the best execution times were achieved when the master problem was solved using barrier methods starting from the immediate previous solution. We also implemented the in-out approach proposed by Ben-Ameur and Neto [6] to further reduce computation time. The execution time for the practical example presented in Section 3.5.2 was about 20% shorter using this approach. The implementation of the column generation algorithm and the integer programming model used to identify the approximate optimal policy was performed in GAMS 23.5 with CPLEX 12.2 as the solver. 85 3.5 Results This section provides some insights regarding the properties of the approximate optimal policy and illustrates the potential benefits that can be obtained from its use. We first consider a small example of the problem and analyze the impact of different treatment specifications on the values of Cin and Hm . Then, we evaluate the approximate optimal policy by simulating its performance for a more practical example based on BCCA operations. 3.5.1 Policy insights In this section, we analyze six scenarios to illustrate some relevant properties of the coefficients defining the approximate optimal policy. Each scenario assumes the values of cin are determined by discounting a constant wait time penalty fi for each day of wait beyond a recommended wait time target (wait time penalties are incurred for each appointment slot). The regular-hour capacity is set equal to the average demand, request arrivals are assumed Poisson and postponements are not allowed. In theory, assuming that postponements are not allowed can result in infeasibility. However, in practice, this is not an issue as there are sufficient regular-hour and overtime appointment slots to guarantee feasibility. Demand distributions are truncated at three times their mean values. Note that even for the small settings presented in this section the size of the state space and the size of the corresponding action sets make a direct solution to (3.12) impossible. For example, the formulation for the base case scenario below involves more than 1075 possible states. (a) Base case – we consider a hypothetical radiation therapy center with a regularhour capacity of 50 appointment slots per day and an overtime capacity of 6 appointment slots per day. The facility initially delivers only one type of treatment consisting of five consecutive one-appointment-slot sessions. The average demand for this type of treatment is 10 requests per day. The wait time target is 10 days and the wait time penalty is $50 per day per appointment slot. The overtime cost is $100, the length of the booking horizon is 25 days and the discount factor is 0.99. We have chosen a discount factor of 0.99 because it reflects the medium-term planning horizon that is often applicable in health86 care settings. In this way, costs are relatively similar over the short-term and less valued far in the future. (b) Different number of sessions – we consider five treatment types consisting of three, four, five, six and seven one-appointment-slot sessions, respectively. The demand rate for each type is set at 2 requests per day. All other parameters stay the same. (c) Different session durations – we have five treatment types, all with only one treatment session, but the duration of the session depends on the type, requiring between three and seven appointment slots. All other parameters remain the same. (d) Different wait time penalties – we study five treatment types, all of them consisting of five one-appointment-slot sessions. What varies is the wait time penalty, taking values of $10, $25, $50, $75 and $100 per day per appointment slot, respectively. All other parameters stay the same. (e) Different wait time targets – we still consider five treatments types consisting of five one-appointment-slot sessions each. The wait time penalties are set back at $50 per day per appointment slot, but the wait time targets are 5, 8, 10, 12 and 15 days, respectively. (f) Different demand rates – we have five treatment types, all of them consisting of two one-appointment-slot sessions. The demand rate varies depending on the type, taking values from three to seven requests per day, respectively. All other parameters remain the same as the base case. Figure 3.4 provides a graphical representation of Cin , the adjusted cost associated with starting a treatment of type i on day n, for each scenario. We focus our analysis on this coefficient since it is the most relevant when identifying the approximate optimal booking actions. The values of Hm are, with the exception of the value for day 1, very close to zero, if not zero. This, together with the fact that the values of Cin on day 1 are negative and lower than the overtime cost, allow us to conclude that the approximate optimal policy will allocate available treatment ca- 87 4000 type 5 2000 3000 10 15 20 25 1 5 10 15 20 25 (c) different session durations (d) different wait time penalties 4000 Workday duration penalty 2000 Coefficient Cin 3000 10 25 50 75 100 −1000 −1000 0 1000 2000 3000 7 6 5 4 3 1000 4000 Workday 0 5 10 15 20 25 1 5 10 15 20 Workday (e) different wait time targets (f) different demand rates 4000 Workday target rate 3 4 5 6 7 3000 Coefficient Cin 25 0 −1000 −1000 0 1000 2000 3000 5 8 10 12 15 2000 4000 1 1000 Coefficient Cin 3 4 5 6 7 −1000 −1000 1 Coefficient Cin sessions 0 1000 2000 Coefficient Cin 3000 1 0 Coefficient Cin (b) different number of sessions 1000 4000 (a) base case 1 5 10 15 20 25 1 Workday 5 10 15 20 25 Workday Figure 3.4: Graphical display of Cin , the adjusted cost associated with starting a treatment of type i on day n, for the six scenarios defined in Section 3.5.1. Scenarios are labeled as: (a) base case, (b) different number of sessions, (c) different session durations, (d) different wait time penalties, (e) different wait time targets, and (f) different demand rates. 88 pacity according to the values of Cin and use overtime whenever demand is present and regular-hour capacity is not available. The analysis of the results in Figure 3.4 suggests the following properties for the approximate optimal policy: (1) to book as much demand as possible on workday one and then within the wait time targets (scenario a); (2) the larger the number of sessions, the earlier the booking (scenario b); (3) the shorter the duration of each session, the later the booking (scenario c); and (4) the more urgent the treatment or the shorter the wait time target, the earlier the booking (scenarios d and e). The first property follows from the negative values of Cin , with the values associated with workday one being the most negative. The other three properties are due to the fact that longer, more intense and more urgent treatments have lower negative Cin values. Additionally, scenario f suggests that the approximate optimal policy does not depend on the demand rate associated with each treatment type. The drops in the values of Cin in scenario c are due to an abrupt decrease in the value of Um∗ in the last days of the booking horizon. Um∗ becomes less relevant late in the booking horizon since most treatments will be booked within the wait time target, leaving the system empty further into the future. The values of Hm , the adjusted cost associated with a one-unit increase in the system total capacity on day m, are almost identical across all scenarios (see Figure 3.5 for an example). The value of Hm on workday one is equal to the overtime cost. Then, as a consequence of the opportunity cost due to the loss of available overtime capacity and the benefit from freeing regular-hour capacity, Hm quickly decreases until it reaches zero. Late in the planning horizon, the opportunity costs vanish and Hm becomes equal to the discounted overtime cost hm . 3.5.2 A practical example This section describes the potential benefits from using the proposed solution approach by evaluating its performance for a practical-sized example based on an approximation to BCCA operations using historical data. The BCCA is a publicly funded organization with a mandate to provide a cancer control program for the people of British Columbia, Canada. This program includes prevention, early detection, treatment and various forms of care. Services 89 60 40 0 20 Coefficient Hm 80 100 Coefficient Hm 1 5 10 15 20 25 Workday Figure 3.5: Graphical display of Hm , the adjusted cost associated with a oneunit increase in the system total capacity on day m. This example corresponds to the scenario where treatment types have different demand rates. provided by the BCCA are delivered through five regional cancer centers. The instance of the problem analyzed in this section is based on all breast, head and neck, lung and prostate treatments registered and completed at the Vancouver Cancer Center (VCC) between April 1, 2009 and March 31, 2010 (2,061 cases in total, equivalent to 60% of the total number of treatments delivered within these dates). An analysis of the data identified 461 different capacity requirements out of 2,061 treatments delivered. For this reason, we chose to classify treatments based on urgency, cancer site and treatment intent and then to represent each class using the most frequent capacity requirements (treatment patterns) within that group. Table 3.3 describes the 18 treatment types obtained through this exercise (types are grouped according to common urgency as defined in Table 3.4). They account for 62% of the treatments in our data. Treatments of type 1, for example, represent urgent lung, prostate and breast palliative cases and consist of an initial session of two appointment slots plus four additional sessions of one appointment slot each. Appointments slots are 12-minutes long. In practice, the values defining 90 Table 3.3: Characteristics of the treatment types used to evaluate the performance of the proposed methodology. Type Capacity requirements Sessions/ slots Arrival rate [reqs./day] Most frequent case (site and intent) 5/ 6 1/ 2 4/ 5 0.19 0.11 0.11 Lung/Prostate/Breast Palliative (Urgent) 1 2 3 1×2 + 4×1 1×2 1×2 + 3×1 4 5 6 1×2 + 15×1 1×2 + 15×1 + 1×2 + 3×1 1×3 + 15×2 16/17 20/22 16/33 1.43 0.59 0.45 Breast Adjuvant 7 8 9 10 11 12 1×2 1×2 + 4×1 1×2 + 9×1 1×2 + 3×1 1×2 + 14×1 1×1 1/ 2 5/ 6 10/11 4/ 5 15/16 1/ 1 1.42 1.36 0.57 0.38 0.18 0.18 Lung/Breast/Prostate Palliative (Non-urgent) 13 14 1×2 + 19×1 1×3 + 34×2 20/21 35/71 0.29 0.21 Head and Neck Radical 15 16 17 1×2 + 32×1 1×2 + 36×1 1×2 + 21×1 + 1×2 + 14×1 33/34 37/38 37/39 0.30 0.29 0.15 Prostate Radical 18 1×2 + 32×1 33/34 0.04 Prostate Adjuvant the optimal value function approximation can be used to estimate the impact of today’s decisions on the future performance of the system not only for the treatment types considered in their computation but also for any other possible treatment type since they represent the marginal cost of resource consumption irrespective of the treatment type. Our approach thus provides a systematic way of identifying effective booking guidelines for a much larger set of treatment types. Between April 2009 and March 2010, the booking agents at the VCC faced an average demand of 8.25 treatment requests per day, for the above cancer sites. A Poisson distribution with rate 8.25 provided the best fit for the total number of requests per day (p-value = 0.724). We assumed the demand rate for each treatment type was proportional to the number of cases of each type and set a maximum number of arrivals, obtained from historical data, in order to maintain a finite state space. The demand process is thus described by the arrival rates given in Table 3.3. 91 Table 3.4: Wait time penalties. Types 1– 3 4– 6 7–12 13–14 15–17 18 [0,1] 0 0 0 0 0 0 Daily penalty within workday interval (1,5] (5,10] (10,20] (20,30] (30,40] 100 0 0 0 0 0 150 0 65 80 0 0 150 50 100 150 40 50 150 100 100 150 80 90 150 100 100 150 100 100 (40,100] 150 150 150 150 150 150 Regular-hour capacity is set at 120 appointment slots, which is equivalent to three identical treatment units operating eight hours a day. This number is almost six appointment slots lower than the average daily demand. The overtime capacity is 15 appointment slots or one extra hour per treatment unit. The overtime cost is set at 100 per appointment slot, the discount factor remains at 0.99 and no postponements are allowed. The booking and the planning horizon are set at 100 and 136 workdays, respectively. In practice, the approximate optimal policy has proved to be independent of the length of the booking horizon provided that it exceeds the wait time target of the least urgent treatment type. The wait time penalties are defined in Table 3.4. They were specified by Dr. Scott Tyldesley, an experienced radiation oncologist at BCCA, on the basis of existing Joint Collegiate Council for Oncology guidelines for acceptable waits, which provide good practice and maximal acceptable wait times for urgent, radical, palliative and adjuvant cases [4], and estimates of the importance of radiation therapy in relation to disease sites. Figure 3.6 shows the values of Cin for treatment types 1, 4, 7, 14, 17 and 18 (one type per wait time penalty scheme) for the first 25 workdays of the booking horizon. These six treatment types were chosen because they provide a good description of the overall booking preferences associated with each wait time penalty scheme. As in the previous section, we focus our attention on coefficient Cin since it is the most relevant when identifying the approximate optimal booking actions. The value of Hm in this case is equal to zero from workday two to far into the future. This, together with the fact that the values of Cin on workday one are negative and orders of magnitude lower than the overtime cost, makes evident that the ap92 Figure 3.6: Partial display of Cin , the adjusted cost associated with starting a treatment of type i on workday n, for types 1, 4, 7, 14, 17 and 18. proximate optimal policy for the evaluation instance will use overtime whenever demand is present and regular-hour capacity is not available. Figure 3.6 show how the values of Cin are strictly increasing for treatment type 1 and non-monotonic for all other types. This is a direct consequence of the differences in capacity requirements and the wait time penalties between types. Taking each type independently, the values of Cin identify the booking day preferences associated with each treatment class (see Figure 3.7). Treatments should be initiated according to these preferences and the available treatment capacity. Figure 3.7 suggests that treatments of type 1 should be initiated as soon as possible. Treatments of type 7, however, should be scheduled starting with workday one, then workday five, four, three and two. If a treatment of type 7 has not been scheduled within the first five workdays, then it should be booked on the first workday with available capacity after workday five. The preferences for treatment types 17 and 18 make evident the lack of capacity in the system. It is necessary to defer these two types of treatments in order to deliver more urgent types earlier. 93 Figure 3.7: Partial display of the booking day preferences for treatment types 1, 4, 7, 14, 17 and 18. For each type, a circle indicates the most preferable booking day and the arrows a recommended booking order. In order to demonstrate the impact of today’s decisions on the future performance of the system, which is one of the main contributions of the research ins this dissertation, Figure 3.8 shows the difference between the values of Cin associated with the approximate optimal policy and a myopic policy for the six treatment types shown in Figure 3.6. The differences reflect the importance of the opportunity cost that is not considered by the myopic policy. Under the approximate optimal policy less urgent treatment types such as 17 and 18 have higher Cin values early on in the booking horizon and therefore are less likely to be booked early. This observation supports the conclusions obtained from Figure 3.6. The benefits from the proposed method are evaluated by simulating the performance of the scheduling system under the approximate optimal policy. Results are compared to the myopic policy. The simulation length was set at 1,500 days and statistics were collected for each of 10 runs after a warm-up period of 750 days. Table 3.5 demonstrates that the approximate optimal policy outperforms the myopic policy with respect to the total number of cases initiated within 1, 5 and 94 Difference bewteen Coefficients 0 −200 −600 −400 Difference −− [Cin − (cin − gi)] 200 type 1 4 7 14 17 18 1 5 10 15 20 25 Workday Figure 3.8: Difference between the capacity allocation coefficients defining the approximate optimal policy and the myopic policy for treatment types 1, 4, 7, 14, 17 and 18. 10 workdays (the bold font indicates the policy that provides the highest service level for each treatment type and wait time target). The same improvement is observed for most treatment types individually. The average percentage of treatments initiated within 1, 5 and 10 workdays increase from 5% to 26%, 29% to 53% and 73% to 96%, respectively. This is achieved at the expense of a negligible but statistically significant increase in the average overtime utilization of 3 minutes a day. The approximate optimal policy achieves better service levels by scheduling less time-sensitive treatments later in the planning horizon, allowing other more important treatments to be scheduled earlier. Treatments of types 15 to 18 are booked starting on workday 10 allowing most of the treatments of types 7 to 14 to be scheduled within the first five workdays. The main drawback with regard to the proposed policy is the increase in the wait times for treatments of types 1 to 3. However, the fact that urgent treatments wait slightly longer demonstrates a willingness to trade-off a small increase in wait time for urgent treatments for a larger gain for less urgent treatments. 95 Table 3.5: Simulation results. Only the results for which a significance test shows the mean service level is better are highlighted (α = 0.05). Type 1 workday % of the cases initiated within 5 workdays 10 workdays 15 workdays 20 workdays myopic policy 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 70 ± 10 82 ± 8 72 ± 12 1± 2 1± 1 1± 1 2± 2 2± 2 2± 2 2± 2 3± 2 4± 2 1± 1 1± 1 1± 1 0± 0 1± 1 1± 1 94 ± 4 95 ± 5 95 ± 5 4± 3 4± 3 4± 3 75 ± 4 25 ± 9 16 ± 10 37 ± 8 15 ± 10 92 ± 2 19 ± 12 16 ± 10 2± 2 2± 1 2± 2 3± 3 100 ± 0 100 ± 0 100 ± 0 57 ± 9 55 ± 10 53 ± 10 100 ± 0 76 ± 5 62 ± 9 89 ± 2 61 ± 10 100 ± 0 90 ± 3 61 ± 9 54 ± 9 52 ± 10 52 ± 8 55 ± 11 100 ± 0 100 ± 0 100 ± 0 98 ± 2 98 ± 2 97 ± 3 100 ± 0 100 ± 1 99 ± 2 100 ± 0 99 ± 1 100 ± 0 100 ± 0 99 ± 2 97 ± 3 97 ± 3 97 ± 3 98 ± 2 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 total 5± 2 29 ± 4 73 ± 6 99 ± 1 100 ± 0 Type 1 workday % of the cases initiated within 5 workdays 10 workdays 15 workdays 20 workdays approximate optimal policy 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 66 ± 7 75 ± 9 66 ± 11 17 ± 6 11 ± 5 12 ± 5 43 ± 9 31 ± 9 27 ± 8 38 ± 8 25 ± 9 36 ± 7 23 ± 9 11 ± 4 0± 0 0± 0 0± 0 0± 0 81 ± 8 84 ± 8 80 ± 10 17 ± 6 11 ± 5 12 ± 5 82 ± 8 79 ± 9 78 ± 9 81 ± 8 77 ± 9 97 ± 2 80 ± 8 77 ± 9 0± 0 0± 0 0± 0 0± 0 97 ± 2 97 ± 2 97 ± 3 95 ± 4 94 ± 5 94 ± 5 97 ± 3 96 ± 3 96 ± 3 96 ± 3 96 ± 3 100 ± 0 97 ± 3 95 ± 3 94 ± 5 94 ± 4 94 ± 5 93 ± 6 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 100 ± 0 total 26 ± 7 53 ± 6 96 ± 3 100 ± 0 100 ± 0 96 The results show no statistically significant difference between the two policies in terms of regular-hour capacity utilization. Both policies use on average about 99.5% of the available regular-hour capacity. However, the approximate optimal policy outperforms the myopic policy with respect to the average discounted cost (p-value = 0.000). The average discounted cost associated with the approximate optimal policy is $121,973.57 and the average discounted cost associated with the myopic policy is $185,843.06. The total time required to identify the approximate optimal policy was 1 hour and 45 minutes and the total time needed to simulate each policy was 3 hours and 20 minutes. The computer used to run our algorithm was a 3.00GHz Quad Core PC with 16GB of RAM. The simulation model was also implemented using GAMS 23.5 and CPLEX 12.2. As a final remark, we must note that we have been able to identify an analytical solution for the values of the approximation parameters defining the optimal value function approximation, at least for the problems analyzed in this chapter. Its partial description is given by equations (3.24) to (3.26), where l and T are the length and the wait time target associated with the most time-sensitive treatment type. Um∗ = U∗ m+l (m−1) λ h Vm∗ = 0 Wi∗ = 0 m = 1, . . . , (T − 1) m = T , . . . , (M − 1) m=M ∀m Ti +li −1 ∑ (3.24) (3.25) ri(k+1−Ti )Um∗ ∀i (3.26) k=Ti The optimal value of W0 is not listed since it is not part of the definition of the coefficients characterizing the approximate optimal policy. An immediate extension to the methodology presented in this chapter would be to prove the analytical structure of the coefficients defining the optimal value function approximation. 97 3.6 Conclusion This chapter describes the use of the linear programming approach to ADP as a means of solving a radiation therapy appointment scheduling problem, a problem computationally intractable using standard methods. Starting from the linear programming approach to discounted infinite-horizon MDPs and employing an affine structure to represent the value function in the equivalent linear programming model, our models provide a systematic way of identifying effective booking guidelines for radiation therapy. These guidelines could be used in practice to significantly reduce wait times for radiation therapy and thus to decrease the impact of delays on cancer patients’ health. The results presented in Section 3.5.2 show how the approximate optimal policy outperforms the myopic policy which is an approximation of the current practice. The percentage of treatments initiated within the clinical benchmark (10 workdays) increases, on average, from 73% to 96% under the proposed policy. This increment brings with it a negligible but statistically significant increase in the average overtime utilization of 3 minutes a day. Although primarily adapted to radiation therapy treatment scheduling, the linear programming approach presented in this chapter is applicable to any capacity allocation problem involving multiple priority classes and multiple appointment requirements (appointments do not need to be on consecutive days). An alternative application of our models could be in the definition of dispatching rules (production plans) in manufacturing systems. They could also be useful in supporting inpatient admissions and post-operative bed allocation decisions in hospitals. It is important to note that the setting used to evaluate the benefits from the proposed method does not reflect an actual stand alone cancer center. It best represents a small center with identical machines catering to specific cancer sites or a subsection of a larger cancer center with identical machines to which specific cancer sites are preferentially booked. The model also avoids the additional complexity of explicit planning time considerations. Possible extensions involve elements of the original problem that were excluded from our formulation. The inclusion of multiple treatment units and unit compatibility restrictions, in addition to wait lists, would significantly increase the 98 complexity of the problem. We are also interested in proving the analytical solution for the coefficients defining the optimal value function approximation, computing better bounds on the optimality gap and applying other approximate dynamic programming techniques to the radiation therapy appointment scheduling problem. Our models could also be extended to consider cancellations and no-shows. 99 Chapter 4 The Appointment Scheduling Game This chapter describes the Appointment Scheduling Game (ASG), a teaching tool we developed to illustrate advance patient scheduling concepts and practices in health care systems in which patients are characterized by their urgency level. The game introduces the problem faced by a booking agent in an appointment scheduling office and offers students the opportunity to learn by doing. While playing the game, students are engaged in a simulated and readily understandable experience of a real scheduling problem. They must assess the impact of their decisions in order to efficiently allocate available service capacity to different patient categories and thus provide good service levels. Played with a pen, paper, a printed calendar, multi-colored poker chips and a die, the ASG illustrates the main advance patient scheduling challenges in systems with multiple patient categories, random patient arrivals and limited service capacity. It also provides instructors with a platform to discuss some basic concepts of Markov decision processes and decision analysis. The game has been very useful for teaching not only health care professional and business students but also health care operations research students. To date, it has been used in several courses at the University of British Columbia (UBC), including “Managing Health Care System Operations” (MBA Program), “Managing Patient Flow” (Executive MBA in Health Care Program) and “Logistics and Operations Management” (Commerce Undergraduate Program). It has also been 100 used by colleagues at the University of Ottawa, McGill University and University of Michigan. The game provides an excellent platform to link several management science and operations research concepts with a real-world problem. 4.1 Introduction Health care systems are growing in complexity, size and funding needs. As a consequence, health care organizations are now demanding more from their managers – especially around business fundamentals – to efficiently coordinate scarce resources and administer tight budgets. Clinical background and extensive experience are now not sufficient to handle the complexity of operations. Increasingly, health care managers are seeking business education to gain new leadership skills and to learn new decision analysis tools. One of the most challenging problems facing health care organizations nowadays is the efficient allocation of medical resources to patient needs. This is due to the complexity introduced by a wide range of services, different categories of urgency, high patient volumes and resource heterogeneity and interactions. Poor capacity allocation decisions are usually associated with increased patient wait times, wasted resources and, potentially, a deterioration in patients’ health. Operations Research and Management Science (ORMS) concepts and methodologies address such resource allocation issues. They have been effectively used, for example, in manufacturing, hospitality, airlines and retail industries. However, despite their importance, their dissemination to the health care community has not been completely successful [27]. Managerial concepts such as efficiency, optimal resource allocation, cost of opportunity and decision rules are usually difficult to explain in abstract terms, especially to health care professional and students without prior business or analytical background. In this chapter, we recognize the need to transfer research and case studies of successful implementation, together with educational tools, to health care professionals and business students. We describe the Appointment Scheduling Game (ASG), a teaching exercise we developed to illustrate advance patient scheduling practices in health care systems in which patients are characterized by their urgency level. The game introduces students to the impact of uncertainty on patient 101 scheduling decisions and reinforces the importance of managing variability. It provides instructors and students with a vehicle for discussing appointment scheduling rules, levers for managing capacity, surge capacity, performance measurement, challenges measuring wait times and simulation. Through the game, students assume the role of a booking agent and learn about advance patient scheduling decisions and their challenges. The remainder of this chapter is organized as follows. In Section 4.2, we present a literature overview of the use of classroom games to teach ORMS concepts. Then, in Section 4.3, we provide a detailed description of the game. The description includes the tools used to play it and the game’s target audiences and corresponding learning objectives. In this section, we also characterized the two sessions into which the game is organized and discuss some possible extensions. In Section 4.4, we comment on our past experiences using the game. Finally, in Section 4.5, we conclude Chapter 4 with a brief discussion on the challenges and lessons we learned developing and teaching the game and some of the steps we are undertaking to address them. 4.2 Literature review Classroom games are used in many different fields to facilitate learning of abstract topics. They are very common in disciplines such as Applied Mathematics, Operations Research, Management Science, Computer Science and Artificial Intelligence. They can be particularly useful for illustrating complex ideas not easily taught using traditional teaching methods. Games have been used to introduce concepts such as decision making under uncertainty and dynamic programming (“the counterfeit coin problem”), computer science recursive algorithms (“the towers of Hanoi”) and constraint programming (“N-queens problem”), to mention just a few examples [58]. In the operations management field, perhaps the most famous example is the “Beer Game” [3] – a role-playing game developed at MIT in the 1960’s to clarify the advantages of taking an integrated approach to managing a supply chain. Close in popularity, but with a completely different objective, are the numerous versions of the television game show “Who Wants to Be a Millionaire” [12]. This game although is commonly used as an effective way to capture 102 the interest of the students rather than for teaching specific concepts. Additionally, examples based on the television game show “But Who’s Counting” are used in Puterman [49] to illustrate basic concepts of Markov decision processes. Even though games, puzzles and paradoxes have been long recognized as important tools for developing educationally rich material, the literature in ORMS in general – and introductory textbooks in particular – scarcely makes use of them. This is mainly because realistic problems tend to be too complex and therefore not particulary suitable for teaching material or because they require complicated solution algorithms. We refer the reader to Griffin [26] for a discussion of the use of games in teaching ORMS concepts and related references. Often, the most educationally effective games are the simplest. For example, a large number of production issues can be easily discussed by carrying out a task as simple as setting up an assembly line where students fold paper to simulate airplane manufacturing [26]. The ASG is a very simple exercise that challenges students to schedule appointments in a simplified health care setting. To the best of our knowledge, this is the first game designed to analyze advance patient scheduling decisions and their challenges. A somewhat related game is the “Operating Room Manager Game” [28]. This game, however, is intended to provide managerial insights into administering a large hospital’s operating room department, at various levels of control, rather than focused on operational decisions and their challenges. 4.3 Game description The ASG considers a hypothetical medical center with a fixed capacity of three appointment slots per day. Patients requesting appointments are classified into three urgency categories, each with a different wait time target. Students are divided into groups. Each group is asked to assume the role of a single booking agent who must assess the impact of his/her decisions in order to efficiently allocate available service capacity to incoming appointment requests. The challenge of allocating activities to group members is implicit. Since no performance metrics are given in advance, students are forced to develop their own metrics and the appropriate metrics to use becomes a topic of discussion during the game. The game is played with a pen, paper, a printed calendar, multi-colored poker 103 chips and a die. Pen and paper are used to keep a record of performance metrics. The calendar is employed to keep track of unused future capacity. The number of days on it must allow students to simulate the booking agent’s task for at least twenty days. Also, the size of each day’s cell must be larger than a poker chip (see Figure 4.1). Multi-colored poker chips represent patient requests, with different colors indicating different patient wait time targets. Each chip symbolizes a different appointment request. The die is used to generate random patient arrivals. SUN MON JULY 2012 WED TUE THU FRI SAT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Figure 4.1: Sample ASG calendar. Three different chip colors (red, blue and white) are used to indicate three different urgency categories according to Table 4.1. Patients of category 1 are the most urgent and are associated with a wait time target of 2 days. Patients of category 2 are the second most urgent and are associated with a wait time target of 4 days. Patients of category 3 are the least urgent and are associated with a wait time target of 6 days. The game assumes that each patient requesting service needs only one appointment slot. The game is played in two sessions of about thirty minutes each. In each ses104 Table 4.1: The ASG patient representation. Chip color Urgency category Wait time target Red Blue White 1 2 3 2 days 4 days 6 days sion, groups are asked to simulate the booking agent’s task for a period of ten days. For each simulated day, groups must complete the following steps: 1. Roll the die. 2. Reach into a sack or envelope containing the multi-colored poker chips and pick out as many chips as the number that appeared on the die. 3. Place the chips on the calendar without exceeding three chips – the daily capacity – on any day. To simplify things, students must assume that appointments can be scheduled on any calendar day, including weekends. They are not allowed to re-schedule patients (move chips from one day to another) or postpone booking decisions to the next simulated day. 4. Move to the next day and go to Step 1. Making good appointment scheduling decisions in Step 3 requires significant foresight. This is because each day’s decisions will clearly impact the number of appointment slots available for future demand. For example, if category 3 patients are booked too soon, then there may be insufficient capacity in the first two days later for category 1 demand. Conversely, if category 3 patients are booked too far into the future, there is the potential for idle capacity. We advise instructors to adopt a passive role during the game, so students engage in a more independent decisionmaking process. In this way, they will rely on their own experience playing the game to make better decisions. The diagram in Figure 4.2 represents the game “board” after a few simulation periods. 105 Figure 4.2: The ASG “board”. The calendar helps keep track of the number of appointment slots occupied on each day, the multi-colored poker chips represent patient requests and the dice is used to generate random patient arrivals. The numbers on the calendar days indicate the number of chips placed on each day. The arrow indicated the current booking day. 4.3.1 Target audiences and learning objectives The game was originally designed for health care professionals and business students. However, it has also been used to teach some basic concepts of advance patient scheduling and Markov decision processes to health care operations research students. We next describe the learning objectives for each specific audience. 106 Health care professionals and business students This target audience corresponds to health care professionals and business students (graduate and undergraduate) interested in working in management or analysis in the health care sector or interested in exploring some useful concepts for service operations management. The main objectives of the game in this case are twofold. First, to introduce students to the challenges of capacity allocation problems in general – and advance patient scheduling problems in particular – including the challenges in measuring wait times and assigning priorities to different patient types. Second, to provide students with an idea of good appointment scheduling rules and performance metrics for this type of systems. To achieve these goals, the game enables students to explore, by playing in realistic conditions, the most relevant challenges faced when dealing with advance appointment scheduling decisions. Since no performance metrics are given in advance, they are forced to develop their own metrics and rules in order to make better decisions. The game also provides instructors with a platform to generate discussion about the characteristics of good scheduling rules, appropriate performance metrics and other relevant concepts associated with this type of problems. Although its focus is primarily on advance patient scheduling and capacity allocation decisions, the game can also be used to introduce some basic concepts of process mapping (flowcharts and timelines), decision analysis (modeling and probability assessment), statistical analysis (descriptive statistics, variability and statistical significance) and simulation (random numbers and sample estimates). Operations research students This target audience corresponds to Masters and Doctoral students in health care operations research or related disciplines interested in understanding some basic concepts of advance patient scheduling and/or Markov decision processes. In this case, the game objectives are also twofold. First, to train students in dynamic programming modeling techniques. Second, to provide students with a framework for discussion regarding the complexity of advance patient scheduling problems, dynamic programming models and solution algorithms. To this end, the game offers students a better way of capturing the dynamic nature of the patient scheduling 107 problem than, for example, case studies or problem sets. Many students may find it difficult to develop mathematical models from a given “wordy” description of a problem. Ideally, graduate students would have some experience either working in the health care sector or dealing with scheduling problems. If this is not the case, we suggest giving them an introductory reading on appointment scheduling in health care. We recommend Gupta and Denton [27]. We refer instructors to Patrick et al. [44] for an MDP formulation of the advance patient scheduling problem. 4.3.2 Rounds of the game The game is organized in two sessions of about thirty minutes each, regardless of the target audience. A 15-20 minute discussion is recommended at the end of each session. Recommended discussion questions are separated by target audience. Session 1: Warm-up Starting from a pre-specified day, students must repeat the live simulation process described earlier in this section until each group has scheduled all the patients whose appointment requests arrive on the 10th day after the initial date. The main objectives of this session are twofold. First, to give students time to become familiar with the game and to fully understand how to play it (i.e. to fully understand the booking agent’s task). Second, to initialize the system by partially filling the current appointment schedule. Some suggested discussion questions for instructors to use with health care professionals and business students are: • What is realistic and what is unrealistic about the game setting? • In what situations is the type of scheduling considered by the game relevant? • What levers for managing capacity are available to regulate the scheduling process? • What are relevant performance metrics? In addition to quickly answering the four questions above, operations research students should be asked to formulate the advance patient scheduling problem as an infinite-horizon MDP. To do this, they should be told to assume the following: 108 • The booking agent receives appointment requests from I different patient categories. The daily number of requests from patients of category i follows a discrete probability distribution. The wait time target (maximum recommended wait) of category i patients is Ti days. • The number of appointment slots booked on any given day cannot exceed a daily capacity of C units and the booking agent may schedule appointments at most N days in advance. As part of their formulations, operations research students should provide a detailed timeline of the booking agent’s task and a cost structure that would allow them to consider different patient categories and maximize the number of patients booked within the corresponding target times. The MDP models should be clearly stated in terms of decision epochs, state variables, action sets, transition probabilities and optimality equations. Some performance metrics that should be part of the discussion at the end of this session are: average wait times, services levels (percentage of patients booked within the corresponding wait time targets), average capacity utilization and average time to first or second available appointment. This last performance metric was suggested by one of our students in one of the last versions of the game. Session 2: Play Students must continue the simulation process for ten more days until they have scheduled all patients whose appointment requests arrive on the 20th day after the initial date. In this occasion, students must use a pen and paper to keep track of the information they consider to be relevant to measure the performance of the system, including demand. The main objectives of this session are also twofold. First, to allow students to apply what they have learned from the previous session. Second, to generate a discussion regarding the characteristics of good scheduling rules and their performance under different scenarios. Some recommended discussion questions to use with health care professional and business students are: • What information should a booking agent keep track of? 109 • What scheduling rules would you use to book patient appointments? • How do these scheduling rules perform? • How should we expect the performance of the system to be in the long run? Why? By the end of this session, students would have noticed that the system is undercapacitated. Note, however, that this may not always be the case since some students may lack a good background in probabilities. The average number of daily requests is 3.5, 0.5 unit higher than the system daily capacity (the current version of the game does not consider any source of surge capacity). For this reason, students should not expect any scheduling rule to perform well in the long run. Despite this, we encourage instructors to discuss the advantages and disadvantages of the following four patient scheduling rules: • Policy 1. Book patients as soon as possible, in increasing urgency order. This rule corresponds to a myopic policy. It ignores the impact of today’s decisions on the future performance of the system. • Policy 2. Book as many category 1 patients as possible into days one and two, starting with day one. For each successive patient category, book incoming demand into any available appointment slot within the corresponding wait time target, starting with day one, then the target day, and working down to day two. Book any remaining demand late (as soon as possible). This policy corresponds to the booking guidelines described in Section 2.3. • Policy 3. Book patients as soon as possible, in increasing urgency order, on the day associated with the minimum number of bookings within the corresponding wait time targets. Book any remaining demand late. This rule corresponds to the DMB policy defined in Section 2.6. • Policy 4. Book patients as soon as possible, in increasing urgency order. Leave one unit of capacity free for later-arriving category 1 demand on each day but day one. This rule is characterized by a protection level of one capacity unit for category 1 patients. 110 In addition to discussing some of the questions intended for health care professionals and business students, operations research students should be asked to describe a methodological approach they would use to solve their models. Discussion should focus on the following questions: • What are the main computational challenges they should expect to face? • How could they deal with these challenges? • Would it be possible to find the optimal decisions for every possible state of the system? • What is the possible form of the optimal patient scheduling policy? • How should we expect this policy to be affected by different parameters values? Additionally, operations research students could be asked to discuss how they would modify their models to incorporate the following elements: the possibility of diverting patients at a given cost, the possibility of delaying scheduling decision for a few days, patients who receive service across multiple days and for irregular lengths of time and cancellations and no-shows. 4.3.3 Student Evaluation Since the ASG is intended to motivate managerial concepts and to illustrate some basic theoretical concepts, as opposed to demonstrating mastery of specific theory, we would recommend not grading any of the activities that are part of it. If a grade is necessary, students in either audience could be evaluated through a written essay regarding patient scheduling strategies and their performance in practice. Operations research students could also be asked to provide a revised version of their MDP models and a brief discussion regarding the challenges of solving advance patient scheduling problems. 4.3.4 Supporting materials During the last couple of years, we have generated various supporting materials for instructors interested in using the game. These include a couple of Microsoft Pow111 erPoint presentations, a game animation, simulation models and several result sets. All available upon request. The simulation models were developed to study the performance of the system under different patient scheduling policies. Their implementation was initially carried out in GAMS, a high-level modeling system for mathematical programming and optimization. The game animation was developed in Microsoft Excel using Microsoft Visual Basic. We have created four result sets to discuss and compare the performance of policies 1 to 4 described earlier in this section in terms of mean average wait times, mean service levels (percentage of patients booked within the corresponding wait time targets) and mean average daily capacity utilization. The first corresponds to the game setting. The second corresponds to the game setting but assuming a daily capacity of 4 appointment slots. The third corresponds to the game setting but this time assuming that demand is generated from a Poisson distribution with mean 3 appointment requests per day. The last corresponds to the small outpatient clinic described in Section 2.6. Result sets were generated by simulating the scheduling system under each policy for 100 750-day runs with statistic collected after a warmup period of 250 days. The overall conclusions regarding policies 1 to 4 clearly depend on the system setting. Simulation results are presented in figures 4.3 to 4.6 and summarized in tables 4.2 to 4.5. In the original game setting, the system is under-capacitated. Thus, all policies perform equally badly (see Figure 4.3 and Table 4.2). The mean average daily capacity utilization is 100%, the mean service levels are 0% and the mean average wait times are longer than 100 days (and increasing). Policy 4, the protection level, provides a slightly shorter mean average wait for category 1 patients but its value is still over 90 days. When the daily capacity is four appointment slots, the system is over-capacitated. In this case, all policies perform well (see Figure 4.4 and Table 4.3). The mean average daily capacity utilization is about 3.5 appointment slots (equal to the expected number of requests). All policies provide high mean service levels and reasonable mean average wait times. Policy 1, the myopic policy, is the best in terms of mean average wait times, however, it generates a slightly lower mean service level for category 1 patients. The main drawback of policies 2 and 3 in this case are the sometimes unnecessary long wait times for lower urgency categories, as already discussed in Chapter 2. Since a detailed discussion of the 112 simulation results is out of the scope of this chapter, we leave to instructors the task of analyzing the other two result sets. We refer instructors interested in knowing more about advance patient scheduling problems to Gupta and Denton [27], Patrick and Puterman [43], Patrick et al. [44], Erdelyi and Topaloglu [22] and Saur´e et al. [52]. We recommend Gupta and Denton [27] as an introductory reading on patient scheduling. This paper does an excellent job at providing background on appointment scheduling problems. Figure 4.3: Performance of policies 1 to 4 for the original game setting. 113 Figure 4.4: Performance of policies 1 to 4 for the game setting but assuming a daily capacity of 4 appointment slots. 114 Figure 4.5: Performance of policies 1 to 4 for the game setting but assuming that demand is generated from a Poisson distribution with mean 3 appointment requests per day. 115 Figure 4.6: Performance of policies 1 to 4 for the small outpatient clinic described in Section 2.6. 116 Table 4.2: A comparison of policies 1 to 4 for the original game setting. Criterion Category Policy 1 Policy 2 Policy 3 Policy 4 Avg. capacity utilization – 3.00 ± 0.00 3.00 ± 0.00 3.00 ± 0.00 3.00 ± 0.00 Avg. wait times 1 2 3 107.29 ± 25.59 107.97 ± 26.06 108.12 ± 25.52 107.97 ± 25.43 108.65 ± 25.91 108.81 ± 25.35 107.47 ± 25.56 108.15 ± 26.04 108.30 ± 25.50 90.50 ± 30.48 116.38 ± 28.03 116.70 ± 27.61 Service levels 1 2 3 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 Table 4.3: A comparison of policies 1 to 4 for the game setting but assuming a daily capacity of 4 appointment slots. Criterion Category Policy 1 Policy 2 Policy 3 Policy 4 Avg. capacity utilization – 3.51 ± 0.12 3.51 ± 0.12 3.51 ± 0.12 3.51 ± 0.12 Avg. wait times 1 2 3 1.38 ± 0.25 1.59 ± 0.28 1.86 ± 0.27 1.31 ± 0.30 2.19 ± 0.38 4.15 ± 0.43 1.16 ± 0.20 1.68 ± 0.31 2.98 ± 0.47 1.24 ± 0.18 1.62 ± 0.31 1.96 ± 0.32 Service levels 1 2 3 92.76 ± 8.63 99.25 ± 2.52 99.97 ± 0.59 93.35 ± 8.40 97.95 ± 4.51 99.74 ± 1.31 96.92 ± 5.61 99.07 ± 2.89 99.92 ± 0.82 95.99 ± 5.55 98.71 ± 3.43 99.89 ± 0.85 Table 4.4: A comparison of policies 1 to 4 for the game setting but assuming that demand is generated from a Poisson distribution with mean 3 appointment requests per day. Criterion Category Policy 1 Policy 2 Policy 3 Policy 4 Avg. capacity utilization – 2.98 ± 0.07 2.98 ± 0.07 2.98 ± 0.07 2.97 ± 0.07 Avg. wait times 1 2 3 12.77 ± 16.12 13.11 ± 16.15 13.51 ± 16.38 13.68 ± 16.31 13.68 ± 16.31 14.84 ± 15.90 12.99 ± 16.26 13.44 ± 16.12 14.09 ± 15.97 7.84 ± 10.12 16.31 ± 19.66 17.03 ± 19.93 Service levels 1 2 3 10.75 ± 26.85 19.75 ± 43.26 28.13 ± 54.19 10.46 ± 26.45 15.00 ± 35.49 22.44 ± 47.41 13.23 ± 32.42 18.58 ± 41.87 26.20 ± 51.95 25.2 ± 37.67 15.70 ± 34.81 20.35 ± 45.63 117 Table 4.5: A comparison of policies 1 to 4 for the small outpatient clinic described in Section 2.6. Criterion Category Policy 1 Policy 2 Policy 3 Policy 4 Avg. capacity utilization – 9.96 ± 0.11 9.95 ± 0.12 9.95 ± 0.11 9.96 ± 0.11 Avg. wait times 1 2 3 7.92 ± 8.35 8.27 ± 8.39 8.60 ± 8.38 9.00 ± 11.94 14.26 ± 7.57 20.53 ± 3.46 6.47 ± 10.68 10.73 ± 8.79 18.76 ± 4.00 6.73 ± 7.95 8.79 ± 8.61 9.26 ± 8.57 Service levels 1 2 3 54.91 ± 66.62 87.04 ± 44.92 97.10 ± 19.26 54.84 ± 63.15 69.52 ± 60.01 88.91 ± 39.55 67.59 ± 61.33 80.06 ± 51.75 93.82 ± 30.85 63.80 ± 63.52 84.52 ± 48.42 96.27 ± 23.52 4.3.5 Game variations There are at least two natural variations of the game. The first one is a competitive version of it in which all groups are given the same initial conditions (patient schedule) and patient arrivals. Students compete in terms of service levels or wait time penalties. The group with the highest service levels or the lowest wait time penalty by the end of the game wins. This version of the game seeks to use competition as a way to engage students in a more thoughtful decision-making process. The second variation considers the existence of an alternative capacity source. Groups are given the possibility of diverting patients to this new capacity source, but the total number of diversions is limited. Thus, students must additionally decide when to divert patients and what type of patient to divert. 4.4 Our experience using the game To date, the ASG has been used at UBC in the following courses: “Managing Health Care System Operations” (MBA Program), “Managing Patient Flow” (Executive MBA in Health Care Program) and “Logistics and Operations Management” (Commerce Undergraduate Program). It has also been used by instructors at the University of Ottawa, McGill University and University of Michigan. In 2011, the game was used in the 2011’s Healthcare Operations and Information Management’s Summer School, an NSERC CREATE initiative that brought together more than thirty Doctoral students and Post-doctoral fellows from North America and 118 Europe for advance training in health care operations and information management. Based on our experience using the game, informal feedback and course evaluations, we believe the ASG is a very useful tool for illustrating and addressing advance patient scheduling and capacity allocation challenges that are difficult to communicate using traditional teaching methods. The game, in this sense, has proven to be a very effective enhancement to traditional lecture based teaching. In general, student reactions to the game have been favorable. Most students enjoy playing the game and the challenges it brings. We think one of the main reasons for its success is the fact that it is very easy to implement and play. Additionally, students do not necessarily need any health care experience or any background on patient scheduling in order to play. The realism of the game creates motivation and excitement, encouraging students to participate. It also generates a rich discussion. The game is clearly better suited to capturing the dynamics of scheduling problems than static case studies or problem sets. 4.5 Conclusion Supported by our experience using the game and feedback provided by students and instructors, we believe the ASG has become a very effective method for engaging health care professionals and business students in patient scheduling and capacity allocation concepts. The game has also the potential to become a very useful tool to introduce operations research students into some basic concepts of advance patient scheduling and Markov decision processes. Overall, the game provides instructors with a clear and easily understandable platform to link abstract concepts with real-world problems and students with an experience otherwise unattainable in a classroom. We anticipate the game will continue to be used in the courses listed in Section 4.4 and expect the game to start to be used in courses such as “Decision Analysis” and “Markov Decision Processes” (Master of Management in Operations Research) at UBC and in other universities in North America. Despite the game’s success, there are still a couple of challenges that need to be addressed. The first one has to do with the enhancement of the game’s effectiveness and accessibility as a teaching tool. To address this challenge, we will 119 soon start the implementation of an improved web-based version of the game. We believe that with appropriate instructional, graphical and interface designs and the power and connectivity provided by computers in general and web platforms in particular, the game will offer instructors powerful educational tools to motivate and communicate the intended concepts. Through a web-based implementation, a wider selection of scenarios could be offered, each allowing different conditions. These additional settings could be used to illustrate a larger variety of situations, each with their own challenges. In addition, an improved version of the game using web technologies would make the game easily accessible at no cost to instructors at UBC and other universities. A web-based implementation will allow instructors to achieve the full potential of the game and enable students to have an enhanced interactive experience. The second challenge to be addressed is also related to the educational effectiveness of the game. To date, we have not implemented any formal methods to evaluate the game’s impact in relation to each learning objective. To address this challenge, questionnaires to measure students’ understanding of the intended concepts and to provide feedback on all aspects of the game will be developed, administered and evaluated. Instructor’s feedback will be sought as well. Based on constructive feedback, objectives, instructions and activities will be revised and re-evaluated periodically. The educational effectiveness of game will also be indirectly evaluated trough the number of courses per year where the game is used, the number of students per year that use the game and the number of use requests per year received from other universities. 120 Chapter 5 Conclusions, Extensions and Further Applications The research in this dissertation focused on two alternative techniques for solving dynamic multi-priority patient scheduling problems, problems computationally intractable using standard methods. In Chapter 2, we considered a simple multipriority patient scheduling problem and described a simulation-based solution algorithm based on a parametric non-linear value function approximation, whereas, in Chapter 3, we considered a more general multi-priority patient scheduling problem involving multiple appointment requirements and different service durations and described a linear programming solution method based on an affine value function approximation. While the simulation-based method was developed in terms of a post-decision state variable, the linear programming method was formulated in terms of a pre-decision state variable. A post-decision state formulation allows the simulation-based method to consider a much smaller state space and avoid the need to compute the expectation in the right-hand side of the optimality equations, simplifying the challenge of creating good value function approximations. To the best of our knowledge, the research presented in chapters 2 and 3 makes a significant contribution to the literature in patient scheduling in five aspects: • It describes an alternative approach for solving a dynamic multi-priority patient scheduling problem. 121 • It considers the use of a parametric non-linear functional form to approximate the value function in the MDP formulation of this problem. • It compares the performance of policies derived using two alternative ADP approaches, the simulation-based approach and a linear programming approach, for the same problem. • It extends the MDP model and solution approach developed by Patrick et al. [44] for the multi-priority patient scheduling problem by considering patients who receive service across multiple days and for irregular lengths of time, and by allowing the possibility of using overtime on different days of the booking horizon. • It introduces a novel application of the linear programming approach to ADP to a problem, the radiation therapy patient scheduling problem, that has received limited attention in the literature. The results in Chapter 2 suggest that the value function for a simple multipriority patient scheduling problem can be better represented by a generalized logistic function than by an affine function. They also show that booking policies obtained through a simulation-based method using this type of approximation architecture perform better, in most cases, than other booking policies for this problem. In particular, they provide lower discounted cost values and shorter average wait times for higher priority patients than policies derived directly from a linear programming approach using an affine value function approximation. In general, patient scheduling policies derived from a logistic function approximation achieve better performance levels than other policies for this problem by booking lower priority patients earlier, highest priority patients late and by preemptively diverting patients of all types, all depending on the level of congestion of the system. The results in Chapter 3 show that a linear programming method using an affine value function approximation in the state components provides an effective way of identifying patient scheduling guidelines for radiation therapy. Patient scheduling guidelines derived from this method achieve better service levels, measured as the percentage of patients of each type booked within different wait time targets, than 122 the current practice for this problem by scheduling treatments for less urgent patients later in the booking horizon, allowing the treatments for more time-sensitive patients to be scheduled first. The results suggests that the proposed method could potentially be used in practice to reduce patient wait times and thus to decrease the potential impact of waits on radiation therapy patients’ health. By adequately assessing the future impact of today’s decisions, booking policies derived from the ADP methods in chapters 2 and 3, although possibly suboptimal, perform significantly better than the corresponding myopic policies (usually a representation of the current practice). They provide clear guidelines as to when to book patients of each priority class and when to resort to overtime or to any other alternative source of capacity. These policies, as initially hypothesized, usually book lower priority patients further into the future allowing the appointment for more urgent patients to be scheduled earlier. In this way, they achieve reasonable average wait times at a much lower cost. In practice, patient scheduling guidelines obtained through the proposed methods could potentially reduce patient wait times and, in some cases, help decrease the impact of delays on patients. The models and methods in chapters Chapter 2 and Chapter 3 have been developed as a proof of concept. Their application to particular medical facilities would require further research but it would certainly be plausible. Our work to date has focused on the methodological framework for how to address dynamic multipriority patient scheduling problems rather than on the actual implementation of the policies derived from our research. We intend to work first on the customization of the dynamic patient scheduling model and solution approach presented in Chapter 3 for a local cancer center. Note that given a final value function approximation, the consequent patient scheduling policy can be easily implemented by either using optimization software to solve the right-hand side of the optimality equations (a simple integer programming model) or a short program reflecting the main scheduling guidelines suggested by this policy. In any case, we expect the main implementation challenges to be related to database connectivity, user interface and software functionalities, in addition to some possible cultural reluctance, rather than to the actual scheduling program. In early stages of development, we considered applying the simulation-based method described in Chapter 2 to the radiation therapy patient scheduling problem 123 introduced in Chapter 3. However, after implementing the basic modules of the algorithm for a simple setting, we realized this idea was impractical. Although the simulation-based approach described in Chapter 2 provides very effective patient scheduling policies for a simpler version of the problem, the use of a parametric non-linear value function approximation introduces numerous challenges in the context of this type of algorithms. The slow speed of simulation and the complexity of the optimization problem solved at each decision epoch (a non-linear integer program) would have made the solution times quickly intractable for a large scale problem like the radiation therapy patient scheduling problem. Even though dynamic capacity allocation policies obtained from a linear programming method using an affine value function approximation may not be as good as policies obtained from a simulation-based method using a parametric nonlinear value function approximation, the additional effort required to solve the latter makes the linear programming method a better choice for large scale problems. It is important to note, however, that a linear programming method based on an affine value function approximation would not always provide a good value function approximation. According to our experience using this approach, there will be many scenarios in which the optimal value function approximation will be equal to zero and the resulting policy will be no other than a myopic policy. It is also worth noting that a simulation-based algorithm based on a simpler approximation architecture such as an affine function or a linear combination of basis functions may still provide a very good performance and at the same time be much faster than an algorithm based on a parametric non-linear approximation. For example, the AA policy analyzed in Chapter 2 provides a better performance than the myopic policy, in terms of the discounted cost, and the solution time is reduced by a factor of three relative to the time required to identify an SS policy. The two ADP methods in this dissertation can be adapted to any dynamic problem that involves allocating a fixed amount of daily capacity among entities of different types, for example, to order acceptance problems and to more general capacity allocation problems involving multiple demand origins and multiple service providers. However, the linear programming method requires the immediate cost and the expected cost-to-go function to be represented as a linear combination of individual state components (e.g. number of entities at each demand origin, 124 available capacity at each service provider, processing capacity on each calendar day, location of a given resource, etc.). The simulation-based method, given its problem-size limitations, could be used to solve small examples in order to gain a better understanding of the main characteristics of more complex policies for a given problem. Any additional insights could be later tested through simulation. A deep analysis of cost-to-go samples from an existing policy for the same problem or a similar problem could help identify, as in Chapter 2 of this thesis, better approximation architectures to solve a problem. The flexibility of the generalized logistic function, in this sense, could be used to fit different types of data. There are numerous potential extensions to our models, however, the main two involve elements of the original problems that were excluded from our formulations. First, the MDP models described in chapters 2 and 3 either assume no postponements or that the portion of the demand that is not booked today (i.e. postponed demand) is handled the same as the new demand tomorrow. This is done in order to avoid the additional complexity of keeping track of patient wait times. Second, the dynamic multi-appointment patient scheduling model described in Chapter 3 assumes the system capacity is determined by aggregating individual capacities from multiple resources. This assumption may be unrealistic, for example, for some cancer centers in which treatment units differ in terms of their technical capabilities and, consequently, in which different types of treatments can be delivered only by specific group of machines. The inclusion of patient wait lists and multiple resources together with patient-resource compatibility constraints would clearly make a solution to the models more challenging but, at the same time, it would increase the range of systems to which our solution approaches can be applied. We have formulated a model that explicitly considers wait lists by keeping track of the the number of patients of each type who have waited a given number of days without receiving an appointment and developed a linear programming method based on an affine value function approximation to solve this model. Preliminary results, in absence of an immediate postponement cost, indicate that approximate optimal policies for this problem will make patients wait the maximum allowed number of days before booking them in decreasing priority order in the first available appointment slot. This is equivalent to defining demand as the number of patients for which booking decisions cannot be postponed any longer and using a first available 125 slot policy. The next step in this research initiative will be to analyze the case in which there is an immediate postponement cost. The model and solution approach are described in Appendix A. The linear programming approach presented in Chapter 3 can also be extended to a more general dynamic capacity allocation problem involving several wait lists, patient choice and alternative capacity sources. In this problem, demand arrives from multiple wait lists. Each list has associated a demand origin and a homogeneous group of clients. There are multiple servers with each client listing a subset of servers where he/she is willing to be served as well as a preferred server. Though clients can be assigned to one of a subset of servers, there is a penalty associated with serving a client at a non-preferred server. Demand origins have their own performance targets regarding length of wait and/or length of the queues. Servers have a fixed capacity. An MDP formulation of this problem in the context of a long-term (bed) capacity allocation setting with the presence of patient choice is presented in Appendix B. We expect the approximate solution to this model to provide reasonable capacity allocation policies for long-term care placements for both hospital and community demand. An additional extension of this research would be a simulation study to determine appropriate wait time targets, when possible, for patients. In some cases, wait time targets are set shorter than needed (open access being the most extreme example), in others, they are set so far out that, as discussed before, there is no benefit in terms of efficiency. As wait time targets are pushed out, the ability to smooth out demand over a period of time increases. However, the benefit decreases as wait time targets are pushed further out. The problem is that appropriate wait time targets are undoubtedly dependent on the scheduling policy, especially in a setting with multiple priority classes. We could potentially simulate some of the policies studied in Chapter 2 under different scenarios in order to study this relationship. The next step in this project will be to identify a specific set of scenarios to analyze. Finally, we believe we have found in the Appointment Scheduling Game described in Chapter 4 an effective mechanism to transfer what we have learned from our research in advance multi-priority patient scheduling problems mainly to health care professionals and business students. Even though we are in the process of developing formal methods to evaluate the game’s impact in relation to each of its 126 learning objectives, we believe we will see positive results one these methods are administered and evaluated. We expect an improved web-based version of the game will enhance the game’s effectiveness and accessability as a teaching tool. The models in appendices A and B were originally drafted by Dr. Jonathan Patrick. As part of my doctoral work, I revised the original formulations and implemented them, including the respective solution approaches. From the solution to different problem instances, we expect in the near future to be able to identify reasonable capacity allocation policies for each specific problem. 127 Bibliography [1] D. Adelman and D. Klabjan. Computing near-optimal policies in generalized joint replenishment. INFORMS J. Comput., 2011. In press. → pages 14 [2] D. Adelman and A. Mersereau. Relaxations of weakly coupled stochastic dynamic programs. Oper. Res., 56(3):712–727, 2008. → pages 14 [3] S. Ammar and R. Wright. Experiential learning activities in operations management. Int. Trans. Oper. Res., 6(2):183–197, 1999. → pages 102 [4] D. Ash, A. Barrett, A. Hinks, and C. Squire. Re-audit of radiotherapy waiting times 2003. Clin. Oncol., 16(6):387–394, 2004. → pages 92 [5] M. A. Begen and M. Queyranne. Appointment scheduling with discrete random durations. Math. Oper. Res., 36(2):240–257, 2011. → pages 2 [6] W. Ben-Ameur and J. Neto. Acceleration of cutting-plane and column generation algorithms: Applications to network design. Networks, 49(1): 3–17, 2007. → pages 85 [7] D. Bertsekas. Dynamic Programming and Optimal Control, volume II, chapter 6, pages 320–527. Unpublished, 3rd edition, 2010. → pages 15, 20 [8] D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, Mass., 1996. → pages 13, 38 [9] B. Cardoen, E. Demeulemeester, and J. Belien. Operating room planning and scheduling: A literature review. Eur. J. Oper. Res., 201(3):921–932, 2010. → pages 2 [10] T. Cayirli and E. Veral. Outpatient scheduling in health care: A review of literature. Production Oper. Management, 12(4):519–549, 2003. → pages 2 128 [11] Z. Chen, W. King, R. Pearcey, M. Kerba, and W. Mackillop. The relationship between waiting time for radiotherapy and clinical outcomes: A systematic review of the literature. Radiother. Oncol., 87(1):3–16, 2008. → pages 1, 69 [12] J. Cochran. Who wants to be a millionaire: The classroom edition. INFORMS Trans. Ed., 1(3):112–116, 2001. → pages 102 [13] C. Coles, L. Burgess, and L.Tan. An audit of delays before and during radical radiotherapy for cervical cancer-effect on tumour cure probability. Clin. Oncol., 15(2):47–54, 2003. → pages 69 [14] D. Conforti, F. Guerriero, and R. Guido. Non-block scheduling with priority for radiotherapy treatments. Eur. J. Oper. Res., 201(1):289–296, 2010. → pages 6 [15] D. Conforti, F. Guerriero, R. Guido, and M. Veltri. An optimal decision-making approach for the management of radiotherapy patients. OR Spectrum, 33(1):123–148, 2011. → pages 6 [16] T. K. Das, A. Gosavi, S. Mahadevan, and N. Marchalleck. Solving semi-Markov decision problems using average reward reinforcement learning. Management Sci., 45(4):560–574, 1999. → pages 14 [17] D. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Oper. Res., 51(6):850–865, 2003. → pages 14 [18] D. de Farias and B. Van Roy. On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res., 29(3):462–478, 2004. → pages 14 [19] F. d’Epenoux. A probabilistic production and inventory problem. Management Sci., 10(1):98–108, 1963. → pages 80 [20] C. Derman. Finite State Markovian Decision Processes. Academic Press, Inc., Orlando, FL, USA, 1970. → pages 81 [21] V. Desai, V. Farias, and C. Moallemi. A smoothed approximate linear program. Adv. Neur. In., 22:459–467, 2009. → pages 14 [22] A. Erdelyi and H. Topaloglu. Computing protection level policies for dynamic capacity allocation problems by using stochastic approximation methods. IIE Trans., 41:498–510, 2009. → pages 5, 113 129 [23] A. Fortin, I. Bairati, M. Albert, L. Moore, J. Allard, and C. Couture. Effect of treatment delay on outcome of patients with early-stage head-and-neck carcinoma receiving radical radiotherapy. Int. J. Radiat. Oncol. Biol. Phys., 52(4):929–936, 2002. → pages 69 [24] Y. Gerchak, D. Gupta, and M. Henig. Reservation planning for elective surgery under uncertain demand for emergency surgery. Management Sci., 42(3):321–334, 1996. → pages 5 [25] A. Gosavi, N. Bandla, and T. Das. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. IIE Trans., 34(9):729–742, 2002. → pages 14 [26] P. Griffin. The use of classroom games in management science and operations research. INFORMS Trans. Ed., 8(1):1–2, 2007. → pages 103 [27] D. Gupta and B. Denton. Appointment scheduling in health care: Challenges and opportunities. IIE Trans., 40(9):800–819, 2008. → pages 2, 101, 108, 113 [28] E. W. Hans and T. Nielberg. Operating room manager game. INFORMS Trans. Ed., 8(1):25–36, 2007. → pages 103 [29] M. M. Hing, A. V. Harten, and P. C. Schuur. Reinforcement learning versus heuristics for order acceptance on a single resource. J. Heuristics, 13(2): 167–187, 2007. → pages 14 [30] L. Kallenberg. Linear Programming and Finite Markovian Control Problems. Mathematisch Centrum, Amsterdam, 1983. → pages 81 [31] M. Kim, A. Ghate, and M. H. Phillips. A stochastic control formalism for dynamic biologically conformal radiation therapy. Eur. J. Oper. Res., 219(3): 541–556, 2012. → pages 6 [32] M. Lamiri, X. Xie, A. Dolgui, and F. Grimaud. A stochastic model for operating room planning with elective and emergency demand for surgery. Eur. J. Oper. Res., 185(3):1026–1037, 2008. → pages 5 [33] G. J. Lim and W. Cao. A two-phase method for selecting IMRT treatment beam angles: Branch-and-prune and local neighborhood search. Eur. J. Oper. Res., 217(3):609–618, 2012. → pages 6 [34] M. L¨ubbecke and J. Desrosiers. Selected topics in column generation. Oper. Res., 53(6):1007–1023, 2005. → pages 85 130 [35] W. Mackillop. Killing time: The consequences of delays in radiotherapy. Radiother. Oncol., 84(1):1–4, 2007. → pages 1, 69 [36] J. Magerlein and J. Martin. Surgical demand scheduling: A review. Health Services Res., 13(4):418–433, 1978. → pages 2 [37] P. Marbach, O. Mihatsch, and J. Tsitsiklis. Call admission control and routing in integrated services networks using neuro-dynamic programming. IEEE J. Sel. Area. Comm., 18(2):197–208, 2000. → pages 14 [38] M. Maxwell, S. G. Henderson, and H. Topaloglu. Tuning approximate dynamic programming policies for ambulance redeployment via direct search. Unpublished, 2010. → pages 14, 67 [39] V. Miˇsi´c, D. Aleman, and M. Sharpe. Neighborhood search approaches to non-coplanar beam orientation optimization for total marrow irradiation using IMRT. Eur. J. Oper. Res., 205(3):522–527, 2010. → pages 6 [40] S. Mondschein and G. Weintraub. Appointment policies in service operations: A critical analysis of the economic framework. Production Oper. Management, 12(2):266–286, 2003. → pages 2 [41] S. Norris. The wait time issue and the patient wait times guarantee. Technical report, Canada: Library of Parliament, 2009. → pages 70 [42] N. O’Rourke and R. Edwards. Lung cancer treatment waiting times and tumour growth. Clin. Oncol., 12(3):141–144, 2000. → pages 69 [43] J. Patrick and M. Puterman. Reducing wait times through operations research: Optimizing the use of surge capacity. Healthc. Policy, 3(3):75–88, 2008. → pages 113 [44] J. Patrick, M. Puterman, and M. Queyranne. Dynamic multipriority patient scheduling for a diagnostic resource. Oper. Res., 56(6):1507–1525, 2008. → pages ii, ix, x, 3, 5, 6, 7, 9, 11, 12, 14, 15, 16, 17, 18, 24, 29, 30, 31, 33, 38, 46, 48, 50, 51, 60, 62, 63, 71, 75, 108, 113, 122 [45] S. Petrovic and E. Castro. A genetic algorithm for radiotherapy pre-treatment scheduling. In Di Chio et al., editor, Applications of Evolutionary Computation, volume 6625 of Lecture Notes in Computer Science, pages 454–463. Springer Berlin / Heidelberg, 2011. → pages 6 [46] S. Petrovic and P. Leite-Rocha. Constructive and GRASP approaches to radiotherapy treatment scheduling. In Proceedings of the Advances in 131 Electrical and Electronics Engineering - IAENG Special Edition of the World Congress on Engineering and Computer Science 2008, pages 192–200, 2008. → pages 6 [47] W. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley-Interscience, Hoboken, N.J., 2011. → pages 13, 22, 45 [48] W. B. Powell, J. A. Shapiro, and H. P. Simo. An adaptive dynamic programming algorithm for the heterogeneous resource allocation problem. Transport. Sci., 36(2):231–249, 2002. → pages 14 [49] M. Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley-Interscience, New York, 1994. → pages 19, 103 [50] E. Rising, R. Baron, and B. Averill. A systems analysis of a university-health-service outpatient clinic. Oper. Res., 21(5):1030–1047, 1973. → pages 5 [51] C. Sanmartin, F. Gendron, J.-M. Berthelot, and K. Murphy. Access to health care services in Canada, 2003. Technical report, Statistics Canada, Catalogue 82-575-XIE., 2004. → pages 1 [52] A. Saur´e, J. Patrick, S. Tyldesley, and M. L. Puterman. Dynamic multi-appointment patient scheduling for radiation therapy. To appear in Eur. J. Oper. Res., 2012. → pages 113 [53] H. Sch¨utz and R. Kolisch. Approximate dynamic programming for capacity allocation in the service industry. Eur. J. Oper. Res., 218(1):239–250, 2012. → pages 5 [54] P. Schweitzer and A. Seidmann. Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl., 110(2):568–582, 1985. → pages 14 [55] M. Schwind and O. Wendt. Dynamic pricing of information products based on reinforcement learning: A yield-management approach. In M. Jarke, G. Lakemeyer, and J. Koehler, editors, KI 2002: Advances in Artificial Intelligence, volume 2479 of Lecture Notes in Computer Science, pages 355–388. Springer Berlin / Heidelberg, 2002. → pages 14 [56] H. P. Sim˜ao, J. Day, A. P. George, T. Gifford, J. Nienow, and W. B. Powell. An approximate dynamic programming algorithm for large-scale fleet 132 management: A case application. Transport. Sci., 43(2):178–197, 2009. → pages 14 [57] S. S. Singh, V. B. Tadi´c, and A. Doucet. A policy gradient method for semi-Markov decision processes with application to call admission control. Eur. J. Oper. Res., 178(3):808–818, 2007. → pages 14 [58] M. Sniedovich. OR/MS games: 1. A neglected educational resource. INFORMS Trans. Ed., 2(3):86–95, 2002. → pages 102 [59] R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, Mass., 1998. → pages 13, 21 [60] G. Tesauro. Practical issues in temporal difference learning. Mach. Learn., 8:257–277, 1992. → pages 14 [61] B. Van Roy, D. Bertsekas, Y. Lee, and J. Tsitsiklis. A neuro-dynamic programming approach to retailer inventory management. In Decision and Control, 1997., Proceedings of the 36th IEEE Conference on, volume 4, pages 4052–4057, 1997. → pages 22 [62] I. Vermeulen, S. Bohte, S. Elkhuizen, H. Lameris, P. Bakker, and H. L. Poutr´e. Adaptive resource allocation for efficient patient scheduling. Artif. Intell. Med., 46(1):67–80, 2009. → pages 5 [63] A. Waaijer, C. Terhaard, H. Dehnad, G. Hordijk, M. V. Leeuwen, C. Raaymakers, and J. Lagendijk. Waiting times for radiotherapy: Consequences of volume increase for the TCP in oropharyngeal carcinoma. Radiother. Oncol., 66(3):271–276, 2003. → pages 69 [64] G. Werker, A. Saur´e, J. French, and S. Shechter. The use of discrete-event simulation modelling to improve radiation therapy planning processes. Radiother. Oncol., 92(1):76–82, 2009. → pages 6 [65] W. Zhang and T. G. Dietterich. High-performance job-shop scheduling with a time-delay TD(lambda) network. In Adv. Neur. In., pages 1024–1030, 1995. → pages 14 133 Appendix A Dynamic Multi-priority Patient Scheduling with Postponements This appendix extends the dynamic multi-priority patient scheduling model by considering multiple appointment lengths and allowing the booking agent to delay scheduling an appointment provided the patient’s total wait does not exceed the wait time target. To be more realistic, we also consider a limit on the number of patients who can be diverted per day and impose a lower bound on the advance warning time (notification time) for each patient class 1 . A.1 Decision epochs Booking decisions are made at the end of each day over an infinite horizon. A.2 State space At the end of each day, the booking agent observes the number of appointment slots booked on each future day over an N-day booking horizon and the number of patients of each priority class waiting to be scheduled and how long they have waited. A typical state of the system, denoted by s ∈ S, is represented by a vector 1 The formulation of this problem was part of my Comprehensive Examination. The model was originally drafted by Dr. Jonathan Patrick. I revised the model and implemented the current version of it including the solution approach. 134 s = (u, w) = (u1 , . . . , uN , w10 , . . . wIUI ), where un is the number of appointment slots already booked on day n and wi j is the number of type i patients who have already waited j days without receiving an appointment. The parameter Ui represents an upper bound on the number of days between when a request is placed to when a patient’s date of service is actually scheduled. A patient type is now defined by a priority level and a length of the required appointment. A.3 Action sets At the end of each day, the agent must decide which available appointment slots to assign to each unit of waiting demand. We assume the agent has the ability to divert patients to an alternative capacity source at an additional cost. A vector of possible actions can be written as a = (x, y) = (x101 , . . . , xIUI N , y10 , . . . , yIUI ), where ai jn is the number of type i patients who have waited j days to book on day n and yi j is the number of diverted type i patients who have already waited j days. The set of feasible actions compatible with state (u, w) ∈ S, denoted by A(u,w) , must satisfy the following constraints: Ui I un + ∑ ∑ li xi jn ≤ CR ∀n (A.1) i=1 j=1 Ui I ∑ ∑ li y i j ≤ C O (A.2) i=1 j=1 N ∑ xi jn + yi j ≤ wi j ∀i, j (A.3) ∀i (A.4) ∀i, j ≤ Ui , n (A.5) ∀i, j ≤ Ui , n (A.6) ∀i, j, n < Li (A.7) n=1 N ∑ xiU n + yiU i i = wiUi n=1 xi jn + yi j ≤ M × Ii jn Ui ∑ k= j+1 n wik − ∑ xikm − yik ≤ M × (1 − Ii jn ) m=1 ai jn = 0 Constraint (A.1) limits the total number of appointment slots booked today for 135 day n to be less than or equal to the available service capacity that day. Parameters CR and li represent the system daily capacity (in appointment slots) and the number of appointment slots required by a patient of type i, respectively. Constraint (A.2) imposes a limit CO on the number of appointment slots that can be diverted per day. Constraint (A.3) restricts the number of bookings and diversions to be less than or equal to the number of patients waiting. Constraint (A.4) insures that demand that has already waited Ui days is either booked or diverted. Constraints (A.5) and (A.6) guarantee that demand is served in a first-come-first-served basis within each patient class. The scalar M is a sufficiently large number and Ii jn is an auxiliary binary variable. In some settings, it may be unreasonable to think that one can call patients the day before to have them come in for an appointment. For this reason, Constraint (A.7) imposes a lower bound Li on the advance warning time for each patient class. All actions are positive and integer. A.4 Transition probabilities Once a decision is made, the only source of uncertainty in the transition to the next state of the system is the new demand for service. As a result of choosing booking action a = (x, y) in state s = (u, w), a ∈ As and s ∈ S, and having qi new service requests of patients of type i, the state of the system the next day, denoted by s = (u , w ) is determined by the following probability distribution: p(s |s, a) = I ∏ Pr(qi ), if s satisfies (A.9) and (A.10); (A.8) i=1 0, otherwise. I u + lx n+1 ∑ i i(n+1) , n < N; un = i=1 0, n = N. qi , N wi j = wi( j−1) − ∑ xi( j−1)n − yi( j−1) , (A.9) j = 0; j > 0. ∀i (A.10) n=1 Equation (A.9) defines the new number of appointment slots booked on day 136 n as a function of the number of slots previously booked on day n + 1 plus all new bookings that affected that day. Equation (A.10) defines the new number of type i patients who have already waited j days. Demand that was not booked nor diverted re-appears in the next day’s demand but with an additional day of wait associated with it. The term Pr(qi ) in (A.8) represents the probability of having qi new service requests from patients of type i. We assume demand for each patient class is independent and that each day’s demand is independent as well. Since demand arises from multiple independent sources, independence between classes seems a reasonable assumption. A.5 Immediate cost The immediate cost associated with choosing action a = (x, y) in state s = (u, w), a ∈ As and s ∈ S, comes from two sources: the cost associated with booking a patient beyond the type-specific wait time target Ti and the cost associated with patient diversions. We formulate this cost as in (A.11), where ci jn is the cost of booking a type i patient who has waited j days on day n and di j is the penalty for diverting a type i patient who has already waited for j days. I Ui c(s, a) = ∑ ∑ N I Ui ∑ ci jn xi jn + ∑ ∑ di j yi j i=1 j=0 n=1 ∀s ∈ S, a ∈ As (A.11) i=1 j=0 Although arbitrary, it is reasonable to assume that ci jn will be decreasing in i and zero if n ≤ Ti . Furthermore, it would seem advisable to insure that the cost of delaying a patient’s booking j days and then booking him/her n days after should be equal to the cost of booking the patient n + j days in advance immediately. Thus, a natural form for the booking cost would be (A.12), where gi , a decreasing function of i, represents the cost of booking a patient of type i one day past the wait time target without delay. The parameter λ is the discount factor. gi (n + j − Ti ) , for all n + j > Ti ; ci jn = λj 0, otherwise. 137 ∀i, j, n (A.12) Similarly, it would make sense to assign at least the same cost (if not slightly less) to diverting patients immediately versus waiting j days and then diverting. Thus, a natural form for the diversion cost would be (A.13), where di is the cost of diversion today. di j = di λj ∀i, j ≤ Ui (A.13) The immediate cost function explicitly balances the cost to the patient in wait time and the cost to the system in having to resort to an alternative capacity source. The role of the booking agent is to maintain reasonable wait times in a cost effective manner. Determining the specific values of gi and di would be the task of a panel of medical experts and health care managers. A.6 Optimality equations The value function in this formulation, denoted by v(·), specifies the minimum discounted cost over the infinite horizon for each state s ∈ S and satisfies the following optimality equations: v(s) = min c(s, a) + λ a∈As ∑ p(s |s, a)v(s ) ∀s ∈ S (A.14) s ∈S It is here, as mentioned before, that the curse of dimensionality becomes apparent. Assuming that wi j can take up to Qi j possible values, this means that we might have up to (CR + 1)N ΠIi=1 Qi j different states and many more actions. Reasonable values of CR , N, I and Q make a direct solution to (A.14) impossible. A.7 Solution approach To deal with an intractable number of states and actions, we first transform the optimality equations in (A.14) into its equivalent linear programming form. 138 max ∑ α(s)v(s) (A.15) s∈S subject to ∑ p(s |s, a)v(s ) ≥ v(s) c(s, a) + λ ∀s ∈ S, a ∈ As s ∈S Without loss of generality, we assume that α is a probability distribution over the initial state of the system. The equivalent linear programming model, however, does not avoid the curse of dimensionality. The model in (A.15) has one variable for every state s ∈ S and one constraint for every feasible state-action pair (s, a), s ∈ S and a ∈ As , making its solution impossible. To solve the dynamic multipriority patient scheduling problem with postponements, we chose the following affine approximation to v(u, w): N I Ui v(u, w) = W0 + ∑ Un un + ∑ ∑ Wi j wi j n=1 (A.16) i=1 j=0 U, W ≥ 0,W0 ∈ R The advantage of this simple approximation is that the approximation parameters are easily interpreted. The value of Un represents the marginal infinite horizon discounted cost of having an additional occupied slot on day n and the value Wi j represents the marginal infinite horizon discounted cost of having one more patient of type i who has waited j days without receiving an appointment. We impose the further restriction that all Un and Wi j are non-negative while W0 is unconstrained. Reformulating (A.15) in terms of this approximate value function yields the following approximate linear program: N max U,V ,W ,W0 I Qi W0 + ∑ Eα [un ]Un + ∑ ∑ Eα [wi j ]Wi j n=1 i=1 j=0 subject to 139 (A.17) N I Ui (1 − λ )W0 + ∑ µn (s, a)Un + ∑ ∑ ωi j (s, a)Wi j ≤ c(s, a) n=1 i=1 j=0 ∀s ∈ S, a ∈ As U, W ≥ 0,W0 ∈ R where Eα [un ] = ∑ α(s)un (s) ∀n µn (s, a) = un (s) − λ un (s, a) ∀n s∈S Eα [wi j ] = ∑ α(s)wi j (s) ∀i, j ≤ Ui ωi j (s, a) = wi j (s) − λ Eq [wi j (s, a)] ∀i, j ≤ Ui s∈S The approximate linear programming model in (A.17) has a tractable number of variables, N + ∑Ii=1 (Ui + 1) + 1, but still an intractable number of constraints. For this reason, we solve its dual (A.18) using column generation. min ∑ (A.18) c(s, a)X(s, a) X s∈S, a∈As (1 − λ ) ∑ X(s, a) = 1 s∈S, a∈As ∑ µm (s, a)X(s, a) ≥ Eα [um ] ∀m ∑ ωi j (s, a)X(s, a) ≥ Eα [wi j ] ∀i, j ≤ Ui s∈S, a∈As s∈S, a∈As X ≥ 0 Given the dual values associated with the current solution to (A.18) the next state-action pair to enter the basis is obtained by solving (A.19). N I Qi arg min c(s, a) − (1 − λ )W0 − ∑ µn (s, a)Un − ∑ ∑ ωi j (s, a)Wi j s∈S,a∈As n=1 i=1 j=1 Rearranging terms yields, 140 (A.19) I arg min s∈S,a∈As Qi N ∑∑ I Qi ∑ (ci jn + λ liUn−1 − λWi( j+1) )xi jn + ∑ ∑ (di j − λWi( j+1) )yi j i=1 j=0 n=1 N i=1 j=0 I Qi + ∑ (λUn−1 −Un )un + ∑ ∑ (λWi( j+1) −Wi j )wi j n=1 i=1 j=0 I + ∑ Wi0 λ E[qi ] − (1 − λ )W0 (A.20) i=1 The coefficients of xi jn and yi j in (A.20) have an intuitive interpretation in terms of balancing the costs versus the benefits of each action. For each action xi jn , there is a cost, ci jn + λ liUn−1 , due to a (possibly) late booking and the loss of available capacity tomorrow as well as a benefit, λWi( j+1) , due to the fact that the booking decision does not re-appear in tomorrow’s demand. For each action yi j , there is a cost, di j , due to diverting the patient which is likewise weighed against the benefit of not having the patient appear in tomorrow’s demand, λWi( j+1) . The implementation of the column generation algorithm and the integer programming model used to identify the approximate optimal policy was performed in GAMS 23.5 with CPLEX 12.2 as the solver. 141 Appendix B Dynamic Capacity Allocation for Long-term Care Those charged with placing clients in community services as they become available are faced with the difficult task of balancing competing priorities. On the one hand, there is the clear need to keep the number of Alternative Level of Care (ALC) patients from impinging on hospitals function. On the other hand, Long-term Care (LTC) demand also arises directly from the community. Long wait times in the community result in societal costs both in servicing more complex clients in the non-ideal setting of the home and in the strain such a scenario places on the families involved. LTC capacity is differentiated both by facility (any given region will have multiple LTC facilities) and by bed type (private, semi-private and ward). Both private and semi-private beds require an additional payment on the part of the client while the ward beds are completely subsidized. Each client stipulates which type of bed and up to three facilities as acceptable placement options. We define the set of client destinations as a bed type and facility pairing. Clients who are placed in a non-preferred destination remain on the wait list for their preferred destination. We refer to the set of destinations where a client is willing and/or able to be placed as a preference class. Thus, demand is differentiated by preference class and demand origin. Demand origin refers to whether the client is an urgent community client, a regular community client or a hospital client. Urgent clients receive priority access. This appendix presents a discounted infinite-horizon MDP model that intelli142 gently allocates available capacity to the various demand classes. In principle, this problem is solvable as an MDP model. However, as the complexities of multiple priority classes, client preferences and a non-homogeneous bed supply are added, the model quickly becomes intractable due to the curse of dimensionality 1 . B.1 Decision epochs and state space Decisions are assumed to be made once a day. Demand for placement comes in two forms: external demand (i.e. clients waiting for placement) and internal demand (i.e. clients that have already been placed but not in their preferred destination). External demand comes from three origins: urgent and non-urgent community demand as well as hospital demand. We let I represent the number of external demand classes (preference class and demand origin pairings) and J represent the number of destinations. For ease of notation, we let [A] = {1,. . . ,A}. In addition, we let J(i) represent the set of possible destinations for a client in demand class i ∈ [I]. We let a state of the system be represented by a vector s = (u, v), where uin is the number of clients of demand class i ∈ [I] who have been waiting n ∈ [N] days for placement and vi jl is the number of clients of demand class i ∈ [I] who have been in destination j ∈ [J] for l ∈ [L] days. The state variables uiN and vi jL represent the number of clients of demand class i who have waited at least N days and the number of clients of demand class i who have been in destination j at least L days, respectively. B.2 Action sets The central action is to assign available capacity to waiting demand. This can be accomplished either through external placements (from the wait lists) or internal transfers (from one destination to another). We represent an action by a vector a = (x, y), where xin j represents the number of clients from demand class i ∈ [I] who have been waiting n ∈ [N] days to place in destination j ∈ [J] and yi jlk corresponds to the number of clients from demand class i ∈ [I] who have been at destination j ∈ J(i) for l ∈ [L] days to transfer to destination k ∈ J(i), k = j. We assume that 1 The models in this section were originally drafted by Dr. Jonathan Patrick. I reviewed, wrote and implemented the current version of them. 143 within a demand class, clients are served on a first-in-first-out basis. Actions are constrained by a number of factors. First, the number of clients assigned to a given destination must not exceed the available capacity at that destination, xin j + ∑ ∑ i∈I( j) n∈[N] ∑ ∑ ∑ yikl j ≤ C j − ∑ ∑ vi jl i∈I( j) k∈J(i) l∈[L] ∀ j ∈ [J] (B.1) i∈I( j) l∈[L] where C j is the total capacity (number of beds) at destination j and I( j) represents the set of possible demand classes at destination j. Second, clients can only be assigned to destinations that meet their preferences, xin j = 0 ∀(i, n, j) ∈ [I] × [N] × [J] and j ∈ / J(i) yi jlk = 0 (B.2) ∀(i, j, l, k) ∈ [I] × [J] × [L] × [J] and ( j ∈ / J(i) or k ∈ / J(i) or j = k) (B.3) where J(i) is the set of admissible destinations for clients from demand class i. Third, actions cannot exceed waiting demand, xin j ≤ uin ∀(i, n) ∈ [I] × [N] (B.4) yi jlk ≤ vi jl ∀(i, j, l) ∈ [I] × J(i) × [L] (B.5) ∑ j∈J(i) ∑ k∈J(i) Fourth, to insure that clients in the same demand class are treated in a firstcome-first-served fashion, we impose the following conditions: ∑ xin j ≤ Mδin ∀(i, n) ∈ [I] × [N − 1] (B.6) ∀(i, n) ∈ [I] × [N − 1] (B.7) j∈J(i) N ∑ m=n+1 uim − ∑ xim j ≤ M(1 − δin ) j∈J(i) 144 ∑ yi jlk ≤ Mδi jl ∀(i, j, l) ∈ [I] × J(i) × [L − 1] (B.8) k∈J(i) L ∑ vi jm − m=l+1 ∑ yi jmk ≤ M(1 − δi jl ) ∀(i, j, l) ∈ [I] × J(i) × [L − 1] (B.9) k∈J(i) where δin and δi jl are binary variables and M is a sufficiently large number. The first two equations force ∑ j∈J(i) xin j to be zero unless uim − ∑ j∈J(i) xim j = 0 ∀m > n. The last two equations force ∑k∈J(i) yi jlk to be zero unless vi jm − ∑k∈J(i) yi jmk = 0 ∀m > l. To insure that the state space remains finite even if capacity is in- sufficient, we introduce a further action zi that removes demand entirely from the system. This could be a last resort measure that pays for clients to be placed in private retirement homes. This action satisfies the following two conditions: N zi ≥ ∑ uin − n=N−1 ∑ xin j −Wi ∀i ∈ [I] (B.10) xin j ∀i ∈ [I] (B.11) j∈J(i) N zi ≤ ∑ n=N−1 uin − ∑ j∈J(i) Constrain (B.10) guarantees that the number of clients of demand class i who have been waiting for at least N days never exceeds Wi . We place a high cost to any positive value of zi to make this a last resort action. Finally, all actions must be positive and integer, xin j ∈ Z+ ∀(i, n, j) ∈ [I] × [N] × J(i) (B.12) yi jlk ∈ Z+ ∀(i, j, l, k) ∈ [I] × J(i) × [L] × J(i)\{ j} (B.13) zi ∈ Z+ ∀i ∈ [I] (B.14) We define As as the set of actions satisfying contraints (B.1) to (B.14) for a given state s. 145 B.3 Transition probabilities New demand can arrive to any of the wait lists (demand classes) while exits from the system can occur from any wait list or destination (since clients can die while on a wait list). Some movement between destinations is also possible as clients may be transferred. Transitions are written as: di , ui1 − s = xi1 j − ei2 , . . . , ui(N−2) − ∑ j∈J(i) ∑ xi(N−2) j − ei(N−1) , j∈J(i) N uin − ∑ n=N−1 vi j1 + xin j − zi − eiN ∑ j∈J(i) ∑ yik1 j − k∈J(i) , i∈[I] ∑ ∑ xin j − ei j1 , n∈[N] yi j1k − ei j2 , . . . , vi j(L−2) + k∈J(i) ∑ yik(L−2) j − k∈J(i) L ∑ yi j(L−2)k − ei j(L−1) , ∑ vi jl + l=L−1 k∈J(i) ∑ yikl j − k∈J(i) ∑ yi jlk − ei jL k∈J(i) i∈[I], j∈J(i) where di corresponds to new demand, ein represents exits from the wait lists after n − 1 days and ei jl are departures from the destinations after l − 1 days. B.4 Immediate cost There are costs associated with clients waiting too long, with the size of the wait list at each facility, with clients residing in non-preferred destinations and of course with the last resort action of removing demand from the wait lists. We represent the total immediate cost as follows: + c(s, a) = ∑ f + cWf ∑ ∑ uin − T f + i∈I( f ) n∈[N] ∑ ∑ cW in uin i∈[I] n∈[N] ∑ ∑ ∑ cPijl vi jl + ∑ cDi zi i∈[I] j∈J(i) l∈[L] i∈[I] 146 ∀s ∈ S, a ∈ As (B.15) where I( f ) represents the set of demand classes associated with demand origin f and T f is the maximum recommended size of the corresponding wait list. B.5 Optimality equations The value function v(·) in this formulation specifies the minimum discounted cost over the infinite horizon for each state s ∈ S and satisfies the following optimality equations: v(s) = min c(s, a) + γ a∈As B.6 ∑ p(s |s, a)v(s ) ∀s ∈ S (B.16) s ∈S Solution approach Converting the optimality equations into its equivalent linear programming form yields: (B.17) max ∑ α(s)v(s) s∈S subject to c(s, a) + γ ∑ p(s |s, a)v(s ) ≥ v(s) ∀s ∈ S, a ∈ As s ∈S Neither the optimality equations given in (B.16) nor the linear program given in (B.17) are tractable due to the large size of the state space and the action sets. We thus seek to solve (B.17) using the linear programming approach to Approximate Dynamic Programming (ADP) and the following affine approximation: v(s) = U0 + ∑ ∑ Uin uin + i∈[I] n∈[N] ∑ ∑ ∑ Vi jl vi jl (B.18) i∈[I] j∈J(i) l∈[L] U, V ≥ 0,U0 ∈ R We assume Uin and Vi jl are positive. This makes intuitive sense as one would 147 expect a positive cost for each additional client waiting to be placed and each additional bed currently occupied. Substituting (B.18) into (B.17) yields the following approximate linear programing model: max U0 + U,V ,U0 ∑ ∑ Eα [uin ]Uin + i∈[I] n∈[N] ∑ ∑ ∑ Eα [vi jl ]Vi jl (B.19) i∈[I] j∈J(i) l∈[L] subject to N−1 (1 − γ)U0 + ∑ (ui1 − γλi )Ui1 + i∈[I] uin − γ(1 − ρin ) ui(n−1) − ∑∑ i∈[I] n=2 xi(n−1) j ∑ j∈J(i) N + ∑ uiN − γ(1 − ρiN ) ∑ ∑ uin − vi j1 − γ(1 − ρi j1 ) i∈[I] j∈J(i) xin j − zi ∑ n=N−1 i∈[I] + ∑ UiN j∈J(i) ∑ xin j Vi j1 n∈[N] L−1 + ∑ ∑ ∑ vi jl − γ(1 − ρi jl ) vi j(l−1) + i∈[I] j∈J(i) l=2 ∑ yik(l−1) j − k∈J(i) ∑ yi j(l−1)k Vi jl k∈J(i) L + ∑ ∑ vi jL − γ(1 − ρi jL ) ≤ c(s, a) ∑ vi jl + l=L−1 i∈[I] j∈J(i) ∑ k∈J(i) yikl j − ∑ yi jlk Vi jL k∈J(i) ∀s ∈ S, a ∈ As U, V ≥ 0,U0 ∈ R where λ represents the mean arrival rates and ρ the probabilities of individuals departing the system. The approximate equivalent linear programming model in (B.19) has a reasonable number of variables but an intractable number of constraints. For this reason, we solve its dual (B.20) through column generation. 148 Uin min X≥0 ∑ (s,a)∈ S×A(s) c(s, a)X(s, a) (B.20) subject to (1 − γ) ∑ X(s, a) = 1 (s,a)∈ S×As ∑ (ui1 − γλi )X(s, a) ≥ Eα [ui1 ] ∀i ∈ [I] (s,a)∈ S×As uin − γ(1 − ρin ) ui(n−1) − ∑ (s,a)∈ S×As xi(n−1) j ∑ X(s, a) ≥ Eα [uin ] j∈J(i) ∀(i, n) ∈ [I] × {2, . . . , N − 1} N uiN − γ(1 − ρiN ) ∑ n=N−1 (s,a)∈ S×As vi j1 − γ(1 − ρi j1 ) ∑ uin − ∑ (s,a)∈ S×As ∑ xin j − zi X(s, a) ≥ Eα [uiN ] ∀i ∈ [I] X(s, a) ≥ Eα [vi j1 ] ∀(i, j) ∈ [I] × J(i) xin j n∈[N] vi jl − γ(1 − ρi jl ) vi j(l−1) + ∑ ∑ j∈J(i) (s,a)∈ S×As yik(l−1) j − ∑ k∈J(i) ∑ yi j(l−1)k X(s, a) ≥ Eα [vi jl ] k∈J(i) ∀(i, j, l) ∈ [I] × J(i) × {2, . . . , L − 1} L vi jL − γ(1 − ρi jL ) ∑ (s,a)∈ S×As ∑ l=L−1 vi jl + ∑ k∈J(i) yikl j − ∑ yi jlk X(s, a) ≥ Eα [vi jL ] k∈J(i) ∀(i, j) ∈ [I] × J(i) Column generation solves this problem by starting with a small set of feasible state-action pairs and then (using the dual prices as estimates for U and V ) finding one or more violated constraints in the primal. It adds the state-action pair associated with the most violated constraints until either no primal constraint is violated or one is close enough to optimality to quit. The integer program required to find 149 the most violated constraint is: + min (s,a)∈ S×As ∑ cWf f ∑ ∑ uin − T f i∈I( f ) n∈[N] N−1 + ∑∑ cW in + γ(1 − ρi(n+1) )Ui(n+1) −Uin uin i∈[I] n=1 + ∑ cW iN + γ(1 − ρiN )UiN −UiN uiN i∈[I] L−1 + ∑ ∑ ∑ cPijl + γ(1 − ρi j(l+1) )Vi j(l+1) −Vi jl vi jl i∈[I] j∈J(i) l=1 + ∑ ∑ cPijL + γ(1 − ρi jL )Vi jL −Vi jL vi jL i∈[I] j∈J(i) N−1 + ∑∑ ∑ γ (1 − ρi j1 )Vi j1 − (1 − ρi(n+1) )Ui(n+1) xin j i∈[I] n=1 j∈J(i) + ∑ ∑ γ [(1 − ρi j1 )Vi j1 − (1 − ρiN )UiN ] xiN j i∈[I] j∈J(i) L−1 + ∑ ∑ ∑ ∑ γ (1 − ρik(l+1) )Vik(l+1) − (1 − ρi j(l+1) )Vi j(l+1) yi jlk i∈[I] j∈J(i) l=1 k∈J(i) + ∑ ∑ ∑ γ [(1 − ρikL )VikL − (1 − ρi jL )Vi jL ] yi jLk i∈[I] j∈J(i) k∈J(i) + ∑ cD i − γ(1 − ρiN )UiN zi i∈[I] + ∑ γλiUi1 − (1 − γ)U0 i∈[I] The implementation of the column generation algorithm and the integer programming model used to identify the approximate optimal policy was performed in GAMS 23.5 with CPLEX 12.2 as the solver. 150
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Approximate dynamic programming methods for advance...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Approximate dynamic programming methods for advance patient scheduling Sauré, Antoine 2012
pdf
Page Metadata
Item Metadata
Title | Approximate dynamic programming methods for advance patient scheduling |
Creator |
Sauré, Antoine |
Publisher | University of British Columbia |
Date Issued | 2012 |
Description | This dissertation studies an advance multi-priority patient scheduling problem. Patrick et al. (2008) formulated a version of this problem as a discounted infinite-horizon Markov decision process (MDP) and studied it using a linear programming method based on an affine value function approximation. This thesis starts by presenting an alternative solution approach for this problem based on the use of simulation, a policy iteration framework and a non-linear value function approximation. It then extends the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. by considering patients who receive service across multiple days and for irregular lengths of time, and by allowing the possibility of using overtime on different days of the booking horizon. The research described in this dissertation is based on the hypothesis that some patients can be booked further into the future allowing the appointments for urgent patients to be scheduled earlier, and it seeks to identify effective policies for allocating available service capacity to incoming demand while reducing patient wait times in a cost-effective manner. Through the use of approximate dynamic programming techniques, it illustrates the importance of adequately assessing the future impact of today's decisions in order to more intelligently allocate capacity. Chapter 1 provides an overview of the multi-priority patient scheduling problem and a review of the literature relevant to it. Chapter 2 describes a simulation-based algorithm for solving a version of this problem and compares the performance of the resulting appointment scheduling policies against the performance of four other policies, including the one derived from the linear programming method. Chapter 3 extends the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. It presents a discounted infinite-horizon MDP model for scheduling cancer treatments in radiation therapy units and a linear programming method for solving it. The benefits from the proposed method are evaluated by simulating its performance for a practical example based on data provided by the British Columbia Cancer Agency. Chapter 4 describes a teaching tool developed to illustrate advance patient scheduling practices to health care professionals and students. Finally, this dissertation concludes with additional discussion, extensions and further applications. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2012-10-17 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0073300 |
URI | http://hdl.handle.net/2429/43448 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Graduation Date | 2012-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2012_fall_saure_antoine.pdf [ 3.11MB ]
- Metadata
- JSON: 24-1.0073300.json
- JSON-LD: 24-1.0073300-ld.json
- RDF/XML (Pretty): 24-1.0073300-rdf.xml
- RDF/JSON: 24-1.0073300-rdf.json
- Turtle: 24-1.0073300-turtle.txt
- N-Triples: 24-1.0073300-rdf-ntriples.txt
- Original Record: 24-1.0073300-source.json
- Full Text
- 24-1.0073300-fulltext.txt
- Citation
- 24-1.0073300.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0073300/manifest