@prefix vivo: .
@prefix edm: .
@prefix ns0: .
@prefix dcterms: .
@prefix skos: .
vivo:departmentOrSchool "Business, Sauder School of"@en, "Operations and Logistics (OPLOG), Division of"@en ;
edm:dataProvider "DSpace"@en ;
ns0:degreeCampus "UBCV"@en ;
dcterms:creator "Imanpoor Yourdshahy, Mona"@en ;
dcterms:issued "2020-08-24T19:10:55Z"@en, "2020"@en ;
vivo:relatedDegree "Doctor of Philosophy - PhD"@en ;
ns0:degreeGrantor "University of British Columbia"@en ;
dcterms:description """This thesis comprises three studies. Within each study, we analyze a stochastic multi-period decision making problem in the area of service management. Throughout the dissertation, we incorporate dynamic programming techniques to develop the mathematical models and analyze the solution approaches for each of the discussed problems.
In the first two studies, we consider a queueing system in which customers require some conditions to be met prior to receiving service. We investigate whether an individual arriving to this system should join the queue at that time, or wait to join at some future time. Chapters 2 and 3 discuss two variations of this problem. We formulate the problem as a Markov decision process and show how the structure of the optimal policy depends on various parameters of the model.
Furthermore, in Chapter 3, we use the construct of “level-k” thinking from the behavioral and experimental economics literature to model the bounded rationality of customers, and also to characterize their equilibrium strategy. We present the structural results of the customers’ joining policies and show the threshold structure of customers’ decisions with different levels of rationality. We also analyze the socially optimal solution and compare it to the equilibrium policy. The optimal policies we derive allow for a better management of customers’ waiting time and also give information on their joining behavior to service providers.
In the third study, we analyze another dynamic decision making problem in a different context than the previous two studies. Through tracking drivers’ behavior, Usage Based Insurance (UBI) allows insurance companies to connect insurers’ premiums more closely to their actual driving performance. Chapter 4 provides a theoretical model to capture the effects of UBI on the auto insurance market. We formulate the underlying problem as a dynamic principal-agent model with hidden information and hidden action. Developing a dynamic programming algorithm, we characterize the full history-dependent optimal contract. Our model results shed light on how to design the contract to manage a UBI program, the extent to which a UBI policy can outperform a traditional policy, and how the potential gains depend on the demographics of the target market."""@en ;
edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/75644?expand=metadata"@en ;
skos:note "Dynamic Decision Making in Service OperationsManagementbyMona Imanpoor YourdshahyB.Sc., Sharif University of Technology, 2010M.Sc., Sharif University of Technology, 2012M.Sc., University of Florida, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Business Administration)The University of British Columbia(Vancouver)August 2020c©Mona Imanpoor Yourdshahy, 2020The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Dynamic Decision Making in Service Operations Managementsubmitted by Mona Imanpoor Yourdshahy in partial fulfillment of the require-ments for the degree of Doctor of Philosophy in Business Administration.Examining Committee:Prof. Steven Shechter, Operations and Logistics Division, Sauder School of Busi-nessSupervisorProf. Tim Huh, Operations and Logistics Division, Sauder School of BusinessCo-SupervisorProf. Mahesh Nagarajan, Operations and Logistics Division, Sauder School ofBusinessSupervisory Committee MemberProf. Hao Zhang, Operations and Logistics Division, Sauder School of BusinessSupervisory Committee MemberProf. Hasan Cavusoglu, Accounting & Information Systems Division, SauderSchool of BusinessSupervisory Committee MemberProf. Yi Qian, Marketing & Behavioral Science Division, Sauder School of Busi-nessUniversity ExaminerProf. Taraneh Sowlati, Department of Wood Science, Faculty of ForestryUniversity ExamineriiAbstractThis thesis comprises three studies. Within each study, we analyze a stochasticmulti-period decision making problem in the area of service management. Through-out the dissertation, we incorporate dynamic programming techniques to developthe mathematical models and analyze the solution approaches for each of the dis-cussed problems.In the first two studies, we consider a queueing system in which customers re-quire some conditions to be met prior to receiving service. We investigate whetheran individual arriving to this system should join the queue at that time, or wait tojoin at some future time. Chapters 2 and 3 discuss two variations of this prob-lem. We formulate the problem as a Markov decision process and show how thestructure of the optimal policy depends on various parameters of the model.Furthermore, in Chapter 3, we use the construct of “level-k” thinking from thebehavioral and experimental economics literature to model the bounded rationalityof customers, and also to characterize their equilibrium strategy. We present thestructural results of the customers’ joining policies and show the threshold structureof customers’ decisions with different levels of rationality. We also analyze thesocially optimal solution and compare it to the equilibrium policy. The optimalpolicies we derive allow for a better management of customers’ waiting time andalso give information on their joining behavior to service providers.In the third study, we analyze another dynamic decision making problem in adifferent context than the previous two studies. Through tracking drivers’ behavior,Usage Based Insurance (UBI) allows insurance companies to connect insurers’premiums more closely to their actual driving performance. Chapter 4 provides atheoretical model to capture the effects of UBI on the auto insurance market. Weiiiformulate the underlying problem as a dynamic principal-agent model with hiddeninformation and hidden action. Developing a dynamic programming algorithm,we characterize the full history-dependent optimal contract. Our model resultsshed light on how to design the contract to manage a UBI program, the extent towhich a UBI policy can outperform a traditional policy, and how the potential gainsdepend on the demographics of the target market.ivLay SummaryThis thesis considers the applications of multi-period decision making models inthe field of service operations management. The first two studies discuss the strate-gic consumers’ queue joining behavior when some requirements should be metprior to receiving service. We show how both customers and service providers canbenefit from the fact that strategic customers may decide to join the queue beforehaving the conditions, thereby parallel processing their time in queue and timeneeded to complete the requirements.In the third study, we investigate the impacts of Usage-Based Insurance (UBI),an emerging technology that links the premium rates of customers to their actualdriving performance. Developing a theoretical model and performing numericalanalysis based on data from a major insurance company, we discuss the benefitsof offering UBI policies including its effect on accident rate’s reduction. We alsoprovide interesting policy recommendations regarding the design of an optimalUBI program.vPrefaceI was the primary author of the work presented in this Ph.D. thesis.The first two studies in this dissertation (chapters 2 and 3) are co-authored withProf. Steven Shechter and Prof. Tim Huh. They were involved in the stages ofproblem formulation and analysis, provided feedback during the course of bothstudies, and contributed to manuscript edits. I was responsible for writing the ma-jority of these studies, developing and implementing all the models, and preparingthe numerical results.In the third study of this thesis (chapter 4), I was primarily responsible for iden-tifying the research question, preparing the literature, developing and solving themodels, numerical analysis, as well as preparing the manuscript. My co-authors,Prof. Mahesh Nagarajan and Prof. Hao Zhang, contributed to identifying and po-sitioning the research question, model development, editing the manuscript, andproviding feedback in various stages of the work.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 When to Queue If You Do Not Want Service Too Soon . . . . . . . . 82.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . 102.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 112.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.1 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Preliminary Results . . . . . . . . . . . . . . . . . . . . . 142.2.3 Case a: c≥ 1−ρ . . . . . . . . . . . . . . . . . . . . . . 16vii2.2.4 Case b: c < 1−ρ . . . . . . . . . . . . . . . . . . . . . 172.2.5 Optimal Policy When There Is No Prerequisite Event . . . 182.3 Extended Model: The Individual May Leave Without ReceivingService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.1 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . 202.4 Conclusions and Extensions . . . . . . . . . . . . . . . . . . . . 213 Equilibrium Joining Strategies in the Presence of a Prerequisite Con-dition for Receiving Service . . . . . . . . . . . . . . . . . . . . . . . 233.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.1 Literature review . . . . . . . . . . . . . . . . . . . . . . 263.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 283.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.1 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . 303.3 Equilibrium Analysis . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1 Optimal Joining Strategy for a Level-2 Player . . . . . . . 343.3.2 Optimal Joining Strategy for a Player of Level-3 or Higher 383.3.3 Comparative Statics . . . . . . . . . . . . . . . . . . . . 413.4 Social Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . 443.4.1 Socially Optimal Policy . . . . . . . . . . . . . . . . . . 453.4.2 Comparing the Equilibrium and Socially Optimal Policies 473.5 Conclusions and Extensions . . . . . . . . . . . . . . . . . . . . 494 Effects of Usage-Based Auto Insurance: A Dynamic Mechanism-Design Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524.1 Introduction and Related Literature . . . . . . . . . . . . . . . . . 524.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.2.1 Mathematical Formulation . . . . . . . . . . . . . . . . . 604.2.2 Recursive Formulation . . . . . . . . . . . . . . . . . . . 664.2.3 Terminal Problem . . . . . . . . . . . . . . . . . . . . . . 704.3 Principal’s Problem Considering Continuation Social Welfare Fron-tiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71viii4.3.1 Continuation Social Welfare Frontiers for the Terminal Prob-lem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.3.2 Continuation Social Welfare Frontiers for the UBI Problem 744.3.3 Extended Model with Limited Liability . . . . . . . . . . 794.4 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 804.4.1 Estimation of the Parameters of the Model . . . . . . . . . 814.4.2 A Short UBI Policy vs. a Long UBI Policy . . . . . . . . 874.4.3 Benefits of Monitoring in Terms of the Accident Rate . . . 884.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . 905 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99A Appendix of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . 106A.1 Discussion on Jn of Equation (2.2) . . . . . . . . . . . . . . . . . 106A.2 Proof of Lemma 2.1. . . . . . . . . . . . . . . . . . . . . . . . . 107A.3 Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . 113A.4 Proof of Proposition 2.1 . . . . . . . . . . . . . . . . . . . . . . . 119A.5 Proof of Theorem 2.2. . . . . . . . . . . . . . . . . . . . . . . . . 121A.5.1 Discussion on u∗ of Theorem 2.2. . . . . . . . . . . . . . 122A.6 Proof of Theorem 2.3. . . . . . . . . . . . . . . . . . . . . . . . . 124A.7 Proof of Theorem 2.4. . . . . . . . . . . . . . . . . . . . . . . . . 126B Appendix of Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . 131B.1 Proof of Proposition 3.1 . . . . . . . . . . . . . . . . . . . . . . . 131B.2 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . 132B.3 Sufficient Condition on Model Parameters that Guarantees Assump-tion 3.1 Holds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136B.4 Equilibrium Analysis (Optimal Joining Strategy for a Player ofLevel-3 or Higher) . . . . . . . . . . . . . . . . . . . . . . . . . 137B.5 Social Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . 138B.5.1 Proof of Proposition 3.2 . . . . . . . . . . . . . . . . . . 138B.5.2 Comparing the Equilibrium and Socially Optimal Policies 141ixC Appendix of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . 142C.1 Proof of Proposition 4.1 . . . . . . . . . . . . . . . . . . . . . . . 142C.2 Proof of Proposition 4.2 . . . . . . . . . . . . . . . . . . . . . . . 146C.3 Proof of Proposition 4.3 . . . . . . . . . . . . . . . . . . . . . . . 148C.4 Proof of Proposition 4.4 . . . . . . . . . . . . . . . . . . . . . . . 158C.5 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . 160C.6 Proof of Theorem 4.2 . . . . . . . . . . . . . . . . . . . . . . . . 160C.7 Corollaries Regarding the Extended Model with Limited Liabilityof Section 4.3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 166xList of TablesTable 4.1 Data Summary of the Insurance Score, Daily Number of HardBrakes, and UBI Score . . . . . . . . . . . . . . . . . . . . . . 83Table 4.2 Results of Clustering Customers to the two Groups of Low andHigh Risk Drivers . . . . . . . . . . . . . . . . . . . . . . . . 83Table 4.3 Data Summary for the UBI Score in the First Month of Monitoring 84Table 4.4 Values of the Parameters for the Numerical Example . . . . . . 86xiList of FiguresFigure 2.1 Optimal policy structure . . . . . . . . . . . . . . . . . . . . . 18Figure 2.2 Optimal policy structure when penalty cost δ = 0 . . . . . . . . . 19Figure 3.1 Optimal policy structure before having the requirements (A = 0) . . 32Figure 3.2 Optimal joining strategy for a level-2 player before having the re-quirements (λ = 3, µ = 4, prerequisite event rate α = 0.5, in-queue andout-of-queue waiting costs= 1, penalty cost δ = 10, ω∗ = 20. ) . . . . . 38Figure 3.3 Optimal joining strategy for a level-k player before having the re-quirements (λ = 3, µ = 4, prerequisite event rate α = 0.5, in-queue andout-of-queue waiting costs= 1, penalty cost δ = 10, k=rationality level) . . 40Figure 3.4 Equilibrium joining strategies before having the requirements fordifferent values of penalty δ (λ = 3, µ = 4, prerequisite event rate α =0.5, in-queue and out-of-queue waiting costs= 1) . . . . . . . . . . . . 42Figure 3.5 The behavior of the equilibrium joining thresholds before having therequirements with respect to prerequisite event rate α (λ = 3, µ = 4,in-queue and out-of-queue waiting costs= 1, penalty cost δ = 10.) . . . . 43Figure 3.6 Socially optimal policy structure before having the requirements . . 47Figure 3.7 Expected cost of system per unit of time for different settings (λ = 3,µ = 4, in-queue and out-of-queue waiting costs= 1, number of customersconsidered for simulation= 5000) (95% confidence interval of the reportedcosts is not more than ±0.1) . . . . . . . . . . . . . . . . . . . . . 48xiiFigure 3.8 Average total wait time per customer for different settings (λ = 3,µ = 4, in-queue and out-of-queue waiting costs= 1, number of customersconsidered for simulation= 5000) (95% confidence interval of the reportedcosts is not more than ±0.07) . . . . . . . . . . . . . . . . . . . . 49Figure 4.1 Sequence of Events in Period t . . . . . . . . . . . . . . . . . . 63Figure 4.2 Continuation Social Welfare Frontiers for the Terminal Problem . . 74Figure 4.3 Agent’s Optimal Effort Structure . . . . . . . . . . . . . . . . . 78Figure 4.4 Optimal Effort Structure for the Numerical Example . . . . . . . . 87Figure 4.5 Social Welfare Increase w.r.t. Monitoring Length . . . . . . . . . 87Figure 4.6 Optimal Monitoring Time w.r.t. the Percentage of High Risk Driversand Monitoring Cost . . . . . . . . . . . . . . . . . . . . . . . 88Figure 4.7 Accident Rate Reduction w.r.t. the Percentage of High Risk Driversand Monitoring Costs . . . . . . . . . . . . . . . . . . . . . . . 89Figure B.1 Average wait time in queue per customer for different settings (λ = 3,µ = 4, in-queue and out-of-queue waiting costs= 1, number of customersconsidered for simulation= 5000) (95% confidence interval of the reportedcosts is not more than ±0.02) . . . . . . . . . . . . . . . . . . . . 141Figure B.2 Average wait time out of queue per customer for different settings(λ = 3, µ = 4, in-queue and out-of-queue waiting costs= 1, number of cus-tomers considered for simulation= 5000) (95% confidence interval of thereported costs is not more than ±0.06) . . . . . . . . . . . . . . . . 141Figure C.1 Decreasing and increasing directions for uatt+1(.,H) and uatt+1(.,L) ofthe terminal problem . . . . . . . . . . . . . . . . . . . . . . . 144Figure C.2 Decreasing and increasing directions for uOTT+1(.,H) and uOTT+1(.,L)of subproblem (0,0) . . . . . . . . . . . . . . . . . . . . . . . . 151Figure C.3 Decreasing and increasing directions for uOTT+1(.,H) and uOTT+1(.,L)of subproblem (1,1) . . . . . . . . . . . . . . . . . . . . . . . . 155xiiiAcknowledgmentsI take this opportunity to express my enduring gratitude to my advisors, Prof.Steven Shechter and Prof. Tim Huh, for their continuous support throughout myPh.D. study and research, for their patience, motivation, enthusiasm, and immenseknowledge. Steven and Tim were extremely caring mentors who guided me everystep of the way. I am also truly grateful to Prof. Mahesh Nagarajan and Prof. HaoZhang as my committee members and the co-authors of my third study for theirtremendous help, brilliant suggestions, and valuable contributions. Their enduringsupport have made a huge difference in my knowledge and experience. I sincerelyacknowledge the other member of my committee, Prof. Hasan Cavusoglu for hiskind support as well. Without all my advisors’ and supervisory committee mem-bers’ help and guidance, this thesis would not have been possible.I would like to extend my gratitude to everyone in the Operations and Logisticsdivision at the Sauder School of Business. All the faculties have been very gener-ous in providing support and guidance whenever I needed them. I am also delightedthat I had such an incredible team of Ph.D. students, or better to say friends, withwhom I shared this experience. Besides, a warm thank you to dear Elaine Cho anddear Rita Quill for their encouragement and extraordinary approachability duringmy Ph.D. journey.I want to express my deepest appreciation to my family and friends withoutwhom any achievement is meaningless. Words cannot describe my gratitude tomy wonderful parents, Shahnaz and Hossein, and my beloved brother, Alireza, fortheir unconditional love and support in my whole life. They made me believe inmy abilities and helped me grow as a strong woman. Last but not least, I wouldlike to thank my dear husband, Miremad, who has been by my side throughoutxivmy Ph.D., living every single moment of it with me. His endless love and supportmake everything possible for me.xvDedicationTo my dear parents, Shahnaz and Hossein, and to my beloved husband, Miremad.xviChapter 1IntroductionThe unifying aim of this dissertation is to develop models and solution techniquesfor operational problems in the area of service management. Effectively manag-ing service operations is crucial to controlling the costs for both of the serviceproviders and customers, which can lead to an improvement in customers’ satis-faction as well. Most of the decision making problems in the field of service man-agement involve situations where decisions are made in multiple stages. Thus, inthis thesis, our focus is mainly on providing solution approaches for multi-periodmodels. In all three chapters, we consider complex stochastic dynamic decisionmaking problems in which our objective is to provide some kind of service withspecific characteristics at the minimum cost.In multi-period decision making problems, the outcome of each decision maynot be fully observable or predictable, but it can be anticipated to some extentbefore the next decision is made. The objective is usually to minimize a certaincost over a given number of stages, which is the case in all of the models we study.Throughout the dissertation, we incorporate dynamic programming techniques toanalyze the developed multi-period models provided in each chapter. Dynamicprogramming is an optimization approach that transforms a complex problem intoa sequence of simpler problems, and its essential characteristic is the multistagenature of the optimization procedure [3].There is a broad variety of practical problems in operations management thatcan be treated by dynamic programming. In the first two chapters of this thesis,1“When to Queue If You Do Not Want Service Too Soon” and “Equilibrium JoiningStrategies in the Presence of a Prerequisite Condition for Receiving Service”, weincorporate dynamic programming techniques to examine the strategic consumers’behavior of joining service systems with crowded queues. The last chapter, en-titled “Effects of Usage-Based Auto Insurance: A Dynamic Mechanism DesignApproach”, also analyzes a problem in which decisions are taken sequentially overtime. In this study, we investigate the impacts of an emerging technology that gath-ers real-time driving data from customers on the insurance market. In what follows,a brief introduction of the above-mentioned studies is provided.In many service settings, customers need to meet some side conditions beforebeing served. When queueing is involved, an strategic individual might decide tojoin the queue before the prerequisite condition is met, thereby parallel processingher time in queue and time needed to complete the requirements. Depending on therandom time until reaching service and the time until the preconditions are met, theformer may occur before the latter. This can result in some type of penalty (e.g.,removal from the queue or a monetary penalty). Thus, individuals should take intoaccount the risk of incurring the penalty cost in their joining decisions as well. Weconsider a few variations of this problem, each of which may be applicable depend-ing on the circumstance. To the best of our knowledge, our work is the first in theliterature to analyze queue joining decisions when there is a required condition forreceiving service. This opens up new applications of strategic customers’ behaviorin the rational queueing literature.First, in Chapter 2 entitled “When to Queue If You Do Not Want Service TooSoon”, we investigate the case that individuals need to join the queue to be ableto initiate the prerequisite process. As an example of such settings, one can thinkof cases for which an application number is needed to prepare the requirements.Chapter 2 explores a strategic customer’s joining behavior arriving to this type ofsystems. We formulate the problem as a Markov decision process and show howthe structure of the optimal policy depends on various parameters of the model.In Chapter 3 entitled “Equilibrium Joining Strategies in the Presence of a Pre-requisite Condition for Receiving Service”, we analyze the case in which the pre-requisite event may be in process before joining the queue. This applies to moregeneral settings in practice. As an example, consider an elderly person who joins2the wait list for a nursing home prior to needing that level of care. This is a com-mon practice, as nowadays, the waiting list for such facilities is very long. First,we take the perspective that only the individual approaching the queue is strategicabout whether to join it or not and analyze the individual’s optimal policy, and thenwe discuss a game theoretic setting where all individuals are strategic.To obtain the equilibrium policy, we incorporate the construct of “level-k”thinking from the behavioral and experimental economics literature (first intro-duced in Stahl and Wilson [59]). Analytically characterizing the equilibrium forcustomers considering “wait versus join” decisions in queueing systems includingour setting is challenging because of the competition among customers and due tothe state-dependent transition probabilities. This is discussed in a recent compre-hensive review of the rational queueing literature (Section 4.1.3 of [28]). Thus,incorporating the standard approaches, the equilibrium strategy of our models can-not be obtained, but applying the level-k framework, we can obtain a sufficientcondition for an equilibrium policy. We believe our work is the first to considerlevel-k thinking in the context of a queueing game.Using level-k modeling approach also allows us to discuss the behavior of cus-tomers with bounded levels of rationality. In reality, individuals may not be fullyrational to follow the equilibrium strategy, as they may have different levels of ra-tionality (Simon [56]). Therefore, in addition to analyzing the Nash equilibrium,we discuss the structure of the optimal policy that individuals with different levelsof rationality follow. Furthermore, we characterize the optimal policy of a socialplanner (i.e., first best solution) and compare the results to the equilibrium policy.The main emphasis of Chapters 2 and 3 are on developing theoretical frame-works and solution techniques to analyze the above mentioned queueing settings inthe presence of a required side condition for receiving service. However, our modelresults also provide insights for the service providers regarding how strategic cus-tomers make queue joining decisions under different model parameters. This canbe helpful when the system managers decide how to set penalties for customerswho reach service without completing their requirements, which we leave to thefuture studies. Finally, the optimal policies we derive would be useful in thinkingof extending these results to the settings in which the queue or the number of peo-ple who are waiting out of queue are unobservable, which is the case in most of the3settings where there is a virtual wait-list to join.In Chapter 4 entitled “Effects of Usage-Based Auto Insurance: A DynamicMechanism Design Approach”, we study another dynamic decision making prob-lem in a different context than the previous two chapters. This chapter discussesthe benefits of personalizing the insurance companies’ offerings to the customers.In recent years, due to the vast increase in availability of customer data, the auto in-surance industry has been moving towards personalizing their offerings, similar tomany other industries. Usage Based Insurance (UBI) is one of the most recent in-novations by auto insurance companies that helps them in this regard as it links thepremium rates of customers to their actual driving performance. The use of UBIprograms is becoming increasingly popular [27]. In 2015, close to 12 million con-sumers globally subscribed to this type of insurance, and this value is expected togrow over 142 million subscribers by 2023 [35]. Thus, usage-based insurance hassubstantial effects on how the insurance industry operates. According to a recentresearch report by Global Market Insights Inc., the global UBI market’s currentvalue is about USD 34 billion, which is anticipated to reach to USD 107 billion by2024 [22].In UBI programs, the drivers’ behavior is monitored directly while they driveusing some mobile-based data collection applications or in-vehicle telecommuni-cation devices. Then, based on the assessed data, the insurance company offerssome discounts to the customers. In the insurance market, the firm’s lack of in-formation leads to two well-known incentive issues: (1) adverse selection, wherethe customers’ types or preferences are unknown, and (2) moral hazard, where thecustomers’ efforts or choices are unknown. Monitoring technologies can alleviateboth problems; by using this technology, insurance companies can get more ac-curate estimates of the customers’ risk factors, and at the same time, by offeringan appropriate contract, companies can incentivize customers to drive more care-fully and improve their driving behavior, which may result in reduced accidentsand lowered congestion and emissions.Chapter 4 provides a theoretical model to capture the effects of UBI on the autoinsurance market. We formulate the problem as a finite horizon dynamic principal-agent problem. A UBI program is offered to each customer for a fixed duration(e.g., one year); when this program ends, a regular insurance contract follows,4which may be contingent on the drivers’ performance during the UBI program. Inour model, the UBI program is divided into multiple periods (e.g., 12 months or 52weeks). An agent (customer) privately knows his type that summarizes his abilityas a driver, and can make a hidden effort in each period to drive carefully, whichaffects his subsequent type. The principal (insurer) offers a long-term contractto the agent (which covers both the UBI phase and the regular insurance phase)despite the fact that she observes neither the type of the agent nor the actions takenby him. The driving behavior of the agent captured through the telematics devicesis a noisy signal of his type and effort in each period; this signal is observableto both the agent and the principal. We characterize the full history-dependentoptimal contract for this dynamic adverse-selection and moral-hazard problem sothat both the premium for the agent and his effort level in each period depend onhis driving record.Dynamic principle-agent models with both adverse selection and moral haz-ard are highly complex and cannot be analyzed accurately using the standard ap-proaches discussed in the literature. In order to compute the optimal history-dependent contract, we develop a recursive formulation. The underlying systemis a Markov decision process, where the evolution of the state of the system (typeof the customer) is endogenous, as it depends on the hidden action (effort) that theagent takes to improve his driving behavior in the previous period. The optimalcontract can be derived by backward induction based on the set of continuationpayoffs for the principal and the agent in each period. The feasible set of pairs ofcontinuation payoffs for the two parties in each period is vast. We tackle this diffi-culty by focusing on the efficient frontiers of the feasible sets (there is one such setfor each period), which express the principal’s maximum continuation payoff as afunction of the agent’s continuation payoff (vector) at the beginning of each period.Furthermore, we overcome the complexity that arises from the interaction of hid-den state and hidden action by decomposing the problem into a set of subproblemsin each period.To the best of our knowledge, the literature on dynamic principal-agent con-tracts are fairly limited, and articles with both adverse selection and moral hazardare even fewer. Due to the interplay between hidden type and hidden action, ana-lyzing the dynamic contracting problems with both issues is complex, and conse-5quently, most of the papers in this area resort to numerical analyses (see e.g., [19]).In contrast, we develop a dynamic programming algorithm to examine the modelanalytically and explore structural results about the optimal contract.Besides nontrivial contributions to the theory of dynamic principal-agent games,the results of this study also lead to important and interesting managerial insightsfor firms considering UBI programs. We show that as it gets closer to the end (ofthe UBI program), the gap between the optimal continuation social welfare corre-sponding to different agent types enlarges, which suggests better screening of theagent as more information about him is collected. This verifies one of the benefitsof the UBI program to the insurers. However, the marginal benefit of better screen-ing declines as the number of periods grows, so it is not always beneficial to offera long monitoring period. Our model results provide an estimation of the expectedmarginal benefit to the firms offering a UBI program for each period of screening.This proves useful when the insurance companies want to decide for how long theyshould monitor the driving behavior of customers. Thus, depending on the modelparameters, we make suggestions on whether it is better to offer a short or long UBIprogram. Furthermore, as discussed earlier, designing an appropriate UBI contractcan incentivize customers to become better drivers which may results in accidentrate reduction as well. Solving the proposed model, we can estimate the potentialgain of offering UBI programs in terms of reducing the accident rate.To illustrate the underlying insights of the proposed model, we perform a nu-merical analysis. For this purpose, using a novel dataset from one of the majorinsurance companies in the US, we estimate the model parameters. We have ac-cess to the UBI data of 40,525 drivers that submitted a quote request to purchasean insurance policy from March 2012 to November 2014, who also adopted thisprogram. Analyzing the results of the model given the estimated parameters, weobtain the optimal monitoring time and provide comparative statics results on theoptimal length of monitoring with respect to various model parameters. We showthat under the optimal contract, we can have about an 18% reduction in the annualaccident rate.Recently published studies in the marketing and operations management liter-ature, Soleymanian et al. [58] and Choudhary et al. [8], explored the benefits ofUBI programs for the drivers empirically. However, gains from this screening pro-6gram for the auto insurance companies have not been analyzed in the literature,which is one of the main focuses of this paper as explained above. According tothe business press, many newly established insurance companies are struggling todetermine whether they should initiate and offer UBI. Our study sheds light onhow to design the contract to manage a UBI program, the extent to which a UBIpolicy can outperform a traditional policy, and how the potential gains depend onthe demographics of the target market.7Chapter 2When to Queue If You Do NotWant Service Too Soon2.1 IntroductionMany service settings require individuals to satisfy some side conditions beforebeing served. When queueing is involved, an individual may decide to join thequeue before the prerequisite condition is met. This allows for parallel processingof waiting for the service as well as waiting for the completion of the prerequi-site condition. However, this also means that the individual may reach servicebefore completing the prerequisites, which can result in some type of penalty, e.g.,removal from the queue, extra processing time while in service, or monetary penal-ties. As an example, the mobile “Nowait” application allows customers to join await-list at their desired restaurant before arriving at the location. Depending onthe random time until reaching service and the time until all members of the partyarrives at the restaurant, the former may occur before the latter.An important research stream in the queueing literature concerns customers’behavior in joining queues. In this chapter, we investigate whether and when anindividual arriving to a queue should join the queue when there is a prerequisiteevent for receiving service. We analyze situations where individuals need to jointhe queue to be able to initiate the prerequisite process. As an example of such set-tings, one can think of cases for which an application number is needed to prepare8the requirements. For instance, immigrants seeking permanent residency may firstneed to obtain an application number (i.e., by joining the queue) before they canprepare some of the documents needed by the time they are scheduled for a con-sulate meeting (i.e., reach front of the queue). In the next chapter, we extend thismodel to the case in which the prerequisite event may be in process before joiningthe queue.In our model, an individual arrives to an M/M/1 queue (physically or virtually)and observes (for a physical queue) or is told (for a virtual queue) the number ofpeople in the system. First, we assume that the individual must eventually join thequeue (outside options are not available), but need not join right away. Then, weanalyze a setting where the individual may also leave the system (i.e., forgo serviceor seek it elsewhere). Whenever the individual does join the queue, the process tocomplete the prerequisite condition begins, with a duration that is exponentiallydistributed. We consider different costs for waiting in versus out of queue, and apenalty for reaching service without completing the prerequisite condition.We present the structure of the optimal policy depending on the queue length,and show how the structure of the optimal policy depends on model parameterssuch as in-queue and out-of-queue waiting costs, the arrival and service rates, aswell as the time until the prerequisite condition is satisfied. The main results are asfollows (without loss of generality, we let the cost of waiting in queue equal 1 perunit of time):• Consider the setting in which the individual has the only two options ofjoining the queue immediately or waiting to join at a later time. If the out-of-queue waiting cost is less than 1-ρ (where ρ is the steady state utilizationof the server), then a three-region policy of wait-join-wait is optimal. Thatis, for small and large queues, it is optimal for an arriving individual to waitout of queue until it reaches an intermediate size, and then join the queue.If the out-of-queue waiting cost is greater than 1-ρ , the optimal policy is atwo-region policy of wait-join. That is, one should join the queue if and onlyif its size is above a certain threshold.• In the presence of the option of leaving the system without receiving service,the optimal policy is a five-region policy of leave-wait-join-wait-leave. In9other words, one might decide to leave the system for very small or verylarge queues; otherwise, she applies a wait-join-wait policy for queue sizesin between.The remainder of the chapter is organized as follows. First, we review therelated literature and discuss our contributions. In Section 2.2, we formulate theproblem and characterize the optimal policy. Section 2.3 presents an extension tothe model of Section 2.2, in which the individual may leave the system withoutreceiving service. Finally, we provide concluding remarks in Section 2.4. Theproof of all theorems, propositions, and supporting lemmas can be found in theAppendix A.2.1.1 Literature ReviewOur work concerns strategic customers’ decisions regarding whether or not to joina queue. This was first explored in the pioneering work of Naor [47], in which acustomer arriving to an M/M/1 queue decides whether to join the queue or leave thesystem (balk). If the customer joins the queue, she cannot renege. There is a linearcost for time spent in the queue and a fixed reward for receiving service. Naorshows that the optimal strategy is a join-leave policy; that is, an individual shouldjoin the queue if and only if the number of people in the system is below somethreshold value (otherwise, she should leave). Yechiali [66] extends these resultsto the case of a G/M/1 queue and shows how the join-leave threshold depends onthe cost parameters. Knudsen [39] extends the results to an M/M/s/n setting andalso allows for nonlinear waiting cost. Yechiali [67] extends the results of Knudsento the case of G/M/s queue, but with linear cost structure. He also considers thecase of a queue with a bounded buffer.Unlike the above examples, individuals in our setting may wait out of queueand join at a later time. Mandelbaum and Yechiali [43] also consider the settingof waiting out of queue with the retrial option. They consider an M/G/1 queuein which an arriving individual chooses among three alternatives: join the queue,leave the system, or wait out of the queue. If the individual chooses to wait outof queue, in the next state change, she can choose among the three options again,while if she joins or leaves the queue, the problem ends. They prove the optimality10of a three-region policy (join-wait-leave); if the queue size is small enough, it isoptimal for the individual to enter the queue; if the queue is large enough, sheleaves the system, and for the queue sizes in between, she waits out of the queue.They also present more specific results for M/M/1 queues.Each of the above examples including our study in this chapter assumes onlythe arriving individual is strategic about whether to join the queue; all other cus-tomers join unconditionally. More recently, Cui et al. [14] generalize the model ofMandelbaum and Yechiali [43] to a game theoretic setting in which all customersconsider the same three options (join, leave or wait). If an individual chooses towait and retry later, she returns in the next period and faces the same decision ofjoining, leaving or waiting. Cui et al. [14] characterize the consumer equilibriumfor this setting. In Chapter 3, we also discuss queue joining decision of customersin equilibrium for a system with a prerequisite event for receiving service, butwithin a different framework. As mentioned earlier in Section 2.1, the timing ofthe initiation of the prerequisite event in Chapter 3 is different than that of themodel proposed here (upon arriving to the system vs. upon joining the queue).Equilibrium behavior has been studied in other papers as well, but without theretrial option. For instance, Hassin and Haviv [29] study an M/M/1 queue in whichall customers face the decision of either joining the queue or balking. However,they also allow individuals to leave the queue after joining. Under a linear waitingcost and a fixed reward for receiving service within a specified time frame, theyprove the existence of a Nash equilibrium. Hassin and Haviv [30] and Hassin [28]provide a comprehensive survey of such studies and also the other research streamsdiscussed in the literature regarding the strategic behaviors in queueing systems.2.1.2 ContributionsTo our knowledge, our study is the first to analyze queue joining decisions whenthere is a side condition required for receiving service. We answer the questionof when is the optimal time for an arriving individual to join such a queue. If theindividual joins when the queue size is small, she risks reaching service before theprerequisite is met, leading to a penalty cost. On the other hand, if she waits toolong before joining, she incurs more waiting costs. Thus, the individual should11balance the waiting costs against the risk of incurring penalty.We formulate and discuss different versions of this problem. In particular,we characterize the structure of the optimal policy. We present some comparativestatics of the optimal policies with respect to the model parameters. Studying thesequeueing settings opens up new applications for the rational queueing literature.The structure of the optimal policies presented for the mentioned models en-ables the service provider to get a sense of a strategic customer’s behavior in joiningqueues. This can prove useful in making decisions regarding some system factorssuch as the penalty cost that individuals incur if they reach the head of the queuewithout having the requirements.2.2 The ModelWe formalize the problem as follows. There is an M/M/1 queue with arrival rate λand service rate µ (λ < µ). An individual arrives to the queue and finds that thereare n people in the system (including the person being served). The individual doesnot need to join the queue immediately upon arrival, but must do so eventually (i.e.,service is needed and there are no outside options). Upon joining the queue, sheremains there until reaching service, i.e., there is no reneging or abandonment.Without loss of generality, we assume the cost of waiting in the queue is 1 per unittime, and the cost of waiting outside of the queue prior to joining is c per unit time.While the individual can join the queue at any time, there is a prerequisite con-dition that should be met before reaching service; otherwise, she incurs a penalty(there is no cost incurred for service time itself). The individual initiates the pre-requisite process upon joining the queue. Let A denote an indicator variable thatchanges from 0 to 1 when the prerequisite event completes. We assume this occursan exponentially distributed amount of time (with rate α) after the individual inquestion joins the queue.Once the individual joins the queue, she remains there (either physically or vir-tually). When she reaches the head of the queue and the server becomes available,if the prerequisite condition is met (i.e., A = 1), she receives service; otherwise,she incurs a penalty δ and leaves the system. We take the perspective that onlythe individual approaching the queue is strategic about whether to join or not. We12discuss the game theoretic approach in the next chapter. Thus, here we assumeall other individuals arriving to the system immediately join the queue and remainthere until reaching service. If they also require a prerequisite condition for service,they still occupy the server’s time with the same exponential rate µ of service.Since we consider exponential distributions, it follows from the memorylessproperty that we can formulate the individual’s problem as a Markov decision pro-cess. Provided the individual has not yet joined the queue, the state of the systemis the number of people in the queue plus service, which is denoted by n (exclud-ing the individual under consideration). The decision epochs take place whenevern changes (i.e., whenever the first of a service completion or new arrival takesplace). The individual’s objective is to minimize her expected total cost. Through-out the chapter, we assume the penalty cost, δ , is positive except when consideringa special case of δ = 0.2.2.1 Optimal PolicyWe define V ∗n as the optimal expected remaining cost incurred by the individualwhen she arrives to the system and finds n people there. Then, V ∗n solves thefollowing Bellman’s equations:Vn = min{Jn,Wn}, (2.1)where Jn is the expected cost that the individual incurs if she joins the queue whenthere are n customers already in the system, and Wn is the expected cost of waitingout of the queue in state n and then following the optimal policy from the next statechange. Since the costs associated with each action are non-negative, state spaceis countable, and action space is finite, a stationary policy satisfying Bellman’sequations (2.1) is an optimal policy [52].We can calculate Jn asJn =nµ+δ(1+αµ)−n. (2.2)The first term in (2.2) is associated with the waiting cost since there needs to be13n service completions, each with the expected duration of1µ. In the second term,(1+αµ)−ncorresponds to the probability that the individual reaches the head ofthe line before the prerequisite completion, in which case she incurs penalty δ ; thedetails of this argument can be found in Appendix A.The value for Wn is obtained as follows:Wn =cλ+V1 n = 0cλ +µ+λλ +µVn+1+µλ +µVn−1 n≥ 1.(2.3)If the individual does not join the queue in state n, she waits out of queueuntil the first of a service completion or new arrivals takes place, at which time shedecides again whether to join or not. Thus, the second line of (2.3) follows from thefact that the time until the first of an arrival or service completion is exponentiallydistributed with rate λ +µ , and the former occurs with probability λ/(λ +µ). Asthe out-of-queue waiting cost is c per unit time, the immediate expected cost ofwaiting in state n is c/(λ +µ) for n≥ 1, and c/λ when n is equal to 0.2.2.2 Preliminary ResultsBefore stating our main results, we define some terms and lemmas. First, we definethe following function f ( j) for j≥ 0, and j ∈ Z. We will use this function to obtainthe optimal policy.f ( j) =(( j∑i=0µ iλ i+1)c+1µ)((α+µ) j+1αµ j). (2.4)Next, we consider a modified problem, which will help us identify the optimalpolicy for the original problem. In the modified problem, the individual has twooptions: join the queue upon arrival (same option as before), or wait for now butjoin as soon as the next state change occurs (as opposed to reconsidering the wait-versus-join decision at that time). We define W ′n as the value of this latter choice14when observing n people upon arrival. That is,W ′n =cλ+ J1 n = 0cλ +µ+λλ +µJn+1+µλ +µJn−1 n≥ 1.(2.5)The Bellman’s equations for this problem are given by V ′n = min{Jn,W ′n}. Thefollowing lemma establishes the optimal policy for the modified problem. Recallthat ρ = λ/µ (i.e., the steady-state utilization of the server).Lemma 2.1a. If n = 0, for all values of c,{J0 ≤W ′0 if δ ≤ f (0)J0 >W ′0 if δ > f (0).b. For n≥ 1, if c≥ 1−ρ , then Jn ≤W ′n.c. For n≥ 1, if c < 1−ρ , then{Jn ≤W ′n if n≤ n∗Jn >W ′n if n > n∗,where,n∗ =⌊ ln( δµ[λ( αα+µ )−α]cµ+λ−µ)ln(1+ αµ) ⌋. (2.6)This lemma states that when the system is empty (i.e., n = 0) and the penaltycost δ is low enough, it is better to join the queue now than joining in the nextperiod (part a). Recall that in our setting, the periods in which decision epochs takeplace are defined as the times that the state of the system, n, changes. When thesystem contains at least one person, for sufficiently high out-of-queue waiting costs(c≥ 1−ρ), it is optimal to join (part b). Interestingly, this does not depend on the15penalty δ . Even for δ = 0, the optimal policy is to join when c ≥ 1−ρ (this caseis discussed separately in Section 2.2.5). Now suppose there is at least one personin the system and the out-of-queue waiting cost is relatively small (c < 1− ρ).Then, if queue is sufficiently large, waiting to join at the next state change wouldbe preferable to joining immediately; otherwise, it is better to join immediately(part c). The threshold of this optimal control limit policy depends on all of themodel parameters, given by equation (2.6).Next, we characterize the optimal policy of the original model according towhether the out-of-queue waiting cost, c, is greater than or less than the percentageof time the server is idle, 1− ρ . We make a use of the results of Lemma 2.1 toprove the following optimal policies.2.2.3 Case a: c≥ 1−ρFor the next result, we define j∗ as the following:j∗ = min{ j ≥ 0 : f ( j)≥ δ , j ∈ Z}. (2.7)Theorem 2.1 Suppose c ≥ 1− ρ . It is optimal to wait out of queue whenevern < j∗, and it is optimal to join the queue whenever n≥ j∗.The above theorem provides the optimal control wait-join threshold j∗ suchthat the optimal policy is to join for the queue sizes above that. Note that thetheorem is constructive; we show in the proof how to solve for j∗ without needingto solve the MDP.We present a few comparative statics results for threshold j∗ with respect to themodel parameters. First, j∗ monotonically increases in penalty cost δ . Intuitively,for higher δ , one wants to join the queue when there are more people waiting, toreduce the risk of incurring the higher penalty of reaching service without complet-ing the prerequisite event. Second, j∗ monotonically decreases in the out-of-queuewaiting cost c. This is intuitive since when it costs more to wait out of queue,joining the queue earlier is better. Third, j∗ monotonically increases in the arrivalrate λ . When the arrival rate increases, the probability that the next state changeis the result of a new arrival increases, and the expected duration to the next state16is shorter. The proofs of these comparative statics results are easily obtained basedon the definition of j∗Finally, we discuss the behavior of threshold j∗ with respect to the prerequisiteevent occurrence rate α . As α increases, an individual who joins the queue is morelikely to reach service with the prerequisite event completed. One may intuit thatthis should mean j∗ decreases with α . When α is relatively large, this intuitionholds. However, for sufficiently small α , i.e., when the prerequisite event is slowto occur, it turns out that the result is in the opposite direction: j∗ increases in α .We prove this through the following proposition.Proposition 2.1 There exists a threshold γ such that if α > γ , j∗(α) decreases inα , and for α < γ , j∗(α) increases in α .When the prerequisite event is not likely to occur (i.e., α is very small), due tothe out-of-queue waiting cost, it might not be worth waiting outside for the queuesize to become too large. In this case, as any increase in the event rate shortens theamount of the time needed for the prerequisite event, the individual might becomemore likely to wait (i.e., joining threshold increases) because there is a greater hopeto satisfy requirements. With increased event rate, the individual becomes morecautious about reaching the server without the prerequisite condition satisfied.2.2.4 Case b: c < 1−ρNow we consider the case where the out-of-queue waiting cost is sufficiently small;i.e., c satisfies c < 1−ρ . In this case, we show the optimality of a three-region pol-icy of wait-join-wait. This result is formally stated in Theorem 2.2. The statementof this theorem uses the definitions of n∗ and j∗ given in (2.6) and (2.7), respec-tively.Theorem 2.2 Suppose c < 1−ρ . There exists an integer u∗, satisfying j∗ ≤ u∗ ≤n∗, such that the optimal policy can be characterized as follows:a. If n < j∗, it is optimal to wait out of queue.b. If j∗ ≤ n≤ u∗, it is optimal to join the queue.c. If n > u∗, it is optimal to wait out of queue.17In other words, if the queue size is relatively small (n< j∗), it is optimal to waitout of the queue and avoid the risk of reaching the front of the queue before theprerequisite event completes. On the other hand, if the queue is too large (n > u∗),then the low cost of waiting out of queue justifies waiting until the queue reducesto an intermediate size before one should join. The expression of u∗ discussed inTheorem 2.2 is presented in Appendix A.Figure 2.1 summarizes the results of Theorems 2.1 and 2.2. The horizontalaxis in this figure represents the number of people who are already in the systemdenoted by n. Note that threshold j∗ for the wait region corresponding to smallerqueue sizes is identical for both cases involving c (i.e., it is optimal to wait whenn < j∗). Thus, all the comparative statics results mentioned in Section 2.2.3 for j∗apply to this case where c < 1−ρ , as well.Figure 2.1: Optimal policy structure2.2.5 Optimal Policy When There Is No Prerequisite EventHere we consider the join-versus-wait decision for the individual for the specialcase when there is no prerequisite event required to obtain service. This is equiv-alent to assuming δ = 0. Even in this simpler setting, the optimal decision is notobvious.Theorem 2.3 Suppose δ = 0.a. If c≥ 1−ρ , then it is always optimal to join the queue.b. If c < 1−ρ , then the optimal policy is to join the queue for state n = 0, andto wait out of the queue for any state n > 0.18Recall that the cost of waiting in queue is 1 per unit time. Therefore, sinceδ = 0, one will join the queue immediately when the out-of-queue waiting cost,c, is bigger than cost of waiting in queue. We show that even if the out-of-queuewaiting cost is below 1 but sufficiently large (i.e., c≥ 1−ρ), the optimal policy isto still join the queue right away (part a). When the out-of-queue waiting cost issmall (i.e., c < 1−ρ), one should wait until the number of people in the system, n,becomes 0, and then join the queue (part b). Figure 2.2 summarizes the results ofTheorem 2.3.Figure 2.2: Optimal policy structure when penalty cost δ = 0To prove Theorem 2.3, similar to Section 2.2.2, we define a modified problemin which the individual has two options: join the queue upon arrival (same optionas before), or wait for now but join as soon as the number of people in the system,n, changes (as opposed to reconsidering the wait-versus-join decision at that time).The optimal policy of this modified problem proves useful in identifying the opti-mal policy of Theorem 2.3. It turns out that the optimal policy of the original modelpresented in Theorem 2.3 is exactly the same as that of the modified problem.As a side note, we draw a connection between our Theorem 2.3 and a resultfrom Mandelbaum and Yechiali [43]. They presented a model in which an arrivingindividual has the option of joining the queue, waiting out of queue, or leaving thesystem altogether. They showed that it is never optimal to wait out of queue ifand only if the out-of-queue waiting cost c≥ 1. Our Theorem 2.3 shows that theirmodel without the leave option reduces the if and only if threshold to c≥ 1−ρ .192.3 Extended Model: The Individual May Leave WithoutReceiving ServiceIn this section, we study an extension to the model of Section 2.2. Under thatmodel, the individual had two options upon observing the queue: (i) join the queueand stay until reaching service, and (ii) wait out of the queue and make anotherdecision at the next state change. The model in this section endows the individualwith another option: (iii) leaving the system altogether at a cost of l. Once anindividual leaves the system, she cannot return to the system. Furthermore, onceshe joins the queue, she cannot leave the queue or leave the system. The rest of thesetup and parameters for this model are the same as the model of Section 2.2.2.3.1 Optimal PolicyIn this setting, if the individual chooses to join the queue or to leave the system, thedecision problem ends and the cost is Jn (from equation (2.2)) and l, respectively.Instead, if the individual chooses to wait out of the queue and monitor the system,she faces the same decision at the next state change. Thus, the definition of Wn isthe same as (2.3) given in Section 2.2.The Bellman’s equation for our model needs to account for the option of leav-ing the system, and is given by:Vn = min{Jn,Wn,Ln}, (2.8)where Ln is the expected cost of leaving the system in state n, which is equal to lfor all values of n≥ 0.Recall that the optimal policy for the base model in Section 2.2 was a three-region wait-join-wait policy (the wait-join policy for the case c ≥ 1− ρ can bethought of as a special case of the three-region policy). The structure of the opti-mal policy for this section is more complicated, but maintains a similar structure,characterized by a five-region policy: leave-wait-join-wait-leave.Theorem 2.4 There exist θLW , θWJ , θJW and θWL, satisfying θLW ≤ θWJ ≤ θJW ≤20θWL, such that an optimal policy is given as follows:leave the system if n≤ θLWwait out of the queue if θLW < n≤ θWJjoin the queue if θWJ < n≤ θJWwait out of the queue if θJW < n≤ θWLleave the system otherwise.Following the logic of Section 2.2 (optimality of a three region policy of wait-join-wait in the absence of leave option), if the line is very short, waiting until theline becomes long enough (i.e., reaching the join region) may be too expensive,so one may want to leave the system if this option is also available. Similarly,when the queue size is large, it may take too long to reach the join region, so theindividual may decide to leave the system in the presence of this option.Note that based on the values of the parameters, some of the five regions men-tioned in Theorem 2.4 may not exist. We indicate some of these conditions in theproof of this theorem in Appendix A. Also, as mentioned in Theorems 2.1 and2.2, optimal policy of the base model is a two-region or three-region policy (wait-join, or wait-join-wait) according to whether c ≥ 1−ρ or c < 1−ρ , respectively.Considering leave option, it turns out that c is not an indicator of having specificdistinct regions in the optimal policy anymore, and a non-trivial five-region policymay exist for values of c≥ 1−ρ or c < 1−ρ .The model in this section extends the M/M/1 result of Mandelbaum and Yechiali[43], who studied a similar setting without considering the prerequisite event andestablished the optimality of a three-region policy: joint-wait-leave. Theorem 2.4above shows that, in the presence of the prerequisite event consideration, the opti-mal policy includes two additional regions (leave-wait) to the left of the join-wait-leave region.2.4 Conclusions and ExtensionsIn this chapter, we studied a queueing system in which individuals would like tojoin the queue or waiting list of a service provider, but they need to have somerequired conditions by the time they reach service. We analyzed different variations21of the problem in Sections 2.2 and 2.3, which varied in the individual’s options ateach decision point (whether leaving is an option or not). For each version, weformulated the problem as a Markov decision process and discussed the structureof the optimal policy.When outside options for getting service are not available (i.e., leaving is notan option), we showed that in general, the optimal policy is a three-region policy ofwait-join-wait. However, we demonstrated that some unexpected shifts in policycould happen as a result of the combination of out-of-queue vs. in-queue waitingcosts. For example, when the out-of-queue waiting cost is relatively large, weproved the optimality of the special case of a two-region policy of wait-join.Furthermore, we showed how the optimal policy changes to a more compli-cated five-region policy of leave-wait-join-wait-leave in the presence of the thirdoption of leaving the system. The optimal policies we derived would be also usefulin thinking of a game theoretic setting where all customers are strategic, in whichcase one has to consider the potentially complex interactions among the customers.We discuss this setting in the next chapter.22Chapter 3Equilibrium Joining Strategies inthe Presence of a PrerequisiteCondition for Receiving Service3.1 IntroductionIn many queueing settings, customers need to meet some requirements to receiveservice, but they might decide to join the queue before the prerequisite condition ismet, thereby parallel processing their time in queue and time needed to completethe requirements. However, this comes with the risk of reaching service before theyare ready, resulting in a penalty (e.g., removal from the queue, extra processingtime while in service, fines, or shame). In Chapter 2, we studied the settings inwhich individuals need to join the queue to be able to initiate the prerequisite event.More general settings where the prerequisite event may be in process before joiningare analyzed in this chapter.Examples of such settings include the following. A couple intending to con-ceive may join a long day care waiting list prior to having a baby. An elderly personmay join the wait list for a nursing home prior to needing that level of care. Thefirst person of a group arriving to a restaurant may put her name on the waiting list,before the rest of the group has arrived. A traveler who arrives to an international23airport may join the line of the passport check before filling the declaration form.In a supply chain setting, if lead times for products are long, a downstream firmmight place an order before having the funds ready for payment.As mentioned in the previous chapter, an important stream in the queueingliterature concerns customers’ behavior in joining queues. In this chapter, we in-vestigate when an individual should join a queue when there is a prerequisite eventfor receiving service. In our setting, an individual arrives to an M/M/1 queue(physically or virtually) and observes (for a physical queue) or is told (for a virtualqueue) the number of people in the system. The individual must eventually join thequeue (outside options are not available), but need not join right away. We assumethe process to satisfy the prerequisite event is already ongoing at the time an indi-vidual observes the queue. Similar to Chapter 2, for this setting also we considerdifferent costs for waiting in versus out of queue, and a penalty for reaching servicewithout completing the prerequisite event.We first consider the optimal policy for a single strategic individual who de-cides whether to wait or join the queue when there are n customers in the system.Thus, we initially assume all other customers automatically join the queue uponarrival (i.e., they are not strategic), but relax this below. The main analytical resultsfor this setting are as follows (without loss of generality, we let the cost of waitingin queue equal 1 per unit of time):• If the out-of-queue waiting cost is less than 1-ρ (where ρ is the steady stateutilization of the server), then the optimal policy is to wait out of queueuntil the prerequisite event is satisfied and then join the queue only when thesystem becomes empty.• If the out-of-queue waiting cost is greater than 1-ρ , then the optimal policyis to join the queue if and only if its size is above a certain threshold.We then analyze settings in which all individuals may choose to join or waitat each system state. For this case of multiple strategic customers, we adopt theconstruct of “level-k” thinking from the behavioral and experimental economicsliterature (first introduced in Stahl and Wilson [59]). This approach considers thatindividuals are boundedly rational and often do not behave according to a Nash24equilibrium. A classic demonstration of this is the “guess 2/3 of the average” game,in which multiple contestants pick a number between 0 and 100, and whoever se-lects the number closest to 2/3 of the average wins. The unique Nash equilibrum ofthis game is for everyone to guess 0, which may be derived by iterative reasoning1.However, one experimental study found that individuals playing this game gave aninitial average response of around 37, with no one choosing the Nash equilibirumof 0 (Nagel [46]).In “level-k” models, there is a portion of non-strategic players who do notconsider other players and behave randomly or according to some naive strategy;these players are called level-0 players. In our setting, we refer to level-0 customersas those who join the queue immediately upon arrival. Then, there is a secondset of players who play a best response to level-0 ones; these are called level-1players, and in our model, are the customers who optimize their join vs. waitdecision assuming all other individuals are level-0 thinkers. Similarly, we refer tolevel-k customers as those who optimize their join vs. wait decision assuming allother customers are level-(k− 1) players. The optimal policy we described aboveregarding a single strategic individual deciding when to join a queue describes thepolicy of a level-1 customer.In moving to level-2 and higher customers, we assume that the state of thesystem is (n,m), where n is the number in the queue plus any in service (i.e., whatthe level-1 customer considers), and m is the number waiting out of the queue(which we assume is knowable via direct observation or information provided bythe system). In this setting, we (numerically) derive the following policy insightsfor a level-k customer in a system where everyone else is a level-(k−1) customer(assuming the same in-queue and out-of-queue waiting costs):• For each number of people waiting out of the queue (m), there is a controllimit n∗k(m) such that the level-k customer joins the queue if and only ifn≥ n∗k(m).1For example, one line of reasoning is as follows. The maximum number anyone can choose is100, and if everyone chooses that, then the average is (2/3)*100. So no one should select a numbergreater than (2/3)*100, as the average is certainly less. Having ruled out anyone selecting a numberabove (2/3)*100, by the same reasoning, the average will be no more than (2/3)2 ∗100. Taking thiselimination process to the limit results in the Nash equilibrium of everyone selecting 0.25• Furthermore, the control limit n∗k(m) decreases in m and k.• If the optimal policy for a level-k thinker matches the optimal policy for alevel-(k−1) thinker, then this optimal policy is an equilibrium.Hence, in addition to characterizing the behavior of customers with boundedlevels of rationality, incorporating the described level-k model also proves usefulin obtaining the equilibrium for the full rationality case. Due to the presence ofthe “wait” option in the action space and also because of the need of customers tohave some side conditions for receiving service, analytically deriving equilibriumresults using the standard approaches discussed in the literature is very challenging.However, applying a level-k framework makes it possible to analyze how customersbehave in equilibrium as well.Furthermore, we provide the structure of the socially optimal policy and com-pare it to the equilibrium solution numerically. In equilibrium, self-interested cus-tomers ignore the negative externalities of their behavior to others, which resultsin a higher expected cost for the system compared to the socially optimal solution;we discuss how this gap varies with respect to some model parameters.The remainder of the chapter is organized as follows. First, we review therelated literature and discuss our contributions. In Section 3.2, we formulate theproblem and characterize the optimal solution for a level-1 customer. Section 3.3presents the modeling details and numerical results for level-2 and higher cus-tomers. We also discuss how level-k analysis can be used to identify an equilib-rium policy. In Section 3.4, we derive the structure of a socially optimal policyand compare it to different level-k policies as well as the equilibrium policy. Fi-nally, we provide concluding remarks in Section 3.5. The proof of all theorems,propositions, and supporting lemmas can be found in Appendix B.3.1.1 Literature reviewSimilar to Chapter 2, our work in this chapter also concerns strategic customers’decisions regarding whether or not to join a queue. This chapter is an extensionof Chapter 2 to more generalized settings with multiple strategic customers inwhich the prerequisite event may be in process before joining the queue. As itis discussed in Section 2.1.1, the queue joining problem was first explored in the26pioneering work of Naor [47], in which a customer arriving to an M/M/1 queue de-cides whether to join the queue or leave the system. Naor showed that the optimalstrategy is a join-leave policy; that is, an individual should join the queue if andonly if the number of people in the system is below some threshold value. Later on,several articles extended the results of Naor to more complicated queueing settingswith both single and multiple strategic customers; these studies are discussed indetail in Section 2.1.1.Most of the game theoretic settings in the rational queueing literature assumethat customers are fully rational. However, the seminal work of Simon [56] onbounded rationality suggested individuals have limited levels of reasoning. Sev-eral experimental studies have demonstrated this by showing that individuals maynot behave according to a Nash equilibrium (see, for example, the work of Nagel[46], described above, as well as McKelvey and Palfrey [44] and Ho et al. [32]).While the concept of bounded rationality has been well studied in the behavioraleconomics and marketing literature, there has been limited discussion of it in theOperations Management (OM) literature.The work done by [34] was the first study to incorporate bounded rationalityin customers’ queue joining behavior. They discuss the case in which customersmay not have the capability of perfectly estimating their expected waiting time andutility when they want to decide between joining the queue or balking. Later on,Li et al. [40] analyzed a customer-intensive service system with two competingservers, in which customers decide to join one of the queues or balk. They as-sume that boundedly rational customers choose their service provider accordingto a logit model. Li et al. [41] extended the model of Li et al. [40] to a multipleserver case. Apart from queueing settings, Su [61] studied bounded rationality innewsvendor models, and [15] studied the limited strategic-reasoning capabilitiesof customers in a supply chain model. More recently, [51] presented a review ofpossible approaches including level-k framework to modeling bounded rationalityin OM settings. For example, they also described a level-k modeling approach,which we use for both modeling the behavior of boundedly rational customers aswell as a tool for establishing equilibrium strategy in our setting. We discuss thismore in Section 3.3.273.1.2 ContributionsAs it is mentioned in Chapter 2, to the best of our knowledge, we are the first toanalyze strategic customers’ queue joining behavior in the presence of a requiredside condition for receiving service. If the individual joins a short queue, she risksreaching service before the prerequisite is met, leading to a penalty cost. On theother hand, if she waits too long before joining, she incurs more waiting costs.Thus, the strategic individual should balance the waiting costs against the risk ofincurring a penalty.Rather than focus on equilibrium behavior, we assume boundedly rational cus-tomers and model them as level-k thinkers. We believe our work is the first toconsider level-k thinking in the context of a queueing game. Through a combina-tion of analytical and numerical results, we provide the structure of optimal policiesand provide comparative statics results. As a by-product of the level-k approach,we obtain a sufficient condition for an equilibrium policy as well. Finally, we char-acterize the optimal policy of a social planner (i.e., first best solution) and estimatethe loss of systems operating under different levels of k.3.2 The ModelIn Chapter 2, we assumed that the clock for the prerequisite event begins onlywhen the individual joins the queue. In this chapter, the clock for the prerequisiteevent begins immediately when the individual arrives to observe the queue. Inother words, the processing of the requirements starts at time 0, and the individualcan undergo the prerequisite event while waiting out of queue in parallel. The restof the setup in this model is identical to that of Chapter 2. Therefore, there is anM/M/1 queue with arrival rate λ and service rate µ (λ < µ). An individual arrivesto the queue and finds that there are n people in the system (including the personbeing served). The individual does not need to join the queue immediately uponarrival, but must do so eventually. Upon joining the queue, she remains there untilreaching service. Without loss of generality, we assume the cost of waiting in thequeue is 1 per unit time, and the cost of waiting outside of the queue prior to joiningis c per unit time.While the individual can join the queue at any time, there is a prerequisite con-28dition that should be met before reaching service; otherwise, she incurs a penalty(there is no cost incurred for service time itself). Let A denote an indicator variablethat changes from 0 to 1 when the prerequisite event completes. We assume thisoccurs an exponentially distributed amount of time (with rate α) after the individ-ual under consideration arrives to the queue (regardless of whether or not she joinsthe queue at that time as opposed to Chapter 2).Once the individual joins the queue, she remains there (either physically or vir-tually). When she reaches the head of the queue and the server becomes available,if the prerequisite condition is met (i.e., A= 1), she receives service; otherwise, sheincurs a penalty δ and leaves the system. In this section, we take the perspectivethat only the individual approaching the queue is strategic about whether to joinor not. We discuss a game theoretic setting in Section 3.3. Thus, here we assumeall other individuals arriving to the system immediately join the queue and remainthere until reaching service. If they also require a prerequisite condition for service,they still occupy the server’s time with the same exponential rate µ of service. Onecan think of this as the time required by the server to find out that the prerequisitecondition is not satisfied.Similar to Chapter 2, since we consider exponential distributions, it followsfrom the memoryless property that we can formulate the individual’s problem asa Markov decision process. Provided the individual has not yet joined the queueand the indicator variable A is still 0, the state of the system is the number ofpeople in the queue plus service, which is denoted by n (excluding the individualunder consideration). It is also possible that the prerequisite event completes whilewaiting out of queue (i.e., A changes to 1); we consider states n˜ to represent thiscase when there are n people in the system. If this occurs, the rest of the problemis equivalent to the case δ = 0 for for the model of Chapter 2 studied in Section2.2.5, and therefore Theorem 2.3 applies.The individual’s objective is to minimize her expected total cost, and her de-cision epochs take place whenever n changes (i.e., whenever the first of a servicecompletion or new arrival takes place), or when A changes from 0 to 1 (i.e., when-ever the prerequisite event is completed). Throughout the chapter, we assume thepenalty cost, δ , is positive.293.2.1 Optimal PolicyWe define V ∗n as the optimal expected remaining cost incurred by the individualwhen she arrives to the system and finds n people there. Then, V ∗n solves thefollowing Bellman’s equations:Vn = min{Jn,Wn}, (3.1)where Jn is the expected cost that the individual incurs if she joins the queue whenthere are n customers already in the system, and Wn is the expected cost of waitingout of the queue in state n and then following the optimal policy from the next statechange. Since the costs associated with each action are non-negative, state spaceis countable, and action space is finite, a stationary policy satisfying Bellman’sequations (3.1) is an optimal policy [52].The expression of Jn is identical to equation (2.2):Jn =nµ+δ(1+αµ)−n,and the value for Wn is obtained as follows:Wn =cα+λ+αα+λV0˜+λα+λV1 if n = 0cα+λ +µ+αα+λ +µVn˜+λα+λ +µVn+1 if n≥ 1.+µα+λ +µVn−1(3.2)If the individual does not join the queue in state n, she waits out of queue untilthe first of a prerequisite event completion, new arrival or service completion takesplace, at which time she decides again whether to join or not. Thus, the secondline of (3.2) follows from the fact that the time until the first of these events isexponentially distributed with rate α + λ + µ , and each occurs with probabilityα/(α+λ +µ), λ/(α+λ +µ), and µ/(α+λ +µ) respectively. Thus, as the out-of-queue waiting cost is c per unit time, the immediate expected cost of waitingin state n is c/(α + λ + µ) for n ≥ 1. When n is equal to 0, the cost of waitingout of queue is c/(α +λ ), and the next state change is the first of a prerequisite30completion and new arrival.If the prerequisite event completes while the individual is waiting out of queue(i.e., A changes to 1 and state n˜ is reached), the rest of individual’s problem isequivalent to the model of Chapter 2 for which the penalty cost δ is 0. As men-tioned at the beginning of Section 3.2, from that time forward, it is optimal for theindividual to follow the policy of Theorem 2.3. We make a use of this theorem tocalculate Vn˜.Recall that state n˜ represents the case where the prerequisite event completeswhile waiting out of queue and there are n people in the system. Since it is optimalto follow the policy of Theorem 2.3 when state n˜ is reached, this state can be treatedas an absorbing state with lump sum expected cost-to-go denoted by Vn˜. We showthat Vn˜ is equal to the following.Proposition 3.1a. If c≥ 1−ρ , then Vn˜ = n/µ .b. If c < 1−ρ , then Vn˜ = nc/(µ−λ ).Substituting the expressions of Vn˜ presented in Proposition 3.1 into equation3.2 simplifies the definition of Bellman’s equations.Before stating the main theorem regarding the optimal policy, we make thefollowing assumption for the remainder of the chapter. It implies that when thesystem is empty and the prerequisite condition has not yet completed, it is optimalto wait until there is at least one person in the system, rather than join and incurthe penalty right away. In other words, the penalty cost, δ , is large enough to makecustomers want to avoid incurring it. If this were not the case, the “join versus wait”consideration would not be particularly interesting. In Appendix B, we provide asufficient condition on model parameters that guarantees Assumption 1 holds.Assumption 3.1 J0 ≥W0.As discussed above, when the prerequisite event is satisfied (A = 1), the in-dividual should follow the optimal policy of Theorem 2.3. Now, we discuss thecase when A = 0, and we again analyze the optimal policy separately for both high(c≥ 1−ρ) and low (c < 1−ρ) out-of-queue waiting costs.31Theorem 3.1 Suppose A = 0.a. If c ≥ 1− ρ , there exists a threshold ω∗ such that it is optimal to join thequeue for n≥ ω∗ and wait out of queue for n < ω∗.b. If c < 1−ρ , it is optimal to wait out of queue.Figure 3.1 depicts the results of the above theorem. The horizontal axis in thisfigure represents the number of people who are already in the system denoted byn. When c ≥ 1− ρ , if the queue size is relatively small (n < ω∗), it is optimalto wait out of the queue and avoid the risk of reaching service without satisfyingthe prerequisite condition. In practice, a high out-of-queue waiting cost may arisewhen there is a physical queue and individuals need to be present in the waitingarea if they decide to join later.Figure 3.1: Optimal policy structure before having the requirements (A = 0)When the out-of-queue waiting cost, c, is sufficiently small (c < 1−ρ), part(b) of Theorem 3.1 proves the optimality of waiting out of queue until having therequired conditions and an empty system. A low out-of-queue waiting cost appliesin situations where there is a virtual waiting list, so individuals can perform otheractivities while waiting out of queue. Interestingly, the optimal policy of Theorem3.1(b) is also the policy taken by an individual who wishes to avoid the possibleembarrassment of reaching service without having the required side condition met.These individuals will only consider joining the queue after the prerequisite condi-tion for service is completed.Theorem 3.1 can be proved using the same approach of defining a modifiedproblem similar to the one discussed in Section 2.2.2 and also in the proof of The-orem 2.3 of Chapter 2. The definition of the modified problem is slightly differentthan the one defined in Chapter 2 to account for the fact that in Theorem 3.1, re-quired conditions have not been met yet. Thus, we define a modified problem in32which the individual has two options: join the queue upon arrival (same option asbefore), or wait for now but join as soon as the number of people in the system, n,changes if the indicator variable A is still equal to 0 (as opposed to reconsideringthe wait-versus-join decision at that time). If the indicator variable changes to 1while the individual is waiting out of the queue, the system moves to the absorbingstates n˜ and the individual follows the policy mentioned in Theorem 2.3. The opti-mal policy of the defined modified problem proves useful in identifying the optimalpolicy of Theorem 3.1, which is discussed in Appendix B.3.3 Equilibrium AnalysisSection 3.2 presented an individual’s optimal policy when she was the only cus-tomer being strategic (i.e., all others joined the queue upon arrival). In this section,we now assume that all arriving individuals are strategic, and we analyze the Nashequilibrium of the system with homogeneous customers.One of the key assumptions in studying the equilibrium is that players haveunlimited reasoning abilities. Each player also believes that others are fully rationaland this belief is consistent across all players. However, it has been shown throughseveral experimental studies that players usually employ boundedly rational levelsof reasoning, and do not behave in the way expected from the equilibrium solution(e.g., see McKelvey and Palfrey [44] and Ho et al. [32]).To explain non-equilibrium behaviors, the economics literature introduced thecognitive hierarchy (CH) (Camerer et al. [6]) and level-k (LK) models (Stahl andWilson [59, 60] and Nagel [46]). These modeling frameworks account for the factthat players may have different and limited abilities of reasoning. There is a portionof non-strategic players who do not consider other players and behave randomlyor according to some naive strategy; these players are called level-0 players. Then,there is a second set of players who assume that others are level-0 ones; these arecalled level-1 players. Similarly, level-k players assume that others have lowerlevels of rationality (level-(k−1) or lower). In CH models, a level-k player plays abest response to all lower level ones (e.g, see Camerer et al. [6]). However, in LKmodels, level-k player plays a best response to just the level-(k−1) ones (e.g., seeHo et al. [32] and Ho and Su [31]). If for some k′, level-k′ and level-(k′+1) players33have the same optimal policy, then this forms a Nash equilibrium as no player hasincentive to deviate given the actions of others (Stahl and Wilson [59] and Camereret al. [6]). We use this property to establish the equilibrium strategy of our setting.In the following, we apply level-k modeling to obtain the equilibrium in thefull rationality case. At the same time, we characterize the structure of the op-timal joining policy for customers with lower levels of rationality. We assume alevel-k player chooses a best response to level-(k−1) ones, which is similar to theapproach taken by Ho et al. [32] and Ho and Su [31].In Section 3.2, we assumed all arrivals to the queue, except the individual underconsideration, were non-strategic and joined the queue unconditionally upon theirarrival (i.e., they were level-0 players). We then characterized the optimal joiningpolicy for a level-1 player who assumes everyone else are level-0 ones. In thissection, we start by assuming that all other arrivals are level-1 players, and weanalyze the optimal behavior of a level-2 player. We repeat this process iterativelyand continue doing the same level-k analysis assuming that others are level-(k−1)players. We do this until we obtain the fixed point of the sequence, which is actuallya Nash equilibrium of the system. Numerical results show convergence to the Nashequilibrium in a finite number of iterations.The details of Bellman’s equations for level-2 and level-3 players are discussednext. We highlight these because the solution of each requires consideration of adifferent type of policy for the level-(k− 1) players involved. For example, thelevel-1 players considered by a level-2 player make joining decisions based onlyon the number of people in the queue. Then, as we demonstrate next, the level-2players considered by a level-3 player consider both the number of people waitingin queue as well as those waiting out of queue. This is also the same considerationfor all level-k players, where k ≥ 3.3.3.1 Optimal Joining Strategy for a Level-2 PlayerFor ease of exposition, we assume in-queue and out-of-queue waiting costs arethe same throughout the rest of the paper, i.e., c = 1. This way, we focus on themain insights of the model. A level-2 player supposes that all other individuals arelevel-1 players who also think that others have a level-0 of rationality. Thus, by34part (a) of Theorem 3.1, there exists a threshold ω∗ such that all level-1 individualsconsidered by a level-2 player join the queue for any state n≥ ω∗, and wait out ofqueue for any n < ω∗ if the indicator variable A is still 0 (i.e., they do not have therequirements yet). Whenever A for any of them changes to 1 (i.e., prerequisite eventis satisfied), part (a) of Theorem 2.3 shows they join the queue immediately. At anytime when there are multiple people wishing to join the queue, we assume that oneof them joins the queue, and this individual is chosen with equal probability. Notethat because individuals might join the queue before having the requirements forreceiving service, there is a chance that they reach the head of the line before theirprerequisites are met; recall from Section 3.2 that such people still occupy theserver’s time with the same exponential rate µ of service and after incurring thepenalty cost δ , they leave the system.To find the optimal policy of the level-2 player under consideration, we definethe state of the system as the number of people in the queue plus service denotedby n, and the number of people waiting out of queue denoted by m, which areboth observable. Let V ∗n,m be the expected cost-to-go of a level-2 player when shearrives to the system and finds there are n people in queue and m out of queue, whoare all level-1 players. V ∗n,m can be obtained by solving the following Bellman’sequations. For brevity, we just represent the Bellman’s equations for the case inwhich the joining threshold of level-1 players, ω∗, satisfies ω∗ ≥ 2.When the queue is empty (i.e., n = 0), we have:V0,m = min{J0,W0,m}, (3.3)whereW0,m =cλ +α+λλ +αV0,1+αλ +α(0) if m = 0cλ +mα+α+λλ +mα+αV0,m+1+mαλ +mα+αV1,m−1 if m≥ 1.+αλ +mα+α(0)Above, Jn represents the expected cost of joining the queue when there are npeople in the queue plus service, which is defined in equation (2.2). If the individ-ual does not join the queue in state (n,m) = (0,0), she waits until the first of a new35arrival or her own prerequisite condition completes, whichever takes place first.This explains the three terms for the case m = 0 (note that the last term equals 0because the queue is empty at that time, the individual’s requirements are satisfied,and therefore she can be served immediately with no penalty). For m ≥ 1, if theindividual decides to wait out of queue, in addition to the arrival and prerequisitecompletion event for the individual under consideration, the prerequisite event maycomplete for one of the other people waiting out of queue. The time until the firstof these other people satisfies her condition is exponentially distributed with ratemα .Similarly, for 1≤ n≤ ω∗−1, we have:Vn,m = min{Jn,Wn,m} , (3.4)whereWn,m =cλ +µ+α+λλ +µ+αVn,1+µλ +µ+αVn−1,0 if m = 0+αλ +µ+α(cnµ)cλ +µ+mα+α+λλ +µ+mα+αVn,m+1 if m≥ 1.+µλ +µ+mα+αVn−1,m+mαλ +µ+mα+αVn+1,m−1+αλ +µ+mα+α(cnµ)The last element in the definition of Wn,m follows from the fact that in-queueand out-of-queue waiting costs are the same, so after preparing the requirements,the individual will join the queue and incur the expected cost of(cnµ).When n ≥ ω∗, other individuals waiting out of queue would try to join. Thus,for the case of m = 0, as there is no one waiting out of queue, the expression forVn,0 is the same as in (3.4).However, for n≥ ω∗ with m≥ 1, we have:Vn,m = min{(1m+1)Jn+(1− 1m+1)Vn+1,m−1,Vn+1,m−1}. (3.5)36The first term above corresponds to the expected cost incurred by the individualif she attempts to join and the second term represents the expected cost-to-go ifshe waits out of queue in state (n,m) and follows the optimal policy from the nextstate change (i.e., Wn,m). Equation (3.5) differs from (3.4) to account for the factthat other individuals follow the policy of joining the queue for n ≥ ω∗. Thus,if the individual under consideration decides to join the queue, with probability(1m+1), she would be able to do so as all the m people waiting out of queue andthe individual herself are competing to jump in line; otherwise, one of the otherindividuals waiting out of queue will join, resulting in the state transitioning tostate (n+1,m−1).If the individual under consideration decides to wait, one of the other peoplein the waiting area will join immediately since the queue size is greater than orequal to ω∗, and thus the state changes to (n+ 1,m− 1); this state change takeszero time, which implies there is no waiting cost c being incurred and Wn,m =Vn+1,m−1. To make the dynamics more clear, note that the probability that theindividual observes state (n,m) with n≥ ω∗ and m≥ 1 upon her first arrival to thesystem is equal to 0 as others follow the policy of joining the queue for n ≥ ω∗.However, the mentioned states can be the result of transitions from other stateswhile the individual is waiting out of queue and she may observe them in laterdecision points. This follows from the fact that if queue size reaches ω∗ while thereare some people waiting out of queue; these people will start joining the queue oneby one and the individual under consideration may involve in this competition tojoin as well. Numerical analysis shows the following structure for the optimalpolicy of a level-2 individual.Observation 3.1 Suppose that there exists a state (n∗,m∗) for which it is optimalto join before having the requirements. Then,a. for any state (n,m∗) such that n≥ n∗, the optimal policy is to join.b. For any state (n∗,m) with m≥ m∗, it is optimal to join as well.Furthermore, if the prerequisite event is satisfied while waiting out of queue, theoptimal policy is to join right away.37Figure 3.2 illustrates the structure of the optimal policy discussed in Observa-tion 3.1. Horizontal axis of this figure represents the number of people in queue(including the one in service) denoted by n, and the vertical axis depicts the num-ber of people waiting out of queue (excluding the individual under consideration)denoted by m. This structure for the optimal policy preserves in the following anal-ysis for a player of level-3 or higher, though we observe a non-increasing behaviorfor the joining thresholds in each direction (holding the other dimension fixed) withrespect to the level of rationality, k.Figure 3.2: Optimal joining strategy for a level-2 player before having therequirements (λ = 3, µ = 4, prerequisite event rate α = 0.5, in-queue andout-of-queue waiting costs= 1, penalty cost δ = 10, ω∗ = 20. )3.3.2 Optimal Joining Strategy for a Player of Level-3 or HigherThe Bellman’s equations for this setting are fundamentally different than above,because a level-k (k≥ 3) player must choose a best response to level-(k−1) players,who follow the two-dimensional policy of Observation 3.1 (whereas level-2 playersjust needed to consider level-1 players, who considered only the one dimension ofnumber of people in the queue). The details of Bellman’s equations for the level-338player can be found in Appendix B. Solving these equations, we get the optimalpolicy with the same structure presented in Observation 3.1, though the joiningthresholds in both directions become smaller. Afterwards, we assume that all otherarrivals follow this new policy (i.e., they are level-3 players), and we find the bestresponse of the individual under consideration (level-4 player). We repeat thisprocess until reaching the fixed point. Numerical analysis show that the processends in a finite number of iterations, and that Observation 3.1 holds for each level-k player. Also, we obtain the structure presented in Observation 3.2 numerically.In this observation, m∗n(k) for any rationality level k denotes the joining thresholdfor the number of people waiting out of queue corresponding to each value of n,which implies for any state (n,m) with m≥ m∗n(k), the optimal policy is to join.Observation 3.2 There exists a finite threshold k¯ such that for k ≥ k¯, the optimaljoining strategies are identical. However, for k < k¯, joining thresholds of a level-kplayer (m∗n(k);∀n), decrease with respect to k.Observations 3.1 and 3.2 are both obtained through numerical analyses. Asmentioned at the beginning of this section, for ease of exposition, the out-of-queuewaiting cost is assumed to be equal to c= 1 per unit of time. Thus, different combi-nations for the other parameters including µ , λ < µ , α , and δ should be explored.For the extreme values of each parameter, the optimal policy can be intuitive. Forinstance, for small enough values of the penalty cost δ , the optimal policy is toalways join before having the requirements and for the other extreme when thepenalty cost is sufficiently large, it is optimal to wait until having the prerequisitecondition and then join the queue. This limits the parameter space needed to beconsidered, and makes it possible to divide the range for each parameter to the fi-nite number of points. Therefore, a broad range for model parameters is exploredand it turns out that for all the instances, the results of both Observations 3.1 and3.2 hold.Figure 3.3 presents the results of Observation 3.2. Before having the require-ments, the individual joins in the region to the right of the plots (including thestrips of colors) and it is optimal to wait otherwise. However, if the individuals’prerequisite event satisfies while waiting out of queue, they join right away. Recallthat for k = 1, Theorem 3.1 indicates that there is an optimal wait-join policy that39only depends on the number of people waiting in the queue; this threshold for thesame example presented in Figure 3.3 is 20. The figure also visualizes the decreas-ing behavior of joining thresholds with respect to the rationality level k discussedin Observation 3.2. This is the result of competition that forces the individuals tojoin the queue before having the requirements, even when the queue is relativelysmall. The red plot in this figure is the fixed point strategy of the level-k analysisas k increases, i.e, it is the equilibrium policy of this example. Having a decreasingbehavior with respect to k for the joining thresholds guarantees the convergence toa Nash equilibrium strategy as for each given n, m∗n(k) is a decreasing sequence ink bounded below by 0.Figure 3.3: Optimal joining strategy for a level-k player before having therequirements (λ = 3, µ = 4, prerequisite event rate α = 0.5, in-queue andout-of-queue waiting costs= 1, penalty cost δ = 10, k=rationality level)Due to the presence of a retrial option in the action space, analytically derivingequilibrium results is challenging. In general, it is difficult to characterize theequilibrium for customers considering “wait versus join” decisions in queueingsettings and the amount of research in this domain is very limited. This is discussed40in a recent comprehensive review of the rational queueing literature (Section 4.1.3of Hassin [28]). We are aware of only one paper that analytically characterizes theequilibrium in a setting with retrials (Cui et al. [14]). The authors considered asetting in which customers at each period can join the queue, leave the system orwait and retry in the next period. Our work is similar to this study in the sense thatthe individuals of our setting also have a retrial option. However, there are somekey differences. First, Cui et al. [14] considered balking, which effectively limitsthe number of states they need to consider in their analysis. Second, they assumedecision periods are long enough apart that customers returning observe a systemstate that is independent of the queue size at the time they left the system. Finally,individuals in our setting need to have some side conditions for receiving service.Considering the prerequisite event results in non-monotonicity of the customers’joining payoff and also because of the randomness of the time needed for this eventto be completed, the customers who retry may have or not have the requirementsby the time of retrial; this makes the characterization of the equilibrium for thecustomers with retrial option even more complex compared to Cui et al. [14]’ssetting and the approach used by them cannot be applied directly.3.3.3 Comparative StaticsWe note a few comparative statics results in this section. Our focus here is onthe equilibrium policy and how it changes with respect to the model parameters.However, the following results hold for the optimal policies of the individuals withother levels of rationality as well.Penalty cost that the individuals incur if they reach service without having therequirements is one of the parameters that service provider needs to decide about asit affects the joining behavior of customers. We start this section with analyzing theeffects of this penalty cost δ on the equilibrium joining thresholds. We let m∗n(∞)denote the joining threshold for the number of people waiting out of queue corre-sponding to each value of n before having the requirements for receiving service inequilibrium. Note that if the prerequisite event is satisfied for any of the individu-als while waiting out of queue, the optimal policy is to join right away. Numericalanalysis show that joining thresholds are increasing with δ (i.e., for each value of n,41m∗n(∞) increases in δ ). Intuitively, before having the requirements, higher penaltycosts make people more conservative in joining queue as they would try to avoidthe risk of reaching the head of the queue before being ready to get service. Thus,as we expect, for the boundary case where δ goes to zero, the equilibrium strat-egy approaches to the policy of always joining. Similarly, for the other boundarycase that δ goes to infinity, the strategy that individuals follow in equilibrium ap-proaches to the policy of waiting until having the requirements and then joining thequeue. Figure 3.4 represents this behavior of equilibrium strategies with respect tothe penalty cost δ .Figure 3.4: Equilibrium joining strategies before having the requirements fordifferent values of penalty δ (λ = 3, µ = 4, prerequisite event rate α = 0.5,in-queue and out-of-queue waiting costs= 1)Figure 3.4 also implies that the marginal effect of penalty cost δ on the joiningthresholds is decreasing in queue size. Following the results discussed in Section3.2, when we have only one strategic customer, it is optimal to wait for small queuesizes and join when the queue size is big enough. However, when all individualsare strategic, if the number of people waiting out of queue is so large, competitionmay force individuals to join even in small queue sizes as otherwise they may incur42a huge waiting costs. Since in small queue sizes, the risk of incurring penalty costsis higher for the individuals who join the queue, the joining thresholds are moresensitive to the penalty costs, i.e., the marginal effect of δ is bigger compared tothe larger queues.Next we discuss the behavior of joining thresholds in the equilibrium settingwith respect to the prerequisite event occurrence rate α . Intuitively, we expect a de-creasing behavior for the joining thresholds as by increasing the prerequisite eventoccurrence rate, the probability of reaching service with the completed require-ments is higher and thus the individuals tend to join the queue sooner. Also, byincreasing α , people waiting out of queue satisfy their requirements with a higherrate which results in a more competitive environment to join. Hence, the joiningthresholds are decreasing in α for any given n. However, for very small valuesof the prerequisite event occurrence rate α , this intuition does not hold and theresult is in the opposite direction. Numerical analysis implies that for any givenqueue size n, there exists a threshold for α called γn such that if α < γn, the joiningthreshold m∗n(∞) increases in α; otherwise, it decreases. This result is pictured inFigure 3.5.Figure 3.5: The behavior of the equilibrium joining thresholds before having therequirements with respect to prerequisite event rate α (λ = 3, µ = 4, in-queueand out-of-queue waiting costs= 1, penalty cost δ = 10.)43When the prerequisite event is not likely to occur (i.e., α is very small), due tothe out-of-queue waiting cost, it might not be worth waiting outside for the queuesize to become too large. In this case, as any increase in the event rate shortens theamount of the time needed for the prerequisite event, the individual might becomemore likely to wait (i.e., joining threshold increases) because there is a greater hopeto satisfy requirements. With increased event rate, the individual becomes morecautious about reaching the server without the prerequisite condition satisfied.Numerical results also show that γn is monotonically decreasing in n, or in otherwords, the expected decreasing behavior of thresholds with respect to α is seensooner for the larger values of n. Figure 3.5 displays this result as well. When thereare more people in queue, it is less risky for the individuals to join without havingthe required conditions as they still have time to prepare the requirements whilewaiting in queue. This explains the mentioned behavior, and thus by increasing α ,people wait less outside as expected.3.4 Social OptimalityIn the absence of the option of joining the queue without having the prerequisiteconditions, customers wait and join after satisfying the requirements for gettingservice; considering this extra option helps strategic customers save their waitingcosts by parallel processing waiting in queue as well as waiting for the comple-tion of the required conditions. However, the joining strategy of self-interestedcustomers in equilibrium can generate negative externalities to the others as theycause over congestion in the queue. On the other hand, it can also smooth theprocess for the future arrivals. In the following, we study what the overall effect is.Checking the conditions required for getting service before joining the queuecan be costly for some systems. In addition to the fact that introducing the optionof being able to join the line without having the prerequisites can save this initialchecking cost of the requirements, numerical analysis show that it also results in alower expected cost per unit of time for the system in equilibrium. As discussedin Section 3.3, in practice, customers may have different levels of rationality. Nu-merical analysis represents that even if customers have the lowest rationality level(i.e., level-1 who follows the policy obtained in Section 3.2), the system’s expected44cost decreases by introducing this extra option of being able to join the line with-out having the requirements compared to the case that individuals can only join thequeue after completing the prerequisite conditions. Numerical results are presentedin Figure 3.7. This figure depicts the differences in expected cost of the system perunit of time for different regimes which are obtained through simulation.2Having an extra option of joining the queue before completing the prerequisiteconditions, the system can be better off if individuals follow the socially optimalsolution (first best solution) instead of the equilibrium strategy. Naor [47], Hassinand Haviv [29] and [14] show that the self-interested individuals form an equi-librium that deviates from socially optimal solution as they ignore the negativeexternalities of their actions to the rest of the customers, which is also the case inour study. In the following, first we discuss the structure of the socially optimalpolicy. Then, we compare the equilibrium and socially optimal solutions. The as-sumptions of this section are similar to those of Section 3.3, so in-queue waitingcost is the same as the cost of waiting out of queue3.4.1 Socially Optimal PolicyThe next proposition and observation represent the structure of the socially optimalpolicy. This includes the possibility of moving a customer waiting out of queue tothe queue, even if she hasn’t completed her prerequisite event. We discuss thereasons why such strategy can be a socially optimal policy after Observation 3.3.Proposition 3.2 Suppose c = 1.a. If the customer’s prerequisite condition is not satisfied, for n≥ 1, it is sociallyoptimal to let a customer wait out of queue.b. If the customer’s prerequisite condition is satisfied, for n ≥ 0, it is sociallyoptimal to let a customer waiting out of queue join the queue.2For the policy of “joining after completing the prerequisites”, it can be shown that the expressionfor the expected cost per unit of time is (λcα+λ (λµ−λ )cµ). This can be proved by considering thewaiting area as an M/M/∞ queue with the arrival rate of λ and the service rate of α , such that itsdepartures are actually the arrivals to the main queue.45We remark that the above proposition does not dictate what the socially opti-mal policy is whenever the server is idle (i.e., n = 0) and the customers waitingout of queue do not have the prerequisite condition. In such a case, the sociallyoptimal policy depends on the number of people waiting out of queue, denoted bym. Numerical analysis shows the following structure for this case.Observation 3.3 If n = 0, the socially optimal policy is a three region policy ofwait-join-wait with respect to m. That is, when the number of people waiting outof queue reaches an intermediate size, it is optimal to move one of them into thequeue (and consequently into the service).Figure 3.6 represents the structure of the socially optimal policy. Note thatfollowing part (b) of Proposition 3.2, if the individuals satisfy the requirements,they join immediately. Thus, the individuals waiting out of queue do not have therequired conditions, which means if we move them to the queue when n = 0, theyimmediately enter service and incur the penalty cost. While this may seem coun-terintuitive, doing so may create positive externalities that outweigh the penaltycost. Specifically, if an individual is preemptively moved from waiting to serviceand she finishes service before other individuals join the queue, then total popula-tion wait times may be reduced. What our three-region policy of Observation 3.3(and Figure 3.6) suggests is that this is the optimal strategy for an “intermediate”number of customers waiting out of queue. On the other hand, it makes sense thatfor a larger number of customers waiting out of queue, there is a higher chanceof other customers joining while the removed individual is still in service (recallthat as mentioned in Section 3.2, the customers with unsatisfied requirements stilloccupy the server’s time with an exponential rate µ of service). In this case, theremoved customer may impose negative externalities by making those customerswait longer. For a small number of customers waiting out of queue, there are notenough potential customers who may join the queue behind the removed customer,and thus the potential savings of reducing congestion do not outweigh the penaltycost.46Figure 3.6: Socially optimal policy structure before having the requirements3.4.2 Comparing the Equilibrium and Socially Optimal PoliciesIn this section, we analyze the differences of equilibrium and socially optimal poli-cies with respect to some model parameters numerically. Figure 3.7 shows howmuch the system can benefit in terms of expected cost per unit of time from apply-ing the socially optimal policy for different values of penalty cost δ and prerequi-site event rate α .Figure 3.7 also represents the marginal effect of δ on the gap between theexpected costs per unit of time for the equilibrium and socially optimal policies.As this figure shows, the maximum deviation of equilibrium from socially optimalsolution is for the intermediate values of penalty costs. The corresponding valueof δ for which this difference is maximum depends on how fast the individualscan prepare the required conditions for getting service. As the prerequisite eventrate α grows (i.e., it takes less to complete the requirements), the expected cost ofequilibrium converges to that of socially optimum with a faster rate (with respectto δ ).The expected cost of the system includes the two parts of penalty cost and alsothe waiting cost incurred by the individuals. Figure 3.8 depicts how the average47Figure 3.7: Expected cost of system per unit of time for different settings (λ = 3,µ = 4, in-queue and out-of-queue waiting costs= 1, number of customers consideredfor simulation= 5000)(95% confidence interval of the reported costs is not more than ±0.1)total waiting times of various settings differ from each other.Despite the fact that in the socially optimal policy, for most of the states itis optimal to wait, this policy mostly has a lower average waiting time comparedto others. This is the result of the join region for n = 0 that imposes positiveexternalities by smoothing the process, and consequently individuals on averagewait less in the socially optimal solution.The waiting time of customers consists of the two parts of waiting in queueand waiting out of queue. The graphs associated with the average in-queue andout-of-queue waiting times are provided in Appendix B.48Figure 3.8: Average total wait time per customer for different settings (λ = 3,µ = 4, in-queue and out-of-queue waiting costs= 1, number of customers consideredfor simulation= 5000)(95% confidence interval of the reported costs is not more than ±0.07)3.5 Conclusions and ExtensionsIn this chapter, we studied a queueing system in which individuals would like tojoin the queue or waiting list of a service provider, but they need to have somerequired conditions by the time they reach service. In many of these settings, theindividuals are allowed to join the queue without having the completed necessaryrequirements. Thus, considering the length of the queue, time needed to completethe required side conditions, and other factors such as in versus out-of-queue wait-ing costs, one may be strategic in deciding when to join.49In Section 3.2, we analyzed the case of an individual, who is the only strategiccustomer in the system, deciding whether to join the queue or wait out of queue. Weformulated the problem as a Markov decision process and discussed the structureof the optimal policy. We proved that in general, the optimal policy is a two-regionpolicy of wait-join; that is, one should join the queue if and only if its size is abovea certain threshold.After discussing the optimal policy of a single strategic individual, we analyzedcustomers’ equilibrium behavior in Section 3.3. The optimal strategy in this casedepends on both the queue size and also the number of people waiting out of queue.We applied level-k models to obtain the equilibrium policy and also to analyzethe behavior of individuals with bounded levels of rationality. We observed thatwhen in-queue and out-of-queue waiting costs are the same, for each given queuesize, there exists a threshold for the number of people waiting out of queue suchthat it is optimal to join the line above that threshold in equilibrium even beforehaving the required conditions for getting service. Furthermore, this threshold isdecreasing in the queue size. Note that if the prerequisite event is satisfied whilewaiting out of queue, the optimal policy is to join right away. We also analyzedthe socially optimal solution. For the case that the server is idle and the numberof people waiting out-of-queue reaches an intermediate level, it is optimal to moveindividuals to the queue in social optimality; otherwise, it is optimal not to moveanybody to the queue until they complete their requirements.Our models can be applied to service systems with long queues or wait lists, inwhich an individual needs to meet some prerequisite condition, or complete sometask, before reaching the front of the queue. The model results allow individualsto better manage their waiting time in these settings. For service providers, weprovide insights regarding how strategic customers make queue joining decisions,under different model settings. In particular, system managers can decide how toset penalties for customers who reach service without completing their require-ments.One limitation of our study is that we have assumed individuals perfectly ob-serve the state of the system (i.e., number of customers waiting in queue and outof queue). Strategic customers’ behavior when individuals may not have full stateinformation has been studied recently in the literature (e.g., see Guo and Zipkin50[25, 26], Burnetas and Economou [5], Economou and Kanta [20], Guo and Hassin[23], Guo and Zhang [24], and Hassin [28]). In some queueing systems, customersmay not physically see a queue nor be given information regarding their rank. Ourresults should be useful for considering this extension. We leave this direction forfuture studies. Another avenue of future work could be to relax the assumptionthat individuals do not leave after joining the queue. For example, it is possiblethat individuals approaching the front of the queue without the requirements yetcompleted may leave their position in the queue and rejoin at the end of it. Fur-thermore, this chapter explores FCFS queueing systems, but some settings such asnursing home wait lists may not operate in a FCFS manner as sometimes patients’health condition can prioritize them for receiving service. The model results wederived prove useful in extending our setting to the case where people waiting inqueue may be served with some type of priority order.51Chapter 4Effects of Usage-Based AutoInsurance: A DynamicMechanism-Design Approach4.1 Introduction and Related LiteratureDue to the vast increase in availability of customer data, the auto insurance industryis moving towards personalizing their offerings, similar to many other industries.The Internet of Things (IoT) enables a new usage-based model in which a driver’srisk is evaluated based on the data collected from his real driving behavior. TheInternet of Things has been reshaping many industries over the past few years.The number of businesses that employ IoT technologies has increased from 13%in 2014 to about 25% in 2019. The worldwide number of IoT-connected devicesis also anticipated to increase to 43 billion by 2023 [16]. Incorporating these newtechnologies has led to significant transformation in traditional business models,and automobile insurance is no exception.Usage-Based Insurance (UBI) is one of the most recent IoT-based innovationsby auto insurance companies that links the premium rates of customers to theiractual driving performance. Under this program, drivers’ behaviors are monitoreddirectly while they drive using some mobile-based data collection applications or52in-vehicle telecommunication devices (telematics) that are usually self-installedinto a special vehicle port. These devices or applications measure a number offeatures such as miles driven, rapid acceleration, hard braking, and hard corner-ing. Based on the assessed data, the insurance company offers discounts to thecustomers on their monthly insurance premiums.Usage-Based Insurance has been touted to benefit insurers, customers and so-ciety. In addition to increased accuracy in pricing the premiums, offering this pro-gram may incentivize customers to adopt driving habits that may result in reducedaccidents and lower congestion and emissions. The use of these programs is be-coming increasingly popular [27]. In 2015, close to 12 million consumers glob-ally subscribed to this type of insurance, and this value is expected to grow over142 million subscribers by 2023 [35]. Thus, UBI and insurance telematics havesubstantial effects on how the insurance industry operates. According to a recentresearch report by Global Market Insights Inc., the global UBI market’s currentvalue is about USD 34 billion, which is anticipated to reach to USD 107 billion by2024 [22].Despite the rapid advancement of the UBI industry, the number of articles inthe academic literature that analyze the effects of this new monitoring technologyon the insurance market is very limited. Verbelen et al. [63] show that having telem-atics data increases the predictive power of insurance pricing methods. Two otherrecent studies in the Marketing and Operations Management literature, Soleyma-nian et al. [58] and Choudhary et al. [8], explore the benefits of UBI programsfor the drivers empirically. However, analyzing the problem from the insurers’perspective to discuss the characteristics of optimal UBI contracts and also study-ing the gains from this screening program for auto insurance companies remainsunaddressed in the literature, and thus is one of the main focuses of this chapter.In the insurance market, the presence of asymmetric information between theinsurance company and customers leads to two well-known incentive issues: (1)adverse selection, where the customers’ types or preferences are unknown, and (2)moral hazard, where the customers’ efforts or choices are unobservable. This hasbeen shown empirically through several articles in Economics literature (e.g., Puelzand Snow [50], Chiappori and Salanie´ [7] and Jeziorski et al. [37]). Usage-BasedInsurance technologies and collecting customers’ sensor data can alleviate both53problems. By using monitoring technology, insurance companies can obtain moreaccurate estimates of the customers’ risk factors. At the same time, by offeringan appropriate contract, they can incentivize customers to drive more carefully andimprove their driving behavior. This paper provides a theoretical model to capturethe effects of UBI monitoring technology on the auto insurance market, and morespecifically, to characterize its effects on both adverse selection and moral hazardissues. To the best of our knowledge, it is the first theoretical study in this area.A UBI program is offered to each customer for a fixed duration (e.g., one year).When this program ends, a regular insurance contract follows, which may be con-tingent on the driver’s performance during the UBI program. Throughout this pro-gram, the driving behavior of customers is monitored using telematics devices dis-cussed above and their performance is summarized in a score provided to themby the insurance company. Considering the customers’ history of scores, the in-surance company offers discounts on their following periodic premiums. In ourmodel, the UBI program is divided into multiple periods (e.g., 12 months or 52weeks). The offered discounts may affect the customers’ willingness to continuethe UBI program and encourage them to improve their future performance.We formulate the UBI problem as a finite horizon dynamic principal-agentmodel. The principal-agent framework is a standard approach to model problemsin which two parties, here insurance company and customer, are involved. Dionne[18] presents problems that can be modeled using the principal-agent framework inthe insurance context. The applications of this modeling framework is not limitedto the insurance industry, and several other problems in which two parties withnon-aligned incentives are collaborating can make use of this modeling approach.Some examples of such settings in the Operations Management literature are listedin Zhang and Zenios [70].In a principal-agent model, the collaborator with the higher bargaining powerto design the contract terms is called the principal (the insurer in this chapter), andthe other party is the agent (the customer in this chapter). In our setting, an agent(referred as “he” hereinafter) privately knows his type that summarizes his abilityas a driver. Soleymanian et al. [58] show that, after UBI adoption, drivers improvetheir driving behavior and more specifically, they mention that the daily hard-brakefrequency decreases by 21% on average after six months. Hard braking is highly54correlated with unsafe driving [65], so by designing a proper UBI contract, theinsurer can encourage customers to become safer drivers over time. Thus, the agentin our setting can make a hidden effort in each period to drive more carefully, whichaffects his subsequent type.The scores gained by the agent during the UBI program are actually noisy sig-nals of his type and effort in each period. The scores of the agent are observableto both the agent and the principal over time and can affect the premium offeringsto the agent. The higher the agent’s scores are, the higher the discounts that can begained by him. Therefore, the principal of our setting (referred as “she” hereinafter)decides the discounts and premiums that she wants to offer to the agent in everyperiod based on his driving performance. The principal actually offers a long-termcontract to the agent (which covers both the UBI phase and the regular insurancephase) despite the fact that she observes neither the type of the agent nor the actionstaken by him. Thus, the problem of our interest is an adverse selection and moralhazard problem, because it has both hidden information and hidden action (effortchoice). In this chapter, we are characterizing the full history-dependent optimalcontract for this dynamic adverse selection and moral hazard principal-agent prob-lem, such that the decisions of both the principal and the agent (i.e., the premiumfor the agent and his effort level) in each period depend on the agent’s full drivingrecord, including the history of the scores he earned.Single-period principal-agent models for the problems with either one or bothof the adverse selection and moral hazard issues are well studied in the Economicsliterature. For example, we can refer to the classic papers by Rothschild and Stiglitz[53] and Holmstrom et al. [33]. Bolton et al. [4] and Salanie´ [54] are also twotextbooks that discuss these types of models in detail. This topic is of great interestto the researchers in the field of Operations Management as well. As a samplestudy, Corbett [12] analyzes an inventory management problem with asymmetricinformation on the setup and backorder costs. More recently, Kao et al. [38] alsodiscuss the optimal contract for the setting with both adverse selection and moralhazard in which the insurer offers business interruption insurance to the firm thatfaces disruption risk.Compared to the single-period principal-agent models, the literature on dy-namic cases of such contracts are fairly limited, and articles with both adverse55selection and moral hazard are even fewer. There is a stream that discusses thedynamic principal-agent games in the presence of only the moral hazard issue. Werefer to the studies done by Plambeck and Zenios [48, 49] in this regard. The othercategory in the literature discusses the multi-period principal-agent models withjust adverse selection. For instance, Zhang and Zenios [70] study long-term con-tracts for a finite horizon dynamic principal-agent problem, in which the state ofthe system can be only observed by the agent. Later on, Zhang et al. [71] pro-vide the optimal short-term contract solution to the model presented by Zhang andZenios [70]. Afterwards, Zhang [69] extends the result of Zhang and Zenios [70]to the infinite horizon case.One of the key features of the above articles is that the private informationfollows a Markov decision process, which makes the problem applicable to moregeneral settings in practice, but at the same time, enhances the problem’s techni-cal complexity. Some other examples of multi-period mechanism design problemswith this Markovian property for the private information structure include the pa-pers by Fernandes and Phelan [21] and Battaglini [2]. The underlying system ofour problem is also a Markov decision process, where the evolution of the state ofthe system (the type of customer) is endogenous, as it is controlled by the hiddenaction (effort) that the agent takes to improve his driving behavior. However, themain difference of our study is that the actions of the agent that control his statetransition probabilities are not observable by the principal, which is not the case inthe mentioned other papers. From this perspective, the closest setting to ours is theone proposed by Doepke and Townsend [19], as they discuss an infinite horizondynamic mechanism-design problem where the evolution of the private state of theagent (his income level) depends on the prior actions taken by him (investment orstorage).Due to the interplay between hidden type and hidden action, analyzing thedynamic contracting problems with both the issues of adverse selection and moralhazard is challenging, and consequently, most of the papers in this area, such as thework by Doepke and Townsend [19] resort to numerical analyses. In contrast, wedevelop a dynamic programming algorithm to examine the model analytically andexplore structural results about the optimal contract.The complexity of dynamic principal-agent models with both adverse selection56and moral hazard makes it impossible to analyze these problems accurately usingthe standard approaches discussed in the literature. In order to compute the optimalhistory-dependent contract described above, we develop a recursive formulation.Then, by incorporating this formulation, the optimal mechanism can be derivedby backward induction based on the set of continuation payoffs for the principaland the agent in each period. The idea of focusing on continuation payoffs inanalyzing repeated games with asymmetric information arises from a seminal workby Abreu et al. [1]. Later, Cole and Kocherlakota [10] extended their setting toinclude hidden states as well. Some other papers in the literature, such as Fernandesand Phelan [21], Zhang and Zenios [70] and [69] also make use of similar approach.However, these papers study only the case of adverse selection, which makes themodel less complex to analyze compared to our model.We overcome the complexity that arises from the interaction of hidden stateand hidden action by decomposing the problem into a set of subproblems in eachperiod. After analyzing the continuation payoffs for each subproblem, we combinethe results to move to the next period in the backward induction process. Also,note that the feasible set for the pairs of continuation payoffs of the two parties(insurer and customer) in each period is vast. We tackle this difficulty by focusingon the efficient frontiers of the feasible sets (there is one such set for each period),which express the principal’s maximum continuation payoff as a function of theagent’s continuation payoff (vector) at the beginning of each period. Followingthis approach, we obtain some structural results about the efficient frontiers, whichis used to simplify the problem to characterize the specifications of the optimalcontract.Besides contributions to the theory of dynamic principal-agent games, the re-sults of this study also lead to important and interesting managerial insights forfirms considering UBI programs. Solving this problem, we propose a necessaryand sufficient condition for the optimality of exerting high effort by the agent ineach period and prove that, for each agent type, there is a threshold time beyondwhich he makes a high effort until the end of the UBI program. Interestingly,this result is aligned with the result of the empirical work of Choudhary et al. [8]about the customers’ behavior close to the UBI pricing thresholds. We also showthat, as it gets closer to the end of the UBI program, the gap between the optimal57continuation social welfare corresponding to different agent types widens, whichsuggests better screening of the agent, as more information about him is collected.This verifies one of the benefits of the UBI program to the insurers. However, themarginal benefit of screening declines as the number of periods grows, so we makesuggestions on whether it is better to offer a short or long UBI program dependingon different model parameters.Furthermore, we perform a numerical analysis to illustrate the underlying in-sights of the proposed model. For this purpose, first using a novel dataset from oneof the major insurance companies in the US, we estimate the model parameters.We have access to the UBI data of 40,525 drivers that submitted a quote requestto purchase an insurance policy from March 2012 to November 2014, who alsoadopted this program. Analyzing the results of the model given the estimated pa-rameters, we obtain the optimal monitoring time and provide comparative staticsresults on the optimal length of monitoring with respect to various model parame-ters. We show that under the optimal contract, we can have about an 18% reductionin the annual accident rate.As discussed at the earlier part of introduction, to our knowledge, only tworecent studies in the literature by Soleymanian et al. [58] and Choudhary et al. [8]explore the effects of UBI programs on the drivers empirically. However, gainsfrom this screening program for auto insurance companies have not been analyzedin the literature, which we do in this chapter. As documented in the business press,many newly established insurance companies are struggling to answer the questionas to whether they should initiate and offer UBI or not. Our study sheds lighton how to design the contract to manage a UBI program, to what extent a UBIpolicy can outperform a traditional policy, and how the potential gains dependon the demographics of the target market. In addition, the analysis of this papercan be applied to many other applications. For instance, based on a recent newsarticle [9], life insurers are considering employing health screening devices suchas smart watches or Fitbits to link the assessed data to the insurance policies theyare offering. This problem can be modeled in the same way as the UBI problempresented in this study.The remainder of the chapter is organized as follows. First, in Section 4.2, weprovide the model setup, discuss the sequence of events in each period, and also58present the mathematical formulation of the problem. In this section, we developthe model for both the UBI period and the regular insurance phase after the UBIprogram ends and the insurance goes back to its traditional form. In Section 4.3,we discuss the continuation payoff frontiers and derive structural results on theoptimal contract during the UBI program. Section 4.3.3 presents an extension tothe model of Section 4.2, in which we consider limited liability. Furthermore, inSection 4.4, a numerical analysis is provided. Finally, we present our concludingremarks in Section 4.5. The proof of all theorems, propositions, and lemmas canbe found in Appendix C.4.2 The ModelIn this section, we first present the details of what a UBI program looks like in mostof the insurance companies in practice. Afterwards, we discuss the informationstructure and develop the mathematical formulation of the problem. In most of thecompanies, the customer who adopts a UBI program receives an initial discount.Then, the insurer gives a telematics device to the customer to install on his vehicle.Using this device, the customer’s driving behavior is tracked, and based on therecorded data, he receives a score as a signal of his performance every day. Theinsurer has access to the customer’s scores as well, and this can be an indication ofhow risky the customer is.After adopting the UBI program, the customer should have installed the telem-atics device into his car for an initial certain amount of time (referred as the “initialinterval” henceforth). The length of this period is different for various companies,but usually it is about two months. During the initial interval, the customer can-not quit the program. At the end of this period, the insurer offers the customer anadditional discount based on his performance. Then, the customer can keep thetelematics device until the end of the UBI program, which is about 1 to 2 yearsfor most companies. If he does so, he receives an updated discount every monthconsidering the history of his scores. After the initial period, the customer can quitanytime, but by designing an appropriate contract, the insurer incentivizes him tostay until the end. At the end of the UBI program, the customer receives a perma-nent discount according to his whole history of scores and the assessed data on his59driving habits.As discussed in Section 4.1, the data show that using the UBI program resultsin a substantial improvement in the driving behaviors of customers [58]. Thus,a proper contract can motivate drivers to invest effort into improving their per-formance, which is beneficial for both the insurance companies and the drivers.This effort can affect the risk factors of customers and consequently change theirtypes over time. The insurer can even do better if she does another screening onthe type of the drivers in each period by offering different premiums for each type,which is a standard approach for modeling auto insurance problems in the literature(e.g., see Dionne [18] for a comprehensive review of articles in the auto insurancecontext). Hence, it is interesting to see how monitoring drivers’ behaviors usingthe telematics devices in addition to this traditional screening approach affects dy-namic contracts in the auto insurance market, which we do in the following section.4.2.1 Mathematical FormulationWe can formulate the insurance problem during the UBI program as a finite horizondynamic principal-agent model. We suppose there are T periods in total (i.e., t =1,2, · · · ,T ). Note that the initial interval mentioned above can be considered asperiod t = 1. Then, each month after the initial interval is one period, such that Tdenotes the end of the UBI program.There are different types of customers in the market whose types are unob-served by the insurance company. Thus, the agent has private information about histype at the beginning of each period t, which results in the presence of the adverseselection issue. The type of the agent is denoted by θt . We assume θt ∈Θ = {L,H},such that L represents low risk drivers and H refers to high risk ones. At each pe-riod, a customer with type θt may have an accident denoted by at ∈ {0,1}. We letPatθt with at ∈ {0,1} represent the probability of having/not having an accident forthe agent with type θt at period t. We assume, for any agent with type θt ∈Θ , thisprobability is equal to (Paθt if at = 1; 1−Paθt if at = 0), such that PaH > PaL , implyingthe probability of an accident for a high risk driver is larger than that for a low riskone. We also consider Caθt to indicate the cost of an accident for type θt .Considering the discussion in Section 4.1, there is also an effort action for the60customer (agent). The underlying effort can be interpreted as how hard a customertries to improve his driving behavior. The effort at period t is denoted by et . Here,we suppose et ∈ ξ = {0,1}, such that 0 means the customer does not make an effortat period t, and 1 represents the case where he chooses to put effort into improvinghis driving habits. Based on the effort level et that the customer of type θt chooses,he incurs the effort cost of Ceθt (et) at period t. We assume Ceθt (0) = 0.As exerting effort may result in improving the driving behavior, the effort ateach period can affect the type of the agent at the next period. Thus, the evolution ofeach customer’s type is endogenous and controlled by his effort level. We assumethe underlying system of our model is a Markov process in the sense that the effortaction only affects the probability of the next period’s type. The following matrixshows the transition probabilities for the agent’s type if he chooses to invest theeffort of et during period t.P(et) =( L HL pLL(et) pLH(et)H pHL(et) pHH(et));et ∈ ξ .In other words, pi j(e¯) = Pr{θt+1 = j|θt = i,et = e¯}. We also define the row vectorspθt (et) = (pθt ,L(et), pθt ,H(et)) for θt ∈ {L,H}.Effort can be considered a multi dimensional action in which some dimensionssuch as “how careful customers drive” are unobservable by the insurer (principal).This leads to having a moral hazard (hidden action) issue in our problem. In theliterature, there are two types of moral hazard: ex-ante and ex-post. The formerrefers to the effort that customers exert to improve their performance, and accord-ingly, reduce their probability of having an accident, and the latter refers to thecases in which the accident occurred and then the customer decides whether to re-port it to the insurance company or not. Jeziorski et al. [37] show that the ex-postmoral hazard is unlikely to play a significant role in the insurance market. Thus,in this study, we focus on the ex-ante moral hazard and assume that the customers’accidents are fully observable to the insurer.Using the telematics monitoring devices discussed in Section 4.1, some otherdimensions of effort such as the “number of hard brakes” or “how fast a customerdrives” can be observed by the principal. Therefore, there is an observable out-61come, which is the score of the insured customer at each period, that reflects allthese observable parameters. The score at the end of period t is denoted by St ∈S ,such that the higher score the customer gets, the better. This score is actually a sig-nal of the customer’s performance during period t, which depends on the type ofcustomer at that period. The following vectors represent the probability of havinga particular score at the end of period t for any agent with type θt . We name thesevectors as the observation vectors. We assume that there are two types of scores:the low score denoted by l and the high one represented by h. Thus,S = {l,h}.ρθt =(ρ lθtρhθt);θt ∈Θ .In other words, for any period t, ρ iθ¯ = Pr{St = i|θt = θ¯}. We assume that investingthe effort action affects the customers’ score and also their probability of having anaccident only through the customers’ type. Hence, investing effort at each periodhas no direct impact on the score and also the occurrence of accidents in that period.Given the dynamic evolution of the type, the score can be viewed as a delayedsignal of the effort in the last period. This assumption simplifies the exposition andanalysis of the model, but does not alter the nature of the problem.The history of the scores obtained by the agent and also his record of accidentsare public information available to both the agent and the principal. We definevector St to denote the history of scores from the beginning up to and includingperiod t, such that St ∈ S t . It is assumed that S0 = ∅. We also define At torepresent the history of the accidents up to and including time t, where A0 =∅.Therefore, the sequence of events during period t is as follows. First, the agentprivately knows his type, θt , at the beginning of the period, but he may misreporthis type as θˆt . We define vector θˆt ∈Θ t to refer to the history of the reported types.Then, considering the history of the customer’s scores, accidents and the reportedtypes up to period t, the principal offers a premium of Rt to the agent, which is alsoa function of the agent’s announced type at the beginning of period t. We assumethat Rt is collected at the beginning of that period. Note that the discounts that thedrivers can receive using the UBI program can be transformed to the payments.Thus, in order to simplify the formulation, we just have the premiums, which in-62clude the discounts as well. During period t, the customer takes a hidden action ofet based on the history of public and private information. The customer may alsohave an accident with the probability of Paθt , which has the cost of Caθt . Finally, theprincipal and agent observe the agent’s score, St , at the end of period t that reflectsthe agent’s past performance (through the current state). Figure 4.1 depicts thissequence of events. Without loss of generality, for the ease of exposition of themodel and also for focusing on the main insights of offering the UBI program, weanalyze the case of full insurance throughout the paper. However, the model can beeasily extended to the case with different collision deductible coverage levels forthe agent to choose from.Figure 4.1: Sequence of Events in Period tLong-term ContractThe insurance company is trying to design an appropriate contract to induce theagent to take the desired actions. We define a long-term contract, denoted by σ , asan agreement between the principal and the agent that provides their decisions forall periods (i.e., the premium that the principal offers to the agent and also the effortlevel that the agent chooses to take), which are contingent on all the public infor-mation including the reported types of the agent and the history of his scores andaccidents available up to their decision points. The history of public information upto the end of period t is denoted by Ht = (θˆ t ,St ,At);Ht ∈H t = {Θ t ,S t ,{0,1}t}.We consider the case of full commitment, so the agent and principal will not rene-gotiate the contract terms during the program.63Therefore, we are looking for the following long-term contract:σ ={R1(θˆ1|H0),e1(θˆ1|H0), · · · ,RT (θˆT |HT−1),eT (θˆT |HT−1)}; θˆt ∈Θ .Recursively, the above contract can be written as:σt(Ht−1) ={Rt(θˆt |Ht−1),et(θˆt |Ht−1),σt+1(Ht)}; θˆt ∈Θ .We assume the insurer and the insured customer are both risk neutral. Later, inSection 4.3.3, we extend the model to consider a limited liability contract, whichcan be helpful in discussing the results for a risk-averse agent. We also considerthe discount factor of δ for both the agent and the principal. The possible set ofcontracts that can be considered for our problem is vast; however, using a similarapproach to the one presented by Zhang and Zenios [70], we can prove the dynamicrevelation principle. This implies that without loss of generality, we can restrict ourattention to the revelation mechanisms, in which designing an appropriate contract,the principal induces the agent to report his true type and also choose the desiredeffort level.The discounted future payoffs for the principal and the agent at time t can beobtained as follows. Given the effort level of et for the agent at period t, if hereports his state truthfully, the principal’s expected payoff from period t onward isequal to:pit(θt |Ht−1) = Rt(θt |Ht−1)+ ∑at∈{0,1}Patθt[−Caθt ·at+δ∑StρStθt ∑θt+1pθtθt+1(et(θt |Ht−1)) ·pit+1(θt+1|θt ,St ,at ,Ht−1)]. (4.1)Note that for the first period, H0 is ∅.Given that the agent reports his type truthfully and follows the effort level64et(θt |Ht−1), his expected payoff from period t onward is:ut(θt |Ht−1) =−Rt(θt |Ht−1)−Ceθt (et(θt |Ht−1))+δ ∑at∈{0,1}Patθt ∑StρStθt ∑θt+1pθtθt+1(et(θt |Ht−1)) ·ut+1(θt+1|θt ,St ,at ,Ht−1). (4.2)However, if the agent misreports his type of θt as θˆt , his expected future utilityis as follows. We assume the agent is rational, so at each period, he chooses theeffort level that maximizes his expected payoff.uˆt(θˆt ,θt |Ht−1) = maxe′t∈ξ={0,1}{−Rt(θˆt |Ht−1)−Ceθt (e′t)+δ ∑at∈{0,1}Patθt ∑StρStθt ∑θt+1pθtθt+1(e′t) ·ut+1(θt+1|θˆt ,St ,at ,Ht−1)}. (4.3)Thus, considering the above equations, the insurer’s problem is as follows:max{Rt(θt |Ht−1),et(θt |Ht−1)}t=1,2,··· ,Tθt∈Θ ,Ht−1∈H t−1∑θ1∈Θαθ1pi1(θ1|H0)s.t. ut(θt |Ht−1)≥ uˆt(θˆt ,θt |Ht−1);∀ t=1,2,··· ,Tθt∈Θ ,θˆt 6=θt∈ΘHt−1∈H t−1(4.4)ut(θt |Ht−1)≥ u¯θtt ; ∀ t=1,2,··· ,Tθt∈Θ ,Ht−1∈H t−1 (4.5)et(θt |Ht−1) = arg maxe′t∈ξ={0,1}{−Ceθt (e′t)+δ ∑at∈{0,1}Patθt ∑StρStθt∑θt+1pθtθt+1(e′t) ·ut+1(θt+1|θt ,St ,at ,Ht−1)}; ∀ t=1,2,··· ,Tθt∈Θ ,Ht−1∈H t−1 (4.6)where αθ1 represents the initial belief on the probability of having a type θ1 cus-tomer at the beginning of period 1. The definitions of ut(θt |Ht−1) and uˆt(θˆt ,θt |Ht−1)mentioned in the formulation also follow from equations (4.2) and (4.3), respec-65tively.Constraints (4.4) stated above are incentive compatibility constraints that in-centivize all types of agents to always state their types truthfully. Constraints (4.5)are individual rationality constraints that induce all types of agents not to quit dur-ing the entire UBI program. In these constraints, u¯θtt denotes the reservation valuefor type θt at period t, which is typically negative in our setting. Several factorscan affect the reservation utility mentioned in the right hand side of individual ra-tionality constraints. As one of the significant factors, we can refer to the privacycosts. Customers today always have concerns about sharing their information withcompanies [17]. A recent study by Lin [42] discusses to what extent customersvalue privacy. The auto insurance market is not an exception and statistics showthat one of the important reasons why customers avoid accepting a UBI programor quit before the end of the program is the privacy factor [45]. A portion of cus-tomers prefer not to be monitored (i.e., they do not accept the UBI program or quitin between). Thus, the designed contract needs to be attractive enough to persuadecustomers to ignore their privacy concerns and continue using UBI.Furthermore, constraints (4.6) are another type of incentive compatibility con-straints, which ensure that the suggested effort levels maximize the agents’ ex-pected future utility. We also assume that uT+1(θT+1|HT ) and piT+1(θT+1|HT )represent the terminal values for all θT+1 ∈Θ and HT ∈H T .Due to the “curse of dimensionality” caused by the interaction of hidden statesand hidden actions, the insurer’s problem cannot be solved directly using the stan-dard approaches discussed in the literature. In the following section, we show thatthe developed model can be reduced to a recursive form, which enables us to dealwith the Markov state links between the periods that are controlled through the hid-den effort taken by the agent. Formulating the problem in a recursive form makesit possible to incorporate dynamic programming techniques and solve the problemusing backward induction.4.2.2 Recursive FormulationTo develop the recursive formulation of the insurer’s problem, we consider theagent’s utility vectors as the state variables. Based on the notations discussed in66Section 4.2.1, the continuation payoff from period t onward for the agent with typeθt is denoted by ut(θt |Ht−1), which is contingent on the given history of publicinformation up to that period. In the following, this history of information beforeperiod t (past accidents, reported types, and observed scores denoted by Ht−1) issuppressed in the notations. We also define column vector ut to refer to the utilitiesover all types θt of the agent, that is ut = (ut(L),ut(H))T . Before presenting therecursive formulation, we simplify some of the other notations as follows.Simplification of the NotationsAs discussed in Section 4.2.1, at each period, there are the two outcomes of anaccident at , and score St , for the agent that are observable by both the agent andthe principal. Here, we define Ot to represent the outcome pair of (at ,St). Thus, Otcan take four possible values of O = {(0,h),(1,h),(0, l),(1, l)}, which occur withthe following probabilities, respectively:q0,hθt = (1−Paθt )ρhθt ,q1,hθt = Paθtρhθt ,q0,lθt = (1−Paθt )ρ lθt ,q1,lθt = Paθtρlθt .Let column vector qOt define the probabilities of having outcome Ot ∈ O overthe types θt = {H,L} in period t. Suppose slope(qOt ) = qOtH /qOtL . Thus, the fol-lowing properties hold:slope(q0,h) =q0,hHq0,hL=(1−PaH)ρhH(1−PaL )ρhL< 1,slope(q0,h)< slope(q1,h) =q1,hHq1,hL=PaHρhHPaLρhL< slope(q1,l),slope(q0,h)< slope(q0,l) =q0,lHq0,lL=(1−PaH)ρ lH(1−PaL )ρ lL< slope(q1,l),slope(q1,l) =q1,lHq1,lL=PaHρ lHPaLρ lL> 1.67We know that if an accident occurs (i.e., either one of the outcome pairs of(1,h) or (1, l) happens), the principal needs to pay the cost of the accident, Caθt .Thus, we also define COtθt , which is equal to Caθt if Ot = (1,h),(1, l) and 0 otherwise.Furthermore, we consider Φt to represent the continuation payoff for the sys-tem (social welfare) from period t onward: thus, Φt(θt ;ut) = pit(θt ;ut)+ut(θt). Inorder to make the model more tractable, we maximize the continuation payoff forthe system Φt , which is equivalent to maximizing the continuation payoff for theprincipal pit , for a given ut . We define column vector Φt(·) to refer to the continu-ation social welfare over all types of agent, that is Φt(·) = (Φt(L; ·),Φt(H; ·))T .Dynamic Programming FormulationAt the beginning of period t, given any feasible ut , for any θt , the maximized con-tinuation social welfare Φ∗t (θt ;ut), and also decision variables Rt(θt) and uOtt+1(θt)(contingent on outcome Ot) can be recursively obtained by the following problem:Φ∗t (θt ;ut) = max{Rt(θt),uOtt+1(θt)}Ot∈O{−Ceθt (et(θt))+∑OtqOtθt[−COtθt +δpθt (et(θt))·Φ∗t+1(uOtt+1(θt))]}(4.7)s.t. ut(θt) =−Rt(θt)−Ceθt (et(θt))+δ∑OtqOtθt[pθt (et(θt)) ·uOtt+1(θt)](4.8)et(θt) = arg maxe′t(θt)∈ξ={0,1}{−Ceθt (e′t(θt))+δ∑OtqOtθt[pθt (e′t(θt)) ·uOtt+1(θt)]}(4.9)68ut(θ˜t)≥−Rt(θt)+ maxe′t(θ˜t)∈ξ={0,1}{−Ceθ˜t (e′t(θ˜t))+δ∑OtqOtθ˜t[pθ˜t (e′t(θ˜t))·uOtt+1(θt)]}; ∀θ˜t 6=θt (4.10)ut(θt)≥ u¯θtt (4.11)Constraint (4.8) is the promise-keeping constraint, which guarantees that theinsurer delivers the promised continuation utility ut(θt) at the beginning of periodt to the customer. Constraints (4.9) and (4.10) are the incentive compatibility con-straints. By equation (4.9), the customer who reports his type truthfully choosesthe action that maximizes his expected payoff from period t onward. Constraint(4.10) also prevents the customer from misreporting his type. By this constraint,the contract for the customer with type θt should be designed in such a way that thecustomer with the actual type of θ˜t cannot gain more by misreporting his state as θt .Thus, this constraint implies that the agent cannot benefit from misreporting in anyperiod, given that he will report his state truthfully from the next period onward.In other words, this constraint prevents the agent from one-period deviation.Using backward induction, it can be shown that constraint (4.10) is sufficientto prevent the agent from multi-period deviations as well. Assume that the agentdeviates for a finite number of periods. Clearly, he cannot gain in his last periodof deviation as moving backward, this can be considered as only a one-period de-viation that is not beneficial according to constraint (4.10). Going back one periodto the second last period of the agent’s deviation, he has already worsened hisexpected utility-to-go by misreporting his type in the following period, and sincethe agent cannot gain by one-period deviation, misreporting in the current periodwill not help him improve his utility. Therefore, moving back this way, by in-duction, we can show that the agent cannot gain through multi-period deviations.Furthermore, similar to equation (4.5) of the model in Section 4.2.1, the individualrationality constraint (4.11) ensures that the continuation payoffs (ut(θt);θt ∈Θ )are considered in such a way that induce all types of customers not to quit the UBIprogram.69As discussed earlier, reformulating the problem in a recursive fashion expressedabove makes it possible to solve the problem using backward induction. To do so,the first step is to characterize the terminal value for the continuation social welfareof each state Φ∗T+1(θT+1; ·) as follows.4.2.3 Terminal ProblemAfter T periods, the UBI program ends and the agent gives the telematics deviceback to the insurance company. Thus, there will be no monitoring afterwards andthe insurance contract goes back to the traditional form. Therefore, we model theterminal problem as an infinite horizon dynamic adverse selection model, wherethe customer’s premiums depend only on his record of accidents. This is actuallya special case of the model developed for the UBI phase, but with no effort action.This structure is also similar to that considered by Cooper and Hayes [11], whichis a classical paper for dynamic models in the auto insurance literature. However,the model studied in this paper is a finite horizon model, but the terminal model ofour consideration is an infinite horizon one. Thus, according to the recursive modelprovided in Section 4.2.2, the terminal problem can be formulated as follows.Φ∗t (θt ;ut) = max{Rt(θt),uatt+1(θt)}at∈{0,1}{∑atPatθt[−Catθt +δpθt (0) ·Φ∗t+1(uatt+1(θt))]}(4.12)s.t. ut(θt) =−Rt(θt)+δ∑atPatθt[pθt (0) ·uatt+1(θt)](4.13)ut(θ˜t)≥−Rt(θt)+δ∑atPatθ˜t[pθ˜t (0) ·uatt+1(θt)]; ∀θ˜t 6=θt (4.14)ut(θt)≥ u¯θtt (4.15)Optimal continuation social welfares from the first period onward for the ter-minal problem actually represent the terminal values of Φ∗T+1(θT+1; ·) for the UBIphase. The following section describes the methodology that we use to solve theprincipal’s main UBI problem, which also proves useful in analyzing the terminal70model proposed above.4.3 Principal’s Problem Considering Continuation SocialWelfare FrontiersStarting with any type θt , for a given feasible ut , an incentive compatible contractσt satisfies constraints 4.8 to 4.11, which results in the continuation social welfarevalue of Φt(θt ;ut). Thus, the optimal long-term contract can be obtained throughbackward induction on pairs (ut ,Φt(ut)). The feasible set for these pairs of contin-uation payoff vectors in each period t is vast, but as the optimal contract maximizesthe continuation social welfare for any given ut , it is sufficient to focus on time-tcontinuation social welfare frontiers defined by column vectors Φ∗t (ut), ut ∈ Ut ,where Ut refers to the set of all feasible ut .Therefore, the first step to acquire the optimal mechanism is obtaining the fea-sible region for the agent’s continuation utility vectors from every period t onward(i.e., Ut). Afterwards, we analyze the social welfare frontiers over these feasibleregions. To characterizeUt , consider an operator called B such thatUt = B(Ut+1).This operator maps nonempty subsets of IR|Θ | into nonempty subsets of IR|Θ |,which is defined as follows.Definition 4.1 A utility vector ut ∈ B(Ut+1) if there exist premium values Rt(θt) ∈IR and future utilities uOtt+1(θt) ∈Ut+1 for all possible outcomes Ot , such that con-straints 4.8 to 4.11 of the problem of period t for the associated type θt hold.Furthermore, to characterize the social welfare frontiers over the feasible re-gions Ut for each period t, we define an operator Γ ∗similar to Abreu et al. [1] andZhang [69].Definition 4.2 Γ ∗ is a functional operator that maps Φ∗t+1 :Ut+1→ IR|Θ | to Φ∗t :Ut → IR|Θ |. Therefore, Φ∗t (·) = Γ ∗Φ∗t+1(·).Operator Γ ∗ defined above actually represents each iteration of backward in-duction to obtain the continuation social welfare frontierΦ∗t (·) fromΦ∗t+1(·). Thus,the principal’s problem at the beginning of period 1 is equivalent to the following.Recall that αθ1 is the initial belief on the probability of having type θ1 customer71at the beginning of the first period, so we let row vector α = (αL,αH) denote theinitial state probabilities.maxu1∈U1α(Φ∗1(u1)−u1). (4.16)As mentioned in Section 4.2.2, to start our backward induction analysis to solvethe principal’s problem, we need to first solve the terminal problem of Section4.2.3 to compute the terminal values Φ∗T+1(uT+1), uT+1 ∈UT+1. To do so, usingthe operators defined through Definitions 4.1 and 4.2, we incorporate the dynamicapproach discussed above as the proposed terminal model is also a special case ofthe main UBI model, but with an infinite number of periods. The only difference isthat, when we talk about the terminal problem, the definition of operator B (Defini-tion 4.1) is slightly different. For the terminal problem, in which t = T +1, ...,∞, autility vector ut ∈B(Ut+1) if there exist premium values Rt(θt)∈ IR and future util-ities uatt+1(θt) ∈Ut+1;at ∈ {0,1}, such that constraints 4.13 to 4.15 of the terminalmodel of period t for the associated type θt hold.Thus, the general steps of solving the principal’s problem can be summarizedthrough the following algorithm.(i) Solve the terminal problem:• Characterize the feasible set of the agent’s continuation utility vectorsto get UT+1.• Analyze the continuation social welfare frontiers to obtainΦ∗T+1(uT+1),uT+1 ∈UT+1.(ii) Starting from t = T to t = 1 of the UBI model, for each t:• Apply operator B of Definition 4.1 to get Ut from Ut+1.• Apply operator Γ ∗ of Definition 4.2 to get Φ∗t (·) from Φ∗t+1(·).(iii) Solve the principal’s ex-ante optimization problem (4.16).The last step of the above algorithm is a one-shot optimization problem thatneeds to be solved at the beginning of period 1 considering the initial belief aboutthe distribution of different types of the agent. However, the first and second steps72are the recursion and iteration steps, which are the challenging parts of the algo-rithm. Hence, in later sections, we discuss the details of step (i) and step (ii).4.3.1 Continuation Social Welfare Frontiers for the TerminalProblemIn this section, we analyze step (i) of the algorithm proposed above for solvingthe principal’s problem. The developed terminal problem is an infinite horizondynamic adverse selection principal-agent model. Over the infinite horizon, thecontinuation social welfare frontier Φ∗t (ut), ut ∈ Ut is identical to the frontierΦ∗t+1(ut+1), ut+1 ∈ Ut+1, and also the feasible sets for the agent’s continuationutility vectors Ut and Ut+1 are the same. In other words, the continuation socialwelfare frontiers and also the feasible sets of the agent’s continuation utilities areidentical for all periods. Thus, index t can be dropped from the formulation of theterminal problem of Section 4.2.3, and to simplify the notations, Ut and Φ∗t (ut)can be denoted by U and Φ∗(u) respectively.Therefore, U of the terminal problem is actually a fixed point of operator B(introduced in Definition 4.1) and Φ∗(·) is a functional fixed point of operator Γ ∗(defined through Definition 4.2). The following proposition provides the character-izations ofU . Note that index t can be dropped from the notation of the reservationutilities as well, since we have an infinite horizon problem.Proposition 4.1 The feasible continuation utility set of the agent for the terminalproblem, denoted by U , is the entire R2 space cut by the reservation utilities u¯Land u¯H .Considering the objective function of the terminal model, equation (4.12), thecontinuation social welfare frontierΦ∗(·), can be obtained with the following spec-ifications.Proposition 4.2 Φ∗(L, ·) and Φ∗(H, ·) are constant; implying that the continua-tion social welfare frontier for each state of the terminal problem is a flat plane73over U . The expressions of Φ∗(L, ·) and Φ∗(H, ·) are as follows:Φ∗(L, ·) = − 1(1−δ · pLL(0))PaL ·CaL− δ · pLH(0)(1−δ)(1−δ · pLL(0))PaH ·CaHΦ∗(H, ·) = − 1(1−δ )PaH ·CaH .The results of Propositions 4.1 and 4.2 are visualized in Figure 4.2.Figure 4.2: Continuation Social Welfare Frontiers for the Terminal ProblemIn the following section, we move to step (ii) of the algorithm provided inSection 4.3 for solving the principal’s problem, which is the main and the mostcomplicated step.4.3.2 Continuation Social Welfare Frontiers for the UBI ProblemConsidering the results of the terminal problem, we can start our backward analy-sis for solving the UBI model proposed through equations (4.7) to (4.11). Startingfrom U and Φ∗(·) discussed in Propositions 4.1 and 4.2 for the terminal problem(which are actually equivalent to UT+1 and Φ∗T+1(·) of the UBI model), we char-acterize the feasible continuation utility set of the agent and continuation socialwelfare frontiers for the last period (i.e., t = T ) of the UBI problem.As mentioned earlier in Section 4.1, the interaction of hidden state and hid-den action in dynamic adverse selection and moral hazard principal-agent modelsmakes the problem highly complex. We overcome the complexity that arises from74this interaction by decomposing the problem into a set of subproblems in each pe-riod. After analyzing the continuation payoffs for each subproblem, we combinethe results to move to the next period in the backward induction process.Thus, to obtain UT , first we separate the problem into different subproblems,one for each possible realization of {eT (θT );θT ∈Θ}. Considering two types of L(low risk) and H (high risk) for the agent, θT ∈Θ = {L,H}, and two effort levels,eT ∈ ξ = {0,1}, we need to analyze the four subproblems of (0,0), (1,1), (0,1)and (1,0), as there are four possible effort pairs for the two types. For instance, insubproblem (0,1), we have eT (L) = 0 and eT (H) = 1, which implies that a low riskdriver does not take any effort, but a high risk one exerts some effort to improvehis driving behavior.For the remainder of this section, we also consider the following assumption:Assumption 4.1 pHL(0) = 0 and pLL(1) = 1.Based on the above assumption, a high risk driver cannot improve his typewithout putting forth any effort. Similarly, by exerting effort, a low risk driverremains a good driver during the corresponding period.The next proposition provides the structure of UT for each of the subproblemsdiscussed above.Proposition 4.3 Under Assumption 4.1, the agent’s feasible continuation utilitysets UT for all subproblems (0,0), (1,1), (0,1) and (1,0) are the entire R2 spacecut by the reservation utilities u¯LT and u¯HT .The structure of the agent’s feasible continuation utility sets mentioned aboveis similar to that of feasible utility sets for the terminal problem depicted in Figure4.2, though the values of reservation utilities might be different. By Proposition4.3, moving backward from the final period, for all periods t, we get the samestructure as the one proposed in this proposition for the feasible set of the agent’scontinuation utility vectors. The following corollary states this result.Corollary 4.1 For all periods t of the UBI problem, the agent’s feasible continu-ation utility sets Ut for all subproblems of that period, are the entire R2 space cutby the reservation utilities u¯Lt and u¯Ht .75Note that the UBI model of our consideration is a finite horizon problem with Tperiods followed by an infinite horizon terminal problem. As discussed in Section4.3.1, the reservation utilities for all periods of the terminal problem are consideredequal to u¯L and u¯H for the agents with types L and H, respectively. Thus, sincethe UBI problem is followed by an infinite horizon model, it is natural to assumethat, for all periods t of the UBI model also, the reservation utilities are the sameand equal to u¯L and u¯H . In this case, according to Propositions 4.1 and 4.3, UT issimilar toUT+1; implying that moving backward and following the same reasoningprovided for the proof of Proposition 4.3 in Appendix C, for all the mentioned 4subproblems, Ut has the same characteristics presented in Proposition 4.3 at eachperiod t.The statement of Corollary 4.1 is still true even if we do not consider similarreservation utilities for all periods of the UBI problem, and instead assume that thereservation utilities are increasing as we move forward and get closer to the endof the UBI program (i.e, t increases). In this case, UT+1 ⊂ UT , which makes theproof of Proposition 4.3 still applicable in characterizing UT−1. This implies thatfor t = T−1, the agent’s feasible continuation utility setsUT−1 for all subproblems(0,0), (1,1), (0,1) and (1,0) are also the entire R2 space cut by the reservationutilities u¯LT−1 and u¯HT−1. Therefore, we haveUT ⊂UT−1 and consequently, movingbackward, a similar reasoning applies to all previous periods and Proposition 4.3holds for all t.Furthermore, by Proposition 4.2, the continuation social welfare frontier foreach state of the terminal problem is a flat plane over U . In other words, Φ∗(L, ·)andΦ∗(H, ·) are constant. These values are actually the terminal values,Φ∗T+1(L, ·)and Φ∗T+1(H, ·), for the continuation social welfares of our UBI problem. Thus,according to the objective function of the UBI model shown in equation (4.7),moving backward from the last period, we have the following.Corollary 4.2 For all periods t, the optimal continuation social welfare from eachstate, Φ∗t (θt ;ut), is a constant.The above constant values over the two types of the agent are denoted by col-umn vector Φ∗t (ut) = (CtL,CtH)T ;ut ∈ Ut . The statement of this corollary is true,as by Proposition 4.3 and Corollary 4.1, at each period t, the feasible continuation76utility sets of the agent, Ut , are the same for all subproblems of that period, whichmakes it possible to combine the defined subproblems and obtain the optimal ob-jective value given in equation (4.7) by comparing the continuation social welfaresof the corresponding type for both of the possible effort levels. This results inProposition 4.4 regarding the optimal value of the effort levels of the agent duringeach period.Agent’s Optimal Effort StructureThe following proposition demonstrates a necessary and sufficient condition forthe optimality of exerting effort by the agent in each period.Proposition 4.4 It is optimal for the principal to induce the agent with type θt totake effort in period t if and only if:(pθt L(1)− pθt L(0))·Ct+1L +(pθt H(1)− pθt H(0))·Ct+1H ≥Ceθt (1)δ.For the reminder of Section 4.3, to discuss the main insights of offering a UBIprogram, we focus on the case in which the cost of exerting effort for all customersare the same (i.e., Ceθt (1) = Ce(1); ∀θt ∈Θ ). We also consider the following as-sumption:Assumption 4.2a. pHL(1)≤ pHH(1) and pLL(0)≥ pLH(0).b. pHL(1)≥ pLH(0).Based on part a of the above assumption, during one period, the probabilitythat a high risk driver can improve his type to a low risk one by taking the effortaction does not exceed the probability that he will stay as a high risk driver forthe next period. Since the length of periods are short, it is natural to consider thisassumption. The similar reasoning applies to a low risk driver that does not takeeffort at some period and his type might change to a high risk one. As discussedin Section 4.1, Soleymanian et al. [58] show that designing an appropriate UBIcontract can encourage customers to try harder to become better drivers. When77the probability of having an improvement in the driving performance by taking aneffort action is sufficiently high, such that pHL(1) ≥ pLH(0) (part b of the aboveassumption), it can be shown that the optimal effort levels for the agent of eachtype has a threshold type of structure described in the next theorem.Theorem 4.1 Under Assumption 4.2, for any type θt ∈Θ , either it is optimal notto invest effort at all (i.e., e∗t (θt) = 0;∀t), or there exists a threshold t∗θt such thate∗t (θt) = 1 for any t ≥ t∗θt and e∗t (θt) = 0 for t < t∗θt .Figure 4.3 depicts the results of Theorem 4.1. By this theorem, for each type θtof the agent, there is a threshold time beyond which he should make a high effortuntil the end of the UBI program if his type remains the same. If the type of theagent changes, he should follow the optimal policy of his new type. This resultis aligned with the analysis of an empirical work by Choudhary et al. [8], whichstudies customers’ behavior in response to this monitoring technology includingtheir performance close to UBI pricing thresholds.Figure 4.3: Agent’s Optimal Effort StructureTo prove Theorem 4.1, we make use of the next theorem, which is interestingon its own as well. This result verifies one of the benefits of the UBI program tothe insurers.Theorem 4.2 Under Assumption 4.2, CtL−CtH is nondecreasing in t.Recall that, for any t, Ctθ represents the optimal continuation social welfarefrom state θt . That is Φ∗t (ut) = (CtL,CtH)T ;ut ∈Ut . Theorem 4.2 implies that get-ting closer to the end of the UBI program, the gap between the optimal continuation78social welfare corresponding to different agent types enlarges, which is the resultof screening that can differentiate various types of the agent in a better way overtime. However, the marginal benefit of screening declines as the number of peri-ods grows. Considering this behavior, in Section 4.4.2, we make suggestions onthe optimal length of the UBI program, denoted by T ∗ in our terminology. Wealso provide comparative statics results on how T ∗ changes with respect to variousmodel parameters.4.3.3 Extended Model with Limited LiabilityIn this section, we extend the model of Section 4.2 to consider a limited liabilitycontract in which there is an upper bound on the monthly premiums that a customercan be charged. Therefore, the agent’s loss will not exceed a certain amount at eachperiod. In practice, this bound is usually determined by regulations of the area inwhich the insurance company is located. In the following, let Ru denote the upperbound of the agent’s premium at each period. Thus, a new constraint (4.17) will beadded to constraints (4.8) through (4.11) of the recursive formulation of the UBIproblem, and also to constraints (4.13) through (4.15) of the terminal model ofSection 4.2.3.Rt(θt)≤ Ru. (4.17)Adding the above constraint can affect different steps of the algorithm of Section4.3 proposed for solving the principal’s problem. However, it can be shown thatunder the following assumption, the results developed in Section 4.3.1 regardingstep (i) of the algorithm still hold. Consequently, under Assumption 4.3, for theterminal problem, the feasible continuation utility set of the agent and continuationsocial welfare frontiers have a similar structure to those discussed in Section 4.3.1.See Corollary C.1 of Appendix C for more details.Assumption 4.3−Ru1−δ ≤ u¯H .Recall that δ represents the discount factor of the dynamic problem, and u¯θtdenotes the reservation utility of the agent with type θt , which is typically negative79in our setting. As a high type customer will not be in favor of paying the maximumpremium all the time, it is reasonable to consider the lower limit of Assumption4.3 for the reservation utility of the high risk drivers. Note that, as u¯H ≤ u¯L, As-sumption 4.3 implies the reservation utility of the low type drivers also satisfiesthis assumption.Furthermore, it turns out that, if the effort cost is big enough such that theequation of Assumption 4.4 holds, the results proposed in Section 4.3.2 on step(ii) of the algorithm including the structure of the social welfare frontiers and alsothe agent’s feasible continuation utility sets for each subproblem of the UBI modelapplies to the extended model of this section as well. This is discussed in CorollaryC.2 of Appendix C in more detail.Assumption 4.4 −Ru− u¯H +δ[pHL(1) · u¯L+ pHH(1) · u¯H]≤CeH(1).Thus, under Assumption 4.3 and 4.4, Theorems 4.1 and 4.2 can be also statedfor this extended model with limited liability.4.4 Numerical AnalysisIn this section, we perform a numerical analysis to illustrate the underlying insightsof the proposed UBI model. We also provide comparative statics results on how theobtained outcomes vary with respect to some model parameters. More specifically,we analyze the optimal monitoring time that insurance companies should considerdepending on the demographics of the target market, to what extent a UBI policycan outperform a traditional policy, and how the potential gains depend on theinitial belief about the distribution of different types of drivers. Our model insightscan be beneficial to the newly established insurance companies that are consideringoffering a UBI program, and as documented in the business press, there are manysuch firms in the insurance market.To run the numerical analysis, first we estimate the parameters of our model toobtain a realistic value for each of them. We have access to the UBI data of 40,525drivers who adopted this program from a major insurance company in the US. Inthe following, the details of the available data fields are discussed. Combining theresults of this novel dataset with the outcomes of other articles in the auto insurance80literature in which similar parameters are estimated using other relevant datasets,we obtain our model parameters’ estimation.4.4.1 Estimation of the Parameters of the ModelAverage Cost of Accident. First, we specify the average cost of a claim filed due toan accident. The information of the accidents and claim costs are not included inour dataset, so we use other articles in the related literature to estimate the averageclaim costs. Findings from the Insurance Research Council’s (IRC’s) auto injuryinsurance claims study show that the average cost per claim in the US in 2013 was$8,017 [13]. Also, according to another study by ISO [64], in 2017, the averageauto liability claim for property damage in the US was $3,638, and the average autoliability claim for bodily injury was $15,270. The same study reports that 1.1% ofpeople with liability insurance had a bodily injury liability claim, while 4.0% ofthose with liability insurance had a property damage liability claim. Hence, theaverage cost per claim (combining bodily injury and property damage) was around$6,500. Note that the ISO analysis does not consider the collision damage costs: itjust denotes the liability costs, and consequently, it is lower than the cost reportedby the IRC group. Here, we assume that the claim costs in our model are alsojust associated with third part liability costs, so considering the second referencementioned above, we set the cost of an accident per claim to be about $6,500 inour numerical analysis (CaH =CaL = 6500).Probability of Having an Accident. As explained in the previous sections, theprobability of having an accident is an important factor in solving the model.Our model specification considers two different probabilities of accident for lowand high risk drivers. Therefore, based on the published industry reports in theauto insurance context, we try to set the two probabilities of PaL and PaH for lowand high risk drivers, respectively. Forbes reported in an article that, accord-ing to car insurance industry estimates, a driver will file a claim for a collisionabout once every 17.9 years on average [62]. This implies that the average prob-ability of having an accident (filing a claim) in the US is around 5.5% per year(probability = 1/17.9 = 0.055). The Highway Loss Data Institute also reportedthe average claim frequency as 7.3 according to data from 2016 to 2018 [64]. The81claim frequency is expressed as a rate per 100 insured vehicle years. Therefore, theaverage probability of filing a claim in this report is assumed to be around 7% peryear.The mentioned references estimate the overall average probability at which adriver might have an accident. However, according to the model of Section 4.2,we need to set the probabilities of having an accident for low and high risk driversseparately. Yip and Yau [68] estimate the distribution of claim frequency consider-ing the zero-inflated Poisson regression model. In this paper, the yearly probabilityof having an accident is estimated to be around 5.6%, which is similar to the esti-mations of the above references. Considering the results of this paper including aPoisson regression for the number of accidents, and also taking into account thatwe want to cluster customers to the two groups of low and high risk, the probabil-ities of PaL and PaH can be calculated. Assuming the drivers whose rate of accidentare higher than the estimated average as high risk drivers and the rest as low riskones, we obtain PaL = 0.037/12 and PaH = 0.075/12 as monthly probabilities ofhaving an accident for each type of customers.Probability of Getting High/Low UBI Scores. In this subsection, we estimate theprobabilities of getting a high (low) UBI score for the low and high risk driversseparately based on their actual driving behavior. In our terminology, these prob-abilities are denoted by ρhL and ρhH , respectively (ρlL = 1−ρhL and ρ lH = 1−ρhH).For this purpose, we use a unique dataset from a major US automobile insurancecompany. We have information for 40,525 new customers that submitted a quoterequest to purchase an insurance policy from March 2012 to November 2014, whoalso adopted the UBI program. We observe the customers’ insurance score as theirinitial risk measure. The insurance score is a factor used by the insurance compa-nies traditionally as a proxy of customers’ past driving record (past claim frequencyand claim costs). For all customers who adopted the UBI policy, we also have ac-cess to their daily actual driving performance including the daily number of hardbrakes and the daily UBI scores during the monitoring period. Table 4.1 representsthe summary statistics for these variables in our dataset.To estimate the desired parameters, we first cluster customers into two groupsof low and high risk drivers. For clustering, we consider two attributes, which82Variable Minimum Mean Median MaximumInsurance Score 21.7 50.16 51.2 92.9Daily Hard Brakes 0 4.09 3 26UBI Score 31 66.83 66.5 99Table 4.1: Data Summary of the Insurance Score, Daily Number of Hard Brakes,and UBI Scoreare directly associated with how risky a customer’s driving behavior is. The firstattribute is the customers’ initial insurance score at the time of insurance purchase,and the second factor is the average number of daily hard brakes in the first monthof monitoring in the UBI program. Therefore, for customer i (i = 1,2, ...,40525),we have “insurance scorei” and “hard brakesi” as two attributes for clustering. Werun the cluster analysis by using the kmeans() function in R software and settingK = 2, as we want to classify customers into the two groups of low and high riskdrivers. Table 4.2 shows the results of our cluster analysis. The results indicatethat the customers in Cluster 1 have a significantly higher average daily numberof hard brakes than those in Cluster 2, which means more risky driving behavior.In addition, the drivers in Cluster 1 on average have a significantly lower initialinsurance score, which means again this group of customers is riskier than thecustomers in Cluster 2. Therefore, we label the customers in Clusters 1 and 2 ashigh and low risk drivers, respectively.Cluster N Average Daily Hard Brakes Average Initial Insurance Score1 (High Risk) 14,589 6.27 57.842 (Low Risk) 25,936 4.13 62.09Table 4.2: Results of Clustering Customers to the two Groups of Low and High RiskDriversIn the next step, we estimate the probability of obtaining a high UBI score foreach cluster of drivers. We observe the daily UBI score for each driver, whichis a measure (1-100) of his actual driving behavior. Table 4.3 represents the datasummary of the observed UBI scores during the first month of monitoring in ourdataset.If a UBI score is greater than median (65), we define it as a high score, and ifit is lower than that, we denote it as a low score. Considering the observed UBI83Variable Minimum Mean Median MaximumUBI Score 32 64.27 65 97Table 4.3: Data Summary for the UBI Score in the First Month of Monitoringscores, we count the number of high scores for each customer in the two clustersand estimate the probability of getting a high score based on that. In this dataset,the probabilities of obtaining a high UBI score for the low and high risk drivers areestimated as ρhL = 0.68 and ρhH = 0.27, respectively.Transition Probabilities of the Customers’ Types Based on Their Effort Level.According to the cluster analysis of Table 4.2, the initial probabilities of havinga low or high risk driver at the beginning of the UBI program of our dataset canbe obtained as αL = 25936/40525 = 0.64 and αH = 1−αL = 0.36. In the previ-ous part, we also estimated the probabilities of getting a low and high UBI scorefor each type (ρhL = 0.68 and ρhH = 0.27 ). We assume that these probabilitiesare homogeneous over time. That is, during all periods of the UBI program, theprobabilities of obtaining a low or high UBI score will be the same for each giventype. Let us also define P(low(high) scoret) and P(low(high) riskt) to denote theprobabilities of having a low (high) score and a low (high) risk driver in period t,respectively. Therefore,P(high scoret) = P(high scoret |low riskt)×P(low riskt)+ P(high scoret |high riskt)×P(high riskt)= ρhL×P(low riskt)+ρhH ×P(high riskt)= 0.68×P(low riskt)+0.27×(1−P(low riskt))= 0.27+0.41×P(low riskt). (4.18)If we observe the probability of having a high (low) score in the each period, wecan recover P(low riskt) based on the above equation. Furthermore, we have:P(low riskt) = P(low riskt |low riskt−1)×P(low riskt−1)+P(low riskt |high riskt−1)×P(high riskt−1). (4.19)84In the above equation, P(low riskt |low riskt−1) actually represents the transitionprobability of staying as a low risk driver in period t given the fact that the customeris low risk at the beginning of period t− 1 as well. P(low riskt |high riskt−1) alsoshows the probability of a transition from being a high risk driver in period t− 1to being a low risk one in the following period t. In our model setup, we considerthese transition probabilities conditional on the effort level that customers take ateach period, and we also assume that they are homogeneous over different periods.Recall that, for any period t, these probabilities are denoted by pi j(e¯) = Pr{θt+1 =j|θt = i,et = e¯}. Thus, as we consider two possible effort levels of 1 (taking effort)and 0 (not taking effort), equation (4.19) is equivalent to the following:P(low riskt) =(pLL(1)×P(et−1 = 1|low riskt−1)+ pLL(0)×P(et−1 = 0|low riskt−1))×P(low riskt−1)+(pHL(1)×P(et−1 = 1|high riskt−1)+ pHL(0)×P(et−1 = 0|high risk1))×P(high riskt−1).According to Assumption 4.1, the above can be written as:P(low riskt) =(P(et−1 = 1|low riskt−1)+ pLL(0)×(1−P(et−1 = 1|low riskt−1)))×P(low riskt−1)+(pHL(1)×P(et−1 = 1|high riskt−1))×(1−P(low riskt−1)). (4.20)Using equation (4.20) for different periods t = 2,3,4, ... of the UBI program, wewant to estimate the transition probabilities pLL(0) and pHL(1). One of the vari-ables of the above equation is P(low riskt). According to equation (4.18), havingthe probability of observing a high score in each period, we can get P(low riskt).As discussed in the previous section, UBI scores above the median (65) are consid-ered high scores and ones lower than that are defined as low UBI scores. Therefore,using our dataset, we can estimate P(high scoret) for each period t (i.e., month t)85of the monitoring period as follows:P(high score2) = 0.537, P(high score3) = 0.545,P(high score4) = 0.552, P(high score5) = 0.559.Considering the above values, equation (4.18) implies:P(low risk2) = 0.651, P(low risk3) = 0.671,P(low risk4) = 0.689, P(low risk5) = 0.705.Thus, using equation (4.20) for periods t = 2, 3, 4, and 5, and considering the abovevalues for P(low riskt), the other four variables of equation (4.20) including thetransition probabilities can be estimated. We obtain, pHL(1) = 0.113 and pLL(0) =0.94.The estimated values of all the model parameters are summarized in Table 4.4.We incorporate these values in the following to discuss the numerical analysis. Inaddition, we use the estimated periodic cost of being monitored obtained by So-leymanian et al. [57] to set the effort cost at each period. They show that the costof being monitored (including the privacy cost, inconvenience cost of adopting anew technology, effort cost, etc.) is around $30 per month. We consider the effortcost to be about one third of this value (i.e., CeL(1) = CeH(1) = 10). Note that thestructure of the results are robust to alternative values for the effort cost, but themagnitudes may change. Solving the model with the mentioned parameters andconsidering the length of monitoring as 1 year (i.e., T = 12) and the discount fac-tor of δ = 0.95, the structure depicted in Figure 4.4 for the effort action is obtained.The results show that a high risk driver takes an effort action during all the mon-itoring periods. However, a low risk driver starts to put forth effort from period 9until the end of the UBI program.Parameter CaL =CaH PaL PaH ρhL ρhH αH pLL(0) pHL(1)Value 6500 0.037/12 0.075/12 0.68 0.27 0.36 0.94 0.113Table 4.4: Values of the Parameters for the Numerical Example86Figure 4.4: Optimal Effort Structure for the Numerical Example4.4.2 A Short UBI Policy vs. a Long UBI PolicyAccording to Theorem 4.2, we showed that, as it gets closer to the end of the UBIprogram, the gap between the optimal continuation social welfare correspondingto different agent types enlarges, which suggests better screening of the agent sincemore information about him is collected. However, the marginal benefit of screen-ing declines as the number of monitoring periods grows. Figure 4.5 presents thisbehavior for our numerical example. This graph shows the expected percentage ofincrease in the social welfare achieved by offering a UBI policy with respect to themonitoring length of the program.Figure 4.5: Social Welfare Increase w.r.t. Monitoring LengthDepending on the technologies used to gather and analyze the customers’ sen-sor data during the monitoring time, having a larger number of periods for the UBI87program can be very costly to the insurance companies. Thus, comparing the mon-itoring cost of each period and the marginal benefit of adding it to the monitoringperiods, the optimal length of a UBI policy (T ∗) can be obtained. For instance,assume that, according to the periodic monitoring costs in our numerical example,the UBI policy should be continued until the marginal percentage of increase in thesocial welfare is above 0.5%. Therefore, the graph in Figure 4.5 demonstrates that6 periods is the optimal monitoring length (i.e., T ∗ = 6).In the above numerical example, the initial percentage of the high risk driversis estimated to be 36% (i.e., αH = 0.36). As αH increases, one may intuit that theoptimal monitoring time (T ∗) should also increase. When the monitoring costs arerelatively large, this intuition holds. However, for very small monitoring costs, itturns out that the result is in the opposite direction: T ∗ decreases in αH . This resultis depicted in Figure 4.6.Figure 4.6: Optimal Monitoring Time w.r.t. the Percentage of High Risk Driversand Monitoring Cost4.4.3 Benefits of Monitoring in Terms of the Accident RateIt has been stated in several articles that UBI programs lower the incidents of autocrashes and the average claim costs dramatically. One business reports noted that,by adopting the UBI policies, the accident rate can decline by about 20% especially88among young drivers [36]. Thus in this section, we present the estimated accidentrate reduction in the numerical example discussed above. Considering the param-eters estimated from our dataset, we use simulation to analyze how the customers’type distribution changes during the UBI program. Comparing the percentage ofthe risky drivers at end of the monitoring periods with the beginning of the pro-gram, we estimate the reduction in the accident rate of the underlying population.Recall that the optimal number of monitoring periods is obtained as 6 periods (i.e.,T ∗ = 6). We find that under the optimal UBI contract, after these 6 months of mon-itoring, the annual accident reduction rate is around 17.99% with a 95% confidenceinterval of ±0.65.As discussed in Section 4.4.2, the specifications of the target market such asαH , and also the monitoring costs that the insurance company incurs by offeringa UBI program, can affect the design of the UBI policy, and consequently, theoptimal monitoring length. Therefore, these factors have also important impacts onthe benefits of offering the UBI programs in terms of the accident rate. Figure 4.7presents the accident percentage reduction for various ranges of αH and monitoringcosts.Figure 4.7: Accident Rate Reduction w.r.t. the Percentage of High Risk Drivers andMonitoring CostsAs Figure 4.7 shows, despite the non-monotonic behavior of T ∗ with respect89to αH and the monitoring costs presented in Figure 4.6, the accident percentagereduction increases monotonically when monitoring is cheaper and the percentageof risky drivers in the society is higher. In other words, depending on the technolo-gies used for monitoring during the UBI program, in a society that αH is higher,even with a lower number of monitoring periods, we can have a higher reduction inthe accident rate compared to the market with a smaller αH , but longer monitoringperiods.4.5 Conclusion and DiscussionNew technologies have reshaped business models in various industries over the lastpast years, and the Internet of Things (IoT) is one of these new phenomena that isbecoming increasingly popular. Collecting sensor data of customers can generatenew revenue models and provide potential benefits for both customers and firms.Usage-Based Insurance (UBI) is one of the most recent IoT-based innovations byauto insurance companies that enables insurers to collect individual-level drivingdata and offer individually targeted premiums accordingly. This chapter providesa theoretical model to capture the effects of this monitoring technology on the autoinsurance market.We formulate the problem as a dynamic principal-agent model in which theagent (customer) has private information about his risk type and can take a hiddeneffort action in each period to improve his driving performance (i.e., the agent’ssubsequent type may change). The principal (insurer) offers a long-term contract tothe agent considering the captured monitoring data of the agent’s driving behavior,which is a noisy signal of his type and effort. We characterize the full history-dependent optimal contract for this dynamic adverse selection and moral hazardproblem.The interplay between hidden type and hidden action elevates the complexity ofthe dynamic mechanism design problems with both issues of adverse selection andmoral hazard, and consequently, most of the papers in this area resort to numericalanalyses. In contrast, we develop a dynamic programming algorithm to examinethe model analytically and explore the specifications of the desired optimal con-tract. Through this algorithm, the optimal contract can be derived by backward90induction based on the set of continuation payoffs for the principal and the agent ineach period. The feasible set of the pairs of continuation payoffs for the two partiesis vast. We tackle this difficulty by focusing on the efficient frontiers of the feasiblesets, which express the principal’s maximum continuation payoff as a function ofthe agent’s continuation payoff (vector) at the beginning of each period. Further-more, we overcome the complexity that arises from the interaction of hidden stateand hidden action by decomposing the problem into a set of subproblems in eachperiod.In addition to nontrivial contributions to the theory of dynamic principal-agentgames, our model results lead to interesting managerial insights for firms consid-ering offering UBI programs. We employ the UBI data of 40,525 drivers from amajor insurance company in the US to further illustrate the underlying insights ofthe proposed model. It is shown that offering a long UBI program is not alwaysbeneficial, as the marginal benefits of screening declines over time. By estimatingthe model parameters using our dataset and solving the developed model, we ob-tain the optimal monitoring time. We also provide comparative statics results onthis optimal length of monitoring with respect to various model parameters such asthe demographics of the target market and also the monitoring costs of offering theprogram for the insurers. We show that having more high risk drivers in a societydoes not necessarily mean that the UBI monitoring period should be longer. De-pending on the monitoring costs, by increasing the percentage of high risk drivers,it might be even optimal to offer a shorter UBI program. Finally, we analyze theeffects of monitoring on the accident rate and estimate that, for our dataset, underthe optimal contract, we can have about an 18% reduction in the annual rate ofaccidents.Our model has some limitations that could be addressed by future research. Inthe developed framework, we do not model competition among different insurancecompanies. Thus, a future research project could extend our current baseline modelby considering more than one insurer in the market. Such an extension wouldbe quite challenging, but it could lead to more realistic results. Furthermore, theanalysis in this chapter considers a long-term contract, which means the principalcommits to future payment plans. The solution does not apply when the principalcannot make full commitment to a long-term contract. Another interesting exten-91sion of this study would be to study the case of offering a short-term contract orsituations where renegotiation is permitted.Finally, the applications of dynamic adverse selection and moral hazard prob-lems are not limited to the insurance industry. The framework of this chapter hasmany other applications of interest for the Operations Management community.Specifically, several other problems in which two parties with non-aligned incen-tives collaborate with each other can benefit from this modeling approach. Forinstance, in supply chain management, the inventory level of the downstream firmis usually unobservable for the upstream firm. Also, the downstream firm may takesome hidden actions, which may affect its subsequent inventory levels. Moreover,in various companies, the managers have limited information about the employ-ees’ skill levels compared to the employees themselves; however, by designing anappropriate reward system, they can incentivize the employees to improve theirknowledge and skills. Therefore, the model and approach developed in this chap-ter provide a useful framework for studying the dynamic contracting problems inthese types of contexts.92Chapter 5ConclusionThe main focus of this dissertation is on the applications of stochastic multi-perioddecision making models in the field of service operations management. In all threestudies, we incorporate dynamic programming techniques to develop the math-ematical models and analyze the solution approaches for each of the discussedproblems. In what follows below, we provide a review of the research questionsdiscussed in each study, analytical models developed to address the proposed prob-lems, and also the main results. Furthermore, we discuss the limitations of eachwork, and possible extensions and avenues for further research.The first study (Chapter 2) contributes to the operations management literatureby analyzing the strategic customer’s joining behavior in the service systems wherecustomers need to meet some side conditions before being served. When queue-ing is involved, considering the length of the queue, time needed to complete therequired side conditions, and other factors such as in versus out-of-queue waitingcosts, customers may be strategic in deciding when to join. Thus, in this chapter,we examine whether an individual arriving to such systems should join the queueat that time, or wait to join at some future time. We analyze different variations ofthe problem, which vary in the individual’s options at each decision point (whetherleaving is an option or not). For each version, we formulate the problem as aMarkov decision process and discuss the structure of the optimal policy.When outside options for getting service are not available (i.e., leaving is not anoption), we show that in general, the optimal policy is a three-region policy of wait-93join-wait. However, we demonstrate that some unexpected shifts in policy couldhappen as a result of the combination of out-of-queue vs. in-queue waiting costs.Furthermore, we show how the optimal policy changes to a more complicated five-region policy of leave-wait-join-wait-leave in the presence of the third option ofleaving the system.Despite substantial work on strategic behavior in queuing systems, individualdecisions in the presence of prerequisite events have not been addressed, which wedo in Chapters 2 and 3 of this thesis. In many real queueing systems, customersneed to meet some requirements to get service; therefore, the results of our stud-ies shed light on this gap and open up new applications of strategic customers’behavior in the rational queueing literature.In Chapter 2, we investigate the case that individuals need to join the queue tobe able to initiate the prerequisite process. The second study of this dissertation(Chapter 3) extends the model of Chapter 2 to the case in which the prerequisiteevent may be in process before joining the queue. In this study, we first take the per-spective that only the individual approaching the queue is strategic about whetherto join it or not and analyze the individual’s optimal policy. We formulate the prob-lem as a Markov decision process, and prove that in general, the optimal policy isa two-region policy of wait-join; that is, one should join the queue if and only if itssize is above a certain threshold. Afterwards, we discuss a game theoretic settingwhere all individuals are strategic. The optimal strategy in this case depends onboth the queue size and also the number of people waiting out of queue.To obtain the equilibrium policy, we incorporate a level-k model, an approachdeveloped in the bounded rationality literature. In reality, individuals may not befully rational to follow the equilibrium strategy, as they may have different levelsof rationality. Therefore, in addition to analyzing the Nash equilibrium, we alsodiscuss the structure of the optimal policy that individuals with different levels ofrationality follow. We observe that when in-queue and out-of-queue waiting costsare the same, for each given queue size, there exists a threshold for the numberof people waiting out of queue such that it is optimal to join the line above thatthreshold in equilibrium even before having the required conditions for gettingservice. Furthermore, this threshold is decreasing in the queue size. Note that ifthe prerequisite event is satisfied while waiting out of queue, the optimal policy is94to join right away. We also present the socially optimal solution for the model ofChapter 3 and compare the results to the equilibrium policy.The models developed in Chapters 2 and 3 can be applied to service systemswith long queues or wait lists, in which an individual needs to meet some prereq-uisite condition, or complete some task, before reaching the front of the queue.The model results allow individuals to better manage their waiting time in thesesettings. For service providers, we provide insights regarding how strategic cus-tomers make queue joining decisions, under different model settings. In particular,system managers can use this information to decide about the penalties for cus-tomers who reach service without completing their requirements. We leave this tothe future studies.The optimal policies we derived would be also useful in thinking of extendingthe provided results to unobservable queues (e.g., call centers). There is a researchstream in the queueing literature that discusses strategic customers’ behavior insituations where individuals may not be fully informed about the state of the systembefore making their joining decisions. Though the queue in our setting is fullyobservable, this stream could be an interesting extension to the models we studied.One scenario is when the individuals do not see any information about the queueincluding the person in service or the ones waiting out of queue. Another scenariocan be the case in which customers can observe the number of people waiting inqueue, but they do not see how many people are waiting out of line.Furthermore, Chapters 2 and 3 explore FCFS queueing systems, but somesettings such as nursing home wait lists may not operate in a FCFS manner assometimes patients’ health condition can prioritize them for receiving service. Themodel results we derived prove useful in extending our settings to the case wherepeople waiting in queue may be served with some type of priority order.In the third study (Chapter 4) of this thesis, we analyze another dynamic de-cision making problem in the area of service operations management. This studyprovides a theoretical model to capture the benefits of one of the most recent in-novations by auto insurance companies, called Usage-Based Insurance (UBI). Bythis technology, the drivers’ behavior is monitored directly while they drive. Then,based on the assessed data, the insurance company offers some discounts to thecustomers. Thus, using UBI, insurance companies can get more accurate estimates95of the customers’ risk factors, and at the same time, by offering an appropriatemechanism, they can encourage customers to improve their driving performance.Collecting sensor data of customers can generate new revenue models and pro-vide potential benefits for both customers and firms, which we address in this study.We formulate the problem as a dynamic principal-agent model with both adverseselection and moral hazard. The agent (customer) has private information about hisrisk type and can take a hidden effort action in each period to improve his drivingperformance (i.e., the agent’s subsequent type may change). The principal (in-surer) offers a long-term contract to the agent considering the captured monitoringdata of the agent’s driving behavior, which is a noisy signal of his type and ef-fort. We characterize the full history-dependent optimal contract for this dynamicmechanism design problem.In order to compute the optimal contract, we develop a general recursive formu-lation. The underlying system is a Markov decision process, where the evolutionof the state of the system (type of the customer) is endogenous, as it depends onthe hidden action in the previous period. The interplay between hidden type andhidden action elevates the complexity of the dynamic mechanism design problemswith both issues of adverse selection and moral hazard, and consequently, most ofthe papers in this area resort to numerical analyses. In contrast, we develop a dy-namic programming algorithm to examine the model analytically and explore thespecifications of the desired optimal contract. Through this algorithm, the optimalcontract can be derived by backward induction based on the set of continuationpayoffs for the principal and the agent in each period.In order to further illustrate the underlying managerial insights of the proposedmodel, we also perform numerical analysis in Chapter 4. To do so, we employ theUBI data of 40,525 drivers from a major insurance company in the US that submit-ted a quote request to purchase an insurance policy from March 2012 to November2014, who also adopted this program. It is shown that offering a long UBI programis not always beneficial, as the marginal benefits of screening declines over time.By estimating the model parameters using our dataset and solving the devel-oped model, we obtain the optimal monitoring time. We also provide comparativestatics results on this optimal length of monitoring with respect to various modelparameters such as the demographics of the target market and also the monitoring96costs of offering the program for the insurers. We show that having more highrisk drivers in a society does not necessarily mean that the UBI monitoring periodshould be longer. Depending on the monitoring costs, by increasing the percent-age of high risk drivers, it might be even optimal to offer a shorter UBI program.Finally, we analyze the effects of monitoring on the accident rate and estimatethat, for our dataset, under the optimal contract, we can have about an 18% reduc-tion in the annual rate of accidents. Thus, in addition to nontrivial contributionsto the theory of dynamic principal-agent games, our model results lead to interest-ing managerial insights and policy recommendations for firms considering offeringUBI programs.The model developed in this study can be applied to a wide range of applica-tions in practice. For instance, based on a recent news article 1, life insurers areconsidering employing health-screening devices such as smart watches or Fitbit tolink assessed data to the insurance policies they are offering. This can be modeledas a dynamic adverse selection and moral hazard principal-agent problem similarto the model of Chapter 4. However, the applications of dynamic adverse selectionand moral hazard problems are not limited to the insurance industry. Several otherproblems in the field of operations management in which two parties with non-aligned incentives collaborate with each other can also benefit from the modelingframework of our study. For instance, in supply chain management, the inventorylevel of the downstream firm is usually unobservable for the upstream firm. Also,the downstream firm may take some hidden actions, which may affect its subse-quent inventory levels.Our model developed in the third study has some limitations that could beaddressed by future research. In the provided framework, we do not model com-petition among different insurance companies. Thus, an avenue for future couldbe extending the model of Chapter 4 to the case with more than one insurer inthe market. Furthermore, the analysis in this study considers a long-term contract,which means the principal commits to future payment plans. The solution does notapply when the principal cannot make full commitment to a long-term contract.Another interesting extension of this study would be to analyze the case of offering1https://money.cnn.com/2018/09/19/news/companies/john-hancock-life-insurance-vitality/index.html97a short-term contract or situations where renegotiation is permitted.98Bibliography[1] D. Abreu, D. Pearce, E. Stacchetti, et al. Toward a theory of discountedrepeated games with imperfect monitoring. Econometrica, 58(5):1041–1063, 1990. → pages 57, 71[2] M. Battaglini. Long-term contracting with markovian consumers. AmericanEconomic Review, 95(3):637–658, 2005. → page 56[3] D. P. Bertsekas. Dynamic Programming and Optimal Control. AthenaScientific, 4 edition, 2012. ISBN 1. → page 1[4] P. Bolton, M. Dewatripont, et al. Contract theory. MIT press, 2005. → page55[5] A. Burnetas and A. Economou. Equilibrium customer strategies in a singleserver markovian queue with setup times. Queueing Systems, 56(3):213–228, 2007. → page 51[6] C. F. Camerer, T.-H. Ho, and J.-K. Chong. A cognitive hierarchy model ofgames. The Quarterly Journal of Economics, 119(3):861–898, 2004. →pages 33, 34[7] P.-A. Chiappori and B. Salanie´. Testing for asymmetric information ininsurance markets. Journal of political Economy, 108(1):56–78, 2000. →page 53[8] V. Choudhary, M. Shunko, and S. Netessine. Does immediate feedbackmake you try less hard? a study of automotive telematics. Working Paper.Available at SSRN: https://ssrn.com/abstract=3260891, 2019. → pages6, 53, 57, 58, 78[9] CNN-News. John Hancock wants to turn life insurance into a wellnessgame. https://money.cnn.com/2018/09/19/news/companies/john-hancock-life-insurance-vitality/index.html, 2018. → page5899[10] H. L. Cole and N. Kocherlakota. Dynamic games with hidden actions andhidden states. Journal of Economic Theory, 98(1):114–126, 2001. → page57[11] R. Cooper and B. Hayes. Multi-period insurance contracts. InternationalJournal of Industrial Organization, 5(2):211–231, 1987. → page 70[12] C. J. Corbett. Stochastic inventory systems in a supply chain withasymmetric information: Cycle stocks, safety stocks, and consignmentstock. Operations research, 49(4):487–500, 2001. → page 55[13] D. Corum. Auto injury claim severity pushes insurance claim costs higher.https://www.insurance-research.org/sites/default/files/downloads/Trends2015NRFINAL.pdf, 2015.→ page 81[14] S. Cui, X. Su, and S. K. Veeraraghavan. A model of rational retrials inqueues. Operations Research, Forthcoming, 2019. → pages 11, 41, 45[15] T. H. Cui and Y. Zhang. Cognitive hierarchy in capacity allocation games.Management Science, 64(3):1250–1270, 2018. → page 27[16] F. Dahlqvist, M. Patel, A. Rajko, and S. Jonathan. Growing opportunities inthe Internet of Things.https://www.mckinsey.com/industries/private-equity-and-principal-investors/our-insights/growing-opportunities-in-the-internet-of-things, 2019.→ page 52[17] Deloitte-Insights. To share or not to share: What consumers really thinkabout sharing their personal information.https://www2.deloitte.com/insights/us/en/industry/retail-distribution/sharing-personal-information-consumer-privacy-concerns.html,2017. → page 66[18] G. Dionne. Handbook of Insurance. Springer Science & Business Media,2013. → pages 54, 60[19] M. Doepke and R. M. Townsend. Dynamic mechanism design with hiddenincome and hidden actions. Journal of Economic Theory, 126(1):235–285,2006. → pages 6, 56[20] A. Economou and S. Kanta. Equilibrium balking strategies in the observablesingle-server queue with breakdowns and repairs. Operations ResearchLetters, 36(6):696–699, 2008. → page 51100[21] A. Fernandes and C. Phelan. A recursive formulation for repeated agencywith history dependence. Journal of Economic Theory, 91(2):223–247,2000. → pages 56, 57[22] Global-Market-Insight. Usage-based insurance market to hit $107bn by2024. https://www.globenewswire.com/news-release/2018/12/03/1660531/0/en/Usage-based-Insurance-Market-to-hit-107bn-by-2024-Global-Market-Insights-Inc.html, 2018. → pages4, 53[23] P. Guo and R. Hassin. Strategic behavior and social optimization inmarkovian vacation queues: the case of heterogeneous customers. EuropeanJournal of Operational Research, 222(2):278–286, 2012. → page 51[24] P. Guo and Z. G. Zhang. Strategic queueing behavior and its impact onsystem performance in service systems with the congestion-based staffingpolicy. Manufacturing & Service Operations Management, 15(1):118–131,2013. → page 51[25] P. Guo and P. Zipkin. Analysis and comparison of queues with differentlevels of delay information. Management Science, 53(6):962–970, 2007. →page 51[26] P. Guo and P. Zipkin. The effects of the availability of waiting-timeinformation on a balking queue. European Journal of Operational Research,198(1):199–209, 2009. → page 51[27] P. Handel, I. Skog, J. Wahlstrom, F. Bonawiede, R. Welch, J. Ohlsson, andM. Ohlsson. Insurance telematics: Opportunities and challenges with thesmartphone solution. IEEE Intelligent Transportation Systems Magazine, 4(6):57–70, 2014. → pages 4, 53[28] R. Hassin. Rational queueing. CRC press, Boca Raton, FL, 2016. → pages3, 11, 41, 51[29] R. Hassin and M. Haviv. Equilibrium strategies for queues with impatientcustomers. Operations Research Letters, 17(1):41–45, 1995. → pages 11, 45[30] R. Hassin and M. Haviv. To queue or not to queue: Equilibrium behavior inqueueing systems. Kluwer Academic Publishers, Boston, MA, 2003. →page 11[31] T.-H. Ho and X. Su. A dynamic level-k model in sequential games.Management Science, 59(2):452–469, 2013. → pages 33, 34101[32] T.-H. Ho, C. Camerer, and K. Weigelt. Iterated dominance and iterated bestresponse in experimental” p-beauty contests”. The American EconomicReview, 88(4):947–969, 1998. → pages 27, 33, 34[33] B. Holmstrom et al. Moral hazard and observability. Bell journal ofEconomics, 10(1):74–91, 1979. → page 55[34] T. Huang, G. Allon, and A. Bassamboo. Bounded rationality in servicesystems. Manufacturing & Service Operations Management, 15(2):263–279, 2013. → page 27[35] IHS-Markit. Usage-based insurance expected to grow to 142 millionsubscribers globally by 2023. https://technology.ihs.com/578102/, 2016. →pages 4, 53[36] Intelligent-Mechatronic-Systems-Inc. Driver behavior monitoring andvehicle telematics insurance.https://www.intellimec.com/insights/driver-behavior-monitoring, 2019. →page 89[37] P. Jeziorski, E. Krasnokutskaya, and O. Ceccarini. Adverse selection andmoral hazard in the dynamic model of auto insurance. Working Paper, 2017.→ pages 53, 61[38] Y.-M. Kao, N. B. Keskin, and K. Shang. Impact of information asymmetryand limited production capacity on business interruption insurance.Available at SSRN 3184530, 2018. → page 55[39] N. C. Knudsen. Individual and social optimization in a multiserver queuewith a general cost-benefit structure. Econometrica: Journal of theEconometric Society, 40(3):515–528, 1972. → page 10[40] X. Li, P. Guo, and Z. Lian. Quality-speed competition in customer-intensiveservices with boundedly rational customers. Production and OperationsManagement, 25(11):1885–1901, 2016. → page 27[41] X. Li, Q. Li, P. Guo, and Z. Lian. On the uniqueness and stability ofequilibrium in quality-speed competition with boundedly-rational customers:The case with general reward function and multiple servers. InternationalJournal of Production Economics, 193:726–736, 2017. → page 27[42] T. Lin. Valuing intrinsic and instrumental preferences for privacy. WorkingPaper. Available at SSRN: https://ssrn.com/abstract=3406412, 2019. →page 66102[43] A. Mandelbaum and U. Yechiali. Optimal entering rules for a customer withwait option at an M/G/1 queue. Management Science, 29(2):174–187, 1983.→ pages 10, 11, 19, 21[44] R. D. McKelvey and T. R. Palfrey. An experimental study of the centipedegame. Econometrica: Journal of the Econometric Society, 60(4):803–836,1992. → pages 27, 33[45] R. Mosley. Seeing UBI from consumer’s point of view and what that meansfor adoption rates. Pinnacle Actuarial Resources, Inc. Insurance TelematicsUSA Conference, 2015. → page 66[46] R. Nagel. Unraveling in guessing games: An experimental study. TheAmerican Economic Review, 85(5):1313–1326, 1995. → pages 25, 27, 33[47] P. Naor. The regulation of queue size by levying tolls. Econometrica:journal of the Econometric Society, 37(1):15–24, 1969. → pages 10, 27, 45[48] E. L. Plambeck and S. A. Zenios. Performance-based incentives in adynamic principal-agent model. Manufacturing & Service OperationsManagement, 2(3):240–263, 2000. → page 56[49] E. L. Plambeck and S. A. Zenios. Incentive efficient control of amake-to-stock production system. Operations Research, 51(3):371–386,2003. → page 56[50] R. Puelz and A. Snow. Evidence on adverse selection: Equilibriumsignalling and cross-subsidization in the insurance market. Journal ofPolitical Economy, 102(2):236–257, 1994. → page 53[51] H. Ren and T. Huang. Modeling customer bounded rationality in operationsmanagement: A review and research opportunities. Computers &Operations Research, 91:48–58, 2018. → page 27[52] S. M. Ross. Introduction to stochastic dynamic programming. Academicpress, New York, 1983. → pages 13, 30[53] M. Rothschild and J. Stiglitz. Equilibrium in competitive insurance markets:An essay on the economics of imperfect information. In Uncertainty ineconomics, pages 257–280. Elsevier, 1978. → page 55[54] B. Salanie´. The Economics of Contracts: A Primer. MIT Press, 2005. →page 55103[55] S. M. Shechter, M. D. Bailey, A. J. Schaefer, and M. S. Roberts. The optimaltime to initiate hiv therapy under ordered health states. OperationsResearch, 56(1):20–33, 2008. → page 117[56] H. A. Simon. A behavioral model of rational choice. The quarterly journalof economics, 69(1):99–118, 1955. → pages 3, 27[57] M. Soleymanian, C. B. Weinberg, and T. Zhu. Threats to privacy versussaving money: A study of consumers? adoption and usage of usage-basedinsurance. Working Paper, 2019. → page 86[58] M. Soleymanian, C. B. Weinberg, and T. Zhu. Sensor data and behavioraltracking: Does usage-based auto insurance benefit drivers? MarketingScience, 38(1):21–43, 2019. → pages 6, 53, 54, 58, 60, 77[59] D. O. Stahl and P. W. Wilson. Experimental evidence on players’ models ofother players. Journal of economic behavior & organization, 25(3):309–327, 1994. → pages 3, 24, 33, 34[60] D. O. Stahl and P. W. Wilson. On players’ models of other players: Theoryand experimental evidence. Games and Economic Behavior, 10(1):218–254,1995. → page 33[61] X. Su. Bounded rationality in newsvendor models. Manufacturing &Service Operations Management, 10(4):566–589, 2008. → page 27[62] D. Toups. How many times will you crash your car?https://www.forbes.com/sites/moneybuilder/2011/07/27/how-many-times-will-you-crash-your car/7f47eb744e62, 2011. → page81[63] R. Verbelen, K. Antonio, and G. Claeskens. Unravelling the predictive powerof telematics data in car insurance pricing. Journal of the Royal StatisticalSociety: Series C (Applied Statistics), 67(5):1275–1304, 2018. → page 53[64] Verisk-Analytics-Business. Facts and statistics: Auto insurance.https://www.iii.org/fact-statistic/facts-statistics-auto-insurance, 2017. →page 81[65] World-Health-Organization. Global status report on road safety.https://www.who.int/violence injury prevention/road safety status/2018/en/,2018. → page 55104[66] U. Yechiali. On optimal balking rules and toll charges in the GI/M/1queuing process. Operations Research, 19(2):349–370, 1971. → page 10[67] U. Yechiali. Customers’ optimal joining rules for the GI/M/s queue.Management Science, 18(7):434–443, 1972. → page 10[68] K. C. Yip and K. K. Yau. On modeling claim frequency data in generalinsurance with extra zeros. Insurance: Mathematics and Economics, 36(2):153–163, 2005. → page 82[69] H. Zhang. Solving an infinite horizon adverse selection model through finitepolicy graphs. Operations research, 60(4):850–864, 2012. → pages56, 57, 71[70] H. Zhang and S. Zenios. A dynamic principal-agent model with hiddeninformation: Sequential optimality through truthful state revelation.Operations Research, 56(3):681–696, 2008. → pages 54, 56, 57, 64[71] H. Zhang, M. Nagarajan, and G. Sosˇic´. Dynamic supplier contracts underasymmetric inventory information. Operations Research, 58(5):1380–1397,2010. → page 56105Appendix AAppendix of Chapter 2A.1 Discussion on Jn of Equation (2.2)To compute Jn, we first let random variable Θ be the waiting time in the queuewhen the individual joins with n people in the system. Also, let the random vari-able ∆ represent the penalty cost when the person reaches the head of the line (δif A = 0; 0 if A = 1). Recall that A is an indicator variable that changes from 0 to1 when the prerequisite event completes. We also assume T denotes the randomamount of time that it takes to complete the prerequisite event, which is exponen-tially distributed with rate α . Since we suppose the in-queue waiting cost is 1, theexpected cost of joining the queue when n people are in the system is equal to:Jn = EΘ ,∆(Θ +∆)= EΘ [Θ ]+EΘ(EΘ ,∆ (∆ |Θ = θ)).We have EΘ [Θ ] = n/µ as there are n people ahead in the system, and we obtain:EΘ ,∆ (∆ |Θ = θ) = δ ·Pr(A = 0|Θ = θ)= δ ·Pr(T > θ)= δ∫ ∞θαe−αtdt= δe−αθ ,106where the second last equality holds since T is exponentially distributed with rateα . Thus,Jn =nµ+δE(e−αΘ). (A.1)When there are n people in the system, Θ has an Erlang distribution with parame-ters n and µ . Using the moment generating function of the Erlang distribution, wecan show:E(e−αΘ)=(1+αµ)−n. (A.2)Combining (A.1) and (A.2), we obtain:Jn =nµ+δ(1+αµ)−n,as desired.A.2 Proof of Lemma 2.1.We first prove another supporting lemma to prove Lemma 2.1.Lemma A.1 Defineh1(n) =nµ+δ(1+αµ)−n,h2(n) =[cλ +µ+λλ +µ(n+1µ+δ(1+αµ)−n−1)+µλ +µ(n−1µ+δ(1+αµ)−n+1)],where n is a real number. Then, h1(n) and h2(n) have a unique intersection ifc < 1−ρ , δ 6= 0.Proof. First, we define function g(n) as follows:g(n) = h1(n)−h2(n).107Thus,g(n) =nµ+δ(1+αµ)−n−[cλ +µ+λλ +µ(n+1µ+δ(1+αµ)−n−1)+µλ +µ(n−1µ+δ(1+αµ)−n+1)].Taking the derivative with respect to n, we have:g′(n) =1µ+δ(1+αµ)−n(− ln(1+ αµ))−[λ(λ +µ)µ+(λλ +µ)δ(1+αµ)−n−1(− ln(1+ αµ))+µ(λ +µ)µ+(µλ +µ)δ(1+αµ)−n+1(− ln(1+ αµ))]=1µ+δ(1+αµ)−n(− ln(1+ αµ))−[1µ+δ(1+αµ)−n(− ln(1+ αµ))( λλ +µ(1+αµ)−1+µλ +µ(1+αµ))]={δ(1+αµ)−nln(1+αµ)}·[λλ +µ(1+αµ)−1+µλ +µ(1+αµ)−1].The expression in the curly brackets is greater than 0, and the expression in thesquare bracket is also strictly positive sinceλλ +µ(1+αµ)−1+µλ +µ(1+αµ)=λλ +µ(µα+µ)+µλ +µ+αλ +µ=λµ+µ2+2αµ+α2λµ+µ2+αµ+αλ> 1 .108Above, the inequality follows since µ > λ . Hence, g′(n) > 0. This implies thatvalue of n¯ satisfying g(n¯) = 0 is unique if it exists.Now we address the existence. Rearranging and simplifying the terms in thedefinition of g(n), we have:g(n) = δ(1+αµ)−n[1− λλ +µ(1+αµ)−1− µλ +µ(1+αµ)]−[cλ +µ+λµ(λ +µ)− 1λ +µ].Thus,g(n) = δ(1+αµ)−n[1− λλ +µ(1+αµ)−1− µλ +µ(1+αµ)]−[cµ+λ −µµ(λ +µ)]. (A.3)As n tends to−∞, g(n) approaches−∞. To see this, note the factor (1+α/µ)−n→∞ and the expression in the first square brackets is negative becauseλλ +µ(1+αµ)−1+µλ +µ(1+αµ)> 1 ,as proved above.When n tends to +∞, the first term approaches 0, and g(n) approaches −[cµ+λ −µ]/[µ(λ +µ)], which is positive since we have assumed c < 1−ρ . Hence, theintersection of h1(n) and h2(n) exists and this completes the proof.The intersection between h1(n) and h2(n) of Lemma A.1 can be computed asfollows. We shall use n¯ to denote this intersection, satisfyingh1(n¯)−h2(n¯) = g(n¯) = 0.We will refer to n¯ in the proof of Lemma 2.1, part (c). By equation (A.3), the above109is equivalent toδ(1+αµ)−n¯[1− λλ +µ(1+αµ)−1− µλ +µ(1+αµ)]=cµ+λ −µµ(λ +µ).Then, rearranging and simplifying this expression, we have:(1+αµ)n¯=δ[1− λλ+µ(1+ αµ)−1− µλ+µ (1+ αµ )](cµ+λ−µµ(λ+µ)) (A.4)=δµ[λ +µ−λ(1+ αµ)−1−µ (1+ αµ )]cµ+λ −µ (A.5)=δµ[λ(αµ+α)−α]cµ+λ −µ . (A.6)Taking natural logarithm from both sides of the above equality,n¯ ln(1+αµ)= lnδµ[λ(αµ+α)−α]cµ+λ −µ . (A.7)Therefore,n¯ =ln(δµ[λ(αα+µ)−α]cµ+λ−µ)ln(1+ αµ) . (A.8)Proof of Lemma 2.1. Part a. Recalling the definition of f ( j) from equation(2.4), δ ≤ f (0) is equivalent toδ ≤(( 1λ)c+1µ)(α+µα),110which holds if and only ifδ(αα+µ)≤ cλ+1µ.Then, by rearranging terms,0 ≤ cλ+1µ−δ(αα+µ)=cλ+1µ+δ(µµ+α−1),which is equivalent toδ ≤ cλ+1µ+δ(1+αµ)−1.The left side of the above expression is J0 from equation (2.2), and the right handside above is W ′0 from equation (2.5), so this inequality holds if and only if J0 ≤W ′0.Hence, {J0 ≤W ′0 if δ ≤ f (0)J0 >W ′0 if δ > f (0).Part b. In this part, we suppose that both n≥ 1 and c≥ 1−ρ hold, and we want toshow Jn ≤W ′n or equivalently W ′n− Jn ≥ 0. By the definition of W ′n from equation(2.5) and Jn from equation (2.2), W ′n− Jn ≥ 0 is equivalent tocλ +µ+λλ +µ(n+1µ+δ(1+αµ)−n−1)+µλ +µ(n−1µ+δ(1+αµ)−n+1)− nµ−δ(1+αµ)−n≥ 0.By rearranging and combining terms of the above inequality, we get:cλ +µ+λ −µµ(λ +µ)+δ(1+αµ)−n(λλ +µ(1+αµ)−1+µλ +µ(1+αµ)−1)≥ 0,111which is equivalent tocµ+λ −µµ(λ +µ)+δ(1+αµ)−n(λµ+µ2+2αµ+α2(λ +µ)(α+µ)−1)≥ 0.Since we have assumed c ≥ 1−ρ , the first term cµ+λ −µµ(λ +µ)is nonnegative, andit suffices to show that the second term is nonnegative. Since µ > λ , we haveλµ+µ2+2αµ+α2≥ λµ+µ2+2αµ ≥ λµ+µ2+αµ+αλ = (λ+µ)(α+µ).Therefore,λµ+µ2+2αµ+α2(λ +µ)(α+µ)−1≥ 0,which completes the proof.Part c. In this part, we suppose that both n ≥ 1 and c < 1− ρ hold. Since c <1− ρ , by Lemma A.1, considering real values for n, the expressions of Jn andW ′n have a unique intersection denoted by n¯ of equation (A.8) (recall we assumeδ > 0 unless otherwise specified). Note that the expressions for functions h1(n)and h2(n) of Lemma A.1 are the same as Jn and W ′n given in equations (2.2) and(2.5), respectively, for n 6= 0. Since n represents the number of people in the systemand can only take integer values, let n∗ be the greatest integer which is smaller thanthe intersection n¯ of Lemma A.1. Therefore,n∗ = bn¯c=⌊ ln( δµ[λ( αα+µ )−α]cµ+λ−µ)ln(1+ αµ) ⌋.As discussed in the proof of Lemma A.1, the expression for Jn−W ′n is an increas-ing function in n. Hence, {Jn ≤W ′n for n≤ n∗Jn >W ′n for n > n∗.This completes the proof of Lemma 2.1.112A.3 Proof of Theorem 2.1We first prove supporting lemmas to prove Theorem 2.1 .Lemma A.2 f ( j) is strictly increasing and unbounded in j.Proof. Recalling the definition of f ( j) from (2.4), we have:f ( j) =(( j∑i=0µ iλ i+1)c+1µ)((α+µ) j+1αµ j). (A.9)In the first factor of the above expression, the geometric seriesj∑i=0µ iλ i+1is strictlyincreasing in j. This geometric series with nonnegative components has the com-mon ratio µ/λ > 1; therefore, as j tends to infinity, it approaches infinity. Also,the second factor in (A.9) is equal to((α+µ) j+1αµ j)=(α+µµ) j(α+µα).This factor is strictly increasing in j, and as j tends to infinity, it approaches infinityas well. Thus, f ( j) is strictly increasing and unbounded in j.Lemma A.3 Recall the definition of j∗ given in (2.7). Then,cλ +µ+λλ +µJn+1+µλ +µ(Jn+ c(1λ+µλ 2+ · · ·+ µn−1λ n))< Jn if n < j∗≥ Jn if n = j∗> Jn if n > j∗.Remark A.1 Since f ( j) is strictly increasing in j (Lemma A.2), the value of j∗ isunique for any given value of the penalty cost δ .Proof of Lemma A.3. We first consider the case where n < j∗. We want toshow:Jn >cλ +µ+λλ +µJn+1+µλ +µ(Jn+ c(1λ+µλ 2+ ...+µn−1λ n)).113By the definition of j∗ from (2.7), the inequality δ > f ( j∗− 1) holds. Since byLemma A.2, f ( j) is a strictly increasing function, it follows from n < j∗,δ > f ( j∗−1) ≥ f (n) =(( 1λ+µλ 2+ · · ·+ µnλ n+1)c+1µ)((α+µ)n+1αµn). (A.10)By rearranging terms, the above inequality holds if and only ifδ(αµn(α+µ)n+1)>( 1λ+µλ 2+ · · ·+ µnλ n+1)c+1µ,which is equivalent toδ(µn(α+µ)n− µn+1(α+µ)n+1)>(1λ+µλ 2+ · · ·+ µnλ n+1)c+1µ.By simple algebra, the above inequality is equivalent to(δ(1+αµ)−n− 1µ−δ(1+αµ)−(n+1))>(1λ+µλ 2+ · · ·+ µnλ n+1)c.Therefore, according to the definitions of Jn and Jn+1 from equation (2.2), the aboveinequality is equivalent to each of the following:Jn− Jn+1 >(1λ+µλ 2+ · · ·+ µnλ n+1)cλλ +µJn− λλ +µ Jn+1 >λλ +µc(1λ+µλ 2+ · · ·+ µnλ n+1)Jn >λλ +µJn+1+µλ +µJn+cλ +µ+µλ +µc(1λ+µλ 2+ · · ·+ µn−1λ n)Jn >cλ +µ+λλ +µJn+1+µλ +µ(Jn+ c( 1λ+µλ 2+ · · ·+ µn−1λ n)).114This completes the proof for the case n < j∗.For the case n= j∗, by the definition of j∗ given in (2.7), the inequality f ( j∗)≥δ holds. Thus, the similar approach for the proof can be applied, but all the inequal-ities mentioned above would be non-strict and in the other direction.For any n > j∗, considering the fact that f ( j) is a strictly increasing functionby Lemma A.2 and also following the definition of j∗ from (2.7), we have:f (n)> f ( j∗)≥ δ .This implies that the proof of this case is also similar to what is discussed for thecase where n < j∗, with all inequalities in the other direction.Lemma A.4 Recall the definitions of V ∗j and Wj from Section 2.2.1. Suppose thatfor states j = 0,1, ...,n, the optimal policy is to wait out of queue, or equivalently,V ∗j =Wj. Then, for any 0≤ j ≤ n, we have:V ∗j =V∗j+1+ c(1λ+µλ 2+ ...+µ jλ j+1). (A.11)Proof. This lemma states that if it is optimal to wait out of queue for any state nand all states smaller than n, then for any 0≤ j≤ n, V ∗j is the out-of-queue waitingcost until state j becomes j+1, plus V ∗j+1. Here, the transition of the states followsa truncated random walk process, with the following transition probabilities forany 0≤ j ≤ n: p j, j+1 =λλ +µif j ≥ 1p j, j−1 =µλ +µif j ≥ 1p0,1 = 1 .Note that the expected time for each step of this random walk is 1/(λ +µ), exceptthe step from 0 to 1. The expected time it takes to go from 0 to 1 is 1/λ . Hence,the summation in the second term given in equation (A.11),1λ+µλ 2+ ...+µ jλ j+1,represents the expected time it takes to hit j + 1 starting from j, which can beeasily proved by the following induction. This expected time is multiplied by the115out-of-queue waiting cost, c.Let E j, j+1 refer to the expected time that it takes to reach state j+ 1 startingfrom state j. We first check the result for j = 0 and j = 1. For j = 0, E0,1 isobviously equal to 1/λ . For j = 1,E1,2 =1λ +µ+λλ +µ·0+ µλ +µE0,2 (A.12)=1λ +µ+µλ +µ(1λ)+µλ +µE1,2, (A.13)where the second equality holds because E0,2 = 1/λ +E1,2. By rearranging theabove expression, we obtain:E1,2 =1λ+µλ 2.Now, we proceed by induction. Supposing that the result is true for j− 1, weneed to prove that it is also correct for j.E j, j+1 =1λ +µ+λλ +µ·0+ µλ +µE j−1, j+1 (A.14)=1λ +µ+µλ +µ(E j−1, j +E j, j+1)(A.15)=1λ +µ+µλ +µ(1λ+µλ 2+ ...+µ j−1λ j)+µλ +µE j, j+1.(A.16)Above, the third equality follows from applying the induction hypothesis. Hence,by rearranging terms, we have:λλ +µE j, j+1 =1λ +µ+µλ +µ(1λ+µλ 2+ ...+µ j−1λ j),which is equivalent toE j, j+1 =1λ+µλ 2+ ...+µ jλ j+1.This completes the induction step and proves the required result.116Now, we are ready to discuss the proof of Theorem 2.1.Proof of Theorem 2.1. If j∗ = 0, then by parts (a) and (b) of Lemma 2.1, theinequality Jn ≤W ′n holds for all states n, i.e., for all states, joining the queue is atleast as good as waiting for one period and then joining at the next state change.Our problem is an optimal stopping problem, in which the individual seeks theoptimal time for joining the queue (stopping). Shechter et al. [55] showed that, iffor all states stopping is at least as good as waiting for one period and stopping inthe next period, then it would be optimal to stop in all states. Hence, for j∗ = 0,the optimal policy is to join the queue for all states n.Now, we proceed by assuming j∗ ≥ 1. The proof has two main steps. First, wewill show that for any n < j∗, the optimal action is to wait out of queue; for thispart, using the supporting lemmas, we will find the optimal policy. In the secondstep, we prove the optimality of joining the queue for any n ≥ j∗. For this part,we will start with a guess for the optimal policy and then using the supportinglemmas we have discussed earlier, we will show that this policy satisfies Bellman’sequations (2.1).Step 1: n < j∗. We use induction to show that for any n < j∗, it is optimal towait out of queue.We start with state n = 0. By the definition of j∗, we know that f ( j∗−1)< δ .By Lemma A.2, f ( j) is a strictly increasing function in j, and considering the factthat j∗ ≥ 1, we have f (0)≤ f ( j∗−1)< δ . Hence, by part (a) of Lemma 2.1,J0 > W ′0.Recall from definition that W ′0 represents the expected cost of waiting out of queuein state 0, and then joining the queue in the next state change. Since, W0 is theexpected cost of waiting out of queue in state 0 and then following the optimalpolicy from the next state, the inequality W ′0 ≥W0 holds. It follows that J0 >W ′0 ≥W0, which implies it is optimal to wait out of queue for n = 0 (i.e., V ∗0 =W0).For an induction hypothesis, suppose that for any state i = 1, ...,n− 1, it isoptimal to wait out of queue, i.e., V ∗i =Wi. Now, we want to prove that waiting is117optimal for state i = n. Since 1≤ n < j∗, by Lemma A.3,Jn >cλ +µ+λλ +µJn+1+µλ +µ(Jn+ c(1λ+µλ 2+ ...+µn−1λ n)).Clearly Jn ≥V ∗n holds for any state n, and thus,Jn >cλ +µ+λλ +µV ∗n+1+µλ +µ(V ∗n + c(1λ+µλ 2+ ...+µn−1λ n)). (A.17)By the induction assumption, for i = 0, ...,n−1, it is optimal to wait out of queue.Thus, by Lemma A.4, equality (A.11) holds for any 0 ≤ i ≤ n− 1. Consideringequality (A.11) for i = n−1, we have:V ∗n−1 =V∗n + c(1λ+µλ 2+ ...+µn−1λ n).Thus, inequality (A.17) is equivalent toJn >cλ +µ+λλ +µV ∗n+1+µλ +µV ∗n−1 = Wn, (A.18)where the above equality holds by the definition of Wn from equation (2.3). There-fore, for state i = n, it is also optimal to wait out of queue and V ∗n =Wn.Step 2: n≥ j∗. Now, we want to show that the optimal action is to join for anyn ≥ j∗. To do so, we let Vn = Jn for all n ≥ j∗, and we then show that Vn satisfiesBellman’s equations (2.1). We start with n = j∗. For n = j∗, by Lemma A.3,J j∗ ≤ cλ +µ +λλ +µJ j∗+1+µλ +µ(J j∗+ c(1λ+µλ 2+ ...+µn−1λ n)).Then, letting Vn = Jn for all n≥ j∗, we have:Vj∗ ≤ cλ +µ +λλ +µVj∗+1+µλ +µ(Vj∗+ c(1λ+µλ 2+ ...+µn−1λ n)).We have already shown that, for i = 0, ..., j∗−1, it is optimal to wait out of queue.Thus, by Lemma A.4, equality (A.11) holds for any 1 ≤ i ≤ j∗− 1. Considering118equality (A.11) for i = j∗− 1, we have Vj∗−1 = Vj∗ + c(1λ+µλ 2+ ...+µn−1λ n).Hence, the above inequality is equivalent toVj∗ ≤ cλ +µ +λλ +µVj∗+1+µλ +µVj∗−1.Note that the expression in the right side of the above inequality is Wj∗ by thedefinition given in (2.3). Thus, the above inequality implies:Vj∗ ≤Wj∗ .This shows that our choice of Vj∗ , i.e., Vj∗ = J j∗ , satisfies Bellman’s equations (2.1).Now, for any n≥ j∗+1, we conduct a similar analysis. By part (b) of Lemma2.1, for n≥ j∗+1,Jn ≤ W ′n =cλ +µ+λλ +µJn+1+µλ +µJn−1, (A.19)where the equality above holds by the definition of W ′n. Then, letting Vn = Jn forall n≥ j∗, we have the following for any n≥ j∗+1:Jn =Vn ≤ cλ +µ +λλ +µVn+1+µλ +µVn−1.Since the right side of the above expression is Wn by definition, we obtain:Jn = Vn ≤ Wn.This implies, for n≥ j∗+1, Vn also satisfies Bellman’s equations (2.1).Thus, V ∗n = Jn is an optimal value function for n≥ j∗, which is achieved by thepolicy of joining for these states. This completes the proof.A.4 Proof of Proposition 2.1First, we prove that when α is relatively small (below some threshold), j∗ increasesin α . To prove this behavior, let h∗ = min{h ≥ 0 : αh ≥ µ,h ∈ Z}. Assume α is119sufficiently small (i.e., h∗ is relatively big) such that δ < f (h∗). Using the proper-ties of f ( j) and the definition of j∗ given in (2.7), we show that by increasing α ,j∗ increases in this case. To do so, recall the definition of f ( j) from (2.4):f ( j) =(( j∑i=0µ iλ i+1)c+1µ)((α+µ) j+1αµ j).For each j ≥ 1, f ( j) is strictly convex in α , and for j = 0, it is decreasing. Thiscan be shown as follows. Let k denote the first factor mentioned in the aboveexpression:k =(( j∑i=0µ iλ i+1)c+1µ).Then, the derivative of f ( j) with respect to α is equal to∂ f ( j)∂α= k(µ− j(α j−µ)(α+µ) jα2).Since k is strictly positive, f ( j) is monotonically increasing in α when α j > µ , andfor any α j < µ , it is monotonically decreasing. Thus, the definition of h∗ followsthat for any j ≥ h∗, f ( j) increases in α , and for any j < h∗, it decreases. Nowrecall the definition of j∗ given in (2.7):j∗ = min{ j ≥ 0 : f ( j)≥ δ , j ∈ Z}.When δ < f (h∗), the above definition implies that j∗ ≤ h∗. As mentioned above,by increasing α , for any j ≥ h∗, f ( j) increases and for any j < h∗, f ( j) decreases.This fact, along with the definition of j∗, follows that by increasing α , j∗ increases.Note that based on the definition of h∗, as α increases, the value of h∗ decreases.Thus, the mentioned behavior for j∗ holds until h∗ changes (i.e., α(h∗−1) hits µor equivalently h∗ decreases by one). By decreasing h∗, Lemma A.2 follows thatf (h∗) also decreases. If δ < f (h∗) still satisfies for the new value of h∗, similarreasoning applies and j∗ increases. Therefore, by increasing α , j∗ increases untilf (h∗) becomes equal or less than δ .120For f (h∗) ≤ δ , following a similar approach as discussed above, it can beshown that by increasing α , j∗ decreases. When f (h∗) ≤ δ , the definition of j∗implies j∗ ≥ h∗. Thus, considering the fact that for any j ≥ h∗, f ( j) increases in αand for any j < h∗, it decreases, the definition of j∗ follows that by increasing α , j∗decreases in this case. This behavior holds until the value of h∗ changes. If for thenew value of h∗, f (h∗)≤ δ satisfies, the same explanation applies and j∗ decreasesin α . Following the same reasoning mentioned above, by increasing α , h∗ andf (h∗) decreases, which means the inequality δ ≥ f (h∗) remains valid. Hence, byincreasing α , j∗ keeps decreasing afterwards. With high α , the prerequisite eventduration is short. This implies when α and consequently j∗ are relatively large, anyincrease in α makes the individual join the system with a shorter queue becausethe prerequisite will not take much time to complete.A.5 Proof of Theorem 2.2.Part a. For any state n < j∗, using the same approach as Theorem 2.1, we canprove it is optimal to wait. By simple algebra, it can be shown that j∗ ≤ n∗. Then,by Lemma 2.1, for any n < j∗ ≤ n∗, the relation of Jn and W ′n is the same for bothcases c≥ 1−ρ , and c < 1−ρ . Hence, for the states n < j∗, we can apply the samemethod as we used in the proof of Theorem 2.1 to show that it is optimal to waitout of queue in these states.Parts b and c. The proof has two main steps. Recall that we have shown inpart (a) that the optimal policy is to wait out of queue for any n satisfying n < j∗.We will first prove that, for any state n > n∗, it is optimal to wait out of queue aswell. Then, we will show the optimality of a three-region policy of wait-join-wait.Step 1. For any states n > n∗, by part (c) of Lemma 2.1, we have:Jn >W ′nSince W ′n ≥Wn, the above inequality implies that Jn > W ′n ≥Wn. Therefore, it isoptimal to wait out of queue for all n > n∗.Step 2. Now, for j∗ ≤ n ≤ n∗, either it is optimal to wait for all j∗ ≤ n ≤ n∗121or there exists a largest state u∗ for which it is strictly better to join (i.e., V ∗u∗ =Ju∗ < Wu∗). The former can not be optimal because in that case, the individualshould wait out of queue for all states which results in an infinite cost. Hence, letus assume the later. Now, we want to show that for any j∗ ≤ n < u∗, the optimalaction is to join. To do so, letting Vn = Jn for all j∗ ≤ n < u∗, we will show that Vnsatisfies Bellman’s equations (2.1). For n = j∗, we can apply the same method thatwe used in Theorem 2.1. For j∗ < n < u∗, by part (c) of Lemma 2.1, we have:Jn ≤W ′n.Then, by the definition of W ′n from (2.5),Jn ≤ cλ +µ +λλ +µJn+1+µλ +µJn−1.For any j∗ < n < u∗, consider Vn = Jn as a candidate for the optimal policy. LettingVn = Jn for all j∗ < n < u∗, we see that since at each period the state can go up ordown by just one unit, Vn satisfies Bellman’s equations (2.1) for all j∗ < n < u∗.Therefore, V ∗n = Jn is an optimal value function, which is achieved by the policy ofjoining for any state j∗ ≤ n < u∗. Hence, the optimality of a three-region policy ofwait-join-wait follows. The following section derives the exact value of u∗.A.5.1 Discussion on u∗ of Theorem 2.2.We first prove the following lemma to derive the expression of u∗.Lemma A.5 Consider an M/M/1 queue with arrival rate of λ , service rate of µ ,and n people in the system. The expected time until the number of people in thesystem becomes m, where m≤ n, is equal to (n−m)/(µ−λ ).Proof. Define En,m for any m ≤ n, as the expected time that it takes for thenumber of people in the system to become m when currently there are n people inthe system. Considering the properties of M/M/1 queueing systems, we have:En,n−1 = En−1,n−2 = · · ·= E2,1 = E1,0.122Therefore, we obtain:En,m = En,n−1+En−1,n−2+ · · ·+Em+2,m+1+Em+1,m = (n−m) ·E1,0.It can be shown that the expected time it takes to reach 0 from 1 is equal 1/(µ−λ ).This implies:En,m = (n−m) ·E1,0 = n−mµ−λ .Expression for u∗ of Theorem 2.2. Note that for all states n > u∗, it is optimal towait. Also, by the properties of M/M/1 queueing systems, if the number of peoplein the system, n, is greater than u∗, it will eventually reach u∗ with probability 1.Therefore, for any state n > u∗:V ∗n = c · (Expected time the system reaches state u∗ starting from n)+V ∗u∗ ,which Lemma A.5 implies:V ∗n = c ·(n−u∗µ−λ)+V ∗u∗ . (A.20)Using this result we find the value of u∗. For n = u∗, the optimal action is tojoin. Hence, V ∗u∗ = Ju∗ , and by Bellman’s equations (2.1), we have:Ju∗ ≤ Wu∗ = cλ +µ +λλ +µV ∗u∗+1+µλ +µV ∗u∗−1, (A.21)where the above equality holds by the definition of Wu∗ . We already showed thatif it is optimal to join for n = u∗, it would be also optimal to join for the state n =u∗− 1 ≥ j∗, implying V ∗u∗−1 = Ju∗−1. Considering this equality and also equation(A.20) for V ∗u∗+1, the above is equivalent toJu∗ ≤ cλ +µ +λλ +µ(cµ−λ +Vu∗)+µλ +µJu∗−1=cλ +µ+λλ +µ(cµ−λ + Ju∗)+µλ +µJu∗−1,123where the equality follows from Vu∗ = Ju∗ . By rearranging the above,(µλ +µ)(Ju∗− Ju∗−1) ≤ cµ(λ +µ)(µ−λ ) , (A.22)or, equivalentlyJu∗− Ju∗−1 ≤ cµ−λ .Then by the definition of Ju∗ , and Ju∗−1 from (2.2), we have:1µ− δαµ(1+αµ)−u∗≤ cµ−λ .Multiplying both sides of the above inequality by µ(µ−λ ), and rearranging terms,we obtain each of the following:µ−λ − cµ ≤ δα(µ−λ )(1+ αµ)−u∗ln(µ−λ − cµδα(µ−λ ))≤ −u∗ ln(1+αµ)u∗ ≤ln(δα(µ−λ )µ−λ−cµ)ln(1+ αµ) .Therefore, u∗ is the largest integer value satisfying the above inequality which isgreater than or equal to j∗. This implies:u∗ = max⌊ln(δα(µ−λ )µ−λ−cµ)ln(1+ αµ) ⌋, j∗ .A.6 Proof of Theorem 2.3.Before proving Theorem 2.3, we define a modified problem similar to Section2.2.2, in which the individual has two options: join the queue upon arrival (sameoption as before), or wait for now but join when the number of people in the system,124n, changes (as opposed to reconsidering the wait-versus-join decision at that time).The Bellman’s equations for this modified problem are given by V ′n =min{Jn,W ′n}.The expression for Jn and W ′n are similar to equations (2.2) and (2.5) respectively.The following lemma provides some results regarding the relationship between Jnand W ′n. We make a use of this lemma to prove Theorem 2.3.Lemma A.6a. If c≥ 1−ρ , then Jn ≤W ′n for all values of n≥ 0.b. If c < 1−ρ , then J0 ≤W ′0, and Jn >W ′n for any n > 0.Proof of Lemma A.6. Part a. The results presented in parts (a) and (b) ofLemma 2.1 are also true when δ = 0. Thus, the proof of this part for n = 0 is thesame as the proof of part (a) of Lemma 2.1. For n ≥ 1, the proof is similar to theone given for part (b) of Lemma 2.1.Part b. Part (a) of Lemma 2.1 implies J0 ≤W ′0. We now consider the casen > 0. When the penalty cost δ is equal to 0, Jn and W ′n are equal toJn =nµ,W ′n =cλ +µ+λλ +µ(n+1µ)+µλ +µ(n−1µ).By the above definitions of Jn and W ′n, the inequality W′n < Jn holds if and only ifcλ +µ+λλ +µ(n+1µ)+µλ +µ(n−1µ) 0, we have by part (b) of Lemma A.6:Jn >W ′n .Since W ′n is just one policy that waits in the current state, it follows that W′n ≥Wn,which is the value of waiting in the current state and then following the optimalpolicy. Thus, we obtain Jn ≥W ′n ≥Wn, implying that it is optimal to wait out ofqueue, as desired.A.7 Proof of Theorem 2.4.Suppose Jn ≥ Ln = l for all values of n. Then, we show that it is optimal to leavethe system in all states. To show this, first note that this assumption implies leavingthe system would be always better than joining the queue in each state n. Thus, wejust need to compare the two options of leaving and waiting for each state n. Thereare three possible options for the optimal policy: (i) to wait in all states, (ii) to waitin some states and to leave in some states, and (iii) to leave in all states. Option(i) cannot be optimal, since the expected cost incurred by the individual wouldbe infinity. Option (ii) cannot be optimal since a “leave” state will be reachedeventually as all states of an M/M/1 queue are positive recurrent; so, it wouldhave been better to leave the system immediately. Thus, (iii) would be optimalwhen Jn ≥ Ln = l for all values of n.Now we proceed by assuming that Jn < Ln for some values of n. If we take Jnas a function defined for real numbers given by (2.2) instead of on integers, it canbe easily shown that it is an unbounded convex function in n, which approachesinfinity as n tends to ±∞. Thus, there would be two intersections n1 and n2, with126n1 < n2, satisfying Jn1 = Jn2 = l.Then, for any integer n satisfying n < n1 or n > n2, we have Ln < Jn; leavingis less costly than joining the queue, implying that the optimal policy for state nwould be leaving the system or waiting out of the queue to make a decision atsome later time. For any n1 ≤ n≤ n2, we have Jn ≤ Ln; joining the queue is betterthan leaving the system, and thus it would be optimal to join the queue or to waitout of the queue in these states. If the waiting cost function Wn were also convex,characterizing the optimal policy would be easier. However, since Wn may not beconvex, our analysis is more difficult and we proceed by proving the followingstatements:Statement I: If there exists an integer n′2 > n2 such that it is optimal to leavethe system in state n′2, then it would be also optimal to leave for any staten > n′2.Statement II: If there exist integers n¯1 and n¯2, n1 ≤ n¯1 ≤ n¯2 ≤ n2, such that itis optimal to join the queue when in state n¯1 and n¯2, then the optimal policywould be to join the queue for any n satisfying n¯1 < n < n¯2.Statement III: If there exists an integer n′1 satisfying n′1 < n1 such that it isoptimal to leave the system in state n′1, then it would be also optimal to leavefor any state n < n′1.These three statements all together imply the optimality of a five-region policy.(Note that if either one of the intersections n1 and n2 is negative, some of thefive regions mentioned in the statement of the theorem may not exist. For ease ofexposition, we proceed by assuming both n1 and n2 are positive.)Proof of Statement I:Recall that for state n′2, it is optimal to leave the system (i.e., V∗n′2= Ln′2 = l). Now,for any n > n′2, consider Vn = Ln = l as a candidate for the optimal policy. Sincen > n′2 > n2, we have:Vn = l ≤ Jn. (A.23)127We also have:Vn = l ≤ cλ +µ + l =cλ +µ+λλ +µl+µλ +µl. (A.24)Since we let Vn = Ln = l for all n > n′2, and the state can go up or down by just 1unit for any n > n′2, the above inequality implies:Vn = l ≤ cλ +µ +λλ +µVn+1+µλ +µVn−1 = Wn, (A.25)where the equality above holds by the definition of Wn from (2.3). Now (A.23) andthe above inequality together imply that for any n > n′2, Vn = Ln satisfies Bellman’sequations (2.8). Thus, V ∗n = Ln is an optimal value function for any n > n′2, whichresults in the optimality of leaving the system at these states. (Intuitively speaking,we expect that for the states higher than n′2, the optimal policy is leaving becauseotherwise we should wait in those states to come back to n′2 or some other state andthen eventually leave the system.)Proof of Statement II:Suppose there exist two states n¯1 and n¯2 (n1≤ n¯1≤ n¯2≤ n2), for which it is optimalto join the queue (i.e., V ∗n¯1 = Jn¯1 and V∗n¯2 = Jn¯2). We want to show that for all nbetween them, it is also optimal to join. It is possible that n¯1 = n¯2, or that n¯2 =n¯1 +1, in which case there is no n strictly between n¯1 and n¯2. These cases satisfystatement II trivially, so without loss of generality assume n¯2 > n¯1 + 1. First, weprove statement II for the case that c ≥ 1−ρ . By part (b) of Lemma 2.1, for alln¯1 < n < n¯2, we have:Jn ≤ W ′n =cλ +µ+λλ +µJn+1+µλ +µJn−1, (A.26)where the above equality holds by the definition of W ′n from equation (2.5). Now,for any n¯1 < n < n¯2, we consider Vn = Jn as a candidate for the optimal policy.Since at each state change, the state can go up or down by just one unit, for anyn¯1 < n < n¯2, the above inequality implies:Vn ≤ cλ +µ +λλ +µVn+1+µλ +µVn−1 = Wn, (A.27)128where the last equality follows from the definition of Wn from (2.3). Recall thatn1 and n2 are the intersections of Jn and Ln = l. Since Jn is convex, Jn < l for anyn1 < n < n2. Therefore, as n1 ≤ n¯1 < n < n¯2 ≤ n2, for any n¯1 < n < n¯2, we have:Vn = Jn ≤ l = Ln. (A.28)From the above two inequalities, it follows that for any n¯1 < n< n¯2, Vn = Jn satisfiesBellman’s equations (2.8). Thus, V ∗n = Jn is an optimal value function for anyn¯1 < n < n¯2, which is achieved by the policy of joining the queue for these states.Now, consider the case that c < 1−ρ . In this case, by part (c) of Lemma 2.1and recalling the definition of n∗ from (2.6), for any n > n∗ we have:Jn > W ′n .By the definition of W ′n, the inequality W′n ≥Wn holds. It follows that Jn >W ′n ≥Wn,which implies waiting out of the queue is better than joining the queue for any staten> n∗. In addition, as discussed above, for any state n1 ≤ n≤ n2, joining the queueis less costly than leaving the system. Hence, if n1 ≤ n∗ < n2, then for any state nsatisfying n∗ < n≤ n2, it is optimal to wait out of the queue.Thus, if n1 ≤ n∗ < n2, the states for which joining the queue is optimal can bejust between n1, and n∗. So, n¯1 and n¯2 are between n1 and n∗ in this case. However,if n∗ ≥ n2, the states that joining is optimal for them, can be between n1 and n2 (i.e.n1 ≤ n¯1, n¯2 ≤ n2). In both of these cases, by part (c) of Lemma 2.1, for any staten¯1 < n < n¯2, we have:Jn ≤ W ′n.Hence, using the same approach that we used above for the case with c ≥ 1−ρ ,we can prove that if it is optimal to join for n¯1 and n¯2, for any state n¯1 < n < n¯2,the optimal policy would be to join the queue as well.Note that when c < 1−ρ , if n∗ < n1, joining the queue is never optimal, whichimplies it would be always optimal to leave the system in this case.Proof of Statement III:Suppose that there exists an n′1 < n1 for which it is optimal to leave the system129(i.e., V ∗n′1 = Ln′1 = l). We want to prove that for any state n < n′1, it is also optimalto leave. To do so, for any state n < n′1, we consider Vn = Ln = l as a candidate forthe optimal policy, and we will use similar reasoning to the proof of Statement I.Since n < n′1 < n1, we have:Vn = l ≤ Jn. (A.29)Following the same approach as the one used for Statement I, for any 1≤ n < n′1:Vn = l ≤ cλ +µ +λλ +µVn+1+µλ +µVn−1 = Wn,where the above equality holds by the definition of Wn from (2.3). Letting Vn =Ln = l for all n < n′1, and considering the fact that at each state change, the statecan go up or down by just one unit, for n = 0, we also have:V0 = l ≤ cλ + l =cλ+V1 = W0.Hence, the above three inequalities imply that for any n < n′1, Vn = Ln satisfiesBellman’s equations (2.8). Thus, V ∗n = Ln is an optimal value function for anyn < n′1, which is achieved by the policy of leaving the system for these states.(Intuitively, we expect that for the states smaller than n′1, optimal policy is to leavebecause if we wait in these states, we will reach n′1 with probability 1 and thenleave, which does not make sense.)This proves that Statements I, II, and III hold, which completes the proof of thetheorem.130Appendix BAppendix of Chapter 3B.1 Proof of Proposition 3.1We make a use of Lemma A.5 in proving Proposition 3.1. The expression of thislemma is as follows.Lemma A.5 Consider an M/M/1 queue with arrival rate of λ , service rate ofµ , and n people in the system. The expected time until the number of people in thesystem becomes m, where m≤ n, is equal to (n−m)/(µ−λ ).Proof of Proposition 3.1. Note that Vn˜ is the lump sum expected cost-to-goof an absorbing state n˜. As noted in Section 3.2.1, if state n˜ is reached, then it isoptimal to follow the policy of Theorem 2.3.Part a. Following part (a) of Theorem 2.3, if c≥ 1−ρ , it is optimal to join thequeue right away, implying Vn˜ = n/µ since we have assumed the cost of waitingin the queue is equal to 1 per unit time and n/µ is the expected time for n servicecompletions.Part b. When c < 1−ρ , in the absence of the prerequisite event, part (b) ofTheorem 2.3 implies that the optimal policy is to wait out of the queue until thesystem becomes empty, and then join the queue and get the service immediately.In this case, we obtain Vn˜ = c(n/(µ−λ )), based on Lemma A.5.131B.2 Proof of Theorem 3.1Before proving Theorem 3.1, we define a modified problem in which the individualhas two options: join the queue upon arrival (same option as before), or wait fornow but join when the number of people in the system, n, changes if the indicatorvariable A is still equal to 0 (as opposed to reconsidering the wait-versus-join deci-sion at that time). Similar to the original problem of Theorem 3.1, if the indicatorvariable A changes to 1 (i.e., the prerequisite event is satisfied) while the individualis waiting out of the queue, the system moves to the absorbing states n˜ and theindividual follows the policy of Theorem 2.3. The Bellman’s equations for thismodified problem are given by V ′n = min{Jn,W ′n}. The expression for Jn is similarto equation (2.2). Also, recalling equation (3.2), we have:W ′n =cα+λ+αα+λV0˜+λα+λJ1 if n = 0cα+λ +µ+αα+λ +µVn˜+λα+λ +µJn+1 if n≥ 1.+µα+λ +µJn−1(B.1)The following lemma provides some results regarding the relationship betweenJn and W ′n. We will use this lemma to prove Theorem 3.1.Lemma B.1a. If c≥ 1−ρ , then Jn−W ′n is a decreasing function in n for n≥ 1.b. If c < 1−ρ , then Jn−W ′n is always positive for n≥ 1.Proof of Lemma. B.1. Part a. Following part (a) of Proposition 3.1, whenc≥ 1−ρ , the expression of W ′n given in (B.1) can be simplified to:W ′n =cα+λ+λα+λJ1, if n = 0cα+λ +µ+αα+λ +µ(nµ)+λα+λ +µJn+1 if n≥ 1.+µα+λ +µJn−1(B.2)132Considering the above expression for n≥ 1, we have:Jn−W ′n = Jn−cα+λ +µ− αα+λ +µ(nµ)−λα+λ +µJn+1− µα+λ +µ Jn−1.Then, by the definition of Jn−1 and Jn+1 from (2.2),Jn−W ′n =nµ+δ(1+αµ)−n− cα+λ +µ− αα+λ +µ(nµ)− λα+λ +µ(n+1µ+δ(1+αµ)−n−1)− µα+λ +µ(n−1µ+δ(1+αµ)−n+1).Taking the derivative with respect to (continuous) n, we have:∂ (Jn−W ′n)∂n=−δ(1+αµ)−nln(1+αµ)(−1+ λα+λ +µ(1+αµ)−1+µα+λ +µ(1+αµ)),which is equivalent to:∂ (Jn−W ′n)∂n= −δ(1+αµ)−nln(1+αµ)(−(α+λ +µ)(α+µ)+λµ+(α+µ)2(α+λ +µ)(α+µ))= −δ(1+αµ)−nln(1+αµ)(λα(α+λ +µ)(α+µ))< 0.Part b. By part (b) of Proposition 3.1, when c < 1−ρ , the expression of W ′n133given in (B.1) is equivalent to:W ′n =cα+λ+λα+λJ1, if n = 0cα+λ +µ+αα+λ +µ(nµ−λ)c if n≥ 1.+λα+λ +µJn+1+µα+λ +µJn−1By the definition of Jn from (2.2) and W ′n given above, for n≥ 1, we have:Jn−W ′n =nµ+δ(1+αµ)−n− cα+λ +µ− αα+λ +µ(nµ−λ )c− λα+λ +µ(n+1µ+δ(1+αµ)−n−1)− µα+λ +µ(n−1µ+δ(1+αµ)−n+1).By simple algebra,Jn−W ′n =(αα+λ +µ)(nµ)1− c( µ−λµ )+ µ−λ − cµµ(α+λ +µ)+δ(1+αµ)−n( λα(α+λ +µ)(α+µ)),which is always positive for all n as c < 1−ρ = 1−λ/µ .Now, we are ready to prove Theorem 3.1.Proof of Theorem 3.1. Part a. We have c≥ 1−ρ and A is equal to 0. In thiscase, either it is optimal to wait for all n, or there exists a smallest state ω∗, forwhich it is optimal to join. Without loss of generality, we proceed by assuming thelatter. By Assumption 3.1, ω∗ ≥ 1. Now, we want to show that for all n≥ ω∗, it isalso optimal to join. By Bellman’s equations (3.1), we know that if it is optimal tojoin in state ω∗, we have:Wω∗ ≥ Jω∗ .134By equation (3.2) and part (a) of Proposition 3.1, above inequality is equivalent to:cα+λ +µ+αα+λ +µ(nµ)+λα+λ +µV ∗ω∗+1+µα+λ +µV ∗ω∗−1 ≥ Jω∗ .Because Jn ≥V ∗n for all n, it follows thatcα+λ +µ+αα+λ +µ(nµ)+λα+λ +µJω∗+1+µα+λ +µJω∗−1 ≥ Jω∗ ,which is equivalent to W ′ω∗ ≥ Jω∗ by the definition of W ′n from equation (B.2). Thus,0≥ Jω∗−W ′ω∗ .Since Jn−W ′n is a decreasing function (by part (a) of Lemma B.1), for all n > ω∗,we have 0≥ Jn−W ′n, or equivalentlyW ′n ≥ Jn.By the definition of W ′n from (B.2),cα+λ +µ+αα+λ +µ(nµ)+λα+λ +µJn+1+µα+λ +µJn−1 ≥ Jn.Now, for any n>ω∗, consider Vn = Jn as a candidate for the optimal policy. SettingVn = Jn for all n >ω∗, we see that since at each period, the state can go up or downby just one unit, Vn satisfies Bellman’s equations (3.1) for n > ω∗. Therefore,V ∗n = Jn is an optimal value function, which is achieved by the policy of joining forn > ω∗ given that the indicator variable A is still 0.Part b. We have c < 1−ρ and A is equal to 0. For n = 0, by Assumption 3.1,the individual will not join the queue if the indicator variable A is still 0 (i.e., theprerequisite condition has not yet completed). For any n≥ 1, by part (b) of LemmaB.1,Jn−W ′n ≥ 0,135or equivalently,Jn ≥W ′n.By the definition of W ′n, the inequality W′n ≥Wn holds. It follows that Jn≥W ′n ≥Wn,which implies the optimal policy is to wait out of queue for all n while the indicatorvariable A is still 0.B.3 Sufficient Condition on Model Parameters thatGuarantees Assumption 3.1 HoldsHere, we would like to provide a sufficient condition on model parameters thatguarantees Assumption 3.1, J0 ≥W0, holds. By the definition of W0, and W ′0, wehave W ′0 ≥W0. Hence, it is enough to find a sufficient condition for J0 ≥W ′0 sinceit would ensure J0 ≥W ′0 ≥W0, implying that it is optimal to wait out of queue forn = 0.From the definition of J0 and W ′0 in (2.2) and (B.1), J0 ≥W ′0 is equivalent to:δ ≥ cα+λ+λα+λJ1.By the definition of J1 from (2.2), the above can be written asδ ≥ cα+λ+λα+λ(1µ+δ(1+αµ)−1).Then, by rearranging terms, we have the sufficient conditionδ ≥ (cµ+λ )(α+µ)αµ(α+λ +µ).136B.4 Equilibrium Analysis (Optimal Joining Strategy fora Player of Level-3 or Higher)In this section, we discuss the details of the Bellman’s equations needed to find theoptimal policy of a level-3 player of Section 3.3.2. This individual chooses the bestresponse to level-2 players who follow the optimal policy derived in Section 3.3.1.First, we define some notations. Recall that m∗n(2) denotes the joining thresholdfor the number of people waiting out of queue, m, corresponding to each value ofn such that for any state (n,m) with m ≥ m∗n(2), it is optimal to join for level-2players. In the following, for ease of presentation, (2) is suppressed in the notationof m∗n(2); that is, we present it as m∗n. We also define n˜∗ as a new threshold of thequeue size such that for any state (n,m) with n≥ n˜∗, the optimal policy is to alwaysjoin the queue even if the requirements are not satisfied yet. In other words, n˜∗ isthe first value of n for which m∗n hits zero. Also, n¯∗ ≤ n˜∗ denotes the smallest n forwhich there are some m with the optimal policy of joining for (n¯∗,m), i.e., there isno joining region for any n < n¯∗1. To find the best response of a level-3 individualto level-2 players, Bellman’s equations are as follows.For n = 0, we have V0,m = min{J0,W0,m} such that the definition of W0,m is thesame as the one given for equation (3.3). For 1≤ n≤ n¯∗−1, Vn,m =min{Jn,Wn,m},where Wn,m is defined in the same way as (3.4). For n¯∗ ≤ n≤ n˜∗−1 with m < m∗n−1, Bellman’s equation are the same as equation (3.4) as well. In all the mentionedcases so far, Figure 3.2 suggests that the other arrivals wait out of queue, whichis similar to the cases that equations (3.3) and (3.4) are written for. However, forn¯∗ ≤ n ≤ n˜∗− 1 with m ≥ m∗n− 1, following the policy presented in Figure 3.2,other individuals will join the queue. Thus, by the same reasoning mentioned forequation (3.5) in Section 3.3.1, we have:Vn,m = min{(1m+1)Jn+(1− 1m+1)Vn+1,m−1,Vn+1,m−1}.Note that when the individual under consideration is waiting out of queue,others can see her, so they consider her for making their joining decisions. The1In Figure 3.2, n¯∗ and n˜∗ are equal to 13 and 20 respectively. Also, as an example of m∗n, we canrefer to m∗15 = 12.137above explanations are for the cases that m∗n > 1 when n¯∗ ≤ n ≤ n˜∗−1, though inthe absence of this condition, they can be easily modified.Now, consider n≥ n˜∗:Vn,m =min{Jn,cλ +µ+α+λλ +µ+αVn+1,0+µλ +µ+αVn−1,0 if m = 0+αλ +µ+α(cnµ)}min{(1m+1)Jn+(1− 1m+1)Vn+1,m−1,Vn+1,m−1}if m≥ 1.The mentioned Bellman’s equations are for the cases that n¯∗ ≥ 2. The equa-tions can be simplified to other cases as well, but for brevity, we only present theequations of this case here.Since the structure of the optimal policy is the same as Observation 3.1 in alliterations, we can also use the above Bellman’s equations at later iterations foranalyzing the optimal policy of a level-k player with k > 3.B.5 Social OptimalityB.5.1 Proof of Proposition 3.2Part a. Suppose that a socially optimal policy, called A, is such that a system underthis policy possibly moves a customer without the requirement into the queue whenn≥ 1. We consider an alternative policy, which we call B, such that a system underPolicy B is derived from the system under Policy A in a manner described below.Let nB be the number of customers in the queue plus in service, under Policy B.• Case nB ≥ 1: The system under Policy B never moves any customer into thequeue.• Case nB = 0: When the system under Policy A is about to serve a customer,the system under Policy B moves the same customer into the queue and be-gins to serve the customer.In other words, all customers wait out of queue under Policy B, and customersmove from outside the queue to the server one at a time. The choice of the customer138to be served is determined by Policy A. Note that Policy B satisfies the statementin part (a) of Proposition 3.2.For a fixed sample path of customer arrivals, service times and prerequisitetimings, each customer arrives at the same time whether it is managed by PolicyA or B, and is also served at the same time. Thus, when a customer reaches theserver, this customer’s prerequisite is met under Policy A if and only if it is metunder Policy B, implying that the penalty cost incurred by a customer is the sameunder the two policies. The only difference between these two systems is where acustomer waits – either entirely outside the queue or a combination of inside andoutside of the queue. Yet, since the waiting costs in the queue and out of the queueare the same, the waiting cost incurred by each customer is invariant across the twosystems. Then, both systems incur the same cost, and by the social optimality ofPolicy A, it follows that Policy B is also socially optimal. This proves (a).Part b. For proving this part, we restrict our attention to the socially optimalpolicies satisfying the statement in part (a) of Proposition 3.2. Thus, consider asocially optimal policy, called C, which satisfies the condition in part (a). In otherwords, if nC denotes the number of customers in the queue plus in service under thispolicy, the system under Policy C does not let any customer without the prerequisitejoin the queue when nC ≥ 1. Furthermore, suppose that the system sometimes letsa customer with a satisfied prerequisite stay outside the queue. We will constructanother policy, which we call D, based on Policy C, such that Policy D satisfies thestatement in part (b) of Proposition 3.2.We now describe the construction of Policy D. We need to introduce a few def-initions. For any time t, let ntC and ntD denote the number of customers in the queueplus in service, under Policy C and Policy D, respectively, before any decision ismade in time t. Let mtC and mtD be the number of customers outside the queue underPolicies C and D. Furthermore, let ν tC denote the cumulative number of customerswho have left the server under Policy C by time t, and similarly define ν tD. Forour analysis, we fix realizations of customer arrivals, i.e., each customer arrivesat both systems at the same time. We also fix that each customer’s prerequisite issatisfied at the same time across the two systems. Furthermore, we fix a particularrealization of service times based on the sequence in which the server processescustomers instead of the customer arrival sequence. In other words, for example,139the service time for the tenth customer to be served in System C is the same asthe service time for the tenth customer to be served in System D, regardless of theorder of arrival to the system. Then, we have:ν tC +ntC +mtC = νtD+ntD+mtD (B.3)for any t. The system under Policy D operates as follows in time t:• Case ntD≥ 1: If any customer’s prerequisite is satisfied, the system moves thecustomer into the queue. The system does not move any customer withoutthe prerequisite.• Case ntD = 0: If a customer’s prerequisite is satisfied, the system moves thecustomer into the queue (and starts serving right away). In addition, if thesystem under Policy C starts serving a new customer in time t and ν tC = νtDholds, then the system under D moves a new customer into the queue (andconsequently into the service). The new customer that the system under Dselects here is either the same customer that system C starts serving at time tor one of ν tC customers that system C has already served.Clearly, the system under D satisfies the statement in part (b) as well as the state-ment in part (a).It can be shown that the above construction ensures ν tC ≤ ν tD for any t. Thus,ntC +mtC ≥ ntD+mtD holds, implying that the total waiting cost incurred by the sys-tem under D does not exceed the total waiting cost under C. Now, we compare thepenalty cost. Note that system D incurs a penalty cost associated with the incom-pleted prerequisite event when system D moves such a customer into the queuewhen ntD = 0. From the choice of this customer described above, this customer insystem C has already reached the server or reaches the server at time t. Thus, anycustomer that incurs a penalty cost under Policy D also incurs the same penaltycost under Policy C. Thus, the total penalty cost associated with system D is atmost the total penalty cost incurred in system C. The total cost associated with Dis at most the total cost associated with C. By the social optimality of Policy C, weobtain that D is a socially optimal policy.140B.5.2 Comparing the Equilibrium and Socially Optimal PoliciesThe following graphs represent how much on average, an individual waits in queueand out of queue for different settings.Figure B.1: Average wait time in queue per customer for different settings(λ = 3, µ = 4, in-queue and out-of-queue waiting costs= 1, number of customersconsidered for simulation= 5000)(95% confidence interval of the reported costs is not more than ±0.02)Figure B.2: Average wait time out of queue per customer for different settings(λ = 3, µ = 4, in-queue and out-of-queue waiting costs= 1, number of customersconsidered for simulation= 5000)(95% confidence interval of the reported costs is not more than ±0.06)141Appendix CAppendix of Chapter 4C.1 Proof of Proposition 4.1The feasible continuation utility set of the agent, U , for the terminal problem ofSection 4.2.3 is a fixed point of operator B introduced by Definition 4.1. Thus,to prove that for the terminal problem, U is the entire R2 space cut by the reser-vation utilities u¯L and u¯H , assuming Ut+1 = U , we show that Ut = B(Ut+1) isalso the entire R2 space cut by the reservation utilities u¯L and u¯H from below andleft. Recall that, for the terminal problem, a utility vector ut ∈ B(Ut+1) if thereexists premium values Rt(θt) ∈ R and future utilities uatt+1(θt) ∈Ut+1;at ∈ {0,1},such that the constraints (4.13) to (4.15) of the terminal model of period t for theassociated type θt hold.To characterize Ut , first we just consider constraints (4.13) and (4.14) of theterminal problem. Afterwards, we apply the IR constraints and cut the obtained setfor the feasible continuation utility vectors of the agent by the reservation utilitiesof each type. It can be shown that the lower bound of Ut can be obtained based onthe formulation of period t of the terminal problem for the agent with type H (i.e.,high risk driver), which is discussed next. However, the upper bound of Ut can becharacterized through the model of period t for the terminal problem of the agentwith type L (i.e., low risk driver).According to the terminal model of Section 4.2.3, the constraints for the H typeproblem of period t are as follows.142ut(H) =−Rt(H)+δ∑atPatH[pH(0) ·uatt+1(H)](C.1)ut(L)≥−Rt(H)+δ∑atPatL[pL(0) ·uatt+1(H)]. (C.2)Equations C.1 and C.2 imply:ut(L)−ut(H)≥ δ∑at[(PatL · pLL(0)−PatH · pHL(0))uatt+1(H,L)+(PatL · pLH(0)−PatH · pHH(0))uatt+1(H,H)].Considering the fact that a high risk driver cannot improve his type without puttingforth any effort (i.e., pHL(0)= 0 and consequently pHH(0)= 1), the above is equiv-alent to:ut(L)−ut(H)≥ δ∑at[PatL · pLL(0) ·uatt+1(H,L)+(PatL · pLH(0)−PatH)uatt+1(H,H)].(C.3)To compute the least possible lower bound of Ut , it is sufficient to find theminimum value of y such that ut(L)−ut(H)≥ y. Thus, according to the definitionof operator B, considering Eq. (C.3) and assuming y= ut(L)−ut(H), the minimumvalue of y can be obtained by solving the following optimization problem.min{y,uatt+1(H,L),uatt+1(H,H);at∈{0,1}}ys.t. y≥ δ∑at[PatL · pLL(0) ·uatt+1(H,L)+(PatL · pLH(0)−PatH)uatt+1(H,H)],(C.4)uatt+1(H) ∈Ut+1 ∀at ∈ {0,1}.We solve the above problem geometrically. To do so, we find the minimum pos-sible value for the RHS of constraint (C.4) such that, for all at ∈ {0,1}, uatt+1(H) ∈Ut+1. The decreasing direction of the RHS of constraint (C.4) with respect touatt+1(H,H) and uatt+1(H,L) for each at is143(PatH −PatL pLH(0)−PatL pLL(0)).For at = 1 (i.e., having an accident), which is an informative outcome for a highrisk driver, we have PatH > PatL as the probability of having an accident for a hightype agent is larger than this probability for a low risk one. This implies the abovedecreasing direction for at = 1 is in quadrant (IV) of Figure C.1 that depicts Ut+1.Increasing u(1)t+1(H,H) and u(1)t+1(H,L) in the mentioned direction in the feasibleregion of Ut+1 (shown as direction (1) in Figure C.1), along with choosing propervalues for the utilities of the other at can push the RHS of constraint (C.4) to −∞.This results in the unboundedness of Ut from below.Figure C.1: Decreasing and increasing directions for uatt+1(.,H) and uatt+1(.,L) ofthe terminal problemNow, we discuss the upper bound ofUt through the L type model of period t forthe terminal problem. Based on the terminal model of Section 4.2.3, the constraintsof the L type problem are as follows.144ut(L) =−Rt(L)+δ∑atPatL[pL(0) ·uatt+1(L)](C.5)ut(H)≥−Rt(L)+δ∑atPatH[pH(0) ·uatt+1(L)]. (C.6)By Eqs. C.5 and C.6, we have:ut(H)−ut(L)≥ δ∑at[(PatH · pHL(0)−PatL · pLL(0))uatt+1(L,L)+(PatH · pHH(0)−PatL · pLH(0))uatt+1(L,H)].Following the same approach as the one used to find the upper bound of Ut , andconsidering the point that a high risk driver cannot improve his type without puttingin any effort (i.e., pHL(0) = 0 and consequently pHH(0) = 1), the above is equiva-lent to:ut(L)−ut(H)≤ δ∑at[PatL · pLL(0) ·uatt+1(L,L)+(PatL · pLH(0)−PatH)uatt+1(L,H)].(C.7)To obtain the largest possible upper bound of UT , we can find the maximumvalue of y′ such that uT (L) ≤ uT (H) + y′. Thus, considering Eq. (C.7) and as-suming y′ = uT (L)−uT (H), the maximum value of y′ can be found by solving thefollowing optimization problem:max{y′,uatt+1(L,L),uatt+1(L,H);at∈{0,1}}y′s.t. y′ ≤ δ∑at[PatL · pLL(0) ·uatt+1(L,L)+(PatL · pLH(0)−PatH)uatt+1(L,H)], (C.8)uatt+1(L) ∈Ut+1 ∀at ∈ {0,1}.To solve the above problem geometrically, we compute the maximum possiblevalue for the RHS of constraint (C.8) such that for all at ∈ {0,1}, uatt+1(H) ∈Ut+1.The increasing direction of the RHS of this constraint with respect to uatt+1(L,H)and uatt+1(L,L) for each at is equal to:145(PatL pLH(0)−PatHPatL pLL(0)).Since PatL pLL(0) is positive for all at ∈ {0,1}, the above increasing directionis either in quadrant (I) or (II) of Figure (C.1). Thus, increasing uatt+1(L,H) anduatt+1(L,L) for any at in this direction (shown as direction (2) in Figure C.1) in thefeasible region of Ut+1, along with choosing proper values for the utilities of theother at can make the RHS of constraint (C.8) unbounded, which implies thatUt+1is unbounded from above as well.The above analysis show that, in the absence of the reservation utilities, Ut isthe entire R2 space. However, since based on IR constraints (4.15), the continuationutilities delivered to the agent should be at least equal to his reservation utilities,Ut is the entire R2 space cut by the reservation utilities u¯L and u¯H from below andleft, which implies Ut = B(Ut+1). Therefore, the proof is completed, and for theterminal problem, U is the entire R2 space cut by the reservation utilities u¯L andu¯H , as this set is a fixed point of operator B.C.2 Proof of Proposition 4.2The continuation social welfare frontier of the terminal problem of Section 4.2.3,Φ∗(u);u ∈ U , is a functional fixed point of operator Γ ∗ introduced by Definition4.2. It can be shown that Φ∗(·) = (CL,CH)T with the following constant values forCL and CH is a fixed point of operator Γ ∗.CL = − 1(1−δ · pLL(0))PaL ·CaL− δ · pLH(0)(1−δ)(1−δ · pLL(0))PaH ·CaH (C.9)CH = − 1(1−δ )PaH ·CaH . (C.10)To prove this, we show that (CL,CH)T = Γ ∗(CL,CH)T . Based on the definition ofoperator Γ ∗ and according to the objective function of the terminal problem in Eq.146(4.12), we have:Γ ∗(CL,CH)T = (Φ∗(L, ·),Φ∗(H, ·))T .Objective function (4.12) of the terminal problem of an agent with type L implies:Φ∗(L, ·) =∑atPatL[−CatL +δ(pLL(0) ·CL+ pLH(0) ·CH)].Considering P1L = PaL and consequently P0L = 1−PaL , the above is equivalent to:Φ∗(L, ·) =−Pa ·CaL+δ · pLL(0) ·CL+δ · pLH(0) ·CH . (C.11)Substituting the values of CL and CH from Eqs. (C.9) and (C.10) into Eq. (C.11),we get:Φ∗(L, ·) =−Pa ·CaL+δ · pLL(0)(− 1(1−δ · pLL(0))PaL ·CaL−δ · pLH(0)(1−δ)(1−δ · pLL(0))PaH ·CaH)+δ · pLH(0)(− 1(1−δ )PaH ·CaH),which is equivalent to:Φ∗(L, ·) =−(1−δ · pLL(0))−δ · pLL(0)(1−δ · pLL(0)) PaL ·CaL+−δ 2 · pLH(0) · pLL(0)−δ · pLH(0)(1−δ · pLL(0))(1−δ)(1−δ · pLL(0)) PaH ·CaH=−1(1−δ · pLL(0))PaL ·CaL+ −δ · pLH(0)(1−δ)(1−δ · pLL(0))PaH ·CaH= CL. (C.12)Similarly, following the objective function (4.12) of the terminal problem of anagent with type H, we obtain:147Φ∗(H, ·) =∑atPatH[−CatH +δ(pHL(0) ·CL+ pHH(0) ·CH)].Considering the fact that a high risk driver cannot improve his type without puttingany effort (i.e., pHL(0) = 0 and consequently pHH(0) = 1), the above can be sim-plified to:Φ∗(H, ·) = −Pa ·CaH +δ ·CH= −Pa ·CaH +δ(− 1(1−δ )PaH ·CaH)=−(1−δ )−δ(1−δ ) PaH ·CaH= CH , (C.13)where the second and last equalities follow from Eq. (C.10). Equations (C.12) and(C.13) imply Γ ∗(CL,CH)T = (CL,CH)T , and thus the proof is completed.C.3 Proof of Proposition 4.3We analyze each subproblem separately. Similar to the proof of Proposition 4.1,it can be shown that the lower bound of UT for each of the subproblems can beobtained based on the formulation of period T presented in Section 4.2.2 for theH type agent (i.e., high risk driver), which is discussed more below. However, theupper bound ofUT can be characterized through the problem of period T for the Ltype agent (i.e., low risk driver). We start our analysis with subproblem (0,0).• Subproblem (0,0). For this subproblem, eT (L) = 0 and eT (H) = 0. Thus,based on the recursive formulation of Section 4.2.2, the constraints for the H148type problem of period T are as follows. Note that CeH(0) =CeL(0) = 0.uT (H) =−RT (H)+δ∑OTqOTH[pH(0) ·uOTT+1(H)](C.14)δ∑OTqOTH[pH(0) ·uOTT+1(H)]≥−CeH(1)+δ∑OTqOTH[pH(1) ·uOTT+1(H)](C.15)uT (L)≥−RT (H)+δ∑OTqOTL[pL(0) ·uOTT+1(H)](C.16)uT (L)≥−RT (H)+(−CeL(1)+δ∑OTqOTL[pL(1) ·uOTT+1(H)])(C.17)Considering the fact that a high risk driver cannot improve his type with-out putting any effort (i.e., pHL(0) = 0 and consequently pHH(0) = 1), Eqs.(C.14) and (C.16) imply:uT (L)−uT (H)≥ δ∑OT[qOTL pLL(0) ·uOTT+1(H,L)+(qOTL pLH(0)−qOTH ) ·uOTT+1(H,H)]. (C.18)Similarly, by simple algebra on Eqs. (C.14) and (C.17), and consideringthe fact that by exerting effort, a low risk driver remains a good driver (i.e.,pLL(1) = 1 and consequently pLH(1) = 0), we have:uT (L)−uT (H)≥−CeL(1)+δ∑OT[qOTL ·uOTT+1(H,L)−qOTH ·uOTT+1(H,H)].(C.19)Also, constraint (C.15) can be written as:CeH(1)≥ δ∑OT[qOTH pHL(1) ·uOTT+1(H,L)−qOTH pHL(1) ·uOTT+1(H,H)].(C.20)To compute the least possible lower bound of UT , it is sufficient to find theminimum value of y such that uT (L) ≥ uT (H)+ y. Thus, considering Eqs.149(C.18) through (C.20) and assuming y= uT (L)−uT (H), the minimum valueof y can be obtained by solving the following problem:min{y,uOTT+1(H,L),uOTT+1(H,H);OT∈O}ys.t. (C.18) to (C.20),uOTT+1(H) ∈UT+1 ∀OT ∈ OWe solve the above optimization problem geometrically. To do so, we findthe minimum possible values for the RHS of constraints (C.18) and (C.19)such that for all OT ∈ O , uOTT+1(H) ∈UT+1, and also constraint (C.20) satis-fies. The maximum of these two right hand sides gives us the optimal valueof y. For the RHS of constraint (C.18), the decreasing direction with respectto uOTT+1(H,H) and uOTT+1(H,L) for each OT is(qOTH −qOTL pLH(0)−qOTL pLL(0)).Since for OT = (1,L) (i.e., having an accident and low score), which is themost informative outcome for a high risk driver, we haveqOTHqOTL> 1 basedon the analysis of Section 4.2.2, the decreasing direction is in quadrant(IV). This is shown as direction (1) in Figure C.2. Increasing u(1,L)T+1 (H,H)and u(1,L)T+1 (H,L) in this direction in the feasible region, along with choosingproper values for the utilities of other OT s based on constraint (C.20), canmake the RHS of constraint (C.18) unbounded. For the RHS of constraint(C.19), the decreasing direction with respect to uOTT+1(H,H) and uOTT+1(H,L)for each OT is (qOTH−qOTL).Similar to direction (1) of Figure C.2, the above direction also is in quadrant(IV). Thus, following the same reasoning as the one discussed for constraint(C.18), this direction can push the RHS of constraint (C.19) to −∞ as well.150This results in the unboundedness of UT from below for subproblem (0,0).Figure C.2: Decreasing and increasing directions for uOTT+1(.,H) and uOTT+1(.,L) ofsubproblem (0,0)Now, we discuss the upper bound of UT for this subproblem through the Ltype model of period T . Based on the recursive formulation of Section 4.2.2,the constraints of this problem are as follows.uT (L) =−RT (L)+δ∑OTqOTL[pL(0) ·uOTT+1(L)](C.21)δ∑OTqOTL[pL(0) ·uOTT+1(L)]≥−CeL(1)+δ∑OTqOTL[pL(1) ·uOTT+1(L)](C.22)uT (H)≥−RT (L)+δ∑OTqOTH[pH(0) ·uOTT+1(L)](C.23)uT (H)≥−RT (L)+(−CeH(1)+δ∑OTqOTH[pH(1) ·uOTT+1(L)])(C.24)Following a similar approach to the one used for finding the lower bound, by151simple algebra on Eqs. (C.21) and (C.23), we have:uT (L)−uT (H)≤ δ∑OT[qOTL pLL(0) ·uOTT+1(L,L)+(qOTL pLH(0)−qOTH) ·uOTT+1(L,H)]. (C.25)Similarly, constraints C.21 and C.24 imply:uT (L)−uT (H)≤CeH(1)+δ∑OT[(qOTL pLL(0)−qOTH pHL(1)) ·uOTT+1(L,L)+(qOTL pLH(0)−qOTH pHH(1)) ·uOTT+1(L,H)]. (C.26)Constraint (C.22) also can be written as:CeL(1)≥ δ∑OT[qOTL pLH(0) ·uOTT+1(L,L)−qOTL pLH(0) ·uOTT+1(L,H)]. (C.27)To obtain the largest possible upper bound ofUT , we can find the maximumvalue of y′ such that uT (L) ≤ uT (H) + y′. Thus, considering Eqs. (C.25)through (C.27) and assuming y′ = uT (L)−uT (H), the maximum value of y′can be found by solving the following optimization problem:max{y′,uOTT+1(L,L),uOTT+1(L,H);OT∈O}y′s.t. (C.25) to (C.27),uOTT+1(L) ∈UT+1 ∀OT ∈ OTo solve the above problem geometrically, we compute the maximum pos-sible values for the RHS of constraints (C.25) and (C.26) such that for allOT ∈ O , uOTT+1(L) ∈ UT+1, and also constraint (C.27) satisfies. Thus, theoptimal value of y′ is actually the minimum of these two right hand sides.For the RHS of constraint (C.25), the increasing direction with respect touOTT+1(L,H) and uOTT+1(L,L) for each OT is equal to:(qOTL pLH(0)−qOTHqOTL pLL(0)).152Since qOTL pLL(0) is positive, for all OT ∈O , the above increasing direction iseither in quadrant (I) or (II) of Figure (C.2). Thus, increasing uOTT+1(L,H) anduOTT+1(L,L) for any OT in this direction (shown as direction (2) on Figure C.2)in the feasible region, along with choosing proper values for the utilities ofother OT s based on constraint (C.27) can make the RHS of constraint (C.25)unbounded.For the RHS of constraint (C.26), the increasing direction with respect touOTT+1(L,H) and uOTT+1(L,L) for each OT is(qOTL pLH(0)−qOTH pHH(1)qOTL pLL(0)−qOTH pHL(1)).For OT = (0,H) (i.e., not having an accident and getting a high UBI score),which is the most informative outcome for a low risk driver, according to theanalysis of Section 4.2.2 we haveqOTHqOTL< 1. This is equivalent to q(0,H)H (pHL(1)+ pHH(1))< q(0,H)L (pLL(0)+ pLH(0)), which results in q(0,H)H pHH(1)−q(0,H)LpLH(0) ≤ q(0,H)L pLL(0)− q(0,H)H pHL(1). This inequality implies the increas-ing direction for OT = (0,H), represented as directions (3) colored in bluein Figure C.2, is above the line uT+1(L) =−uT+1(H). Therefore, followingthe same reasoning as the one discussed for constraint (C.25), direction (3)for OT = (0,H) can push the RHS of this constraint to +∞. This results inthe unboundedness of the upper limit of UT for subproblem (0,0).The above analysis shows that, in the absence of reservation utilities, UTis the entire R2 space. However, as based on the IR constraint (4.11), thecontinuation utilities delivered to the agent should be at least equal to hisreservation utilities, UT is the entire R2 space cut by the reservation utilitiesu¯LT and u¯HT from below and left.• Subproblem (1,1). For this subproblem, eT (L) = 1 and eT (H) = 1. Similarto subproblem (0,0), to compute the lower bound ofUT , the H type problemof period T of Section 4.2.2 can be used. By the same algebra as the oneapplied for finding the lower bound of UT for subproblem (0,0), constraints153for the H type problem of the last period imply:uT (L)−uT (H)≥CeH(1)+δ∑OT[(qOTL pLL(0)−qOTH pHL(1)) ·uOTT+1(H,L)+(qOTL pLH(0)−qOTH pHH(1)) ·uOTT+1(H,H)], (C.28)uT (L)−uT (H)≥(CeH(1)−CeL(1))+δ∑OT[(qOTL −qOTH pHL(1))·uOTT+1(H,L)− (qOTH pHH(1)) ·uOTT+1(H,H)], (C.29)CeH(1)≤ δ∑OT[qOTH pHL(1) ·uOTT+1(H,L)−qOTH pHL(1) ·uOTT+1(H,H)].(C.30)Thus, considering y = uT (L)−uT (H), the least possible lower bound of UTfor subproblem (1,1) can be obtained by computing the optimal value of yin the following optimization problem:min{y,uOTT+1(H,L),uOTT+1(H,H);OT∈O}ys.t. (C.28) to (C.30),uOTT+1(H) ∈UT+1 ∀OT ∈ OSimilar to what was discussed above, this problem can be solved geometri-cally. To do so, first we compute the minimum value of the RHS of constraint(C.28). The decreasing direction of the RHS of this constraint with respectto uOTT+1(H,H) and uOTT+1(H,L) for each OT is(qOTH pHH(1)−qOTL pLH(0)qOTH pHL(1)−qOTL pLL(0)).For OT = (1,L) (i.e., having an accident and getting a low UBI score), whichis the most informative outcome for a high risk driver, according to the anal-154ysis of Section 4.2.2, we haveqOTHqOTL> 1. This is equivalent to q(1,L)H (pHL(1)+pHH(1))> q(1,L)L (pLL(0)+ pLH(0)), which results in q(1,L)H pHL(1)−q(1,L)L pLL(0)> q(1,L)L pLH(0)−q(1,L)H pHH(1). This inequality implies the decreasing di-rection for OT = (1,L) is above the line uT+1(L) = −uT+1(H). Directions(1) on Figure C.3 depict different possible cases for these decreasing direc-tions, which all can make the RHS of constraint (C.28) unbounded.Figure C.3: Decreasing and increasing directions for uOTT+1(.,H) and uOTT+1(.,L) ofsubproblem (1,1)Regarding constraint (C.29), the decreasing direction of the RHS of this con-straint with respect to uOTT+1(H,H) and uOTT+1(H,L) is as follows:(qOTH pHH(1)qOTH pHL(1)−qOTL).For all OT ∈ O , qOTL pLL(0) is positive, so the above decreasing direction,shown as direction (2) on Figure C.3 is either in quadrant (I) or (IV) of thisfigure. Thus, deceasing uOTT+1(L,H) and uOTT+1(L,L) for any OT in this direc-tion in the feasible region, along with choosing proper values for the utili-ties of other OT s based on constraint (C.30) can push the RHS of constraint155(C.29) to −∞.Thus, considering the above analysis on constraints (C.28) and (C.29), theoptimal value of y would be unbounded as the maximum of the RHS ofconstraints (C.28) and (C.29) is −∞, which results in the unboundedness ofUT from below for subproblem (1,1).Now, let us discuss the upper bound of UT for this subproblem. Followingthe similar analyses to those used for finding the upper bound of UT forsubproblem (0,0), the constraints of the recursive model of Section 4.2.2 forthe L type agent at period T imply:uT (L)−uT (H)≤−CeL(1)+δ∑OT[qOTL ·uOTT+1(L,L)−qOTH ·uOTT+1(L,H)],(C.31)uT (L)−uT (H)≤(CeH(1)−CeL(1))+δ∑OT[(qOTL −qOTH pHL(1)) ·uOTT+1(L,L)− (qOTH pHH(1)) ·uOTT+1(L,H)], (C.32)CeL(1)≤ δ∑OT[qOTL pLH(0) ·uOTT+1(L,L)−qOTL pLH(0) ·uOTT+1(L,H)]. (C.33)Considering y′ = uT (L)− uT (H), the upper bound of UT for subproblem(1,1) can be obtained by computing the optimal value of y′ in the followingoptimization problem:max{y′,uOTT+1(L,L),uOTT+1(L,H);OT∈O}y′s.t. (C.31) to (C.33),uOTT+1(L) ∈UT+1 ∀OT ∈ OTo solve the above problem geometrically, as mentioned above, we computethe maximum of the right hand sides of constraints (C.31) and (C.32) andthen take the minimum of these two values. For constraint (C.31), the in-156creasing direction with respect to uOTT+1(L,H) and uOTT+1(L,L) for each OT is(−qOTHqOTL).The above direction, denoted as direction (3) of Figure C.3 is in quadrant(II) of this figure. Increasing u(0,H)T+1 (L,H) and u(0,H)T+1 (L,L) in this directionin the feasible region, along with choosing proper values for the utilities ofother OT s based on constraint (C.33) can make the RHS of constraint (C.31)unbounded.Regarding constraint (C.32), the increasing direction is as follows:(−qOTH pHH(1)qOTL −qOTH pHL(1)).For OT = (0,H) (i.e., having no accident and getting a high score), which isthe most informative outcome for a low risk driver, according to the analy-sis of Section 4.2.2, we haveqOTHqOTL< 1 implying that q(0,H)H pHL(1) < q(0,H)L .Thus, for outcome OT = (0,H), the increasing direction is in quadrant (II) ofFigure C.3 (similar to direction (3) of Figure C.3). Increasing u(0,H)T+1 (H,H)and u(0,H)T+1 (H,L) in this direction in the feasible region, along with choosingproper values for the utilities of other OT s based on constraint (C.33) canpush the RHS of constraint (C.32) to ∞ as well. This results in the unbound-edness ofUT from above for subproblem (1,1) in the absence of reservationutilities. However, similar to subproblem (0,0), as based the on IR constraint(4.11), the continuation utilities delivered to the agent should be at least equalto his reservation utilities, UT is the entire R2 space cut by the reservationutilities u¯LT and u¯HT from below and left.• Subproblem (0,1). For this subproblem, eT (L) = 0 and eT (H) = 1. Sincefor computing the lower bound of UT , we use the H type problem of periodT of Section 4.2.2, the lower bound for this subproblem is the same as thatof subproblem (1,1), in which the high risk driver also chooses to take the157effort action (i.e., eT (H) = 1). Thus, considering the analysis of subproblem(1,1), without taking into account the reservation utilities,UT of subproblem(0,1) is also unbounded from below. For computing the upper bound ofUT , we use the L type problem of period T , which results in similarity ofits upper bound to that of subproblem (0,0) as in both of them eT (L) =0. Thus, following the results of subproblem (0,0), in the absence of thereservation utilities,UT of subproblem (0,1) is also unbounded from above.That implies UT is the entire R2 space in this case. Therefore, consideringthe IR constraints and reservation utilities, UT of this subproblem is also thewhole R2 space cut by the reservation utilities u¯LT and u¯HT from below andleft.• Subproblem (1,0). For this subproblem, eT (L) = 1 and eT (H) = 0. Bysimilar reasoning discussed for subproblem (0,1), the lower bound ofUT forsubproblem (1,0) is the same as that of subproblem (0,0) as in both eT (H)=0. The upper bound of UT for subproblem (1,0) also would be identical tosubproblem (1,1)’s upper bound because eT (L) = 1 in both of them. Thus,considering the results for subproblems (0,0) and (1,1) presented above, theUT of subproblem (1,0) is the entire R2 space cut by the reservation utilitiesu¯LT and u¯HT as well.C.4 Proof of Proposition 4.4Recall that, for any t, Ctθ represents the optimal continuation social welfare fromstate θt . That isΦ∗t (ut) = (CtL,CtH)T ;ut ∈Ut . Following Eq. (4.7), the continuationsocial welfare from state θt when the agent takes the effort level of 1 (i.e., et(θt) =1) is equal to:Φt(θt ;ut) =−Ceθt (1)+∑OtqOtθt[−COtθt +δpθt (1) ·Φ∗t+1(uOtt+1(θt))],158which is equivalent to:Φt(θt ;ut) =−Ceθt (1)−∑OtqOtθt ·COtθt +δ∑OtqOtθt[pθt L(1) ·Ct+1L + pθt H(1) ·Ct+1H],and consequently,Φt(θt ;ut) =−Ceθt (1)−∑OtqOtθt ·COtθt +δ[pθt L(1) ·Ct+1L + pθt H(1) ·Ct+1H]. (C.34)Similarly, considering Ceθt (0) = 0, the continuation social welfare from state θtwhen the agent does not exert any effort is:Φt(θt ;ut) =−∑OtqOtθt ·COtθt +δ[pθt L(0) ·Ct+1L + pθt H(0) ·Ct+1H]. (C.35)Thus, since according to Proposition 4.3 and Corollary 4.1, the feasible continua-tion utility set of the agent, Ut for all subproblems defined for each realization ofeffort pairs are the same, it would be optimal to encourage the agent to take effortif and only if (C.34)≥ (C.35), which implies:−Ceθt (1)−∑OtqOtθt ·COtθt +δ[pθt L(1) ·Ct+1L + pθt H(1) ·Ct+1H]≥−∑OtqOtθt ·COtθt +δ[pθt L(0) ·Ct+1L + pθt H(0) ·Ct+1H].By rearranging terms, we obtain:((pθt L(1)− pθt L(0))·Ct+1L +(pθt H(1)− pθt H(0))·Ct+1H ≥Ceθt (1)δ.159C.5 Proof of Theorem 4.1The necessary and sufficient condition presented in Proposition 4.4 to induce anagent with type L to exert effort at period t is as follows.(pLL(1)− pLL(0))·Ct+1L +(pLH(1)− pLH(0))·Ct+1H ≥CeL(1)δ.Following Assumption 4.1, the above can be simplified to:(1− pLL(0))·Ct+1L − pLH(0) ·Ct+1H ≥CeL(1)δ,which is equivalent to:pLH(0)[Ct+1L −Ct+1H]≥ CeL(1)δ. (C.36)Similarly, it can be shown that, for the agent with type H also, the condition ofProposition 4.4 can be written aspHL(1)[Ct+1L −Ct+1H]≥ CeH(1)δ. (C.37)By Theorem 4.2, under Assumption 4.2, CtL−CtH is nondecreasing in t. Thus, Eqs.(C.36) and (C.37) along with Theorem 4.2 imply that, for both types of agents, if itis optimal to exert effort at any period t (i.e, Eqs. (C.36) or (C.37) satisfy), it wouldbe optimal to exert effort in later periods (i.e., larger t) as well.C.6 Proof of Theorem 4.2The proof can be done by induction. First, we present the expression of CtL−CtH .According to the objective function (4.7) of the recursive formulation, for any 1≤160t ≤ T ,CtL−CtH =CeH(e∗t (H))−CeL(e∗t (L))+∑OtqOtH ·COtH −∑OtqOtL ·COtL +δ[(pLL(e∗t (L))− pHL(e∗t (H)))·Ct+1L +(pLH(e∗t (L))− pHH(e∗t (H)))·Ct+1H],which is equivalent to the following. Since∑OtqOtH ·COtH −∑OtqOtL ·COtL = PaH ·CaH −PaL ·CaL is constant for all periods t, henceforth it is denoted by γ .CtL−CtH =CeH(e∗t (H))−CeL(e∗t (L))+ γ+δ[((1− pLH(e∗t (L)))−(1− pHH(e∗t (H))))·Ct+1L +(pLH(e∗t (L))− pHH(e∗t (H)))·Ct+1H].The above equation implies:CtL−CtH =CeH(e∗t (H))−CeL(e∗t (L))+ γ+δ(pHH(e∗t (H))− pLH(e∗t (L)))[Ct+1L −Ct+1H]. (C.38)Considering (C.9) and (C.10) discussed in the proof of Proposition 4.2, for t =T +1, by Eq. (4.12) of the terminal problem of Section 4.2.3, we obtain:CT+1L −CT+1H = −1(1−δ · pLL(0))PaL ·CaL+[11−δ −δ · pLH(0)(1−δ)(1−δ · pLL(0))]PaH ·CaH= − 1(1−δ · pLL(0))PaL ·CaL+[1−δ(pLL(0)+ pLH(0))(1−δ)(1−δ · pLL(0))]PaH ·CaH=1(1−δ · pLL(0))[PaH ·CaH −PaL ·CaL]. (C.39)Depending on the optimal effort levels of each type L and H during period tand considering Assumption 4.1, when the effort cost of both types are the same161(i.e., CeL(1) =CeH(1)), Eq. (C.38) can be further simplified as follows.• e∗t (L) = 0 and e∗t (H) = 0:CtL−CtH = γ+δ(1− pLH(0))[Ct+1L −Ct+1H]= γ+δ · pLL(0) ·[Ct+1L −Ct+1H]. (C.40)• e∗t (L) = 0 and e∗t (H) = 1:CtL−CtH =CeH(1)+ γ+δ(pHH(1)− pLH(0))[Ct+1L −Ct+1H]. (C.41)• e∗t (L) = 1 and e∗t (H) = 0:CtL−CtH =−CeL(1)+ γ+δ[Ct+1L −Ct+1H]. (C.42)• e∗t (L) = 1 and e∗t (H) = 1:CtL−CtH = CeH(1)−CeL(1)+ γ+δ · pHH(1) ·[Ct+1L −Ct+1H]= γ+δ · pHH(1) ·[Ct+1L −Ct+1H]. (C.43)Note that when CeL(1) =CeH(1), following the necessary and sufficient conditionsfor exerting effort presented by Eqs. (C.36) and (C.37), if pHL(1) ≥ pLH(0) (part(b) of Assumption 4.2), the third case mentioned above cannot happen. This im-plies that at any period, if it is optimal for a L type agent to exert effort, for the Htype agent, it is optimal to exert effort as well.Now, we prove that CTL −CTH ≤ CT+1L −CT+1H . Then, we continue with theinduction assumption of CiL−CiH ≤Ci+1L −Ci+1H and show that Ci−1L −Ci−1H ≤CiL−CiH .We prove CTL −CTH ≤ CT+1L −CT+1H separately for each of the three possiblecases of optimal effort pairs for the L and H type agents in the last period T ,(e∗T (L),e∗T (H)).• e∗T (L) = 0 and e∗T (H) = 0: By Eq. (C.40) and substituting back the value of162γ to this equation, we have:CTL −CTH = PaH ·CaH −PaL ·CaL+δ · pLL(0) ·[CT+1L −CT+1H]=(1−δ · pLL(0))[CT+1L −CT+1H]+δ · pLL(0) ·[CT+1L −CT+1H]= CT+1L −CT+1H ,where the second equality above follows from Eq. (C.39).• e∗T (L) = 0 and e∗T (H) = 1: Similar to the previous case, Eqs. (C.39) and(C.41) imply:CTL −CTH = CeH(1)+(1−δ · pLL(0))[CT+1L −CT+1H]+ δ(pHH(1)− pLH(0))[CT+1L −CT+1H]= CeH(1)+(1−δ(pLL(0)+ pLH(0)− pHH(1)))[CT+1L −CT+1H]= CeH(1)+(1−δ · pHL(1))[CT+1L −CT+1H]≤ δ · pHL(1) ·[CT+1L −CT+1H]+(1−δ · pHL(1))[CT+1L −CT+1H]= CT+1L −CT+1H ,where the second last inequality follows from Eq. (C.37) as e∗T (H) = 1 inthis case.• e∗T (L) = 1 and e∗T (H) = 1: From Eqs. (C.39) and (C.43), we obtain:CTL −CTH =(1−δ · pLL(0))[CT+1L −CT+1H]+δ · pHH(1) ·[CT+1L −CT+1H]=[CT+1L −CT+1H]−δ(pLL(0)− pHH(1))[CT+1L −CT+1H]≤ CT+1L −CT+1H ,where the last inequality holds since, by Assumption 4.2, pHL(1) ≥ pLH(0)(or equivalently 1− pHL(1)≤ 1− pLH(0)), implying that pHH(1)< pLL(0).Now, considering the induction assumption of CiL−CiH ≤Ci+1L −Ci+1H , we showthat Ci−1L −Ci−1H ≤CiL−CiH . We prove this for all possible cases of optimal effort163pairs for the L and H type agents during periods i−1 and i, (e∗i−1(L),e∗i−1(H)) and(e∗i (L),e∗i (H)). As discussed above, for each t, (e∗t (L),e∗t (H)) can have 3 possiblecases, and for each one, CtL −CtH can be obtained through Eqs. (C.40), (C.41)and (C.43). However, considering the necessary and sufficient condition for theoptimality of exerting effort in each period presented by Eqs. (C.36) and (C.37),the induction assumption of CiL−CiH ≤ Ci+1L −Ci+1H eliminates some of the casesneed to be considered, as by this assumption and following Eqs. (C.36) and (C.37),if e∗i−1(L) = 1, then e∗i (L) = 1. Similarly, e∗i−1(H) = 1 implies e∗i (H) = 1. Thus,we consider the following cases.• e∗i−1(L) = 0, e∗i−1(H) = 0 and e∗i (L) = 0, e∗i (H) = 0: By Eq. (C.40), we haveCi−1L −Ci−1H = γ+δ · pLL(0) ·[CiL−CiH]≤ γ+δ · pLL(0) ·[Ci+1L −Ci+1H]= CiL−CiH ,such that the second inequality holds by the induction assumption, and thelast equality follows from Eq. (C.40).• e∗i−1(L)= 0, e∗i−1(H)= 0 and e∗i (L)= 0, e∗i (H)= 1: According to Eq. (C.40),Ci−1L −Ci−1H = γ+δ · pLL(0) ·[CiL−CiH]= γ+δ(1− pLH(0))[CiL−CiH]= γ+δ(pHH(1)+ pHL(1)− pLH(0))[CiL−CiH]≤ γ+δ(pHH(1)− pLH(0))[CiL−CiH]+CeH(1)≤ γ+δ(pHH(1)− pLH(0))[Ci+1L −Ci+1H]+CeH(1)= CiL−CiH .The third last inequality mentioned above satisfies since e∗i−1(H) = 0, andthus, by (C.37), δ pHL(1)[CiL−CiH]