UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Three essays in operations management Sheng, Lifei 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_september_sheng_lifei.pdf [ 1.44MB ]
Metadata
JSON: 24-1.0348670.json
JSON-LD: 24-1.0348670-ld.json
RDF/XML (Pretty): 24-1.0348670-rdf.xml
RDF/JSON: 24-1.0348670-rdf.json
Turtle: 24-1.0348670-turtle.txt
N-Triples: 24-1.0348670-rdf-ntriples.txt
Original Record: 24-1.0348670-source.json
Full Text
24-1.0348670-fulltext.txt
Citation
24-1.0348670.ris

Full Text

Three Essays in OperationsManagementbyLifei ShengB.Sc. (Mathematics), Shanghai Jiao Tong University, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Business Administration)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)June 2017c© Lifei Sheng 2017AbstractThis thesis comprises three independent essays in operations management. The first essayexplores a specific issue encountered by mobile gaming companies. The remaining two essaysaddress the contracting problem in a supply chain setting.In the first essay, we study the phenomena of game companies offering to pay users in“virtual” benefits to take actions in-game that earn the game company revenue from thirdparties. Examples of such “incentivized actions” include paying users in “gold coins” to watchvideo advertising and speeding in-game progression in exchange for filling out a survey etc.We develop a dynamic optimization model that looks at the costs and benefits of offeringincentivized actions to users as they progress in their engagement with the game. We findsufficient conditions for the optimality of a threshold strategy of offering incentivized actions tolow-engagement users and then removing incentivized action to encourage real-money purchasesonce a player is sufficiently engaged. Our model also provides insights into what types of gamescan most benefit from offering incentivized actions.In the second essay, we propose what we call a generalized price-only contract, which is adynamic generalization of the simple wholesale price-only contract. We derive some interestingproperties of this contract and relate them to well-known issues such as double marginaliza-tion, relative power in a supply chain due to Stackelberg leadership, contract structure andcommitment issues.In the third essay, we consider a supplier selling to a retailer with private inventory infor-mation over multiple periods. We focus on dynamic short-term contracts, where contractingtakes place in every period. At the beginning of each period, with inventory or backlog keptprivately by the retailer, the supplier offers a one-period contract and the retailer decides hisorder quantity in anticipation of uncertain customer demand. We cast the problem as a dynam-ic adverse-selection problem with Markovian dynamics. We show that the optimal short-termcontract has a threshold structure, with possibly multiple thresholds. In certain cost regimes,the optimal contract entails a base-stock policy yet induces partial participation.iiLay SummaryIn the first essay, we explore whether the mobile gaming company should pay users in “virtual”benefits to watch video advertising or fill out surveys so that the company earns revenue fromthird parties. We help the company to target what games benefit most from this practice, andwe design the best way to implement it.Essays 2 and 3 consider a supplier selling to a downstream retailer who faces random cus-tomer demand. The supplier determines the type and terms of the contract. In essay 2, westudy a dynamic generalization of the simple wholesale price-only contract. We examine theimpacts on the decisions and profits, if the two companies are allowed to trade multiple times.In essay 3, we characterize the optimal short-term contract in the case where the supplier needsto offer a new contract in every period, without knowing the retailer’s beginning inventory orbackorder.iiPrefaceChapter 2 is co-authored with Christopher Ryan and Mahesh Nagarajan. Chapter 3 is co-authored with Daniel Granot, Tim Huh and Mahesh Nagarajan. Chapter 4 is co-authored withMahesh Nagarajan and Hao Zhang.In all chapters, I was responsible for developing the models, carrying out the analysis andpresenting the results. My coauthors were involved in providing supervision and feedback inproblem formulation, model analysis and manuscript edits. The three chapters will be modifiedand submitted for publication in academic peer reviewed journals.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Incentivized Actions in Freemium Games . . . . . . . . . . . . . . . . . . . . . 42.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.1 Player Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3.2 The Publisher’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Understanding the Effects of Incented Actions . . . . . . . . . . . . . . . . . . . 172.5 Optimal Policies for The Publisher . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Game Design and Optimal Use of Incented Actions . . . . . . . . . . . . . . . . 272.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 A Dynamic Price-Only Contract: Exact and Asymptotic Results . . . . . . 343.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2 Generalized Price-only Contracts with n Offers . . . . . . . . . . . . . . . . . . . 373.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42ivTable of Contents4 Dynamic Short-term Supply Contracts under Private Inventory and Backo-rder Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4 Single Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.5 Two Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.6 Infinite Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.6.1 Retailer’s Reservation Profit-to-go . . . . . . . . . . . . . . . . . . . . . . 624.6.2 The Zero-rent Plan yR(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.6.3 The Optimal Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78AppendicesA Proofs of Results in Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83B Proofs of Results in Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98C Proofs of Results in Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110vList of Tables2.1 Total expected profit for Example 1. . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Total expected profit for Example 4. . . . . . . . . . . . . . . . . . . . . . . . . . 244.1 Supplier’s profit-to-go under the zero-rent contract yR(x) and the optimal con-tract. (Parameters: r = 10, c = 5, b = 3, h = 3, y0 = 3, λ = 1 and δ = 0.9.) . . . 734.2 Supplier’s profit-to-go under yR(x) and max{yL(x), x}. (Parameters: r = 10,c = 6, b = 7, h = 5, y0 = 3, λ = 1 and δ = 0.9.) . . . . . . . . . . . . . . . . . . . 73viList of Figures2.1 A visual representations of the Markov chain model of player behavior with twoengagement levels and incented actions available at engagement level 0. . . . . . 132.2 Induced absorbing Markov chains for alternate policies in the two-engagementlevel case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Sensitivity of the optimal threshold to changes in αstep. . . . . . . . . . . . . . . 302.4 Sensitivity of the optimal threshold to changes in α(0). . . . . . . . . . . . . . . . 313.1 Sequence of events under n-stage generalized price-only contract . . . . . . . . . 353.2 Illustration of generalized price-only contract . . . . . . . . . . . . . . . . . . . . 353.3 The ratio of the retailer’s total profit over the supplier’s total profit as the numberof price offers n increases under (1) exponential demand (2) uniform demand orlinear demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.1 The retailer’s reservation and pre-transfer profit-to-go functions in period 1. . . . 564.2 Illustration of “bump” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.3 Retailer’s profit-to-go in period 1 under the optimal contract . . . . . . . . . . . 594.4 Supplier’s profit-to-go in period 1 under yR1 (x1). (Parameters: r = 10, c = 5,b = 3, h = 3, λ = 1 and δ = 0.9.) . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.5 Two special cases of the optimal contract in period 1. . . . . . . . . . . . . . . . 624.6 Supplier’s profit-to-go under yR(x) . . . . . . . . . . . . . . . . . . . . . . . . . . 664.7 Retailer’s profit-to-go under quantity plan y(x) = max{0, x} and his reservationprofit-to-go. (Parameters: r = 10, c = 5, b = 3, h = 3, λ = 1 and δ = 0.9.) . . . . 684.8 Retailer’s profit-to-go under quantity plans yR(x) and max{yL(x), x}. (Param-eters: r = 10, c = 5, b = 3, h = 3, y0 = 3, λ = 1 and δ = 0.9.) . . . . . . . . . . . 704.9 Supplier’s profit-to-go under quantity plans yR(x) and max{yL(x), x}. (Param-eters: r = 10, c = 5, b = 3, h = 3, y0 = 3, λ = 1 and δ = 0.9.) . . . . . . . . . . . 714.10 Two quantity plans yR(x) and max{yL(x), x}. (Parameters: r = 10, c = 5,b = 3, h = 3, y0 = 3, λ = 1 and δ = 0.9.) . . . . . . . . . . . . . . . . . . . . . . . 724.11 Sample inventory trajectory under the quantity plan yR(x). (Parameters: r = 10,c = 5, b = 3, h = 3, y0 = 0, λ = 1 and δ = 0.9.) . . . . . . . . . . . . . . . . . . . 724.12 Sample inventory trajectory under the quantity plan in Conjecture 1. (Parame-ters: r = 10, c = 6, b = 7, h = 5, y0 = 0, λ = 1 and δ = 0.9.) . . . . . . . . . . . . 74viiList of FiguresC.1 Illustration of function ϕ(y1|x1, β) under different x1 and β. . . . . . . . . . . . . 115C.2 Function ϕ(y1|x1, β) when choosing two different x1. Parameters: r = 10, c = 5,b = 2, h = 3, y0 = 3, β = 0.1, λ = 1 and δ = 0.9. . . . . . . . . . . . . . . . . . . 115C.3 Function ϕ(y1|x1, β) = 0 has 2 solutions but the optimal control is y∗1(x1) = 0. . 116viiiAcknowledgementsMy time of pursuing Ph.D at UBC has been a great treasure for me. I am deeply grateful tomy supervisors, Tim Huh and Mahesh Nagarajan, for their support in the past five years. Theydevoted tremendous time in improving my academic aptitude and helping me become a goodresearcher. I really appreciate their patience, encouragement and generous financial support.I want to extend my gratitude to the two other committee members, Christopher Ryan andHao Zhang. It is a great honor to collaborate with them. I am indebted for their insightfulcomments and suggestions on my thesis.My sincere thanks also go to many other faculties, staffs and graduate students at the SauderSchool of Business. In particular, I thank Maurice Queyranne for offering the fundamentalcourses in management science, which are very helpful to my research. I thank Harish Krishnan,Steven Shechter, Yichuan Ding and Greg Werker for their invaluable help and advice on myresearch and teaching. I thank Elaine Cho and Rita Quill for caring me and helping me dealwith all kinds of administrative issues.My final gratitude goes to my loving family for their support and encouragement throughoutthese years.ixDedicationTo my parents, Fusen Sheng and Liping ShixChapter 1IntroductionThe area of operations management has found many applications and connections to otherdisciplines, such as economics, marketing and information systems. This thesis presents threeessays in the domain of operations management. Although the topics are diverse, they canbe sorted into two major streams. One is the application of operations management to studyissues in the digital economy (Chapter 2). The other is the application of mechanism design inoperations management (Chapters 3 and 4).Operations Management in the Digital EconomyThe digital economy, especially the emergence of smartphones, social media and cloud dataservice, is radically changing the ways in which people work, learn, entertain themselves andsocialise. By 2015, the worth of the mobile content market alone reached 27.5 billion U.S. dollarsworldwide, and the number of social network users exceeded 2 billion. The emergence of thedigital economy also impacts the way people do business. In particular, as technical innovationspurs the era of big data, decision makers are able to collect personalized data and apply datadriven analytics to understand customers better and improve decision-making. Moreover, anumber of new business models have emerged. One example is freemium games that are free todownload where revenue is generated only after use either through in-app transactions or fromother third parties. The first essay (Chapter 2) explores a particular practice called incentivizedactions that are commonly implemented in mobile games.Mobile games are the fastest growing segment of the entertainment industry globally, whichitself is dominated by freemium games. A recent innovation is to offer “incentives” for playersby paying them with “virtual” benefits for clicking on banner ads, watching videos, or fillingout surveys. These are collectively called incentivized actions, or shortened as incented actions.Such a new business model raises many interesting questions.In Chapter 2, we take the perspective of a game publisher and explore the use of incentedactions in mobile games. Specifically, we study the following questions: Should game publishersoffer incented actions? If so, how to optimally design a policy for offering incented actions? Ifa gaming company offers several products, which of its games can most benefit from offeringincented actions?We model the publisher’s problem as a Markov decision process where the underlying stateis the player’s engagement levels and the publisher’s decision is whether or not to offer incented1Mechanism Design in Operations Managementactions. We provide sufficient conditions for the optimality of a threshold strategy of offeringincented actions to low-engaged players and then removing them to encourage real-moneypurchases once a player is sufficiently engaged. We also explore the settings where the optimalityof the threshold policy breaks down. Moreover, we provide managerial insights and assist gamepublishers in targeting which types of games can take most advantage of delivering incentedactions. For instance, we show analytically that social games that include player interactions aspart of the design should offer incented actions more broadly. We also discuss different effectsof the design of incented actions for attracting and engaging players, including their “strength”,i.e. the power of their associated virtual benefits.Mechanism Design in Operations ManagementMechanism design is a field pioneered by economists but has recently found important ap-plications in operations management, especially the areas of supply chain management andhealthcare management. Numerous studies have analyzed how contracts should be designed tomediate interactions among self-interested firms. The major part of the literature has focusedon the static setting with complete information. In the real world, however, multi-period con-tracting is also (if not more) prevalent, with contracting parties having private information anddecisions being made dynamically. Dynamic contracting is known to be a challenging problemdue to a host of technical and expositional difficulties. Several researchers have exerted a sig-nificant effort to characterize the optimal mechanism in certain specific settings. For instance,Battaglini [4] characterizes the optimal long-term contract between a monopolist and a buyerwhose private preferences evolve as a two-state Markov process. He finds that the optimal con-tract is contingent on the buyer’s complete purchase history and once the buyer reveals himselfto be the high-type, the supply will become efficient in all future periods. This is considered asignificant finding in dynamic contracting, yet limited by the two-state assumption.Chapters 3 and 4 focus on a two-echelon supply chain in which a retailer (“he”) buysinventory from an upstream supplier (“she”) dynamically in anticipation of uncertain customerdemand. The supplier needs to determine the terms of the contract. We are interested infinding how dynamic interactions affect bilateral business relationships and whether it will leadto significantly different contracts than under one-shot interactions.In Chapter 3, we look at a two-stage supply chain with symmetric information. We pro-pose a generalized price-only contract that is a dynamic generalization of the simple wholesaleprice-only contract. The supplier first informs the retailer that n wholesale prices will be of-fered sequentially and dynamically. For each wholesale price proposed, the retailer chooses anorder quantity at that price. At the end of the last offer, the retailer uses the total quantitycumulatively purchased to satisfy market demand. We examine how it will affect the firms’decisions if, instead of having one opportunity to trade, they are allowed to engage multipletimes, still using a simple linear price-only contract.2Mechanism Design in Operations ManagementIt is well known that the classical wholesale price-only contract causes supply chain in-efficiency due to the double marginalization effect. We show that the generalized price-onlycontract benefits both players. Moreover, as the number of price offers n approaches infinity,the supply chain profit approaches the first best profit. We also demonstrate that for a givencontract with a specific n, the wholesale prices monotonically decrease. However, somewhatsurprisingly, for a fixed n, the order quantities within the n periods may not be monotone. Weprovide necessary and sufficient conditions for the stationarity of the supplier’s per period prof-it. Finally, we derive closed form solution for three settings in which the demand is exponential,uniform or constant.In Chapter 4, we consider a supplier selling to a retailer with private inventory informationover multiple periods. A few pioneering studies have explored the contracting problem in thissetting. Zhang et al. [57] focus on dynamic short-term contract in the lost sales case and theyshow that the optimal contract is a batch-order contract under certain assumptions. Ilan andXiao [27] study the optimal long-term contract and prove that it takes a simple form in bothlost-sales and backlogging cases.Filling a gap in the literature, our work focuses on dynamic short-term contracts, wherecontracting takes place in every period, with inventory or backlog kept privately by the retail-er. We cast the problem as a dynamic adverse-selection problem with Markovian dynamics.Markovian adverse-selection models, in which the state and action in a period affect the statein the subsequent period, are theoretically challenging and much less understood. Our workcontributes to a better understanding of such models, especially under short-term contracting.We show that the optimal short-term contract has a threshold structure, with possiblymultiple thresholds, under exponentially distributed demand. In a high cost regime, the optimalshort-term contract may entail a base-stock order policy and an exclusion region. If the retailer’sinventory (or backlog) falls in the exclusion region, the supplier terminates the relationship withthe retailer. If not, the retailer participates and orders up to a constant base-stock level. It isdrastically different from the lost sales setting.Moreover, in the backlogging case, the supplier finds more sales opportunities in the retailer’sbacklog situation, which increases the retailer’s bargaining power. As an interesting result,the information rent (profit yielded to the retailer) under the optimal contract may be non-monotone in the retailer’s inventory (or backlog) level. The supplier would sometimes preferto deal with retailers with high inventory, which is different from the lost-sales case where thesupplier always wants to trade with retailers who have low inventory.The rest of the thesis is organized as follows. Each essay is self-contained and is presentedin one chapter, with a more exhaustive discussion of literature review, research question andmain contributions. All proofs are relegated to Appendices.3Chapter 2Incentivized Actions in FreemiumGames2.1 IntroductionGames represent the fastest growing sector of the entertainment industry globally, which in-cludes music, movies and print publishing [39]. Moreover, the online/mobile space is the fastestgrowing segment within games, which itself is dominated by games employing a “freemium”business model. Freemium games are free to download and play and earn revenue through ad-vertising or selling game enhancements to dedicated players. When accessed on 23 April 2015,Apple Inc.’s App Store showed 190 out of the 200 top revenue generating games (and all ofthe top 20) were free to download.1 On Google Play, the other major mobile games platform,297 out of the 300 top revenue generating games were freemium.2 Moreover, games are thedominant revenue generators in the global app market. Revenues from mobile games accountfor 79% of total app revenue on Apple’s App Store and 92% of revenue on Google Play [48].The concept behind freemium is to attract large pools of players, many of whom mightnever “monetize”; that is, pay for an in-app purchase. The process by which a player beginsto pay out-of-pocket for a freemium game is called monetization. In general, successful gameshave a monetization rate of between 2 and 10 per cent, with the average much closer to 2 percent [36]. As for unsuccessful games, the monetization rate can be virtually zero.When game publishers cannot earn directly from the pockets of consumers they turn toother sources of revenue. This is largely through earning revenue from third parties willingto pay publishers for delivering advertising content, have players download other apps, fill outsurveys, or apply for services, such as credit cards. This stream of revenue is less lucrative perconversion than in-app purchases. For instance, delivering a video ad typically earns a fractionof a cent while an in-app purchase typically earns the publisher fifty cents or more.Like most modern consumers, however, players can become irritated by advertising, espe-cially when it interrupts the flow or breaks the fiction of a game. A recent innovation is tooffer “incentives” for players to click on a banner ad, watch a video, or fill out a survey. Theseare collectively called incentivized actions, or as it is commonly shortened, incented actions. Toget a clearer sense of the structure of an incented action and the value of the “incentive” to a1http://appshopper.com/bestsellers/games/gros/?device=iphone2https://play.google.com/store/apps/collection/topgrossing?hl=en42.1. Introductionplayer, details of the mechanics and goals of a game are needed to provide context. We feel thisis best achieved through the description of the following two concrete examples.Crossy RoadCrossy Road is a freemium game developed by Hipster Whale that has recently (since 2014) seengreat success with incented video advertising, earning over 10 million USD in the first threemonths after its launch [19]. In Crossy Road, the player controls a character who attemptsto cross busy streets full of fatal obstacles. The main progression of the game is to collectadditional characters to play, including animals, avatars of famous people, and many others.The characters must be unlocked through earning “coins”. “Coins” are earned organically byplaying the game at a slow rate. Periodically the player has an option to watch a video ad toearn a large bundle of coins all at once. Once a player collects one hundred coins she can usethem to randomly draw a character. If the player is unlucky she may draw a character shepreviously unlocked. If the player wishes to purchase a specific character (of which there arenow dozens) it will cost at least 0.99 USD. The incented action (watching an ad) acceleratesthe progression of the player by rewarding large bundles of coins, but the value of the incentiveweakens as random draws are increasingly unlikely to unlock a new character as the playerprogresses. Moreover, there can be long stretches of time where video ads are not offered,forcing the player to either make progress organically or purchase characters with real money.Candy Crush SagaA second illustrative example is Candy Crush Saga, published by King. King was recentlyacquired by Activision-Blizzard for 5.9 billion USD based on the enduring popularity of CandyCrush Saga and its portfolio of successful games [37]. In Candy Crush Saga, a player attemptsto solve a progression of increasingly challenging puzzles. At the higher levels it is typical forplayers to get stuck for extended periods of time on a single puzzle. Player progression is furtherhindered by a “lives” mechanic where each failed attempt at a puzzle consumes one of at mostfive total lives. Lives are regenerated either through waiting long periods of real time or bypurchasing additional lives with real money. In addition to lives, players can also pay for itemsthat enhance their chances of completing a puzzle.Early versions of Candy Crush Saga had incented actions, including advertising. A playercould take an incented actions to earn lives or items without using real money. However, inJune of 2013, six months after Candy Crush Saga launched on Apple iOS, King decided to dropall forms of in-game advertising in the game [18].King’s choice was surprising to many observers. What was the logic for removing a po-tential revenue stream? How did this move affect the monetization rate? The ramificationsfrom such decisions vary depending on the game and can potentially have significant financialconsequences. To get a sense of this, note that an in-app-purchase can be between a dollar to52.1. Introductionaround $5 and Supercell earned approximately 2.3 billion in revenue in 2015 purely throughmonetization of its three games [51]. Our two examples of games that have experimented withthe use of incented actions also raise several related important questions about the impact ofincented actions. For example, when is it best to offer incented actions? If offered, is it optimalto offer them to certain players at certain times, but not others? Also, if a gaming companyoffers several products, which of its games are better suited to offering incented actions? Ourpaper develops a framework for answering some of these important questions.Our contributionsIn this paper we present an analytical model to explore the use of incented actions. In particular,we are interested in a game publisher’s decision of when to offer incented actions to players, andwhen to remove this option. Our model emphasizes the connection of incented actions to twoother useful concepts often discussed in the game industry – engagement and retention. Theengagement of a player measures their commitment. Highly engaged players are more likelyto make in-app purchases and less likely to quit. Retention refers to a game’s effectiveness atkeeping players from quitting. Intuitively, the longer a player is retained in the game, the morelikely they are to become engaged and monetize. Clearly, these two concepts are interrelated.Analytically, player engagement levels are modeled as states in a Markov chain and retentionis captured as the time a player stays in the system before being absorbed into a “quit” state.The main insights from our model deal with the relationship between engagement, retentionand incented actions. We identify, and provide analytical characterizations for, three maineffects of incented actions. These effects are described in with greater precision below, but wemention them here at a conceptual level.First is the revenue effect : by offering incented actions game publishers open up anotherchannel of revenue. However, the net revenue of offering incented actions may nonetheless benegative if one accounts for the opportunity costs of players not making in-app purchases. Thatis, this captures the possibility that a player would have made an in-app purchase if an incentedaction was not available. For instance, in Crossy Road a player may collect characters entirelythrough watching video ads, but if this option were removed a player may begin to purchasecharacters with real money.The retention effect measures how effective an incented action is at keeping players fromquitting. Again, in the example of Crossy Road, at some point the organic accumulation of“coins” may feel prohibitively slow to a player. If the option of watching video ads were removed,a player may prefer to quit rather than start to use real money to purchase characters. In otherwords, incented actions can delay a player’s decision to quit the game.Finally, the progression effect refers the effectiveness of an incented action in deepeningthe engagement level of the player. It refers to an incented actions ability to increase theplayer’s attachment to the game. In Crossy Road video ads allow players to collect characters,potentially deepening their engagement. These three effects are intuitively understood by game62.1. Introductiondevelopers and the topic of much discussion and debate in the gaming industry.3 Gamingcompanies grapple with the issue of understanding how these effects interact with each otherin the context of specific games. As we shall see in concrete examples below, all three effectscan act to either improve or erode the overall revenue available to the publisher. Each effectis clearly connected and they often move in similar directions as players progress. Part of ouranalysis is to describe situations where the effects move in different, sometimes counter-intuitive,directions.We are able to analytically characterize each effect, allowing us to gain insights into how tooptimally design a policy for offering incented actions. To understand the interactions betweenthese effects and to capture the dynamics in a game, we use Markov chains to model playerengagement and how they transition from one level of engagement to another. Then, using aMarkov Decision Process (MDP) model we study the effect of specific decisions or policies ofthe game publisher. For example, we provide sufficient conditions for when a threshold policyis optimal. In a threshold policy incented actions are offered until a player reaches a targetengagement level, after which incented actions are removed. The intuition of these policies isclear. By offering incented actions, the retention effect and progression effect keep the playerin for longer by providing a non-monetizing option for progression. However, once a playeris sufficiently engaged, the revenue effect becomes less beneficial and the retention effect lesssignificant because highly engaged players are more likely to buy in-app purchases and keepplaying the game. This suggests that it is optimal to remove incented actions and attemptto extract revenue directly from the player through monetization. Our sufficient conditionsprovide justification for this logic, but we also explore settings where this basic intuition breaksdown. For instance, it is possible that the retention effect remains a dominant concern evenat higher engagement levels. Indeed, a highly engaged player may be quite likely to monetizeand so there is a strong desire on the part of the publisher to keep the player in the system forlonger by offering incented actions to bolster retention.MDPs are used to study dynamics in systems such as ours and are popular in the economics,operations management and marketing literatures. There are several advantages to using MDPsto model and study settings such as ours. First of all, they are an effective tool for theoreticalanalyses, such as the one we are interested in. This is because MDP theory is rich and allowsone to prove formal results on the interactions between different variables of interest. Second,with the availability of player level data as is the case with games, it is relatively easy to validatethese models and perform “what if” scenarios using simulations to test different scenarios ofinterest. We believe ours is the first formal model and study using these ideas in a gamingsetting and we anticipate that the results and modeling approach will be useful to researchersin this area as well as practitioners.Clearly, the relative strengths of these three effects depend on the characteristics the game,including all the parameters in our MDP model. We examine this dependence by tracking3Discussion of issues is a regular occurrence on gaming industry forums, such as gaminginsiders.com.72.2. Related Literaturehow the threshold in an optimal threshold policy changes with the parameters. This analysisprovides insights into the nature of optimal incented action policies.For instance, we show analytically that the more able players are at attracting their friendsinto playing the game, the greater should be the threshold for offering incented actions. Thissuggests that social games that include player interaction as part of their design should offerincented actions more broadly, particularly when the retention effect is strongly positive, sincekeeping players in the game for longer gives them more opportunities to invite friends. Indeed, acommon incented action is to contact friends in your social network or to build a social networkto earn in-game rewards. This managerial insight can assist game publishers in targeting whattypes of games in a portfolio of game projects can take most advantage of delivering incentedactions.We also discuss the different effects of the design of incented actions, in particular their“strength” at attracting and engaging players. “Strength” here refers to how powerful thereward of the incented action is in the game. For instance, the number of “coins” given to theplayer when an incented action is taken. If this reward is powerful, in comparison to in-apppurchases, then it can help players progress, strengthening the progression effect. On the otherhand, a stronger incented action may dissuade players further from monetizing, strengtheningcannibalization. Through numerical examples we illustrate a variety of possible effects thattradeoff the behavioral effects of players responding to the nature of the incented action rewardand show that whether or not to offer incented actions to highly engaged players depends ina nonmonotonic way on the parameters of our model that indicate the strength of incentedactions.The rest of the paper is organized as follows. In Section 2.2 we review related work, payingclose attention to contributions from the information systems and marketing literatures. Sec-tion 2.3 presents our model, first developing a stochastic model of player behavior and thenformulating the game publisher’s decision problem as an MDP. In Section 2.4 we formally de-fine the three effects mentioned above and characterize them analytically. These effects areleveraged to provide sufficient conditions for an optimal threshold policy in Section 2.5. Sec-tion 2.6 draws out policy implications and managerial insights that arise from studying optimalthreshold policies. Section 2.7 concludes. Proofs of all results are in the appendix.2.2 Related LiteratureAs freemium business models have grown in prominence, so has interest in studying variousaspects of freemium in the management literature. While papers in the marketing literatureon freemium business models have been largely empirical (see for instance Gupta et al. [24]and Lee et al. [33]), our work connects most directly to a stream of analytical studies in theinformation systems literature that explore how various approaches to the concept of “free”have been used in the software industry. Two important papers for our context are Niculescu82.2. Related Literatureand Wu [43] and Cheng et al. [13] that together establish a taxonomy of different freemiumstrategies and examine in what situations a given strategy is most advantageous. Seeding is astrategy where a number of products are given away entirely for free, to build a user base thatattracts new users through word-of-mouth and network effects. Previous studies explored theseeding strategy by adapting the Bass model [3] to the software setting (see for instance Jiangand Sarkar [28]). Another strategy is time-limited freemium where all users are given access toa complete product for a limited time, after which access is restricted (see Cheng and Liu [12]for more details). Our setting is best captured by the feature-limited freemium category wherea functional base product can always be accessed by users, with additional features availablefor purchase by users. In freemium mobile games, a base game is available freely for downloadwith additional items and features for sale through accumulated virtual currency or real-moneypurchases.Our work departs from this established literature in at least two dimensions. First, wefocus on how to tactically implement a freemium strategy, in particular, when and how tooffer incented actions to drive player retention and monetization. By contrast, the existingliterature has largely focused on comparing different freemium strategies and their advantageover conventional software sales. This previous work is, of course, essential in understanding thebusiness case for freemium. Our work contributes to a layer of tactical questions of interest tofirms committed to a freemium strategy in need of further insights in how it should be deployed.Second, games present a specific context that may be at odds with some common conceptu-alizations of a freemium software product. For a productivity-focused product, such as a PDFeditor, a typical implementation of freemium is to put certain advanced features behind a paywall, such as the ability to make handwritten edits on files using a stylus. Once purchased,features are typically unlocked either in perpetuity or for a fixed duration by the paying player.By contrast, in games what is often purchased are virtual items or currency that may enhancethe in-game experience, speed progression, or provide some competitive advantage. These pur-chases are often consumables, meaning that they are depleted through use. This is true, forinstance, of all purchases in Candy Crush Saga. Our model allows for a player to make repeatedpurchases and the degree of intensity of monetization to evolve over the course of play.Other researchers have examined the specific context offered by games, as opposed to generalsoftware products, and have adapted specialized theory to this specific context. Guo et al. [23]examine how the sale of virtual currencies in digital games can create a win-win scenario forplayers and publishers from a social-welfare perspective. They make a strong case for the valuecreated by games offering virtual currency systems. Our work adds an additional layer byexamining how virtual currencies can be used to incentivize players to take actions that areprofitable to the firm that does not involve a real-money exchange. A third-party, such as anadvertiser, can create a mutually beneficial situation where the player earns additional virtualcurrency, the publisher earns revenue from the advertiser, and the advertiser promotes theirproduct. Also, Guo et al. [23] develop a static model where players decide on how to allocate92.3. Modela budget between play and purchasing virtual currency. We relate a player’s willingness totake incented actions or monetize as their engagement with the game evolves, necessitating theuse of a dynamic model. This allows us to explore how a freemium design can respond to theactions of players over time. This idea of progression in games has been explored empiricallyin Albuquerque and Nevskaya [1] and we adapt similar notions to derive analytical insights inour setting.The dynamic nature of our model also shares similarities with threads of the vast customerrelationship management (CRM) literature in marketing. In this literature, researchers areinterested in how firms balance acquisition, retention and monetization of players through thepricing and design of their product or service over time. For example, Libai et al. [35] adaptBass’s model to the diffusion of services where player retention is an essential ingredient in thespread of the popularity of a platform. Fruchter and Sigue´ [21] provide insight into how a servicecan be priced to maximize revenue over its lifespan. Both studies employ continuous-time andcontinuous-state models that are well-suited to examine the overall flow of player population.Our focus of analysis is at the player level and asks how to design the game (i.e. service)to balance retention and monetization through offering incented actions for a given acquiredplayer. Indeed, game designs on mobile platforms can, in principle, be specialized down to aspecific player. With the increasing availability of individual player level data, examinationof how to tailor design with more granularity is worthy of exploration. By contrast, existingcontinuous models treat a single player’s choice with measure zero significance.Finally, our modeling approach of using a discrete time Markov decision process modelin search of threshold policies is a standard-bearer of analysis in the operations managementliterature. We have mentioned the advantages of this approach earlier. Threshold policies,which we work to establish, have the benefit of being easily implementable and thus draw favorin studies of tactical decision-making that is common in multiple areas including the economicsand operations management literature. The intuition for their ease of use is somewhat easyto understand. The simplest type of threshold policies allows the system designer to simplykeep track of nothing but the threshold (target) level and monitor the state of the system andtake the appropriate action to reap the benefits of optimality. This is in contrast to situationswhere the optimal policy can be complex and has nontrivial state and parameter dependencies.Examples of this policy being effectively used in dynamic settings include inventory and capacitymanagement and control [58], revenue management [52] and adaptive learning and pricing [50].2.3 ModelWe take the perspective of a game publisher who is deciding how to optimally deploy incentedactions in its game. Incented actions can be offered (or not) at different times during a player’sexperience with the game. For example, a novice player may be able to watch video ads forrewards during the first few hours of game play, only later to have this option removed.102.3. ModelOur model has two agents: the game publisher and a single player. This assumes thatthe game publisher has the ability to offer a customized policy to each its player, or at leastcustomized policies to different classes of players. In other words, the “player” in our modelcan be seen as the representative of a class of players who behave similarly. The publisher mayneed to decide on several different policies for different classes of players for an overall optimaldesign.We assume that the player in our two-agent model behaves stochastically according to theoptions presented to her by the game publisher. The player model is a Markov chain withengagement level as the state variable. The game publisher’s decision problem is a Markov De-cision Problem (MDP) where the stochasticity is a function of the underlying player model andthe publisher decision whether or not to offer incented actions. The player model is describedin detail in the next subsection. The publisher’s problem is detailed in Section 2.3.2.2.3.1 Player ModelThe player can take three actions while playing the game. The first is to monetize (denotedM) by making an in-app purchase with real money. The second is to quit (denoted Q). Oncea player takes the quit action she never returns to playing the game. Third, the player cantake an incented action (denoted I). The set of available actions is determined by whether thepublisher offers an incented action or not. We let A1 = {M, I,Q} denote the set of availableactions when an incented action is offered and A0 = {M,Q} otherwise.The probability that the player takes a particular action depends on her engagement level.Engagement level is a general concept that can be understood in different ways depending onthe specifics of the game. For example, in Crossy Road engagement level may be a function ofthe number of characters that have been collected, in Candy Crush Saga the level of the puzzlethe player is currently on. Let E = {0, 1, . . . , N} be the set of possible engagement levels of theplayer.The probability that the player takes an action also depends on what actions are availableto her. We used the letter “p” to denote probabilities when an incented action is availableand write pa(e) to denote the probability of taking action a ∈ A1 at engagement level e ∈ E.For example, pM (2) is the probability of monetizing at engagement level 2 while pI(0) is theprobability of taking an incented action at engagement level 0. We use the letter “q” to denoteaction probabilities when the incented action is unavailable and write qa(e) for the probabilityof taking action a ∈ A0 at engagement level e ∈ E. By definition pM (e) + pI(e) + pQ(e) = 1and qM (e) + qQ(e) = 1 for all e ∈ E.There is a relationship between pa(e) and qa(e). When an incentivized action is not availablethe probability pI(e) is allocated to the remaining two actions M and Q. We assume thisprobability is allocated as follows.112.3. ModelAssumption 2.3.1. For each e ∈ E there exists a parameter α(e) ∈ [0, 1] such that:qM (e) = pM (e) + α(e)pI(e) (2.1)qQ(e) = pQ(e) + (1− α(e))pI(e). (2.2)Note that α(e) must be such that pM (e)+pI(e)+pQ(e) = 1 and qM (e)+qQ(e) = 1 for all e ∈ E.We call α(e) the cannibalization parameter at engagement level e, since α(e) measures theimpact of removing an incented action on the probability of monetizing and thus captures thedegree to which incented actions cannibalize demand for in-app purchases. A large α(e) (close to1) implies strong cannibalization whereas a small α(e) (close to 0) signifies weak cannibalization.It remains to consider how a player transitions from one engagement level to another en-gagement level. We must first describe the time epochs where actions and transitions take place.The decision epochs where actions are undertaken occur when the player is assessing whetherthey want or not to continue playing the game. For example, in Crossy Road a player mustchoose to monetize, watch a video ad, or quit once she can no longer tolerate the organic rate ofprogression. The real elapsed time between decision epochs is not constant, since it depends onthe behavior of the player between sessions of play. Some players may play frequently, othersonly for a few minutes per day. A player might be highly engaged but nonetheless have littletime to play due to other life obligations. This reality underscores that the elapsed time be-tween decision epochs should not be a critical factor in our model. We denote the engagementlevel at decision epoch t by et and the action at decision epoch t by at.Returning to the question of transitioning from engagement level to engagement level, inprinciple we would need to determine individually each transition probability P(et+1 = e′|et =e and at = a) (or more simply, P(e′|e, a) since we will assume that transition probabilities arestationary over time). However, we make the following simplifying assumption about statetransitions: (i) engagement increases by at most one level at every decision epoch and nevergoes down, (ii) the transition probability is independent of the current engagement level anddepends only on the action taken by the player, (iii) the impact on transitioning to a higherengagement level when taking the monetize action M is independent of whether an incentedaction was offered or not. This implies the following structure.Assumption 2.3.2. The engagement level transition probabilities satisfy the following condi-tions:P(e′|e, a) =τa if e′ = e+ 1 and e < N1− τa if e′ = e < N1 if e = e′ = N0 otherwise(2.3)for a ∈ {M, I}. For a = Q the player transitions with probability one to a quit state denoted122.3. Model0 1MI-1pQ(0) qQ(1)qM (1)pI(0)1− τI1− τMpM (0)τMτIFigure 2.1: A visual representations of the Markov chain model of player behavior with twoengagement levels and incented actions available at engagement level 0.−1.The overall state transition probabilities are:P1(e′|e) =pM (e)τM + pI(e)τI if e′ = e+ 1 and e < NpM (e)(1− τM ) + pI(e)(1− τI) if e′ = e < NpM (e) + pI(e) if e = e′ = NpQ(e) if e′ = −10 otherwise(2.4)when an incented action is available to the player andP0(e′|e) =qM (e)τM if e′ = e+ 1 and e < NqM (e)(1− τM ) if e′ = e < NqM (e) if e = e′ = NqQ(e) if e′ = −10 otherwise(2.5)when incented actions are not offered. Figure 2.1 provides a visual representation of the Markovchain describing player behavior when there are two engagement levels, with incented actiononly offered at engagement level 0. Our assumptions make the structure of state transitionsrelatively simple, but nonetheless still capture the complexity of first having a probabilisticrealization of an action followed by a random transition depending on the action taken. Indeed,despite the simplicity of these assumptions they still turn out to be insufficient to ensure someseemingly intuitive properties of incented actions (see further discussion below, in particularthe need for several additional assumptions to drive analysis).132.3. ModelWe close this subsection by providing some additional basic assumptions on the Markovchain data. These assumptions ensure that the model is consistent with what we mean byplayer engagement and our understanding of what holds in practice.Assumption 2.3.3. We make the following assumptions:(A3.1) pM (e) and qM (e) increase in e,(A3.2) pQ(e) and qQ(e) decrease in e,(A3.3) pQ(e), qQ(e) > 0 for all e ∈ E,(A3.4) pI(e) decreases in e,(A3.5) τM > τI , and(A3.6) α(e) is increasing in e.Assumptions (A3.1) and (A3.2) ensure that more engaged players are more likely to makein-app purchases and less likely to quit. This is precisely how we understand the concept ofengagement – the more invested a player is in a game the more likely they are to spend andthe less likely they are to quit. Assumption (A3.3) ensures that there is always a nonzeroprobability of quitting, no matter the level of engagement. This acknowledges the fact thatgames are entertainment activities, and there are numerous reasons for a player to quit due tofactors in their daily lives, even when engrossed in the game. Moreover, this turns out to bean important technical assumption that allows us to consider a total reward criterion for thepublisher’s decision problem that avoids mathematical complexities (see Section 2.3.2 below).Assumption (A3.4) ensures that players are less likely to take an incented action as theirengagement level increases. One interpretation of this is that the rewards associated with anincented action are less valuable as a player progresses, decreasing the probability of takingsuch an action. Observe that (A3.1)–(A3.4) put implicit assumptions on the cannibalizationparameter α(e) via (2.1) and (2.2).Assumption (A3.5) implies that a player is more likely to increase their engagement whenmonetizing than taking an incented action. This assumption is well-justified for two reasons.The first is that players may view the making an in-app purchase as a kind of investment andbecome more committed to playing to ensure their investment pays off. Second, the rewardsfor incented actions are typically less powerful than what can be purchased for real money.The example of Crossy Road is illustrative: specific characters can be directly bought with realmoney, but watching video ads only contributes to random draws for characters.Finally, (A3.6) implies that a greater share of the probability of taking an incented actionswhen offered is allocated to monetization when an incented ad is removed (see (2.1)). Thisassumption is intuitive and consistent again with our concept of engagement – as a playerbecomes more engaged the monetization option becomes relatively more attractive than quittingwhen the incented action is removed.142.3. Model2.3.2 The Publisher’s ProblemWe model the publisher’s problem as an infinite horizon Markov decision process under a totalreward criterion (for details see Puterman [46]). A Markov decision process is specified by aset of states, controls in each state, transition probabilities under pairs of states and controls,and rewards for each transition.Specifically in our setting based on the description of the dynamics we have laid out thus far,the set of states is {−1}∪E and the set of controls U = {0, 1} is independent of the state, where1 represents offering an incented action and 0 not offering an incented action. The transitionprobabilities are given by (2.4) when u = 1 and (2.5) when u = 0. The reward depends onthe action of the player. When the player quits, the publisher earns no revenue, denoted byµQ = 0. When the player takes an incented action the publisher earns µI , while a monetizationactions earns µM .Assumption 2.3.4. We assume µI < µM .This assumption is in concert with practice, as discussed in the introduction.The expected reward in state e under control u is:r(e, u) =pM (e)µM + pI(e)µI if e ∈ E and u = 1qM (e)µM if e ∈ E and u = 00 if e = −1.(2.6)Note that expected rewards do not depend on whether the player transitions to a higher en-gagement level and so the probabilities τM and τI do not appear in (2.6).A policy y for the publisher is a mapping from E to U . Figure 2.1 illustrates the policyy(0) = 1 and y(1) = 0. Each policy y induces a stochastic process over rewards, allowing us towrite its value as:W y(e) := Eye{ ∞∑t=1r(et, y(et))}(2.7)where e is the player’s initial engagement level and the expectation is from the induced stochasticprocess. In many examples of Markov decision processes, the sum in (2.7) does not converge,but under our assumptions (in particular, (A3.3)) the expected total reward does converge forevery policy y. In fact, our problem has a special structure that we can exploit to derive aconvenient analytical form for (2.7) as follows:W y(e) =∑e′≥enye,e′r(e′, y(e′)) (2.8)where nye,e′ is the expected number of visits to engagement level e′ starting in engagement level e.We derive closed-form expressions for ne,e′ that facilitate analysis. For details see Appendix A.152.3. Model0 1M-1qQ(0) qQ(1)qM (1)1− τMqM (0)τM(a) y1 = (0, 0)0 1M-1qQ(0) pQ(1)pM (1)1− τMqM (0)τMpI(1)(b) y2 = (0, 1)0 1MI-1pQ(0) qQ(1)qM (1)pI(0)1− τI1− τMpM (0)τMτI(c) y3 = (1, 0)0 1MI-1pQ(0) pQ(1)pM (1)pI(0)1− τI1− τMpM (0)τMτIpI(1)(d) y4 = (1, 1)Figure 2.2: Induced absorbing Markov chains for alternate policies in the two-engagement levelcase.Policy W y(0) W y(1)y1 = (0, 0) qM (0)µM1−qM (0)(1−τM ) +qM (0)τM(1−qM (0)(1−τM ))qQ(1)qM (1)µMqM (1)µMqQ(1)y2 = (0, 1) qM (0)µM1−qM (0)(1−τM ) +qM (0)τM(1−qM (0)(1−τM ))pQ(1) (pM (1)µM + pI(1)µI)pM (1)µM+pI (1)µIpQ(1)y3 = (1, 0) pM (0)µM+pI (0)µI1−pM (0)(1−τM )−pI (0)(1−τI ) +pM (0)τM+pI (0)τI(1−pM (0)(1−τM )−pI (0)(1−τI ))qQ(1)qM (1)µMqM (1)µMqQ(1)y4 = (1, 1) pM (0)µM+pI (0)µI1−pM (0)(1−τM )−pI (0)(1−τI ) +pM (0)τM+pI (0)τI(1−pM (0)(1−τM )−pI (0)(1−τI ))pQ(1) (pM (1)µM + pI(1)µI)pM (1)µM+pI (1)µIpQ(1)Table 2.1: Total expected profit for Example 1.For concrete expressions in a special case see Example 1 below.The game publisher’s decision is to choose a policy to solve the optimization problem: givena starting player’s engagement level e solve:maxy∈{0,1}EW y(e). (2.9)In the discussion in later sections, we will often refer to a simple setting with two engagementlevels. Here we can use (2.8) to derive clean expressions for the total expected reward of allfour possible policies.Example 1. Consider the case where N = 1 and there are two engagement levels E = {0, 1}.There are four possible policies: y1 = (0, 0), y2 = (0, 1), y3 = (1, 0) and y4 = (1, 1). Figure 2.2gives a visual representation of these four policies.Table 2.1 shows the total expected reward functions for all four policies. Details of theirderivation are in Appendix A. Which of the four policies y1, . . . , y4 is optimal depends on thevalues of the parameters in the model. Most of the numerical examples in the paper refer to theformulas in Table 2.1.162.4. Understanding the Effects of Incented Actions2.4 Understanding the Effects of Incented ActionsIn this section we show how our analytical model helps us sharpen our insight into the costsand benefits of offering incented actions in games. In particular, we give precise analyticaldefinitions of the revenue, retention and progression effects of offering of incented actions to aplayer.Let y1e¯ be a given policy with y1e¯(e¯) = 0 for some engagement level e¯. Consider a localchange to a new policy y2e¯ where y2e¯(e¯) = 1 but y2e¯(e) = y1e¯(e) for e 6= e¯. We call y1e¯ and y2e¯ pairedpolicies with a local change at e¯. Analyzing this local change at the target engagement level e¯gives insight into the effect of starting to offer an incented action at a given engagement level.Moreover, this flavor of analysis suffices to determine an optimal threshold policy, as discussedin Section 2.5 below. For ease of notation, let W 1(e) = W y1e¯ (e) and W 2(e) = W y2e¯ (e).Our goal is to understand the change in expected revenue moving from policy y1e¯ to policy y2e¯where the player starts (or has reached) engagement level e¯. Indeed, because the engagementdoes not decrease (before the player quits) if the player has reached engagement level e¯ theresult is the same as if the player just started at engagement level e¯ by the Markovian propertyof the player model. Understanding when, and for what reasons, this change has a positiveimpact on revenue provides insights into the value of incented actions.The change in total expected revenue from the policy change from y1e¯ to y2e¯ at engagementlevel e¯ is:W 2(e¯)−W 1(e¯) = n2e¯,e¯r(e¯, 1)− n1e¯,e¯r(e¯, 0)︸ ︷︷ ︸(i)+∑e>e¯(n2e¯,e − n1e¯,e)r(e, y(e))︸ ︷︷ ︸(ii)(2.10)= C(e¯) + F (e¯)Term (i), denoted C(e¯), is the change of revenue accrued from visits to the current engagementlevel e¯. We may think of C(e¯) as denoting the current benefits of offering an incented actionin state e¯, where “current” means the current level of engagement. Term (ii), denoted F (e¯),captures the change due to visits to all other engagement levels. We may think of F (e¯) asdenoting the future benefits of visiting higher (“future”) states of engagement. We can giveexplicit formulas for C(e¯) and F (e¯) for e < N (after some work detailed in the Appendix A) asfollows:C(e¯) = pM (e¯)µM+pI(e¯)µI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) −qM (e¯)µM1−qM (e¯)(1−τM ) (2.11)andF (e¯) = { pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) −qM (e¯)τM1−qM (e¯)(1−τM )}{∑e′>e¯ny1e¯+1,e′r(e′, y(e′))}. (2.12)172.4. Understanding the Effects of Incented ActionsOne interpretation of the formula C(e¯) is that the two terms in (2.11) are conditional expectedrevenues associated with progressing to engagement level e¯ + 1 conditioned on the event thatthe player does not stay in engagement level e (by either quitting or advancing). Thus, C(e¯) isthe change in conditional expected revenue from offering incented actions. There is a similarinterpretation of the expressionpM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) −qM (e¯)τM1−qM (e¯)(1−τM ) (2.13)in the definition of F (e¯). Both terms in (2.13) are conditional probabilities of progressing fromengagement level e¯ to engagement level e¯ + 1 conditioned on the event that the player doesnot stay in engagement level e¯ (by either quitting or advancing). Thus, F (e¯) can be seen asthe product of a term representing the increase in the conditional probability of progressing toengagement level e¯ and the sum of revenues from expected visits from state e¯+ 1 to the higherengagement levels.These expressions turn out to be quite useful in later development and numerical examples.For now, we want to provide some intuition behind what drives the benefits of offering incentedaction, both current and future, that is not easily gleaned from these detailed formulas. In par-ticular, we provide precise identification of three effects of incented actions that were discussedinformally in the introduction. To this end, we introduce the notation:∆r(e|e¯) := r(e, y2e¯(e))− r(e, y1e¯(e)), (2.14)which expresses the change in the expected revenue per visit to engagement level e and∆n(e|e¯) = n2e¯,e − n1e¯,e, (2.15)which expresses the change in the number of expected visits to engagement level e (starting atengagement level e¯) before quitting.Note that ∆r(e|e¯) = 0 for e 6= e¯ since we are only considering a local change in policy atengagement level e¯. On the other hand,∆r(e¯|e¯) = −(qM (e¯)− pM (e¯))µM + pI(e¯)µI . (2.16)The latter value is called the revenue effect as it expresses the change in the revenue per visitto the starting engagement level e¯. The retention effect is the value ∆n(e¯|e¯) and expresses thechange in the number of visits to the starting engagement level e¯. Lastly, we refer to the value∆n(e|e¯) for e > e¯ as the progression effect at engagement level e. At first blush it may seempossible for the progression effect to have different in sign at different engagement levels, butthe following result shows that the progression effect is, in fact, uniform in sign.Proposition 2.4.1. Under Assumptions 2.3.1–2.3.4, the progression effect is uniform in sign;that is, either ∆n(e|e¯) ≥ 0 for all e 6= e¯ or ∆n(e|e¯) ≤ 0 for all e 6= e¯.182.4. Understanding the Effects of Incented ActionsThe intuition for the above result is simple. There is only a policy change at the startingengagement level e¯. Thus, the probability of advancing from engagement level e to engagementlevel e + 1 is the same for policy y1e¯ and y2e¯ for e > e¯. Hence, if ∆n(e¯ + 1|e¯) is positive then∆n(e|e¯) is positive for e > e¯+1 since there will be more visits to engagement level e¯+1 and thusmore visits to higher engagement levels since the transition probabilities at higher engagementlevels are unchanged. In light this proposition we may refer to the progression effect generally(without reference to a particular engagement level).If both the revenue effect and retention effects are positive C(e¯) in (2.10) is positive andthere is a net increase in revenue due to visits to engagement level e¯. Similarly, if both effectsare negative then C(e¯) is negative. When one effect is positive and the other is negative, thesign of C(e¯) is unclear. The sign of F (e¯) is completely determined by the direction of theprogression effect.One practical motivation for incented actions is that relatively few players monetize inpractice, and so opening up another channel of revenue the publisher is able to earn more fromits players. Indeed, if qM (e¯) and pM (e¯) are small (say in the order of 2%) then the first termin the revenue effect (2.16) is insignificant when compared to the second term pI(e¯)µI and somost likely to be positive at low engagement levels. Moreover, having an incented action as analternative to monetizing also suggests that it will keep players from quitting and build theircommitment to playing the game. This motivation suggests that the retention and progressioneffects are also likely to be positive, particularly at early engagement levels when players aremost likely to quit and least likely to invest money into playing a game.However, our current assumptions do not fully capture the above logic. It is straightfor-ward to construct specific scenarios that satisfy Assumptions 2.3.1–2.3.4 where the revenue andprogression effects are negative even at low engagement levels (see Example 3 below). Furtherrefinements are needed (see Section 2.5 for further assumptions). This complexity is somewhatunexpected, given the parsimony on the model and structure already placed on the problem.Indeed, the assumptions do reveal a certain structure as demonstrated in the following result.Proposition 2.4.2. Under Assumptions 2.3.1–2.3.4, the retention effect is always nonnegative;that is ∆n(e¯|e¯) ≥ 0.There are two separate reasons for why offering incented actions at engagement level e¯changes the number of visits to e¯. This first comes from the fact that the quitting probabilityat engagement level e¯ goes down from qQ(e¯) to pQ(e¯). The second is that the probability ofprogressing to a higher level engagement also changes from qM (e¯)τM to pM (e¯)τM +pI(e¯)τI whenoffering an ad. Intuitively, the overall affect is somewhat unclear. However, the propositionreveals that the net effect is always nonnegative as a consequence of our assumptions. Observethat the probability of staying in engagement level e always improves when an incented actionis offered:pM (e¯)(1− τM ) + pI(e¯)(1− τI)− qM (e¯)(1− τM ) = pI(e)(−α(e)(1− τM ) + (1− τI)) > 0.192.4. Understanding the Effects of Incented ActionsHowever, it is not necessarily desirable for there to be more visits to engagement e if it isprimarily at the expense of visits to more lucrative engagement levels. We must, in addition,consider the future benefits of the change in policy. The following examples illustrate how thesecurrent and future benefits can be in opposing directions, making it qualitatively more difficultto decide whether to offer an incented action.Example 2. Consider the following two engagement level example. Assume µM = 1, µI =0.25, τM = 0.8, τI = 0.1. At level 0, pM (0) = 0.05, pI(0) = 0.65, α(0) = 0.5 and therebyqM (0) = 0.375. At level 1, pM (1) = 0.2, pI(1) = 0.6, α(1) = 0.75 and thereby qM (1) = 0.65.At level 0, the revenue effect is ∆r(0|0) = pM (0)µM + pI(0)µI − qM (0)µM = 0.05(1) +0.65(0.25)− 0.375(1) = −0.1625 while the retention effect is ∆n(0|0) = 1/(1− pM (0)(1− τM )−pI(0)(1− τI))− 1/(1− qM (0)(1− τM )) = 1/(0.405)− 1/(0.925) = 1.388. Therefore,C(0) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) −qM (0)µM1−qM (0)(1−τM ) =0.21250.405 − 0.3750.925 = 0.1193 > 0 (2.17)Suppose y1(1) = y2(1) = 0, the progression effect is∆n(1|0) = pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qQ(1) −qM (0)τM(1−qM (0)(1−τM ))qQ(1) (2.18)= 0.1050.405(0.35) − 0.30.925(0.35) = −0.1859 (2.19)as a result,F (0) = ∆n(1|0)qM (1)µM = −0.1859(0.65) = −0.1207 < 0 (2.20)Notice that τI/τM = 0.125 but µI/µM = 0.25 and therefore τI is much smaller than τM . Thisimplies that the progression effect is negative and so F (0) is negative. But C(0) is positive sincethe retention effect is dominant. In other words, although the “current” benefits of offeringan incented action at engagement level 0 are positive, these gains out outweighed by losses in“future” benefits.Example 3. Consider the following two engagement level example. Assume µM = 1, µI =0.05, τM = 0.8, τI = 0.3. At level 0, pM (0) = 0.05, pI(0) = 0.65, α(0) = 0.5 and therebyqM (0) = 0.375. At level 1, pM (1) = 0.2, pI(1) = 0.6, α(1) = 0.75 and thereby qM (1) = 0.65.At level 0, the revenue effect is ∆r(0|0) = pM (0)µM + pI(0)µI − qM (0)µM = 0.05(1) +0.65(0.05)− 0.375(1) = −0.2925 while the retention effect is ∆n(0|0) = 1/(1− pM (0)(1− τM )−pI(0)(1− τI))− 1/(1− qM (0)(1− τM )) = 1/(0.535)− 1/(0.925) = 0.788. Therefore,C(0) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) −qM (0)µM1−qM (0)(1−τM ) =0.08250.535 − 0.3750.925 = −0.2512 < 0 (2.21)202.5. Optimal Policies for The PublisherSuppose y1(1) = y2(1) = 0, the progression effect is∆n(1|0) = pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qQ(1) −qM (0)τM(1−qM (0)(1−τM ))qQ(1) (2.22)= 0.2350.535(0.35) − 0.30.925(0.35) = 0.3284 (2.23)henceF (0) = ∆n(1|0)qM (1)µM = 0.3284(0.65) = 0.2134 > 0 (2.24)In contrast to the previous example, there are “current” losses and “future” gains to be had byoffering incented actions at engagement level 0 but, similar to that example, the overall verdictis that it is better not to offer incented actions.These example underscore that it is a nontrivial task to assess the optimality of an incentedaction policy. Whether and how to offer incented actions depends on the specifics of the gameand must weigh how the current and future benefits of incented actions, described in terms ofthe three effects, change as the engagement level evolves. This is the task of the next section.2.5 Optimal Policies for The PublisherRecall the publisher’s problem described in (2.9). This is a dynamic optimization problem wherethe publisher must decide on whether to deploy incented actions at each engagement level, withthe knowledge that a change in policy at one engagement level can effect the behavior of theplayer at subsequent engagement levels. This “forward-looking” nature adds a great deal ofcomplexity to the problem. A much simpler task would be to examine each engagement level inisolation, implying that the publisher need only consider term (i) of (2.10) at engagement levele to decide if y(e) = 1 or y(e) = 0 provides more revenue. A policy built in this way is calledmyopically optimal. More precisely, policy y is myopically optimal if y(e) = 1 when C(e) > 0and y(e) = 0 when C(e) < 0.A myopically optimal policy need not be optimal because it fails to consider future impacts,which can be significant (see Example 2). However, the next result gives a sufficient conditionfor a myopically-optimal policy to be optimal.Proposition 2.5.1. Under Assumptions 2.3.1–2.3.4, if µIµM =τIτMthen a myopically-optimalpolicy is optimal.This result is best understood by looking at the two terms in the change in revenue formula(2.10) discussed in the previous section. It is straightforward to see from (2.11) and (2.12)that when τI = µI and τM = µM that the sign of C(e¯) and F (e¯) are identical. That is, if thecurrent benefit of offering the incented action has the same sign as the future benefit of offeringan action then it suffices to consider the term first C(e¯) only when determining an optimalpolicy. Given our interpretation of C(e¯) and F (e¯), the conditions of Proposition 2.5.1 imply212.5. Optimal Policies for The Publisherthat the conditional expected revenue from progressing one engagement level precisely equalsthe conditional probability of progressing one engagement level. This is a rather restrictivecondition.Since we know of only the above strict condition under which an optimal policy is myopic,in general we are in search of forward-looking optimal policies. Since the game publisher’sproblem is a Markov decision process, an optimal forward-looking policy y must satisfy theoptimality equations for e = 0, . . . , N − 1W y(e) =r(e, 1) + P1(e|e)W (e) + P1(e+ 1|e)W (e+ 1) if y(e) = 1r(e, 0) + P0(e|e)W (e) + P0(e+ 1|e)W (e+ 1) if y(e) = 0 (2.25)and for e = NW y(N) =r(N, 1) + P1(N |N)W y(N) if y(N) = 1r(N, 0) + P0(N |N)W y(N) if y(N) = 0, (2.26)where P1 and P0 are as defined in (2.4) and (2.5) respectively. The above structure showsthat an optimal policy can be constructed by backwards induction (for details see Chapter 4 ofPuterman [46]): first determine an optimal choice of y(N) and then successively find optimalchoices for y(N −1), . . . , y(1) and finally y(0). We use the notation W (e) to denote the optimalrevenue possible with a player starting at engagement level e, called the optimal value function.In addition we use the notation W (e, y = 1) to denote the optimal expected total revenuepossible when an incented action is offered at starting engagement level e. Similarly, we letW (e, y = 0) denote the optimal expected revenue possible when an incented action is not offeredat starting engagement level e. Then W (e) must satisfy Bellman’s equation for e = 0, . . . , N−1:W (e) = max {W (e, y = 1),W (e, y = 0)}= max {r(e, 1) + P1(e|e)W (e) + P1(e+ 1|e)W (e+ 1),r(e, 0) + P0(e|e)W (e) + P0(e+ 1|e)W (e+ 1)} .(2.27)A key fact that we leverage throughout our development is the following.Theorem 2.5.2. Under Assumptions 2.3.1–2.3.4, W (e) is a nondecreasing function of e.This result underscores the value of having players progress in engagement with the game.The higher the engagement of a player, the more revenue can be extracted from them. Thisresult has a natural intuition. Indeed, if the theorem were not true it would even suggest thatour use of the word “engagement” to describe the states would be ill-placed. However, thisresult serves as an important reality check and goes towards establishing the validity of ourmodeling approach.The focus of our discussion is on optimal forward threshold policies that start by offeringincented action. Such a threshold policy y is determined by a single engagement level e¯ where222.5. Optimal Policies for The Publishery(e′) = 1 for e′ ≤ e¯ and y(e) = 0 for e′ > e¯. According to (2.27) this happens when W (e¯+1, y =1) ≤ W (e¯ + 1, y = 0) implies W (e′, y = 1) ≥ W (e′, y = 0) for all e′ ≤ e¯ and W (e′, y = 0) <W (e′, y = 0) for all e′ > e¯+ 1. In the general nomenclature of Markov decision processes otherpolicies would be classified as threshold policies. This includes policies that start with notoffering the incented action until some point and thereafter offering the incented action. Wecall these policies backward threshold.Our interest in forward threshold policies comes from the following appealing practical logic,already hinted at in the introduction. When players start out playing a game their engagementlevel is low and they are likely to quit. Indeed, Theorem 2.5.2 says we get more value out ofplayers at higher levels of engagement. Hence, retaining players at early stages and progressingthem to higher levels of engagement is important for overall revenue. In Proposition 2.4.2,we see the retention effect of offering incented actions is always positive, and intuitively, therevenue and progression effects are largest at low levels of engagement because players areunlikely to monetize early on and the benefits derived from increasing player engagement arelikely to be at their greatest. This suggests it is optimal to offer incented actions at low levelsof engagement. However, once players are sufficiently engaged it might make sense to removedincented actions to focus their attention on the monetization option. If sufficiently engagedand α(e) is sufficiently large, most of the probability of taking the incented action shifts tomonetizing which drives greater revenue.Despite this appealing logic, the following example shows that our current set of assump-tions (Assumptions 2.3.1–2.3.4) are insufficient to guarantee the existence an optimal forwardthreshold policy.Example 4. Consider the following two engagement level example. Assume µM = 1, µI =0.05, τM = 0.5, τI = 0.4. At level 0, pM (0) = 0.05, pI(0) = 0.65, α(0) = 0.5 and therebyqM (0) = 0.375. At level 1, pM (1) = 0.2, pI(1) = 0.6, α(1) = 0.55 and thereby qM (1) = 0.53.We solve the optimal policy by backward induction. At level 1, W (1, y = 1) = pM (1)µM+pI(1)µI1−pM (1)−pI(1) =0.230.2 = 1.15 while W (1, y = 0) =qM (1)µM1−qM (1) =0.530.47 ≈ 1.13. Therefore, y∗(1) = 1 and W (1) =max {W (1, y = 1),W (1, y = 0)} = 1.15.At level 0,W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)(2.28)= 0.08250.585 +0.2850.585(1.15) = 0.141 + 0.56 = 0.701 (2.29)W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)(2.30)= 0.3750.8125 +0.18750.8125(1.15) = 0.462 + 0.265 = 0.727 (2.31)hence y∗(0) = 0 and W (0) = max {W (0, y = 1),W (0, y = 0)} = 0.727.Next we show that y∗ = (0, 1) is the only optimal policy. In fact, we compute W y(0) andW y(1) under all possible policies in the following table. We observe that none of (0, 0), (1, 0) and232.5. Optimal Policies for The PublisherPolicy W y(0) W y(1)y = (0, 0) 0.723 1.13y = (1, 0) 0.691 1.13y = (1, 1) 0.701 1.15y∗ = (0, 1) 0.727 1.15Table 2.2: Total expected profit for Example 4.(1, 1) are optimal. This implies y∗ is the only optimal policy. Since y∗ is not a forward thresholdpolicy, this implies there is no optimal forward threshold policy. Thus we see it is optimal tooffer incented actions at the higher engagement level because of the dramatic reduction in thequitting probability when offering incented actions to 0.2 quitting probability compared to a 0.47quitting probability when not offering incented actions. Although the expected revenue per periodthe player stays at the highest engagement level is lower when incented actions are offered (0.23as compared to 0.47) the player will stay longer and thus earn additional revenue. However, atthe lowest engagement level the immediate reward of not offering incented actions (0.462 versus0.141) outweighed the losses due to a lower chance of advancing to the higher engagement level.The goal for the remainder of this section is to devise additional assumptions that are rel-evant to the settings of interest to our paper and that guarantee the existence of an optimalforward threshold policy. The previous example shows how α plays a key role in determin-ing whether an threshold policy is optimal or not. When incentives actions are removed theprobability pI(e) is distributed to the monetization and quitting actions according to α(e). Theassociated increase in the probability of monetizing from pM (e) to qM (e) makes removing incent-ed actions attractive, since the player is more likely to pay. However, the quitting probabilityincreases from pQ(e) to qQ(e), a downside of removing incented actions. Intuitively speaking, ifα(e) grows sufficiently quickly, the benefits will outweigh the costs of removing incented actions.From Assumption (A3.6) we know that α(e) increases, but this alone is insufficient. Just howquickly we require α(e) to grow to ensure a threshold policy requires careful analysis. Thisanalysis results in lower bounds on the growth of α(e) that culminates in Theorem 2.5.8 below.Our first assumption on α(e) is a basic one:Assumption 2.5.3. α(N) = 1; that is, qQ(N) = pQ(N) and qM (N) = pM (N) + pI(N).It is straightforward to see that under this assumption it is never optimal to offer incentedaction at the highest engagement level. This assumption also serves as an interpretation of whatit means to be in the highest engagement level, simply that players who are maximally engagedare no more likely to quit when the incented action is removed. Under this assumption, andby Bellman’s equation (2.27), every optimal policy y∗ has y∗(N) = 0. Note that this excludesthe scenario in Example 4 and also implies that backwards threshold policies are not optimal(except possibly the policy that y(e) = 0 for all e ∈ E that is both a backward and forwardthreshold). Given this, we restrict attention to forward threshold policies and drop the modifier“forward” in the rest of our development.242.5. Optimal Policies for The PublisherThe next step is to establish further sufficient conditions on the data that ensure that oncethe revenue, retention and progression are negative, they stay negative. As in Section 2.4,we consider paired policies y1e¯ and y2e¯ with a local change at e¯. Recall the notation ∆r(e|e¯)and ∆n(e|e¯) defined in (2.14) and (2.15), respectively. We are concerned with how ∆r(e|e¯) and∆n(e|e¯) change with the starting engagement level e¯. It turns out that the revenue effect ∆r(e|e¯)always behaves in a way that is consistent with a threshold policy, without any additionalassumptions.Proposition 2.5.4. Suppose Assumptions 2.3.1–2.3.4 hold. For every engagement level e¯let y1e¯ and y2e¯ be paired policies with a local change at e¯. Then the revenue effect ∆r(e¯|e¯) isnonincreasing in e¯ when ∆r(e¯|e¯) ≥ 0. Moreover, if ∆r(e¯|e¯) < 0 for some e¯ then ∆r(e′|e′) < 0for all e′ ≥ e¯.This proposition says that the net revenue gain per visit to engagement level e¯ is likely toonly be positive (if it is ever positive) at lower engagement levels, confirming our basic intuitionthat incented actions can drive revenue from low engagement levels, but less so from highlyengaged players. To show a similar result for the progression effect we make the followingassumption.Assumption 2.5.5. α(e+ 1)− α(e) > qM (e+ 1)− qM (e) for all e ∈ E.This provides our first general lower bound on the growth of α(e). It says that α(e) mustgrow faster than the probability qM (e) of monetizing when the incented action is not offered.Proposition 2.5.6. Suppose Assumptions 2.3.1–2.5.5 hold. For every engagement level e¯ lety1e¯ and y2e¯ be paired policies with a local change at e¯ such that y1e¯ and y2e¯ are identical to somefixed policy y (fixed in the sense that y is not a function of e¯) except at engagement level e¯.Then(a) If ∆n(e|e¯) < 0 for some e¯ then ∆n(e|e′) < 0 for all e′ ≥ e¯.(b) If C(e¯) < 0 for some e¯ then C(e′) < 0 for all e′ ≥ e¯, where C is as defined in (2.10).This result implies that once the current and future benefits of offering an incented actionare negative, they stay negative for higher engagement levels. Indeed, Proposition 2.5.6(a)ensures that the future benefits F in (2.10) stay negative once negative, while (b) ensures thecurrent benefits C stay negative once negative. In other words, once the game publisher stopsoffering incented actions it is never optimal for them to return. Note that Proposition 2.5.4does not immediately imply Proposition 2.5.6(b), Assumption 2.5.5 is needed to ensure theretention effect has similar properties, as guaranteed by Proposition 2.5.6(a) for e′ = e¯.As mentioned above, the conditions established in Proposition 2.5.6 are necessary for theexistence of an optimal threshold policy, but does not imply that an threshold policy exists.This is due to the fact that C and F in (2.10) may not switch sign from positive to negative atthe same engagement level. This is illustrated in the following example.252.5. Optimal Policies for The PublisherExample 5. Consider the following two engagement level example. Assume µM = 1, µI = 0.2,τM = 0.91, τI = 0.47. At level 0, pM (0) = 0.03, pI(0) = 0.51, α(0) = 0.59 and therebyqM (0) = 0.3309. At level 1, pM (1) = 0.05, pI(1) = 0.5, α(1) = 0.62 and thereby qM (1) = 0.36.At level 2, pM (2) = 0.34, pI(2) = 0.45, α(2) = 1 and thereby qM (1) = 0.79.The optimal policy is y∗ = (0, 1, 0). We use backward induction. At the highest level 2, wehave y∗(2) = 0 and W (2) = 0.79/0.21 = 3.7619. At level 1,W (1, y = 1) = pM (1)µM+pI(1)µI1−pM (1)(1−τM )−pI(1)(1−τI) +pM (1)τM+pI(1)τI(1−pM (1)(1−τM )−pI(1)(1−τI))qM (2)µMqQ(2)= 0.150.7305 +0.28050.7305(3.7619) = 0.2053 + 0.3840(3.7619) = 1.6498W (1, y = 0) = qM (1)µM1−qM (1)(1−τM ) +qM (1)τM1−qM (1)(1−τM )qM (2)µMqQ(2)= 0.360.9676 +0.32760.9676(3.7619) = 0.3721 + 0.3386(3.7619) = 1.6459therefore y∗(1) = 1 and W (1) = W (1, y = 1) = 1.6498. Moreover, C(1) = 0.2053 − 0.3721 =−0.1668 and F (1) = (0.3840− 0.3386)(3.7619) = 0.0454(3.7619) = 0.1708. Finally, we look atlevel 0.W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.13200.7270 +0.26700.7270(1.6498) = 0.1816 + 0.3673(1.6498) = 0.7876W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.33090.9702 +0.30110.9702(1.6498) = 0.3411 + 0.3104(1.6498) = 0.8532as we can see y∗(0) = 0 and W (0) = W (0, y = 0) = 0.8532. Besides, C(0) = 0.1816− 0.3411 =−0.1595 and F (0) = (0.3673− 0.3104)(1.6498) = 0.0569(1.6498) = 0.0939. The optimal policyis not a threshold policy.In fact, Assumption 2.5.5 is satisfied because α(1)−α(0) = 0.62−0.59 = 0.03 while qM (1)−qM (0) = 0.3600− 0.3309 = 0.0291.We thus require one additional assumption:Assumption 2.5.7. 1−α(e+1) ≤ (1−α(e)) pM (e+1)τM+pI(e+1)τIpQ(e+1)+pM (e+1)τM+pI(e+1)τI for e = 1, 2, . . . , N−1.Note that the fractional term in the assumption is the probability of advancing from engage-ment level e+1 to e+2 conditioned on leaving engagement level e+1 and is thus less than one.Hence, this is yet another lower bound on the rate of growth in α(e), complementing Assump-tions 2.5.3 and 2.5.5. Which of the bounds in Assumptions 2.5.5 or 2.5.7 is tighter depends onthe data specifications that arise from specific game settings. In Example 5 we noted that As-sumption 2.5.5 holds but observe that Assumption 2.5.7 fails, since (1−α(1)) = 1−0.62 = 0.38and (1− α(0)) pM (1)τM+pI(1)τI(1−pM (1)(1−τM )−pI(1)(1−τI)) = (1− 0.59)0.28050.7305 = 0.1574.262.6. Game Design and Optimal Use of Incented ActionsTheorem 2.5.8. Suppose Assumptions 2.3.1–2.5.7 hold. Then there exists an optimal thresholdpolicy with threshold engagement level e∗. That is, there exists an optimal policy y∗ with y∗(e) =1 for any e ≤ e∗ and y∗(e) = 0 for any e > e∗.The existence of an optimal threshold is the cornerstone analytical result of this paper.From our development above, it should be clear that obtaining a sensible threshold policy isfar from a trivial task. Indeed, in many MDP models great effort is put into establishingtheir existence. We believe our assumptions are reasonable based on our understanding of thegames, given the difficult standard of guaranteeing the existence of a threshold policy. Of course,such policies will be welcomed in practice, precisely because of their simplicity and (relatively)intuitive justification. We also remark that none of these assumptions are superfluous. In theAppendix A we show that if you drop Assumptions 2.5.5 then a threshold policy may no longerbe optimal. Example 5 shows that the same is true if Assumption 2.5.7 is dropped. As we see insome examples in the next section, our assumptions are sufficient but not necessary conditionsfor an optimal threshold policy to exist.To simplify matters further, we also take the convention that when there is a tie in Bellman’sequation (2.27) whether to offer an incented action or not, the publisher always chooses not tooffer. This is consistent with the fact that is a cost to offering incented actions. Although wedo not model costs formally, we will use reasoning to break ties. Under this tie-breaking rulethere is, in fact, a unique optimal threshold policy guaranteed by Theorem 2.5.8. This uniquethreshold policy is our object of study in the remainder of the paper.2.6 Game Design and Optimal Use of Incented ActionsSo far we have provided a detailed analytical description of the possible benefits of offering in-cented actions (in Section 2.4) and the optimality of certain classes of policies (in Section 2.5).There remains the question of what types of games most benefit from offering incented actionsand how different types of games may qualitatively differ in their optimal policies. We focuson optimal threshold policies and concern ourselves with how changes in the parameters ofthe model affect the optimal threshold e∗ of an optimal threshold policy y∗ that is guaranteedto exist under Assumptions 2.3.1–2.5.7 by Theorem 2.5.8. Of course, these are only sufficien-t conditions and so we do not restrict ourselves to that setting when conducting numericalexperiments in this section.We first consider how differences in the revenue parameters µI and µM affect e∗. Observethat only the revenue effect in (2.16) is impacted by changes in µI and µM , the retention andprogression effects are unaffected. This suggests the following result:Proposition 2.6.1. The optimal threshold e∗ is a nondecreasing function of the ratio µIµM .Note that the revenue effect is nondecreasing in the ratio µIµM . Since the other effects areunchanged, this implies that the benefit of offering incented actions at each engagement levelis nondecreasing in µIµM , thus establishing the monotonicity of e∗ in µI/µM .272.6. Game Design and Optimal Use of Incented ActionsTo interpret this result we consider what types of games have a large or small ratio µIµM .From the introduction in Section 2.1 we know that incented actions typically deliver far lessrevenue to the publisher than in-app purchases. This suggests that the ratio is small, favoringa lower threshold. However, this conclusion ignores how players in the game may influence eachother. Although our model is a single player model, one way we can include the interactionsamong players is through the revenue terms µI and µM . In many cases, a core value of a playerto the game publisher is the word-of-mouth a player spreads to their contacts. Indeed, this isthe value of non-paying players that other researchers have mostly focused on (see, for instanceLee et al. [33], Jiang and Sarkar [28], and Gupta et al. [24]). In cases where this “social effect”is significant it is plausible that the ratio of revenue terms is not so small. For instance, if δis the revenue attributable to the word-of-mouth or network effects of a player, regardless ofwhether the player takes an incented actions or monetizes, then the ratio of interest is µI+δµM+δ .The larger is δ the larger is this ratio, and according to Proposition 2.6.1, the larger is theoptimal threshold.This analysis suggests that games with a significant social component should offer incentedactions more broadly in social games. For instance, if a game includes cooperative or competitivemulti-player features, then spreading the player base is of particular value to the company.Thought of another way, in a social game it is important to have a large player base to createpositive externalities for new players to join, and so having players quit is of greater concern inmore social games. Hence, it is best to offer incented action until higher levels of engagementare reached. All of this intuition is confirmed by Proposition 2.6.1.Besides the social nature of the game, other factors can greatly impact the optimal thresh-old. Genre, intended audience, and structure of the game affect the other parameters of ourmodel; particularly, τI , τM , and α(e). We first examine the progression probabilities τI andτM . As we did in the case of the revenue parameters, we focus on the ratioτIτM. This ratiomeasures the relative probability of advancing through incented actions versus monetization.By Assumption (A3.5), τIτM ≤ 1 but its precise value can depend on several factors. One is therelative importance of the reward granted to the player when taking an incented action. Recallour discussion of Crossy Road in the introduction, one measure of engagement could be thenumber of unique characters accumulated by the player. Because characters can be purchaseddirectly with real money, this makes τM = 1. However, the reward for watching a video ad isonly coins that can be used in a random draw for new characters. Depending on the odds ofthat draw, τI can be large or small.Taking τM fixed (possibly at 1 as in the example of Crossy Road) we note that increasingτI decreases the “current” benefit of offering incented actions, as seen by examining term C(e¯)in (2.10). Indeed, the revenue effect is unchanged by τI , but the retention effect is weakened.The impact on future benefits is less obvious. However, we know players are more likely toadvance to a higher level of engagement with a larger τI . From Theorem 2.5.2 we also knowhigher engagement states are more valuable, and so we expect the future benefits of offering282.6. Game Design and Optimal Use of Incented Actionsincented actions to be positive with a higher τI and even outweigh the loss in current benefits.This reasoning is confirmed in the next result.Proposition 2.6.2. The optimal threshold e∗ is a nondecreasing function of the ratio τIτM .One interpretation of this result is that the more effective an incented action is at increas-ing engagement of the player, the longer the incented action should be offered. This is indeedreasonable under the assumption that pI(e) and pM (e) are unaffected by changes in τI . How-ever, if increasing τI necessarily increases pI(e) (for instance, if the reward of the incentedaction becomes more powerful and so drives the player to take the incented action with greaterprobability) the effect on the optimal threshold is less clear.Example 6. In this example we show that when the incented action is more effective it canlead to a decrease in the optimal threshold if pI(e) and pM (e) are affected by changes in τI .Consider the following two engagement level example. In the base case let µM = 1, µI = 0.05,τM = 0.8, τI = 0.2. At level 0, pM (0) = 0.3, pI(0) = 0.5, α(0) = 0.7 and thereby qM (0) = 0.65.At level 1, pM (1) = 0.5, pI(1) = 0.4, α(1) = 1 and thereby qM (1) = 0.9. One can show that theunique optimal policy is y∗ = (0, 1) (for details see Appendix A).Now change the parameters as follows: increase τI to 0.25, which affects the decision-makingof the player so that pM (0) = 0.1, pI(0) = 0.7, pM (1) = 0.3 and pI(1) = 0.6. In other words,the incented action became so attractive it reduces the probability of monetizing while increasingthe probability of taking the incented action. One can show that the unique optimal policy in thissetting is y∗ = (0, 0). Hence the optimal threshold has decreased. In conclusion, a change in theeffectiveness of the incented action in driving engagement can lead to an increase or decreasein the optimal threshold policy, depending on the player’s behavioral response.This leads to an important investigation of the changes in the degree of cannibalizationbetween incented actions and monetization. Recall that α(e) is the vector of parameters thatindicates the degree of cannibalization at each engagement level. For sake of analysis, we assumethat α(e) is an affine function of e withα(e) = α(0) + αstepe (2.32)where α(0) and αstep are nonnegative real numbers. A very high α(0) indicates a design wherethe reward of the incented action and the in-app purchase have a similar degree of attractivenessto the player so that when the incented action is removed the player is likely to monetize. Thissuggests that the cost-to-reward ratio of the incented action is similar to that of the in-apppurchase. If one is willing, for instance, to endure the inconvenience of watching a videoad in order to get some virtual currency, they should be similarly willing to pay real moneyfor a proportionate amount of virtual currency. A very low α(0) is associated with a veryattractive cost-to-reward ratio for the incented action that makes monetization seem expensivein comparison.292.6. Game Design and Optimal Use of Incented Actions(a) Initial value α(0) = 0 (b) Initial value α(0) = 0.16Figure 2.3: Sensitivity of the optimal threshold to changes in αstep.The rate of change αstep represents the strength of increase in cannibalization as the playeradvances in engagement. A fast rate of increase is associated with a design where the value ofthe reward of the incented action quickly diminishes. For instance, in Crossy Road the rewardof earning coins to watch a video ad for a chance of randomly drawing a character naturallydiminishes in value as the player accumulates more characters. Despite the reward weakening,this option still attracts a lot of attention from players, especially if they have formed a habit ofadvancing via this type of reward. If, however, the videos are removed, the value proposition ofmonetizing seems attractive in comparison to the diminished value of the reward for watchinga video. Seen in this light, the rate at which the value of the reward diminishes is controlledby the parameter αstep.Analysis of how different values for α(0) and αstep impact the optimal threshold is notstraightforward. This is illustrated in the following two examples. The first considers sensitivityof the optimal threshold to αstep.Example 7. Consider the following example with nine engagement levels and the followingdata: µm = 1, µI = 1, 0.05 τM = 0.8, τI = 0.4, pM (e) = 0.0001 + 0.00005e and pI(e) =0.7 − 0.00001e for e = 0, 1, . . . , 8. We have not yet specified α(e). We examine two scenarios:(a) where α(0) = 0 and we vary the value of αstep (see Figure 2.3(a)) and (b) where α(0) = 0.16and we vary the value of αstep (see Figure 2.3(b)). The vertical axis of these figures is theoptimal threshold of the unique optimal threshold policy for that scenario. What is striking isthat the threshold e∗ is nonincreasing in αstep when α(0) = 0 but nondecreasing in αstep whenα(0) = 0.16.One explanation in the difference in patterns between Figures 2.3(a) and 2.3(b) concernswhether it is optimal to include incented actions initially or not. In Figure 2.3(a) the initialdegree of cannibalization is zero, making it costless to offer incented actions initially. Whenαstep is very small cannibalization is never an issue and incented actions are offered throughout.However, as αstep increases, the degree of cannibalization eventually makes it optimal to stopoffering incented actions to encourage monetization. This explains the nonincreasing patternin Figure 2.3(a).302.6. Game Design and Optimal Use of Incented Actions0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4012Alpha Initial Value: alpha(0)Optimal Threshold e* GEffect of alpha(0) on e*GFigure 2.4: Sensitivity of the optimal threshold to changes in α(0).By contrast, in Figure 2.3(b) the initial degree of cannibalization is already quite high, makingit optimal to start by offering for low values of αstep. However, when αstep is sufficiently large,there are benefits to encouraging the player to advance. Recall, α(e) affects both the probabilityof monetization and the probability of quitting. In the case where αstep is sufficiently high thereare greater benefits to the player progressing, making quitting early more costly. Hence it canbe optimal to offer incented actions initially to discourage quitting and encourage progression.This explains the nondecreasing pattern in Figure 2.3(b).As the following example illustrates, adjusting for changes in α(0) reveals a different typeof complexity.Example 8. Consider the following two engagement level example. Assume µM = 1, µI =0.0001, τM = 0.01, τI = 0.009. At level 0, pM (0) = 0.05, pI(0) = 0.68. At level 1, pM (1) = 0.3,pI(1) = 0.65. We set α step size be 0.6, i.e. α(1) = α(0)+0.6. Figure 2.4 captures how changesin α(0) lead to different optimal thresholds. For complete details on the derivation of the figuresee Appendix A.The striking feature of the figure is that the optimal threshold decreases, and then increas-es, as α(0) becomes larger. This “U”-shaped pattern reveals competing effects associated withchanges in α(0). As α(0) increases, the benefit of increasing retention (at the cost of harmingretention) weakens. This contributes to downward pressure on the optimal threshold. On theother hand, increasing α(0) also increases α(1). This increases the attractiveness of reaching ahigher engagement level and dropping the incented action. Indeed, referring to Table 2.1, W y(1)is increasing in α(1) when y(0) = 1. This puts upward pressure on the optimal threshold. Thislatter “future” benefit is weak for lower levels of α(0), where it may be optimal to offer anincented action in the last period. This provides justification for the “U”-shaped pattern.The scenarios in the above two examples provide a clear illustration of the complexity ofour model. At different engagement levels, and with different prospects for the value of future312.7. Conclusionbenefits, the optimal strategy can be influenced in nonintuitive ways. This is particularly truefor changes in α(e) as it impacts all three effects – revenue, retention, and progression. Insome sense, cannibalization is the core issue in offering incented actions. This is evident in ourexamples and a careful examination of the definitions in Section 2.4 – the parameter α(e) isubiquitous.2.7 ConclusionIn this paper we investigated the use of incented actions in mobile games, a popular strategyfor extracting additional revenue from players in freemium games where the vast majority ofplayers are unlikely to monetize. We discussed the reasons for offering incented actions, andbuilt an analytical model to assess the associated tradeoffs. This understanding lead us to definesufficient conditions for the optimality of threshold policies, which we later analyzed to providemanagerial insights into what types of game designs are best suited to offering incented actions.Our approach of using an MDP has some direct benefits to practitioners. With player data andrelevant game parameters that companies have access to in the age of big data, validating ourmodel and using it to derive insights on the impact of certain policies is plausible.Our analytical approach was to devise a parsimonious stylized model that abstracts a fairdeal from reality and yet nonetheless maintained the salient features needed to assess the impactand effects of offering incented actions. For instance, we assume the publisher has completeknowledge about the player’s transition probabilities and awareness of the engagement state. Inthe setting where transition probabilities are unknown, some statistical learning algorithm andclassification of players into types would be required. Moreover, in the situation where engage-ment is difficult to define or measure, a partially observed Markov decision process (POMDP)model would be required, where only certain signals of the player’s underlying engagement canbe observed. There is also the question of microfounding the player model that we explore,asking what random utility model could give rise to the transition probabilities that we take asgiven in our model. All these questions are outside of our current scope but could nonethelessadd realism to our approach. Of course, the challenge of establishing the existence of thresholdpolicies in these extensions is likely to be prohibitive. Indeed, discovering analytical propertiesof optimal policies of any form in a POMDP is challenging [30]. It is likely that these exten-sions would produce studies that are more algorithmic and numerical in nature, whereas in thecurrent study we were interested in deriving insights.Finally, the current study ignores an important actor in the case of games hosted on mobileplatforms – the platform holder. In the case of the iOS App Store, Apple has made severalinterventions that either limited or more closely monitored the practice of incented actions (see,for instance, Connelly [15]). In fact, the platform holder and game publisher have misalignedincentives when it comes to incented actions. Typically, the revenue derived from incentedactions is not processed through the platform, whereas in-app purchases are. We feel that322.7. Conclusioninvestigation of the incentive misalignment problem between platform and publisher, possiblyas a dynamic contracting problem, is a promising area of future research. The model developedhere is a building block for such a study.33Chapter 3A Dynamic Price-Only Contract:Exact and Asymptotic Results3.1 IntroductionConsider a simple supply chain with two firms with symmetric information, wherein a sellertrades with a downstream buyer who faces customer demand for the product. This system isperhaps the most well understood decentralized model and has been analyzed in the extantliterature in industrial organization and operations management among others. In this paper,we revisit this simple system under what we call a generalized price-only contract. We demon-strate several interesting properties of this system under this easy to understand contract, showinteresting connections to established results in the literature and explore the implications ofour findings to future research.A price-only contract, otherwise known as a simple linear wholesale price contract, specifies aper unit price w at which the seller offers her product to the buyer who then buys some quantityq. The buyer uses this quantity (and potentially other levers such as selling price when relevant)to generate revenue from customer demand. Therefore, in the simplest setting where players arestrategic, these decisions are arrived at as an equilibrium of a corresponding game. The commonparadigm is one where the seller moves first, setting a wholesale price w, which is followed by thebuyer’s decision of purchasing a quantity q. This is referred to as a Stackelberg game (with theseller as the leader in this case). It is well known that the resulting vertical competition betweenthe players in this model leads to inefficiencies referred to as double marginalization. This lossof efficiency (failing to get to the first best) can be addressed by numerous contracting recipeswhen there are no information asymmetries. Our interest is not in addressing this inefficiency,although one consequence of our analysis is related to achieving first best. Rather, we explorewhat would be the effect on the equilibrium decisions if instead of providing the two firmsone opportunity to trade, they are allowed to engage dynamically, multiple times, still using asimple linear price-only contract. To be precise, for some n positive, the seller first informs thebuyer that n wholesale prices will be offered sequentially and dynamically. Then she proposesthe first price, w1, and the retailer decides upon an order quantity, q1, at this price. Thereafter,the supplier offers a new price, w2, and the retailer orders a new quantity q2, etc. At the end ofthe last offer, indexed by n, the buyer has cumulatively purchased Qn = q1 + · · ·+ qn, which isused to satisfy market demand. Figure 3.1 illustrates the sequence of events. Thus, transactions343.1. IntroductionFigure 3.1: Sequence of events under n-stage generalized price-only contract(a) n = 1(b) n = 2Figure 3.2: Illustration of generalized price-only contractbetween the firms occur dynamically and incrementally, in anticipation of downstream revenuesthat will accrue by meeting demand after the n transactions were concluded. We refer to thiscontract as a generalized price-only contract, and we study some properties of this contract.Figure 3.2 gives two examples when n = 1 and n = 2.Special instances of the generalized price-only contract were introduced and studied by Er-hun et al. [20] and by Mart´ınez de Albe´niz and Simchi-Levi [38]. Specifically, Erhun et al.[20] study a dynamic model of a procurement between a supplier and a buyer, when demandis deterministic and linear with price. They prove that the supplier, the buyer and the endcustomers benefit from multiple trading opportunities versus a one-shot procurement agree-ment. Mart´ınez de Albe´niz and Simchi-Levi [38] extend the results to the newsvendor setting.They provide sufficient conditions for the existence and uniqueness of a well-behaved sub-gameperfect equilibrium and they show that as the number of rounds increases, the profits of thesupply chain increases towards the supply chain optimum.In this paper, we consider a similar model, but we allow for fairly general demand settings.Similar to [20] and [38], we show that in our more general setting, both parties benefit frommultiple trading opportunities. Indeed, the benefit increases with the value of n, the horizonspecified in the contract. Moreover, as n approaches infinity, the sum of the profits of the two353.1. Introductionplayers approaches the first best profit. This, of course, implies that the total order quantity isnon decreasing and converges to the corresponding first best order quantity as n increases. Theresults hold without specifying the demand function. We only require that the buyer’s revenuefunction satisfies some mild conditions.We also show that for a given generalized price-only contract with a specified n, the whole-sale prices monotonically decrease. However, somewhat surprisingly, for a fixed n, the orderquantities within the n periods may not be monotonic; that is, it can decrease or increase int = 1, . . . , n. Finally, we provide necessary and sufficient conditions for the supplier’s revenueto increase, decrease or remain constant from one period to another, and we derive closed formsolutions for three settings in which demand is exponential, uniform or constant.Our paper is also related to three streams in the literature. The first deals with documentingand quantifying the loss to the supply chain of using a simple linear wholesale price contract.Lariviere and Porteus [31] study this problem in the newsvender setting and derive conditionson the demand distribution for the existence of an interior (equilibrium) solution. They alsoempirically calculate the potential efficiency losses for various demand distributions. Perakisand Roels [45] obtain a theoretical bound on the worst-case loss in this system, and extendthe analysis to other related systems. One of our results demonstrates that as the number oftransactions, n, grows, the efficiency loss decreases and approaches zero as n goes to infinity. Infact, this seems to be true for more general systems where the buyer may not be a newsvendor.The second stream in the literature is from industrial organization and is related to thework of Coase [14] on durable goods. Coase’s result (a conjecture to be precise) was that aduropolist (a monopolist selling a durable good) does not have monopoly power due to herinability to commit. The simple intuition behind the Coase conjecture is that if the duropolistcharges a high price for the durable good, then consumers anticipate a future price reduction (asthey expect the duropolist to later target consumers with lower valuations), and therefore theyprefer to wait. The duropolist, anticipating such consumer behavior, will then drop prices downto the competitive level. Thus, a duropolist faces intense competition, not from other playersbut from future incarnations of herself. This intuition does not quite translate when customersare atomic and have impact on the outcome of the game. In this case, the subgame perfectequilibrium, known as the Pacman equilibrium, delivers a non zero profit to the duropolist anddecreases the effect of the commitment problem. An excellent of discussion of this topic andbounds on the duropolist profits are given in Berbeglia et al. [6]. The results in our note arerelated as we look at a setting with two players, a buyer and a seller, trading with each other,and the buyer selling to the market. Our results indicate that in such a setting, the commitmentproblem is weakened and the first best solution emerges that distributes the profit between theplayers. We also note that our results cannot be recovered from this stream in the literature.The third stream in the literature that is related to our paper is the one on negotiationpower and contracting in supply chains. If one thinks of contracts as a way to achieve efficiencyin a competitive supply chain, allocations of the first best solution is usually delegated to363.2. Generalized Price-only Contracts with n Offerssome sort of negotiation process. The outcome of this process depends on various factors, animportant one being the negotiation power of the players. The results in our paper provide anorganic division of the first best profits as n approaches infinity. The seller enjoys some powerdue to her role as a Stackelberg leader, but is unable to extract the entire surplus, both inthe regular setting as well as in our dynamic and incremental setting. The exact division ofthe pie, in the dynamic setting, depends on several factors including the elasticity of demand,the shape of the demand distribution etc., which will be discussed later. Thus, our resultsprovide an alternative way of looking at the notion of bargaining power in supply chains (seealso Bernstein and Nagarajan [7] for a more extensive discussion of this topic). A related butsomewhat different paper on this topic is the one by Anand et al. [2]. They study a simpledynamic problem where a buyer and a seller transact using wholesale prices and the buyerfaces a demand which is price sensitive and deterministic. Unlike our model, the buyer facescustomer demand in each period. Anand et al. [2] show that the buyer carries over inventoryfrom one period to another purely for strategic reasons — as demand is certain, there are nofixed costs or other economies of scale. The main reason for carrying inventory is that bydoing so, the buyer forces the seller to offer lower wholesale prices in future periods since thebuyer is able to start future periods with a positive inventory. The monotonic nature of thewholesale prices in their model is similar to the monotonicity of wholesale prices that we derivein our model. Moreover, they similarly show that the dynamic interaction (and the presenceof strategic interactions) is beneficial to both players despite the seller’s channel relative powerbeing eroded by the dynamic interaction.3.2 Generalized Price-only Contracts with n OffersWe consider a seller (henceforth referred to as a supplier) who transacts with a buyer (referredto as a retailer) using a generalized price-only contract. For some n exogenous, the supplierinforms the retailer that n wholesale prices will be offered sequentially. For each wholesale priceproposed, the retailer decides upon the quantity he will purchase at that price. Thus, indexingthe offers forward, the supplier first offers a price w1, and the retailer commits to buying q1;then the supplier offers w2, and the retailer commits to q2, and so on so forth. The issue ofcommitment is not significant here. For example, we can simply assume that each transactionis completed and money and goods are exchanged before the next price is announced by thesupplier. All decisions are made in anticipation of the demand. We will assume that bothplayers are risk neutral. At the end of the n-th offer, trade occurs between the two players andthe total amount of units that the retailer has purchased is denoted by Qn = q1 + · · ·+ qn. Theretailer then uses Qn to satisfy market demand.We first assume that the demand faced by the retailer can be quite general. Let R(Q) bethe retailer’s (expected) sales revenue given the total inventory level Q. For example, underthe newsvendor setting, R(Q) = p∫ Q0 F¯ (z)dz, where p is the exogenous retail price and F is373.2. Generalized Price-only Contracts with n Offersthe CDF of demand distribution (F¯ = 1 − F ). If demand is deterministic but price sensitivewith inverse demand function p = P (Q), we have R(Q) = P (Q)Q. We assume that R(Q) issufficiently smooth, as will be clear from the analysis. A stronger assumption can be R(Q)is analytic, which requires that the function is infinitely differentiable and the power seriesconverges to it. Further, we assume R(Q) to be strictly concave. This holds for most economicsettings we are interested in. For simplicity, we assume the supplier’s marginal production costc = 0. As long as costs are linear, this assumption is without loss of generality. We define pinR,pinS and pinT be the retailer, the supplier and the supply chain total profit, respectively, whichare given bypinR = R(q1 + · · ·+ qn)− w1q1 − w2q2 − · · · − wnqn ,pinS = w1q1 + w2q2 + · · ·+ wnqn ,pinT = pinR + pinS = R(q1 + · · ·+ qn) .Denote by QFB the first-best order quantity which maximizes the total supply chain profitpiT = R(Q). Since the problem is concave, QFB can be characterized by the first-order conditionR′(QFB) = 0.When the supplier and the retailer are independent self-interested rational players, we modelthe problem as a dynamic game with perfect information where the supplier and the retailertake actions sequentially with the supplier being the first mover. We are interested in thesubgame perfect Nash equilibrium (SPNE). Due to the lack of commitment issue and the factthat in any epoch t, the only relevant information that both players use in computing theirstrategies is the length of the horizon and total order quantity traded up until t, relativelymild conditions are needed to guarantee the existence of a SPNE. Denote by w∗t and q∗t theequilibrium solution, t = 1, . . . , n. We denote by xt the pre-order inventory level at periodt, so x1 = 0 and xt = q1 + · · · + qt−1, t = 2, . . . , n. We let x∗t = q∗1 + · · · + q∗t−1 be thecorresponding equilibrium pre-order inventory level at period t, and we let Qn,∗ = q∗1 + · · ·+ q∗nbe the corresponding equilibrium total quantity purchased.We use backward induction to find a SPNE. At the last offer, we first determine the retailer’soptimal order quantity q∗n for each possible wholesale price wn and history of the previousoffers. Then, given the retailer will follow his strategy, we compute what price w∗n the supplierwill offer for each possible history. Note that the pre-order inventory level xn summarizes alluseful information about the history of previous offers to determine the equilibrium solution.Hence, the equilibrium strategy can be denoted by q∗n(xn, wn) and w∗n(xn). Now, one can showthat there exists a mapping between wn and q∗n for any given xn, so we equivalently denotethe equilibrium strategy as w∗n(xn, qn) and q∗n(xn). Therefore, solving the n-th offer problemallows us to naturally revert back to the (n− 1)-th offer and compute the equilibrium strategyw∗n−1(xn−1, qn−1) and q∗n−1(xn−1). We repeat this process until we solve the first offer problemw∗1(x1, q1) and q∗1(x1). The following theorem provides a characterization of the SPNE for this383.2. Generalized Price-only Contracts with n Offersproblem.Proposition 3.2.1. Suppose the subgame perfect Nash equilibrium exists, the equilibrium so-lution at t = n− k (k ≥ 0) satisfies the following condition:w∗n−k(xn−k, qn−k) = (k + 1)R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j)−k∑j=1w∗n−k+j∂q∗n−k+j∂xn−k+jj−1∏m=1(1 +∂q∗n−k+m∂xn−k+m)(3.1)q∗n−k(xn−k) =the solution of: R′(Qn)∏kj=1(1 +∂q∗n−k+j∂xn−k+j ) +∂w∗n−k∂qn−k qn−k = 0 if xn−k < QFB0 if xn−k ≥ QFB(3.2)where Qn = xn−k + qn−k + q∗n−k+1 + · · ·+ q∗n.Proposition 3.2.1 characterizes the equilibrium solution if it exists. Naturally, for generaldemand functions, condition (3.2) does not lead to closed form algebraic expressions. Howev-er, our interest is not to derive closed form expressions, but rather, primarily to analyze theproperties of the equilibrium solution and the effect of multiple price offers on the supplier, theretailer and the whole supply chain.Our first result is intuitive and shows that multiple price offers mitigate the double marginal-ization effect.Theorem 3.2.2. (a) The equilibrium total inventory Qn,∗ strictly increases in the number ofoffers n but will not exceed the first best inventory level QFB.(b) When the total number of price offers n increases, both the supplier’s total profit andthe retailer’s total profit increase; moreover the supply chain’s total profit strictly increases.Intuitively, the opportunity to order multiple times will weaken the supplier’s power andintroduce a competition among the different offers. As a result, the supplier is forced to lowerher wholesale prices and consequently, double marginalization is reduced. Furthermore, sincethe supply chain profit pin,∗T = R(Qn,∗) is increasing in Qn,∗, Theorem 3.2.2(a) implies thetotal size of the pie, i.e., total supply chain profit, increases with the number of price offers.One reason for this is that multiple wholesale prices, in the case of stochastic demand, bettermatches the marginal return on risk for units of inventory that are purchased in anticipation ofthe demand. Furthermore, Theorem 3.2.2 claims that not only the whole supply chain profitincreases in n but also neither the supplier nor the retailer will be worse off with more offers.As expected, the retailer takes advantage of the multiple chances for ordering and inducessuccessively lower wholesale prices. On the other hand, by offering multiple wholesale prices,the supplier price-discriminates the retailer’s orders. Therefore, the supplier can extract a higherprofit and benefits herself. It should be noted that it is possible that either the supplier’s profitor the retailer’s profit “weakly increases”, but the channel profit must strictly increase. Wenext show that full efficiency will be achieved in the limit. In other words, the total inventorylevel will get closer to the first-best when the number of price offers n increases. Thus, withthe generalized price-only contract, the price of anarchy asymptotically vanishes.393.2. Generalized Price-only Contracts with n OffersTheorem 3.2.3. The equilibrium total inventory level Qn,∗ approaches the first best orderquantity QFB when n goes to infinity.So far, we have argued that under generalized price-only contract, the supplier’s monopolypower will diminish and the markup she can charge will be reduced due to the retailer’s previousorders. Nevertheless, the supplier is strategic and can anticipate this, so she would prefer todiscourage the retailer from ordering large amounts in the early offers. As a result, we expecthigher wholesale prices in earlier offers. This result is consistent with similar findings in othersettings under dynamic contracting in supply chains, such as the one considered by Anand et al.[2].Proposition 3.2.4. Given a fixed n, on the equilibrium path, the wholesale prices strictlydecrease, i.e. w∗1 > w∗2 > · · · > w∗n.As the equilibrium wholesale prices decrease in t, one may similarly expect the order quan-tities to correspondingly increase in t. Interestingly, we find out that the order quantities maynot increase in t. In fact, the order quantities may exhibit fairly arbitrary patterns. Next, letus consider the supplier’s per-period profit w∗n−kq∗n−k. Define αn−k as followsαn−k = 1 +n∑m=1k∏j=m(1 +∂q∗n−k+j∂xn−k+j).For instance, αn = 1 and αn−1 = 1 + (1 +∂q∗n∂xn) = 2 + ∂q∗n∂xn. One can directly verify fromProposition 3.2.1 that w∗n−k = R′(Qn)αn−k. The next result demonstrates that αn−k plays animportant role in comparing successive profits by the supplier.Proposition 3.2.5. w∗n−k+1q∗n−k+1(>,=, <)w∗n−kq∗n−k if and only if∂(αk/αk+1)∂xk(<,=, >)0.The proposition provides a necessary and sufficient condition for determining the patternof the sequence of w∗n−kq∗n−k. For example, in the last two periods, ∂(αn−1/αn)/∂xn−1 =∂2q∗n/∂x2n. Therefore, whether w∗n−1q∗n−1 is greater than w∗nq∗n depends on the convexity of q∗nwith respect to xn. Let us consider the inverse demand function p = a−bQγ . One can show thatif γ = 1, ∂2q∗n/∂x2n = 0, so w∗n−1q∗n−1 = w∗nq∗n; if γ > 1, ∂2q∗n/∂x2n > 0, so w∗n−1q∗n−1 > w∗nq∗n;if γ < 1, ∂2q∗n/∂x2n < 0, so w∗n−1q∗n−1 < w∗nq∗n. Thus, one can provide examples such that thesupplier’s per period profit decreases or increases or even stays the same over time for a fixedn.Theorem 3.2.2 has shown that the size of the pie increases with the number of price offers.In addition, both the supplier’s and the retailer’s profits increase. A question of interest iswhether the relative power of the two players changes with multiple offers. To illustrate this,note that when n =1, we have a simple static Stackelberg game. Under such game structure,the first mover (in our case the supplier) generally extracts a larger share of the overall profit.In fact, in newsvendor type settings, when demand is deterministic, linear and price sensitive,403.2. Generalized Price-only Contracts with n Offersthe supplier extracts two-thirds of the total profits. In what follows, we try to understand howthe profit will be actually allocated between the two parties as n increases and whether theshares of the supplier and the retailer become more balanced. In particular, we investigate howmultiple price offers will influence the ratio of the retailer’s total profit over the supplier’s totalprofit. One may suspect that this ratio will increase with the total number of price offers n.In other words, multiple price offers help to balance the channel power. In order to check thisintuition, we will study three specific demand cases: exponential demand, uniform demand andlinear (price-sensitive) demand.Theorem 3.2.6. In a n-stage generalized price-only contract:(1) Suppose demand is exponentially distributed with a parameter λ and market price p isexogenous, then the equilibrium strategies areq∗j =1(n− j + 1)λ j = 1, . . . , nw∗j = (n− j + 1)pe−λ∑nl=1 1/l j = 1, . . . , n .(2) Suppose demand is uniformly distributed on [0,M ] and market price p is exogenous, thenthe equilibrium strategies areq∗1 =12nM and w∗1 = β1pq∗j =j−1∏l=12(n− l) + 12(n− l) q∗1 j = 2, . . . , nw∗j =j−1∏l=12(n− l)2(n− l) + 1w∗1 j = 2, . . . , nwhere βn =12 and βj =(2(n−j)+1)24(n−j)(n−j+1)βj+1, j = 1, . . . , n− 1.(3) Suppose demand is deterministic with inverse demand function p = a − bQ, then theequilibrium strategies areq∗1 =a4bnand w∗1 = β1aq∗j =j−1∏l=12(n− l) + 12(n− l) q∗1 j = 1, . . . , nw∗j =j−1∏l=12(n− l)2(n− l) + 1w∗1 j = 1, . . . , nwhere βn =12 and βj =(2(n−j)+1)24(n−j)(n−j+1)βj+1, j = 1, . . . , n− 1.413.3. ConclusionThus, under exponential demand, uniform demand or linear (price-sensitive) demand, theproblem is tractable and the analytical form of the equilibrium solution is clean and elegan-t. We find, remarkably, that in these three cases, for a fixed n, the supplier’s per-periodprofit is identical, w∗t q∗t = w∗t+1q∗t+1, t = 1, . . . , n − 1. This result follows because the ratio,q∗t /q∗t+1, is always a constant which leads to a constant αt, independent of qt, implying that∂(αt/αt+1)/∂qt = 0. Thus, by Proposition 3.2.5, it follows that the supplier’s profits per periodare identical. Hence, note also that in these scenarios, since the equilibrium wholesale pricedecreases in t, the equilibrium order quantity increases in t.Finally, let us investigate how the ratio of the retailer’s total profit and the supplier’s totalprofit changes with n for the three scenarios considered in Theorem 3.2.6. As is illustrated inFigure 3.3, the ratio, pin,∗R /pin,∗S , increases in n, but does not converge to 1. This can be verifiedby exploiting the expressions in Theorem 3.2.6. Therefore, based on these three examples, itappears that under generalized price-only contract, the supplier still has a somewhat decreasingfirst-mover advantage.0 50 100 150 200 2500.50.550.60.650.70.750.80.85total number of offers:nretailer total profit/supplier total profit  exponentialuniform/linearFigure 3.3: The ratio of the retailer’s total profit over the supplier’s total profit as the numberof price offers n increases under (1) exponential demand (2) uniform demand or linear demand3.3 ConclusionIn this paper, we study a generalization of the well-known wholesale price contract – the gen-eralized price-only contract. We extend known results in the literature about this contract tomore general demand settings and we derive some new interesting properties of the correspond-423.3. Conclusioning sub-game perfect equilibrium. We demonstrate that the introduction of additional tradingopportunities benefits both players. Moreover, as the number of price offers n in the generalizedprice-only contract approaches infinity, the supply chain profit approaches the first best profit.We also show that for a given contract with a specified n, the wholesale prices monotonicallydecrease. We also reveal some curious properties of the generalized price-only contract, such asthe stationarity of the supplier’s per period profit in the three specific demand cases: exponen-tial demand, uniform demand and linear (price-sensitive) demand, and we provide necessaryand sufficient conditions for this to hold (see Proposition 3.2.5). Future research, for example,can attempt to derive theoretical characterizations of the balance of power, as the number ofinteractions increases, and bounds on the performance of this contract for a fixed n larger thanone.43Chapter 4Dynamic Short-term SupplyContracts under Private Inventoryand Backorder Information4.1 IntroductionWe consider a two-echelon supply chain where a single supplier sells to a retailer in multipleperiods. The sequence of events is as follows: In each period t, the supplier offers a one-periodcontract to the retailer. If the retailer rejects the contract, the relationship between the twoparties is terminated and the game is over. If the retailer accepts the contract, he makeshis order decision in anticipation of the random demand. The demand is then realized and theretailer collects his revenue (given an exogenous selling price r). Unmet demand is backordered,subject to a unit stock-out penalty b, and left-over inventory is carried over to the next period,subject to a unit holding cost h. In the next period t+ 1, the supplier designs a new contract,and the above events are repeated.We make three fundamental assumptions in this paper: (1) Unmet demand in each periodis backordered; (2) The retailer’s inventory or backorder level at the beginning of each period isunobservable by the supplier; (3) A one-period contract is offered and executed in each period.The first assumption differentiates our underlying inventory model from the lost-sales mod-el. Backorder and lost-sales are the two standard dynamic inventory models. The lost-salescounterpart of our problem has been investigated by Zhang et al. [57], but the backorder settinghas been left unattended until now. It will be shown in this chapter that the optimal solutions inthese two settings are drastically different. We will come back to this assumption shortly. Notethat the beginning inventory or backorder in period t is denoted by xt, which can be positive(representing inventory) or negative (representing backorder). For convenience, we may referto both inventory and backorder as (possibly negative) inventory when there is no confusion.In contrast, xt is always non-negative in the lost-sales setting.The second assumption characterizes our model as a dynamic adverse-selection model, orequivalently, a dynamic principal-agent model with hidden information. Asymmetric infor-mation is prevalent in operations management. As a core subject of operations management,inventory management has traditionally overlooked the transparency or accessibility of inven-tory information. In reality, retailers are reluctant to share sales and inventory information444.1. Introductionwith their suppliers. For instance, according to a study of the Canadian logistics industryby Statistics Canada (2003) [11], only about 10% of Canadian retailers share inventory dataover established platforms. There are a variety of reasons. One reason comes from strategicconsiderations by the retailer. Retailers may take advantage of this private information andunderreport past sales to elicit discount from suppliers. Other reasons include the lack of trustbetween supply chain members and confidentiality restrictions of different parties.It is well known that in the “selling to the newsvendor” model, simple contracts, such aswholesale price contract or buyback contract, is sub-optimal when the retailer has private in-formation. Burnetas et al. [9] showed that in the static setting, the optimal contract entailsconcave quantity discount with the marginal unit payment decreasing in the order quantity.In general, the optimal contract can be very complex in a multi-period setting. Dynamic con-tracting is documented to be a challenging problem due to a host of technical and expositionaldifficulties. There is a large economics literature on dynamic contracting (e.g: [4] and [5]). Therelated literature in operations management is scant, with few published papers.The third assumption delineates the type of contract under investigation. In the multi-period setting, there are at least two contracting modes that the supplier can choose from,short-term contracting and long-term contracting. Under long-term contracting, the principal(supplier) needs to commit to a contingency plan covering the entire decision horizon, whileunder short-term contracting, the principal only commits to a short-term (usually one-period)contract and a new contract is put in place once the previous one ends. The latter contractingmode is appropriate if the supplier lacks the credibility of carrying out a long-term contract orif she prefers the relative simplicity of managing one-period contracts. This paper will focus onshort-term contracting with a new contract in each period. Indeed, there are a large number ofreal-world examples that the suppliers do not provide retailers with long-term price guaranteesand replenishment schedules.A key concept in the analysis of a short-term contracting problem is the belief of theprivate information. In our problem, knowing the demand distribution and the quantity orderedby the retailer in period t − 1, the supplier can only tell the distribution of the retailer’sinventory/backorder level at the beginning of period t (denoted by xt), but not the exact value.The supplier’s belief can be described by CDF Gt(xt) or PDF gt(xt). One major source ofdifficulty in dynamic short-term contracting results from the complexity of the supplier’s beliefwhich is updated according to the contract and the retailer’s response in every period followingBayes’ rule.To summarize, the three fundamental assumptions define our problem as a dynamic short-term adverse-selection problem on top of a backorder inventory system. It belongs to the mostchallenging type of contract design problems and fills a significant gap in the existing literature.The papers most closely related to our work are Zhang et al. [57] and Ilan and Xiao [27]. Zhanget al. [57] introduced the first dynamic contracting problem with private inventory information,which formalizes an important problem in supply chain management. They focus on dynamic454.1. Introductionshort-term contracts in the lost-sales setting and show that the optimal contract takes the formof a batch-order contract under certain model assumptions. Ilan and Xiao [27] studied optimallong-term contracts in both lost-sales and backorder settings. They prove that in the backordercase, the optimal long-term contract consists of a menu of wholesale prices and upfront fees,whereas in the lost-sales case, the optimal long-term contract takes the same form with anadditional option to lower the wholesale price (at a future time, after paying a fee). Our workfills the gap in this literature. The contracting takes place in every period, with inventory orbackorder kept privately by the retailer. Our goal is to characterize the optimal short-termcontract. We want to know if the optimal contract will still have a simple structure and beeasy to implement. We are also interested in how short-term contracting plus the backorderassumption may lead to different managerial insights.In this chapter, we will investigate three scenarios of the modeling horizon: (1) a singleperiod; (2) two periods; and (3) an infinite horizon. For tractability of the model, we assumethat the demand Dt is i.i.d. and follows an exponential distribution with rate λ. This as-sumption is supported by extant operations management literature (see Iglehart [26], Lau andLau [32] and Nagarajan and Rajagophlan [41]). A summary of our main findings is as follows.(1) In the single-period setting, the optimal contract induces trade only when the retailer re-ports a negative inventory level, and the retailer obtains exactly his reservation profit. (2) Inthe two-period setting, the supplier’s optimal contract in the first period involves at most twothresholds. The retailer obtains his reservation profit when the beginning inventory/backorderis below the lower threshold whereas he receives a positive information rent when it is abovethe upper threshold. In the middle range, the partnership will be terminated and the retailerwill be excluded from future business with the supplier. (3) In the infinite-horizon setting, theoptimal short-term contract is complicated and non-stationary in general. An interesting con-tract emerges. It induces a generalized base-stock policy, where the order-up-to level increaseswith and converges to the beginning inventory. This contract leaves zero information rent tothe retailer (zero profit increment beyond his reservation profit) at any beginning inventorylevel. Although this contract is sub-optimal in general, it is intuitive, easy to implement, andprovides a good heuristic for the optimal contract. We also conjecture that when the backordercost is relatively low, the optimal contract induces a base-stock policy with an exclusion regionfor the beginning inventory.The insights we obtain in the backorder case are substantially different from those in thelost-sales case. In our setting, the optimal short-term contract has a threshold structure, withpossible exclusion in a middle range. The information rent under the optimal contract may benon-monotone in the retailer’s beginning inventory. The supplier would sometimes prefer todeal with a retailer with high inventory, in contrast to the lost-sales case where the supplieralways prefers a retailer with low inventory. In addition, the optimal contract may terminatethe partnership when the retailer’s inventory falls in a certain interval. The above observa-tions accord with the troublesome phenomenon of “countervailing incentives” examined in the464.2. Literature Reviewcontract design literature (see e.g., Lewis and Sappington [34] and Jullien [29]). From thisperspective, the backorder setting results in a more complex information structure than thelost-sales setting. Consequently, contrary to the common wisdom from inventory management,the dynamic short-term contracting problem is more involved under backorder than under lostsales.The emergence of countervailing incentives in our problem is related to our assumptionsabout the backorder arrangement. To be consistent with the backorder inventory model, weassume that when ordering from the supplier, the retailer is required to take backorder fromcustomers as needed. To avoid the arbitrage opportunity for the retailer to take backorderintentionally, we assume b > (1− δ)r, i.e., the penalty for backorder is greater than the possibleinterest that can be earned. As a result, if the retailer abandons the relationship, he shouldnot take backorder anymore, as it is unprofitable. The pressure of accommodating backorderdevalues the relationship with the supplier and in turn increases the retailer’s bargaining power.When the beginning inventory falls into a certain region, the supplier suffers even withoutleaving any information rent to the retailer and would rather let the retailer go.The rest of the paper is organized as follows. Section 4.2 provides a brief literature review.In Section 4.3, we describe the general problem and establish the mathematical model. Then weanalyze the single period case in Section 4.4, two-period case in Section 4.5 and infinite-horizoncase in Section 4.6. Finally we conclude in Section 4.7. All proofs are deferred to the appendix.4.2 Literature ReviewOur work is related to the literature on dynamic adverse-selection problems. This topic is pio-neered by economists and has recently found important applications in operations management.Myerson [40] introduced the famous revelation principle. He proves that any outcome that isimplementable in equilibrium in an arbitrary mechanism can also be implemented in equilibri-um via a direct revelation mechanism. The revelation principle serves as the starting point inanalyzing adverse-selection problems, as well as other mechanism design problems. Dynamicadverse-selection problem is known to be a challenging problem and is much less understoodthan static adverse-selection problems due to a host of technical and expositional difficulties.Economists have made great efforts to characterize the optimal mechanism in certain specificsettings. For instance, the majority of the literature focus on simplified cases where the hid-den state is either constant or the state is sampled from time-independent distributions. Seeillustrative examples in Salanie [47] and Bolton and Dewatripont [8].However, these models are too restrictive to capture the setting we are interested in. Inthis paper, we assume that the supplier cannot observe the retailer’s inventory before ordering.The hidden state is the retailer’s pre-order inventory level which will be affected by the stateand action in the previous period. Thus, our problem belongs to a particular type of dynamicadverse-selection problems with an underlying Markov decision process. Battaglini [4] studied474.3. Problem Formulationthe optimal long-term contract between a monopolist and a buyer whose private preferencesevolve as a two-state Markov process. Long-term contracting means that the manufactureroffers a contingency plan over the entire horizon and she has to commit to that plan at thebeginning of the time horizon. It is considered a significant finding in Markovian adverse-selection problems, yet with a significant limitation about the two-state assumption. Zhangand Zenios [56] consider a general framework with more than two state, and develop a dynamicprogramming algorithm to derive optimal long-term contracts. To the best of our knowledge,dynamic short-term adverse-selection problems are even less studied in the literature, at leastpartly due to the very complex belief process mentioned in the introduction. The only noticeablework with an underlying Markov process is Zhang et al. [57]. Thus, the current work makes atheoretical contribution in deepening our understanding of this class of problems.There is a growing number of papers that explore the contracting problem among self-interested firms with private information in a supply chain. For instance, Corbett and Groote[16], Ha [25] and Corbett et al. [17] examine the situation in which the supplier does not knowthe cost structure of the buyer. Other examples include Nazerzadeh and Perakis [42] on capacityinformation asymmetry and Cachon and Lariviere [10], Shin and Tunca [49] and Taylor and Xiao[54] on demand information asymmetry. All of these papers focus on static models. The paperby Burnetas et al. [9] studied a similar setting as our work, except that they only consideredthe single-period model. They show when a manufacturer sells to a newsvendor retailer withprivate demand information, the optimal contract takes the form of concave quantity discount.Recently, several pioneering studies have explored multi-period contracting problem whereprivate information arises over time and operational decisions need to be made dynamicallybased on available information. Ilan and Xiao [27] studied the optimal long-term contractwhen a manufacturer sells to a retailer over multiple periods with asymmetric demand andinventory information. They showed that in the backorder case, the optimal long-term contractconsists of a menu of wholesale prices and associated upfront fees, whereas in the lost sales case,the optimal long-term contract takes the same form but has an additional option at a later timeto lower the wholesale price after paying an option fee. In contrast, this paper will focus onshort-term contracting, where the supplier only offers a one-period contract to the retailer inevery period. The paper by Zhang et al. [57] is the closest to this paper. They examine theoptimal short-term contract in the lost-sales case. They show that the optimal contracts takethe form of batch-order contracts under certain cost regimes.4.3 Problem FormulationIn this section, we formulate our problem and discuss some basic facts about the model. Wewill derive the optimal (short-term) contract in the single-period case, two-period case, andinfinite-horizon case in subsequent sections. The sequence of events is as follows. In period t,the supplier offers a one-period contract to the retailer. If the retailer rejects the contract, the484.3. Problem Formulationpartnership is broken and the game is over. If the retailer accepts the contract, he makes hisorder decision in anticipation of the random demand. We assume zero lead time. Units aretransferred to the retailer immediately and payments are received by the supplier. The demandis then realized and the retailer collects his sales revenue. After that, the retailer carries overany excess inventory or unmet backorder to the next period t+ 1. In period t+ 1, the supplieroffers a new contract to the retailer, and the game is played dynamically. The total number ofperiods is T .We introduce some notation. We write r for the unit selling price, c for the unit productioncost, h for the unit holding cost, b for the backorder cost and δ ∈ (0, 1) for the discount factor.Throughout this paper, we make the following assumption:Assumption 4.3.1. The demand Dt in every period is i.i.d. and follows an exponential dis-tribution with rate λ.We assume that the supplier cannot observe the sales, i.e. the realized demand, at theretailer in any period. As a result, knowing the demand distribution and the quantity orderedby the retailer in period t − 1, the supplier can barely tell the distribution of the retailer’spre-order inventory level xt. We describe the supplier’s belief about the pre-order inventorylevel by CDF Gt(xt) and PDF gt(xt).If the retailer orders qt in period t, his post-order inventory level will be yt = xt + qt. Wedefine vt(yt) as the retailer’s one-period profit in period t before transferring any payment tothe supplier. More specifically, vt(yt) is equal to the retailer’s expected revenue, minus theholding cost and backorder cost (holding cost and backorder cost are absent in the last period).We let Πt+1(yt) be the supplier’s expected profit-to-go from period t + 1 onwards given thetrue yt. Because xt is only known by the retailer, the supplier may perceive the post-orderinventory to be yˆt. The supplier’s perception will affect her belief about xt+1 and thereby theoptimal contract in period t + 1. Therefore, we write Ut+1(yt|yˆt) as the retailer’s expectedprofit-to-go from period t+ 1 onwards given his true post-order inventory yt and the supplier’sperception yˆt; and Ψt+1(yt) = Πt+1(yt) + Ut+1(yt|yt) as the channel’s expected profit-to-gofrom period t + 1 onwards if the supplier’s perception is correct. Finally, we define ut(xt) asthe retailer’s reservation profit-to-go from period t onwards if the two parties have terminatedtheir relationship at the beginning of period t.As we study short-term contracting, a proper solution concept is the “Perfect BayesianEquilibrium.” (See [22] and [44].) In period t, the contract maximizes the supplier’s expectedprofit-to-go given her belief about xt; and assuming the optimal contracts are offered in allfuture periods, the retailer’s response maximizes his expected profit-to-go given the contract;and the supplier’s belief about xt+1 is derived according to the Bayes’ rule.According to the Revelation Principle (Myerson [40]), we can focus on direct revelationcontracts. The supplier designs a menu contract {qt(xt), st(xt)}xt∈(−∞,x¯t]. We assume that thepre-order inventory xt in period t is upper bounded by x¯t. For instance, xt is the result ofthe previous period’s sales, i.e. xt = x¯t −Dt−1. For each possible xt, the contract specifies a494.4. Single Periodquantity plan qt(xt) and payment plan st(xt). The supplier’s goal is to maximize her expectedprofit-to-go, with respect to her belief Gt of the beginning inventory or backorder level xt. Weformulate the contracting problem using the principal-agent framework. The supplier solvesthe following problem:maxst,qt∫ x¯t−∞{st(xt)− cqt(xt) + δΠt+1(xt + qt(xt))}dGt(xt) (4.1)s.t. vt(xt + qt(xt))− st(xt) + δUt+1(xt + qt(xt)|xt + qt(xt))= maxxˆtvt(xt + qt(xˆt))− st(xˆt) + δUt+1(xt + qt(xˆt)|xˆt + qt(xˆt)), xt, xˆt ∈ (−∞, x¯t](4.2)vt(xt + qt(xt))− st(xt) + δUt+1(xt + qt(xt)|xt + qt(xt)) ≥ ut(xt), xt ∈ (−∞, x¯t](4.3)Constraint (4.2) is the “incentive compatibility” (IC) constraint. It prevents the retailer fromlying. Reporting a different state xˆt does not bring any benefit. Constraint (4.3) is the “in-dividual rationality” (IR) constraint. It guarantees the retailer’s participation. The profitfrom choosing (qt(xt), st(xt)) is at least as good as his reservation profit-to-go. In each periodt = 1, . . . , T , the supplier needs to solve the above problem (4.1)-(4.3). In the following sections,we will investigate three situations (1) the single-period case: T = 1; (2) the two-period case:T = 2; and (3) the infinite-horizon case: T =∞.4.4 Single PeriodFirst, we consider the single-period case which also includes the last period of a finite-horizonmodel. In this case, there is no backorder or inventory holding cost at the end. Instead, weassume that the retailer must refund any backorder or throw away any leftover units at the endof the period. Then, the retailer’ revenue, given the post-order inventory y1, is:v1(y1) =rE[min(y1, D)] = rλ(1− e−λy1), y1 ≥ 0ry1, y1 < 0. (4.4)Recall that the post-order inventory y1 = x1 + q1 is the sum of the pre-order inventory x1 andthe order quantity q1. When y1 ≥ 0, the retailer collects all possible sales revenue and v1(y1)coincides with the revenue function in the lost-sales case. When y1 < 0, the retailer reimbursehis consumers. As a result, the backorder assumption does not play a role here. The retailer’sreservation profit, by quitting the business relationship, is:u1(x1) = v1(x1), x1 ∈ R. (4.5)We further assume that the initial inventory is the result of the previous period’s sales, i.e.,x1 = y0−D0 where y0 is the period 0 post-order inventory level and D0 is the period 0 demand.Without knowing the realized demand, the supplier is only able to form a belief about x1504.4. Single Periodthrough CDF G1(x1) = e−λ(y0−x1) and PDF g1(x1) = λe−λ(y0−x1).As mentioned before, without loss of generality, we focus our attention on the direct reve-lation contracts. The supplier offers a menu contract {q1(x1), s1(x1)}x1∈(−∞,y0]. The retailer isasked to report his inventory or backorder x1, and then the corresponding order q1(x1) and pay-ment s1(x1) are executed. We define u1(x1) as the net profit for the retailer, after transferringthe payment to the supplier. Given the pre-order inventory x1, u1(x1) = v1(x1+q1(x1))−s1(x1).The optimal contract solves the following problem:maxs1,q1∫ y0−∞{s1(x1)− cq1(x1)}dG1(x1) (4.6)s.t. u1(x1) = maxxˆ1∈(−∞,y0]{v1(x1 + q1(xˆ1))− s1(xˆ1)}, x1 ∈ (−∞, y0] (4.7)u1(x1) ≥ u1(x1), x1 ∈ (−∞, y0] (4.8)Given her belief G1(x1) (or equivalently g1(x1)) of the retailer’s pre-order inventory x1, thesupplier maximizes her expected profit subject to two constraints. As described, Constraint(4.7) is called the “incentive compatibility” (IC) constraint and (4.8) is called the “individualrationality” (IR) constraint. The IC constraint encourages the retailer to report his true statex1. The IR constraint ensures that the retailer is willing to participate and accept the contract.By the envelop theorem, we obtain the local IC constraint from the global IC constraint(4.7):u′1(x1) =∂∂x1{v1(x1 + q1(xˆ1))− s1(xˆ1)}|xˆ1=x1 = v′1(x1 + q1(x1)) (4.9)One feature of the single-period problem is that the retailer does not need to pay any holdingor backorder cost. In other words, it does not make a difference between the lost-sales case andbackorder case when there is only one period. As a result, we are able to show similar resultsas [57]. We leave the proof and technical details to the Appendix C.Lemma 4.4.1. (1) A contract {q1(x1), s1(x1)} satisfies the (global) IC constraint if and onlyif it satisfies the local IC constraint and q1(x1) is weakly decreasing in x1;(2) Under the optimal contract, the IR constraint must be binding at y0 and redundant atx1 < y0 where x1 = y0 −D0.Lemma 4.4.1 suggests that we can replace the IC constraint (4.7) by the local IC constraint(4.9) and the monotonicity of q1(x1). It is true as long as the problem satisfies the so-calledsingle-crossing property ∂2v1(x1+q1)∂x1∂q1≤ 0 (Topkis [55]). Indeed, the function v1(y1) is concaveand we have v′′1(y1) =λre−λy1 , y1 > 00, y1 < 0. Moreover, we look at the information rent u1(x1)−u1(x1), which is interpreted as the extra profit to the retailer arising from his informationaladvantage. Its derivative is equal to u′1(x1)−u′1(x1) = v′1(x1 +q1(x1))−v′1(x1). As the functionv1(y1) is concave, it has decreasing differences and consequently the information rent decreases514.4. Single Periodin x1. In fact, for any x1 < xˆ1, we haveu1(x1)− u1(x1) = v1(x1 + q1(x1))− s1(x1)− v1(x1)≥ v1(x1 + q1(xˆ1))− s1(xˆ1)− v1(x1)= v1(xˆ1 + q1(xˆ1))− s1(xˆ1)− v1(xˆ1)+[v1(x1 + q1(xˆ1))− v1(xˆ1 + q1(xˆ1)) + v1(xˆ1)− v1(x1)]≥ v1(xˆ1 + q1(xˆ1))− s1(xˆ1)− v1(xˆ1)= u1(xˆ1)− u1(xˆ1)Therefore, the retailer with a higher inventory is a “worse” type as he gets less informationrent. In order to ensure that the retailer will not inflate his inventory level to elicit discounts,the supplier has to provide more incentives (i.e. larger rent) to those who have low inventory.Next, we compute the so-called virtual surplus in the literature. We replace s1(x1) withv1(x1 + q1(x1))−u1(x1). Since the IR constraint is binding at y0, we rewrite u1(x1) = u1(y0)−∫ y0x1u′1(ξ)dξ = u1(y0)−∫ y0x1v′1(ξ+ q1(ξ))dξ. Thus, s1(x1) = v1(x1 + q1(x1))−u1(y0) +∫ y0x1v′1(ξ+q1(ξ))dξ. Finally, the objective function (4.6) can be reformulated as follow:∫ y0−∞{s1(x1)− cq1(x1)}dG1(x1)=∫ y0−∞{v1(x1 + q1(x1))− u1(y0) +∫ y0x1v′1(ξ + q1(ξ))dξ − cq1(x1)}g1(x1)dx1=∫ y0−∞{v1(x1 + q1(x1))− cq1(x1)}g1(x1)dx1 +∫ y0−∞∫ y0x1v′1(ξ + q1(ξ))dξg1(x1)dx1 − u1(y0)=∫ y0−∞{v1(x1 + q1(x1))− cq1(x1)}g1(x1)dx1 +∫ y0−∞v′1(ξ + q1(ξ))G(ξ)dξ − u1(y0)=∫ y0−∞{v1(x1 + q1(x1))− cq1(x1) + v′1(x1 + q1(x1)G1(x1)g1(x1)}g1(x1)dx1 − u1(y0)=∫ y0−∞J1(q1(x1)|x1)g1(x1)dx1 − u1(y0) (4.10)We call J1(q1|x1) = v1(x1 + q1)− cq1 + v′1(x1 + q1)G1(x1)g1(x1) the “virtual surplus”, which representsthe redistributed profit for the supplier in relation to the order q1 at x1. The optimal quantityplan q∗1(x1) maximizes the virtual surplus. Given x1, the first-order derivative of J1(q1|x1) is∂J1(q1|x1)∂q1= v′1(x1 + q1)− c+ v′′1(x1 + q1)G1(x1)g1(x1)=re−λ(x1+q1) − c− λre−λ(x1+q1)G1(x1)g1(x1), x1 + q1 > 0r − c, x1 + q1 < 0=−c, x1 + q1 > 0r − c, x1 + q1 < 0 (4.11)524.5. Two Periodswhere the last equality holds because G1(x1)g1(x1) = 1/λ. As we can see, J1(q1|x1) increases in q1 < 0but decreases in q1 > 0. Therefore, the optimal quantity plan is q∗(x1) = max{0,−x1}, i.e. theoptimal order-up-to level is y∗(x1) = x1 + q∗(x1) = max{0, x1}. Theorem 4.4.2 provides a fullcharacterization of the optimal contract. The optimal contract entails a base-stock policy withbase-stock level 0. Moreover, the payment is a linear function of the order quantity q1 withmarginal price r.Theorem 4.4.2. In the single period case, the optimal contract is:q∗1(x1) =0, x1 ≥ 0−x1, x1 < 0 and s∗1(x1) =0, x1 ≥ 0−rx1, x1 < 0 . (4.12)Theorem 4.4.2 says that under the optimal contract, trade happens only when the retailerreports negative inventory. Moreover, the retailer gets exactly his reservation profit u1(x1) =v1(x1) = u1(x1). Under an exponential demand distribution, the distortion is so severe thatit is optimal not paying any information rent to the retailer. The supplier is only willing toclear up the backorder and bring the inventory level up to 0. More interestingly, the optimalcontract is independent of the supplier’s belief G1. In other words, no matter what belief thesupplier has about the pre-order inventory x1, she will always offer the contract proposed inTheorem 4.4.2. This property is uncommon in dynamic adverse-selection problems, and givesus hope that there might exist a simple optimal contract in more general cases.4.5 Two PeriodsWe now look at the two-period case. We solve the problem by backward induction. Since wehave already characterized the optimal contract in the last period, t = 2, in Theorem 4.4.2, wefocus our attention on the first period, t = 1. Similarly as before, we assume that there existsa known y0 and the pre-order inventory in period 1 is x1 = y0 − D0. The supplier solves thefollowing problem to find the optimal contract {q1(x1), s1(x1)}x1∈(−∞,y0]:maxs1,q1∫ y0−∞{s1(x1)− cq1(x1) + δΠ2(x1 + q1(x1))}dG1(x1) (4.13)s.t. u1(x1) = v1(x1 + q1(x1))− s1(x1) + δU2(x1 + q1(x1))= maxxˆ1v1(x1 + q1(xˆ1))− s1(xˆ1) + δU2(x1 + q1(xˆ1)), x1 ∈ (−∞, y0] (4.14)u1(x1) ≥ u1(x1), x1 ∈ (−∞, y0] (4.15)534.5. Two PeriodsThe retailer’s expected profit in period 1 (before payment s1) is given by the expected salesrevenue minus the holding and backorder costs:v1(y1) = rE[D]− bE[D − y1]+ − hE[y1 −D])+= rλ − bλe−λy1 − hy1 + hλ(1− e−λy1), y1 ≥ 0by1 + (r − b)/λ, y1 < 0.In addition, Π2(y1) is the supplier’s expected profit-to-go in period 2 given the post-orderinventory y1 in period 1 and U2(y1) is the retailer’s expected profit-to-go in period 2 giveny1. Notice that the optimal contract in period 2 is independent of the supplier’s belief G2(x2).Therefore, the supplier’ perceived post-order inventory yˆ1 is irrelevant and we can simplify thenotation U2(y1|yˆ1) to U2(y1). According to Theorem 4.4.2, for a given pre-order inventory x2 inperiod 2, the retailer’s profit in period 2 is u2(x2) = u2(x2) = rλ(1− e−λx2), x2 ≥ 0rx2, x2 < 0. and thesupplier’s profit in period 2 is pi2(x2) = s∗2(x2)− cq∗2(x2) =0, x2 ≥ 0−(r − c)x2, x2 < 0. As a result,given the post-order inventory y1 in period 1, the retailer’s expected profit-to-go in period 2 is:U2(y1) = E[u2(x2)] = E[u2(y1 −D1)] = rλ − 2rλ e−λy1 − ry1e−λy1 , y1 ≥ 0ry1 − rλ , y1 < 0,the supplier’s expected profit-to-go in period 2 isΠ2(y1) = E[pi2(x2)] = E[pi2(y1 −D1)] = r−cλ e−λy1 y1 ≥ 0(r − c)( 1λ − y1) y1 < 0 ,and the expected profit-to-go for the whole channel is the sum of the two:Ψ2(y1) = U2(y1) + Π2(y1) = rλ − rλe−λy1 − cλe−λy1 − ry1e−λy1 , y1 ≥ 0cy1 − cλ , y1 < 0.We need to be careful when defining the retailer’s reservation profit-to-go in the backordercase. We assume that once the retailer decides to abandon the relationship, he will no longertake backorders. That is to say, if the retailer starts with a backorder, x1 < 0, he will returnthe payment to customers and stop the business right away. If the retailer has positive initialinventory, x1 ≥ 0, he will keep satisfying demand until no inventory is left, but he will not takebackorder at the end of period 1. This is consistent with the following assumption:Assumption 4.5.1. b > (1− δ)r.This assumption prevents the retailer from taking backorder intentionally. If b ≤ (1 − δ)r,544.5. Two Periodsthe retailer has an arbitrage opportunity by carrying backorder all the time and refunding thecustomers at the end of the horizon. Therefore, it is more plausible to assume b > (1 − δ)r.Then, the retailer’s reservation profit-to-go at the beginning of period 1 is equal to:u1(x1) =rE[min(x1, D)]− hE[x1 −D]+ + δU2(x1), x1 ≥ 0rx1, x1 < 0= rλ(1− e−λx1)− hx1 + hλ(1− e−λx1) + δ[ rλ − rλe−λx1 − rx1e−λx1 ], x1 ≥ 0rx1, x1 < 0,where U2(x1) is the retailer’s reservation profit-to-go in period 2 given the beginning inventoryx1 in period 1. When x1 < 0, we know U2(x1) = 0. When x1 ≥ 0, we haveU2(x1) = E[u2(x1 −D1)+] = E[u2(x2)+] =∫ x10rλ(1− e−λx2)λe−λ(x1−x2)dx2 = rλ− rλe−λx1 − rx1e−λx1For convenience, we define µ1(y1) = v1(y1) + δU2(y1), more specifically,µ1(y1) = v1(y1) + δU2(y1) = r+hλ − b+hλ e−λy1 − hy1 + δ[ rλ − 2rλ e−λy1 − ry1e−λy1 ], y1 ≥ 0r−bλ + by1 + δ(ry1 − rλ), y1 < 0.We interpret µ1(y1) as the pre-transfer profit-to-go function for the retailer, if he orders upto y1, before transferring any payment to the supplier. Note that the retailer is assumed totake backorder at the end of period 1, if needed, as a requirement for doing business with thesupplier. The IC constraint (4.14) becomesu1(x1) = µ1(x1 + q1(x1))− s1(x1) = maxxˆ1µ1(x1 + q1(xˆ1))− s1(xˆ1), x1 ∈ (−∞, y0] (4.16)We make a few observations about the pre-transfer profit-to-go function µ1(y1). First of all, wehave µ′′1(y1) =−λ(b+ h+ δrλy1)e−λy1 y1 > 00 y1 < 0, and hence the period 1 problem still satisfiesthe single-crossing property. By a similar proof as Lemma 4.4.1, we can show that the optimalquantity plan in period 1, q∗1(x1), will satisfy the following local IC constraintu′1(x1) = µ′1(x1 + q1(x1)) (4.17)and weakly decrease in x1 ∈ (−∞, y0].Next, we compare µ1(x1) and u1(x1) to obtain insights for the optimal contract.µ1(x1)− u1(x1) =− 1λe−λx1 [b− (1− δ)r], x1 ≥ 0−( 1λ − x1)[b− (1− δ)r], x1 < 0.554.5. Two Periods!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!(!!)!!!(!!)!Slope!r!Slope!! +!!!Slope!'h!!!!0!Figure 4.1: The retailer’s reservation and pre-transfer profit-to-go functions in period 1.Cleary, whether µ1(x1) is greater or smaller than u1(x1) depends on the term b− (1− δ)r. ByAssumption 4.5.1, we have µ1(x1)− u1(x1) < 0 for all x1. The two functions are illustrated inFigure 4.1. It implies that the supplier has to provide incentive in order to keep the retailerin the relationship. More importantly, the base-stock policy with base-stock level 0 proposedin Theorem 4.4.2 (i.e. y1(x1) = max{0, x1}, for all x1 ∈ (−∞, y0]) violates the IR constraintand is no longer a feasible quantity plan in period 1. In fact, under the quantity plan y1(x1) =max{0, x1}, we haveu′1(x1)− u′1(x1) =µ′1(x1)− u′1(x1), x1 ≥ 0µ′1(0)− u′1(x1), x1 < 0 =(b+ δr − r)e−λx1 x1 ≥ 0b+ δr − r, x1 < 0 .We end up with u′1(x1) > u′1(x1) for all x1 ∈ (−∞, y0]. As a result, we claim that as long asthere exists a point x∗1 at which u1(x∗1) = u1(x∗1), the IR constraint must be violated for allx1 < x∗1. So the quantity plan y1(x1) = max{0, x1} (x1 ∈ (−∞, y0]) is infeasible. We shallexpect that the optimal contract in period 1 to be quite different from that in period 2.In the single-period case, we have shown that the information rent decreases in the pre-orderinventory level. However, when there are two periods to go, we fail to have a similar result.The information rent u1(x1) − u1(x1), the difference between the retailer’s profit-to-go underthe contract and his reservation profit-to-go, may not be monotone. In fact, the first-orderderivative of the information rent isu′1(x1)− u′1(x1) = µ′1(y1(x1))− u′1(x1)=(b+ h+ δr)e−λy1 + δrλy1e−λy1 − (r + h)e−λx1 − δrλx1e−λx1 , y1 ≥ x1 ≥ 0(b+ h+ δr)e−λy1 + δrλy1e−λy1 − h− r, y1 ≥ 0 > x1b− (1− δ)r. 0 > y1 ≥ x1Whether the information rent is monotone will depend on the choice of y1. That is to say, it564.5. Two PeriodsFigure 4.2: Illustration of “bump”is unclear at which point the IR constraint will be binding. This is the main difference fromthe lost-sales case where the IR constraint must be binding at the highest inventory level andredundant at other points. In the backorder case, the IR constraint might be binding at a singlepoint or multiple points. The good news is that we can exclude the possibility of “bump” inthe retailer’s profit-to-go function. Here “bump” means that the IR constraint is binding attwo points but redundant at other points in between. If a “bump” occurs, the optimal contractwould be too complicated to analyze. See Figure 4.2 for an illustration of “bump”.Proposition 4.5.2. Under the optimal contract, there does not exist a “bump” in the retailer’sprofit-to-go function, i.e. there does not exist two points x+1 and x−1 such that the IR constraintis binding at x+1 and x−1 but redundant at x1 ∈ (x−1 , x+1 ).Proposition 4.5.2 guarantees that the optimal contract will not lead to a “bump.” Further-more, we know for sure that there must exist at least one point x∗1 where the IR constraint isbinding. If not, the supplier will be able to increase the payment uniformly across all possiblex1 to increase her profit. As a result, we conclude that at x1 < x∗1, the IR constraint is eitheralways binding or never binding. Similar argument holds at x1 > x∗1.Next, we derive the virtual surplus function J1(y1|x1). By a similar approach as the single-period case, we replace s1(x1) with µ1(x1 + q1(x1)) − u1(x1) = µ1(y1(x1)) − u1(x1). At x1 <x∗1, the IR constraint is anchored at the right (or top). So we rewrite u1(x1) = u1(x∗1) −∫ x∗1x1u′1(ξ)dξ = u1(x∗1) −∫ x∗1x1µ′1(y1(ξ))dξ. At x1 > x∗1, the IR constraint is anchored at the left(or bottom). So we rewrite u1(x1) = u1(x∗1) +∫ x1x∗1u′1(ξ)dξ = u1(x∗1) +∫ x1x∗1µ′1(y1(ξ))dξ. Finally,we reformulate the objective function (4.13) as∫ x∗1−∞{s1(x1)− cy1(x1) + cx1 + δΠ2(y1(x1))}g1(x1)dx1 +∫ y0x∗1{s1(x1)− cy1(x1) + cx1 + δΠ2(y1(x1))}g1(x1)dx1574.5. Two Periodswhere the first part is equal to∫ x∗1−∞{s1(x1)− cy1(x1) + cx1 + δΠ2(y1(x1))}g1(x1)dx1=∫ x∗1−∞{µ1(y1(x1))− u1(x∗1) +∫ x∗1x1µ′1(y1(ξ))dξ − cy1(x1) + cx1 + δΠ2(y1(x1))}g1(x1)dx1=∫ x∗1−∞{µ1(y1(x1))− cy1(x1) + cx1 + δΠ2(y1(x1)) + µ′1(y1(x1))G1(x1)g(x1)}g1(x1)dx1−∫ x∗1−∞u1(x∗1)g1(x1)dx1, (4.18)and the second part is equal to∫ y0x∗1{s1(x1)− cy1(x1) + cx1 + δΠ2(y1(x1))}g1(x1)dx1=∫ y0x∗1{µ1(y1(x1))− u1(x∗1)−∫ x1x∗1µ′1(y1(ξ))dξ − cy1(x1) + cx1 + δΠ2(y1(x1))}g1(x1)dx1=∫ y0x∗1{µ1(y1(x1))− cy1(x1) + cx1 + δΠ2(y1(x1))− µ′1(y1(x1))1−G1(x1)g(x1)}g1(x1)dx1−∫ y0x∗1u1(x∗1)g1(x1)dx1. (4.19)(4.18) and (4.19) lead to the following expressions for the virtual surplus anchored at the rightend and the left end, respectively:J1(y1|x1) =cx1 − cy1 + µ1(y1) + δΠ2(y1) + µ′1(y1)G1(x1)g(x1), x1 < x∗1cx1 − cy1 + µ1(y1) + δΠ2(y1)− µ′1(y1)1−G1(x1)g(x1) , x1 > x∗1(4.20)The virtual surplus takes different forms for x1 > x∗1 and x1 < x∗1. It will be used later in findingthe optimal contract. The optimal quantity plan y∗(x) will maximize the virtual surplus subjectto the IR constraint. In order to characterize the optimal contract in period 1, we first introducetwo special quantity plans.Definition 1. Define yR1 (x1) as the solution of u′1(x1) = µ′1(y1), i.e., yR1 (x1) solves(b+ h+ δr + δrλyR1 )e−λyR1 = (r + h+ δrλx1)e−λx1 (4.21)for x1 ∈ [0, y0], and yR1 (x1) = yR1 (0) for x1 ∈ (−∞, 0).Definition 2. Define yL1 (x1) as the solution of the first-order conditionddy1J1(y1|x1) = 0, giventhe “virtual surplus” anchored at the left (or bottom)J1(y1|x1) = cx1 − cy1 + µ1(y1) + δΠ2(y1)− µ′1(y1)1−G1(x1)g(x1), (4.22)584.5. Two PeriodsFigure 4.3: Retailer’s profit-to-go in period 1 under the optimal contracti.e., yL1 (x1) solvesδce−λyL1 + eλ(y0−x1)(b+ h+ δrλyL1 )e−λyL1 = c+ h, (4.23)or yL1 (x1) = x1 if the above solution is less than x1.If the supplier implements the quantity plan yR1 (x1), the retailer will receive exactly hisreservation profit-to-go, because yR1 (x1) is such that u′1(x1) = µ′1(yR1 (x1)) = u′1(x1). The firstequality follows from the local IC constraint. The quantity plan yL1 (x1) maximizes the “virtualsurplus” anchored at the left. By taking the derivative, we can easily check that yR1 (x1) increasesin x1 while yL1 (x1) decreases in x1 whenever yL1 (x1) > x1.The full characterization of the optimal contract in period 1 is given by the following theo-rem.Theorem 4.5.3. The optimal contract in period 1 has at most two thresholds x∗∗1 ≤ x∗1 ≤ y0:(a) At x1 ∈ (−∞, x∗∗1 ], the retailer orders up to yR1 (x1) and gets his reservation profit;(b) at x1 ∈ (x∗∗1 , x∗1], the retailer is excluded;(c) and at x1 ∈ (x∗1, y0], the retailer orders up to yL1 (x1) and receives a positive informationrent.Theorem 4.5.3 indicates that the optimal contract in period 1 consists of three regions atmost, as illustrated in Figure 4.3. In the first region x1 ∈ (−∞, x∗∗1 ], the retailer participates andgets exactly his reservation profit-to-go. To see that, we examine the virtual surplus anchoredat the right J1(y1|x1) = cx1− cy1 + µ1(y1) + δΠ2(y1) + µ′1(y1)G1(x1)g(x1) . Recall that x1 = y0−D0,therefore we have G1(x1) = e−λ(y0−x1) and g1(x1) = λe−λ(y0−x1). The first-order derivative of594.5. Two PeriodsJ1(y1|x1) is given bydJ1(y1|x1)dy1= −c+ µ′1(y1) + δΠ′2(y1) + µ′′1(y1)G1(x1)g(x1)=−h− c+ δce−λy1 < 0, y1 > 0b+ δc− c > 0, y1 < 0The supplier wants to maximize the virtual surplus subject to the IR constraints. However,the function J1(y1|x1) increases in y1 < 0 whereas it decreases in y1 > 0. As a result, the base-stock policy with bast-stock level 0 (i.e. y1(x1) = max{0, x1}) maximizes the virtual surplus.However, we have discussed that the quantity plan y1(x1) = max{0, x1} does not satisfy theIR constraint at all x1 ≤ x∗1. In fact, if the retailer quits, he does not need to incur the higherbackorder cost, which gives him more bargaining power. The information rent correspondingto the base-stock policy is not large enough (in fact it yields negative profit) to ensure theretailer’s participation. In other words, the supplier faces a trade-off between maximizing herprofit and keeping the retailer in the relationship. Since J1(y1|x1) is decreasing in y1 > 0, theoptimal quantity plan will be y∗1(x1) = yR1 (x1), which makes the IR constraint binding.The second region (x∗∗1 , x∗1] suggests that the supplier may be able to improve her profit byexcluding some types of retailers. Figure 4.4 shows the supplier’s profit pi1(x1) for each possiblepre-order inventory x1 if she follows the quantity plan yR1 (x1), i.epi1(x1) = v1(yR1 (x1)) + δU2(yR1 (x1))− u1(x1)− cyR1 (x1) + cx1 + δΠ2(yR1 (x1)). (4.24)We observe that when x1 is large, the supplier starts getting negative profit pi1(x1) < 0. In thiscase, it is better to exclude the retailer, as even zero information rent would lead to negativeprofit for the supplier. In the backorder case, as the retailer has more bargaining power bythreatening to quit, the optimal contract induces partial participation. The threshold x∗∗1 isdetermined by sending (4.24) to 0. We can prove that the threshold x∗∗1 can be uniquelydetermined if the costs are not too high.Lemma 4.5.4. Suppose r + h + δr > e(h + c). There exists a unique x∗∗1 that satisfiesv1(yR1 (x1)) + δU2(yR1 (x1))− u1(x1)− cyR1 (x1) + cx1 + δΠ2(yR1 (x1)) = 0.In fact, even when r + h + δr < e(h + c), we observe numerically that the threshold x∗∗1 isunique. So we postulate that under all circumstances, there always exists a unique x∗∗1 thatsolves v1(yR1 (x1)) + δU2(yR1 (x1))− u1(x1)− cyR1 (x1) + cx1 + δΠ2(yR1 (x1)) = 0.Finally, in the third region x1 ∈ (x∗1, y0], we investigate the virtual surplus anchored at the604.5. Two Periods0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 600.511.522.5x1π(x 1)Figure 4.4: Supplier’s profit-to-go in period 1 under yR1 (x1). (Parameters: r = 10, c = 5, b = 3,h = 3, λ = 1 and δ = 0.9.)left J1(y1|x1) = cx1 − cy1 + µ1(y1) + δΠ2(y1) + µ′1(y1)1−G1(x1)g(x1) . Its derivative is given bydJ1(y1|x1)dy1= c+ µ′1(y1) + δΠ′2(y1) + µ′′1(y1)1−G1(x1)g(x1)=−c− h+ δce−λy1 + eλ(y0−x1)(b+ h+ δrλy1)e−λy1 , y1 > 0b+ δc− c, y1 < 0By definition, the quantity plan yL1 (x1) maximizes the virtual surplus when x1 ∈ (x∗1, y0]. Inorder to satisfy the IR constraint, the threshold x∗1 has a lower bound.Lemma 4.5.5. The threshold x∗1 satisfies x∗1 ≥ xL1 , where xL1 is the unique solution of yR1 (x1) =yL1 (x1). In addition, yR1 (x1)− yL1 (x1) =< 0, x1 < xL1> 0, x1 > xL1 .The IR constraint requires u1(x1) = u1(x∗1)+∫ x1x∗1v′1(yL(ξ))dξ ≥ u1(x1) = u1(x∗1)+∫ x1x∗1v′1(yR(ξ))dξ.Note that the IR constraint is binding at x∗1. By rearranging the terms, we argue that x1 shouldsatisfy∫ x1x∗1[µ′1(yL(ξ)) − µ′1(yR(ξ))]dξ ≥ 0. Now we let x1 approach x∗1 and the IR constrain-t becomes µ′1(yL(x∗1)) − µ′1(yR(x∗1)) ≥ 0, which implies x∗1 ≥ xL1 due to the concavity of µ1.Finally, the IR constraint will be automatically satisfied at x1 ∈ (x∗1, y0] because we haveµ′1(yL(x1)) − µ′1(yR(x1)) ≥ 0 for all x1 > x∗1. As a result, yL1 (x1) is the optimal quantity planin this case.Moreover, the retailer starts getting positive information rent u1(x1)−u1(x1) =∫ x1x∗1[µ′1(yL(ξ))−µ′1(yR(ξ))]dξ. Surprisingly, the information rent is increasing in x1 since u′1(x1) − u′1(x1) =µ′1(yL(x1))− µ′1(yR(x1)) > 0. The retailer with a higher inventory becomes a “better” type inthe two-period case. This is opposite to the single-period case. As we see, the backorder caseleads to totally different insights from the lost-sales case.However, there may not be a closed-form expression for the threshold x∗1. It will be found614.6. Infinite Horizon(a) Special Case 1: no exclusion(b) Special Case 2: no tailFigure 4.5: Two special cases of the optimal contract in period 1.numerically by maximizing the supplier’s expected profit in the region (x∗1, y0].Depending on the parameters, there might exist two special cases as shown in Figure 4.5.The first case (Figure 4.5(a)) is x∗∗1 = x∗1 where we do not have an exclusion region. Thesecond one (Figure 4.5(b)) is x∗1 = y0 where the positive information rent region (or the “tail”)disappears.4.6 Infinite HorizonIn Section 4.5, we have derived the optimal contract in the two-period case. We have seenthat the optimal contract can be fairly complex and involve three regions. This manifests thecomplexity of the optimal contract in the general T -period case. There seems little hope toderive closed-form expressions for the optimal contracts. To eliminate the end-of-horizon effects,we consider the infinite horizon case in this section. We would like to investigate if there existsa stationary optimal contract with a simple structure.Our analysis proceeds as follows. We first propose a simple stationary contract. Thenwe compute the supplier’s profit-to-go functions under such a contract. Finally we show thatwhen the model parameters lie in a certain region, the proposed contract is indeed the optimalshort-term contract.4.6.1 Retailer’s Reservation Profit-to-goWe start with the retailer’s reservation profit-to-go. Suppose the retailer decides to quit atthe beginning of period t, we make the following assumptions. If the retailer has backorder onhand, he will return the payments to customers and leave the market right away. On the otherhand, if the retailer has positive inventory, he will keep selling the goods but no longer takebackorder (because it is unprofitable due to b > (1 − δr)). Once the retailer sells out all theinventory, he will leave the market. Therefore, the retailer’s reservation profit in period t from624.6. Infinite Horizoninventory xt (without ordering) isvt(xt) =rE[min(D,xt)]− hE[xt −D]+ = r+hλ (1− e−λxt)− hxt, xt ≥ 0rxt, xt < 0.We further define U t+1(xt) as the retailer’s expected reservation profit-to-go (with no futureorders or backorders) given the previous period’s inventory xt. Clearly, when xt ≤ 0, the retailerwill reimburse his customers and leave the market in period t. Hence, we have U t+1(xt) = 0.When xt > 0, U t+1(xt) can be determined recursively asU t+1(xt) =∫ xt0{vt+1(xt+1) + δU t+2(xt+1)}dGt+1(xt+1)=∫ xt0{vt+1(xt+1) + δU t+2(xt+1)}λe−λ(xt−xt+1)dxt+1 (4.25)Over the infinite horizon, both U t+1 and U t+2 can be replaced by a stationary function U (thetime index is unnecessary). Thus, Equation (4.25) becomesU(x) =∫ x0{v(z) + δU(z)}λe−λ(x−z)dz (4.26)where x denotes the beginning inventory of the “previous” period and z denotes the beginninginventory of the “current” period.We multiply both sides of (4.26) by eλx which yields eλxU(x) =∫ x0 {v(z) + δU(z)}λeλzdz.By a transformation U˜(x) = eλxU(x), we get U˜(x) =∫ x0 {v(z)λeλz + δλU˜(z)}dz. Finally, wetake derivative on both sides and obtain the following ODE:U˜′(x) = λeλxv(x) + δλU˜(x) (4.27)Through straightforward algebra, the solution to (4.27) can be found asU˜(x) =r + hδλ+r(1− δ) + h[2− λx− δ(1− λx)]λ(1− δ)2 eλx + eδλxMUin which MU is a constant to be determined from the boundary condition U˜(0) = eλ0U(0) = 0.So we have MU = −h+r(1−δ)δλ(1−δ)2 .In conclusion,U(x) = r+hδλ e−λx +r(1−δ)+h[2−λx−δ(1−λx)]λ(1−δ)2 − h+r(1−δ)δλ(1−δ)2 e−λx(1−δ), x > 00, x ≤ 0=h(2−δ)+r(1−δ)λ(1−δ)2 − hx1−δ + r+hδλ e−λx − h+r(1−δ)δλ(1−δ)2 e−λx(1−δ), x > 00, x ≤ 0(4.28)634.6. Infinite Horizonu(x) = v(x) + δU(x)= r+hλ − hx+ δr(1−δ)+h[2−λx−δ(1−λx)]λ(1−δ)2 − h+r(1−δ)λ(1−δ)2 e−λx(1−δ), x > 0rx, x ≤ 0=h+r(1−δ)λ(1−δ)2 − hx1−δ − h+r(1−δ)λ(1−δ)2 e−λx(1−δ), x > 0rx. x ≤ 0(4.29)4.6.2 The Zero-rent Plan yR(x)In the two-period case, we observe that when the pre-order inventory is relatively small (belowx∗∗1 ), the retailer should receive exactly his reservation profit-to-go under the optimal contract.The corresponding quantity plan does not depend on the supplier’s belief and is easy to char-acterize. Motivated by the two-period problem, we compute the quantity plan yR(x) such thatthe retailer always gets his reservation profit-to-go. In other words, we propose a feasible con-tract with quantity plan yR(x). Following such a contract, the retailer receives his reservationprofit-to-go u(x) in each period. One of our goals is to find conditions such that this contractis indeed the optimal short-term contract.First, given his participation, the retailer’s expected pre-transfer profit in period t from thepost-order inventory yt is equal tovt(yt) = rE[D]− bE[D − yt]+ − hE[yt −D]+ = rλ − bλe−λyt − hyt + hλ(1− e−λyt), yt ≥ 0byt + (r − b)/λ, yt < 0,with first-order derivative v′t(yt) =(b+ h)e−λyt − h, yt ≥ 0b, yt < 0 and second-order derivativev′′t (yt) =−λ(b+ h)e−λyt , yt > 00, yt < 0.We define U(y) as the retailer’s expected profit-to-go given the post-order inventory y ofthe “previous” period. By following the proposed quantity plan yR(x), the retailer always getshis reservation profit in the “current” period. It implies U(y) =∫ y−∞ u(z)λe−λ(y−z)dz. If y ≤ 0,644.6. Infinite Horizonwe obtain U(y) =∫ y−∞ rzλe−λ(y−z)dz = ry − rλ . If y > 0, we haveU(y) =∫ y−∞u(z)λe−λ(y−z)dz=∫ 0−∞u(z)λe−λ(y−z)dz +∫ y0u(z)λe−λ(y−z)dz=∫ 0−∞rzλe−λ(y−z)dz + U(y)= − rλe−λy + U(y)=h(2− δ) + r(1− δ)λ(1− δ)2 −hy1− δ +h+ r(1− δ)δλe−λy − h+ r(1− δ)δλ(1− δ)2 e−λy(1−δ)We compare the “no-order” profit-to-go v(x)+δU(x) for the retailer with his reservation profit-to-go:v(x) + δU(x)− u(x) =−(b+ δr − r)e−λx/λ < 0, x > 0(b+ δr − r)(x− 1/λ) < 0, x ≤ 0.As b > (1− δ)r, the “no-order” profit-to-go v(x) + δU(x) is smaller than u1(x1). Similar as thetwo-period case, the retailer is forced to accommodate backorders if he is doing business withthe supplier. However, as b > (1 − δ)r, the retailer gains bargaining power by threatening toquit and avoid possible backorder penalty. As a result, the supplier needs to provide sufficientincentive to keep him in the relationship.We further define u(x) as the retailer’s profit-to-go under the contract and the pre-orderinventory x of the “current” period, i.e u(x) = v(y(x)) + δU(y(x))− s(x) in which y(x) is theorder-up-to level and s(x) is the payment to the supplier. The local IC constraint indicatesu′(x) = v′(y(x)) + δU ′(y(x)) =[b− r(1− δ)]e−λy +h+r(1−δ)1−δ e−λy(1−δ) − h1−δ , y > 0b+ δr. y ≤ 0By definition, the quantity plan yR(x) is such that the retailer receives his reservation profit-to-go. Thus it should satisfy u′(x) = v′(yR(x)) + δU ′(yR(x)) = u′(x), i.e. yR(x) solves[b− r(1− δ)]e−λy + h+ r(1− δ)1− δ e−λy(1−δ) − h1− δ =h+r(1−δ)1−δ e−λx(1−δ) − h1−δ , x > 0r, x ≤ 0(4.30)Clearly, the solution yR(x) is positive for all x. Moreover, when x ≤ 0, yR(x) is a constant,yR(0). When x > 0, yR(x) > x is uniquely determined. In fact, yR(x) has the followingproperties:Lemma 4.6.1. When x > 0,654.6. Infinite Horizon−10 −5 0 5 10 15 20 25 30 35 4001020304050607080x(a) Parameters: r = 10, c = 5, b = 3, h = 3, λ = 1and δ = 0.9−10 −5 0 5 10 15 20 25 30 35 40−505101520253035x(b) Parameters: r = 10, c = 6, b = 7, h = 5, λ = 1and δ = 0.9Figure 4.6: Supplier’s profit-to-go under yR(x)(1) the quantity plan yR(x) is strictly increasing and convex in x.(2) limx→∞[yR(x)− x] = 0.In fact, when x > 0, yR(x) satisfies [b−r(1−δ)]e−λyR(x)+h+r(1−δ)1−δ e−λyR(x)(1−δ) = h+r(1−δ)1−δ e−λx(1−δ).By taking derivative with respect to x on both sides of the equation, we will get Lemma4.6.1(1). To show Lemma 4.6.1(2), we rewrite the equation to be [b−r(1−δ)]e−λ[yR(x)−x]−δλx+h+r(1−δ)1−δ e−λ[yR(x)−x](1−δ) = h+r(1−δ)1−δ . Clearly, as x → ∞, we must have [yR(x)− x] → 0. Oth-erwise, the limit of the left-hand side of the equation would not equal to the constant h+r(1−δ)1−δ .Finally we compute the supplier’s profit-to-go under the quantity plan yR(x). We definepit(x) as the supplier’s profit-to-go from period t onwards given pre-order inventory x in periodt; Πt+1(y) as the supplier’s profit-to-go from period t+ 1 onwards given post-order inventory yin period t. pit(x) and Πt+1(y) have the following relationships:pit(x) = vt(y(x)) + δUt+1(y(x))− ut(x)− cy(x) + cx+ δΠt+1(y(x))Πt+1(y) =∫ y−∞pit+1(z)dG(z) =∫ y−∞pit+1(z)λe−λ(y−z)dzBy assumption, the supplier implements the quantity plan yR(x) in each period. In this case,we apply the contraction mapping theorem to obtain the convergence of pit and Πt as the totalnumber of periods T goes to infinity.Lemma 4.6.2. Suppose the supplier implements the quantity plan yR(x) in each period. Asthe total number of periods T → ∞, the supplier’s profit-to-go function pi = limT→∞ pit existsand is unique.In the infinite-horizon case, the supplier’s profit-to-go functions are stationary and satisfy:pi(x) = v(yR(x)) + δU(yR(x))− u(x)− cyR(x) + cx+ δΠ(yR(x)) (4.31)Π(y) =∫ y−∞pi(z)dG(z) =∫ y−∞pi(z)λe−λ(y−z)dz (4.32)Figures 4.6 demonstrates the supplier’s profit-to-go pi(x) under the quantity plan yR(x). As664.6. Infinite Horizonwe can see, pi(x) presents different structures given different cost parameters. In Figure 4.6(a),pi(x) ≥ 0 for all x. The supplier always gets positive profit by implementing yR(x). However,in Figure 4.6(b), pi(x) is first positive but later becomes negative. In this case, the supplier getsnegative profit by implementing yR(x) for some types of retailers. The supplier is better off toexclude such types of retailers. Therefore, we conjecture that the optimal short-term contractin the backorder case may involve an exclusion region in certain parameter regimes.4.6.3 The Optimal ContractWe explore if the optimal short-term contract is such that the retailer gets his reservation profitin each period. In order to show the optimality of the zero-rent contract, we use inductiveargument. Suppose it holds from the “next” period onwards, we want to show the proposedcontract with yR(x) is also optimal in the “current” period. By a similar approach as thetwo-period case, we obtain the virtual surplus anchored at the right end or left end:JR(y|x) = cx− cy + v(y) + δU(y) + δΠ(y) + [v′(y) + δU ′(y)]G(x)g(x), (4.33)JL(y|x) = cx− cy + v(y) + δU(y) + δΠ(y)− [v′(y) + δU ′(y)] (1−G(x))g(x). (4.34)We first look at the virtual surplus anchored at the right JR(y|x). Its first-order derivativeis:dJR(y|x)dy= −c+ v′(y) + δU ′(y) + δΠ′(y) + [v′′(y) + δU ′′(y)]G(x)g(x).We assume x = y0−D for some known y0. Therefore, G(x) = e−λ(y0−x) and g(x) = λe−λ(y0−x).By straightforward algebra, we have that when y < 0, dJR(y|x)dy = −c + b + δr + δΠ′(y), andwhen y > 0,dJR(y|x)dy= −c+ [b− r(1− δ)]e−λy + h+ r(1− δ)1− δ e−λy(1−δ) − h1− δ + δΠ′(y)−λ{[b− r(1− δ)]e−λy + [h+ r(1− δ)]e−λy(1−δ)} 1λ= −c− h1− δ + δh+ r(1− δ)1− δ e−λy(1−δ) + δΠ′(y).The following proposition gives us several properties of JR(y|x) which will play a role in char-acterizing the optimal contract.Proposition 4.6.3. Suppose that the supplier offers the zero-rent contract yR(x) from the“next” period onwards. For any given x, the virtual surplus anchored at the right JR(y|x)increases when y < 0 but decreases when y > 0. Moreover, JR(y|x) is concave at y > 0. In674.6. Infinite Horizon−5 −4 −3 −2 −1 0 1 2 3−70−60−50−40−30−20−1001020Retailer’s profit  profit under y(x) = max(0,x)reservation profit (a) If IR is binding at x∗ = 3.−5 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0−60−50−40−30−20−100Retailer’s profit  profit under y(x) = max(0,x)reservation profit(b) If IR is binding at x∗ = −0.1.Figure 4.7: Retailer’s profit-to-go under quantity plan y(x) = max{0, x} and his reservationprofit-to-go. (Parameters: r = 10, c = 5, b = 3, h = 3, λ = 1 and δ = 0.9.)other words, it satisfiesdJR(y|x)dy=< 0, y > 0> 0, y < 0 and d2JR(y|x)dy2< 0 when y > 0.By the inductive assumption, from the “next” period onwards, the supplier offers thecontract with quantity plan yR(x). So pi(z) = v(yR(z)) + δU(yR(z)) − u(z) − cyR(z) +cz + δΠ(yR(z)). When y ≤ 0, yR(z) = yR(0) is a constant for all z ≤ y. Therefore,pi(z) = v(yR(0)) + δU(yR(0)) − u(z) − cyR(0) + cz + δΠ(yR(0)) = pi(0) − (r − c)z. We canshow that Π(y) =∫ y−∞ pi(z)λe−λ(y−z)dz is also a linear function of y with slope −(r − c). As aresult, when y < 0, dJR(y|x)dy = −c+ b+ δr + δΠ′(y) = −c+ b+ δr − δ(r − c) = b+ δc− c > 0.When y > 0, we need to prove the result by induction. Please refer to the Appendix C for moredetails.Proposition 4.6.3 indicates that the virtual surplus JR(y|x) is maximized at point 0 for anygiven x. Suppose the IR constraint is binding at some point x∗. For x < x∗, the quantityplan y(x) = max{0, x} maximizes JR(y|x). However, Figure 4.7 shows y(x) = max{0, x} isinfeasible as the IR constraint is violated at x < x∗. We will soon prove that yR(x) is actuallythe optimal quantity plan when x < x∗.Next we consider the case x > x∗. We look at the virtual surplus anchored at the left684.6. Infinite HorizonJL(y|x). Its derivative is:dJL(y|x)dy= −c+ v′(y) + δU ′(y) + δΠ′(y) + [v′′(y) + δU ′′(y)]G(x)− 1g(x)=−c− h1−δ + δ h+r(1−δ)1−δ e−λy(1−δ) + δΠ′(y)+{[b− r(1− δ)]e−λy + [h+ r(1− δ)]e−λy(1−δ)}eλ(y0−x), y > 0−c+ b+ δr + δΠ′(y), y < 0=−c− h1−δ + δ h+r(1−δ)1−δ e−λy(1−δ) + δΠ′(y)+{[b− r(1− δ)]e−λy + [h+ r(1− δ)]e−λy(1−δ)}eλ(y0−x), y > 0−c+ b+ δc, y < 0The y that maximizes JL(y|x) is either the boundary point 0 or the solution(s) of the first-ordercondition dJL(y|x)dy = 0. Thanks to Proposition 4.6.4 below, which guarantees there exists atmost one point yL(x) > 0 such that dJL(y|x)dy = 0, we just need to compare between 0 and yL(x),and see which one leads to a larger JL(y|x).Proposition 4.6.4. Suppose the supplier offers the zero-rent contract yR(x) from the “next”period onwards. For any given x, the first-order condition dJL(y|x)dy = 0 has at most one positivesolution, denoted as yL(x). In addition, yL(x) is decreasing in x.In fact, the first-order condition can be written asc+ h1−δ − δ h+r(1−δ)1−δ e−λy(1−δ) − δΠ′(y)[b− r(1− δ)]e−λy + [h+ r(1− δ)]e−λy(1−δ) = eλ(y0−x).The right-hand side is independent of y whereas the left-hand side is decreasing in x, becausethe numerator is increasing in y by Proposition 4.6.3 yet the denominator is decreasing in y.Therefore, the solution yL(x) is unique (if exists) and is decreasing in x.We have discussed the quantity plans that maximize JR(y|x) and JL(y|x) respectively.However, it is based on the fact of no “bump” in the retailer’s profit-to-go function. If thereexists a “bump”, where the IR constraint is binding at two points but redundant in between,our previous analysis would not hold. In addition to maximizing the virtual surplus, we wouldhave to take into account the boundary conditions at the two end points when characterizingthe optimal quantity plan. As such, the existence of a “bump” would make the analysis morecomplicated. However, we are able to show if the supplier offers the proposed contract yR(x)from the “next” period onwards, the optimal contract in the “current” period will not lead toany “bump” in the retailer’s profit-to-go function.Theorem 4.6.5. Suppose that the supplier implements the zero-rent plan yR(x) from the “next”period onwards. In the “current” period, the optimal contract will not create a “bump” in theretailer’s profit-to-go function, i.e. there can not exist two points x+ and x− such that the IR694.6. Infinite Horizon0 0.5 1 1.5 2 2.5 302468101214xretailer profit under yR and ytail2.2 2.4 2.6 2.8 312.612.81313.213.413.613.8xretailer profit when yR>=ytail  ytailyRFigure 4.8: Retailer’s profit-to-go under quantity plans yR(x) and max{yL(x), x}. (Parameters:r = 10, c = 5, b = 3, h = 3, y0 = 3, λ = 1 and δ = 0.9.)constraint is binding at x+ and x− but redundant at x ∈ (x−, x+).Theorem 4.6.5 simplifies our analysis in finding the optimal short-term contract in thefollowing sense. Under the optimal contract, there must exist a point x∗ where the IR constraintis binding. Thanks to Theorem 4.6.5, we expect the optimal contract to have the followingproperty: at x < x∗ (and at x > x∗), the IR constraint is either always binding (with thepossibility of exclusion) or never binding. As a result, it suffices to only study the virtualsurplus anchored at the right JR(y|x) and at the left JL(y|x).In fact, when x < x∗, we look at JR(y|x) and we have known that y = 0 maximizes thevirtual surplus. But the IR constraint cannot be satisfied given the quantity plan y(x) =max{0, x}. It implies that the IR constraint will always be binding at x < x∗. In other words,it is optimal to implement yR(x) for x < x∗. However, when x > x∗, this may not be thecase. We found examples where, instead of yR(x), it is better for the supplier to implementmax{yL(x), x}. Interestingly, by doing so, both the supplier and the retailer are better off. SeeFigures 4.8 and 4.9.We need some conditions on the anchor point x∗ in order to satisfy the IR constraint whenx > x∗. By similar analysis as the two-period case, we require v′(yL(x∗)) + δU ′(yL(x∗)) >u′(x∗) = v′(yR(x∗))+δU ′(yR(x∗)). The following lemma provides a lower bound for the thresh-old x∗, which is xL.Lemma 4.6.6. The threshold x∗ satisfies x∗ ≥ xL, where xL is the unique solution of yR(x) =yL(x). In addition, yR(x)− yL(x) =< 0, x < xL> 0, x > xL .704.6. Infinite Horizon0 0.5 1 1.5 2 2.5 32223242526272829303132supplier profit pi under yR and ytail2 2.2 2.4 2.6 2.8 32222.52323.52424.525supplier profit when yR>=ytail  ytailyRFigure 4.9: Supplier’s profit-to-go under quantity plans yR(x) and max{yL(x), x}. (Parameters:r = 10, c = 5, b = 3, h = 3, y0 = 3, λ = 1 and δ = 0.9.)Because yR(x) strictly increases in x and yL(x) strictly decreases in x when yL(x) > x, thesolution of yR(x) = yL(x) must be unique, as shown in Figure 4.10. As we can see, the optimalshort-term contract can be much more complex than the proposed zero-rent contract yR(x). Inthe “current” period, the optimal contract is such that when x < x∗, the retailer either ordersup to yR(x) or is excluded; when x > x∗, the retailer orders up to max{yL(x), x}. The quantityplan yL(x) depends on the supplier’s belief G(x). Moreover, the optimal contract may result ina “pooling” region in which the order quantity is 0 and the supplier is unable to differentiatethe retailer’s true inventory level x. In this sense, we do not expect a stationary optimal(short-term) contract, as the supplier’s belief evolves over time in a very complex pattern.However, the proposed zero-rent contract yR(x), though not optimal, serves as a goodheuristic. We show numerically that this contract is inferior than the optimal one by only asmall percentage. But it is much simpler to understand and implement. Table 4.1 presentsthe gap between the zero-rent contract yR(x) and the optimal contract in the supplier’s profit-to-go, which can be less than 2%. Figure 4.11 illustrates a sample trajectory of pre-order andpost-order inventory levels under the zero-rent contract yR(x). We start with y0 = 0. Demandunravels and the retailer has backorders x1 < 0 at the beginning of period 1. According to thecontract, the retailer orders up to yR(0). As long as the retailer holds backorder at the beginningof the periods, his order-up-to remains the same as yR(0). Once the retailer has positive pre-order inventory xt > 0 in period 4, he orders up to a higher level yR(x4) > yR(0). In addition,when the pre-order inventory x is positive, the order-up-to level yR(x) is an strictly increasingfunction of x. As we can see, the zero-rent contract yR(x) leads to a generalized base-stockpolicy.714.6. Infinite Horizon0 0.5 1 1.5 2 2.5 3024681012yR and ytail2 2.2 2.4 2.6 2.8 322.22.42.62.833.23.4yR and ytail when yR>=ytail  ytailyRFigure 4.10: Two quantity plans yR(x) and max{yL(x), x}. (Parameters: r = 10, c = 5, b = 3,h = 3, y0 = 3, λ = 1 and δ = 0.9.)0 1 2 3 4 5 6 7 8 9 10 11 12−1.4−1.15−0.9−0.65−0.4−0.150.10.350.60.851period tinventory level  pre−order inventory xpost−order inventory yFigure 4.11: Sample inventory trajectory under the quantity plan yR(x). (Parameters: r = 10,c = 5, b = 3, h = 3, y0 = 0, λ = 1 and δ = 0.9.)724.6. Infinite Horizonx 0 0.5 1 1.5 2 2.2 2.4 2.6 2.8 3piR(x) 31.51 29.44 27.73 26.07 24.63 24.15 23.64 23.11 22.68 22.24pi∗(x) 31.51 29.44 27.73 26.07 24.63 24.48 23.9 23.35 22.81 22.29(pi∗(x)− piR(x))/pi∗(x) 0 0 0 0 0 1.38% 1.11% 0.99% 0.56% 0.25%Table 4.1: Supplier’s profit-to-go under the zero-rent contract yR(x) and the optimal contract.(Parameters: r = 10, c = 5, b = 3, h = 3, y0 = 3, λ = 1 and δ = 0.9.)x 0 0.5 1 1.5 2 2.2 2.4 2.6 2.8 3piR(x) -3.99 -5.36 -6.215 -6.6775 -6.8764 -6.91 -6.89 -6.88 -6.83 -6.79piL(x) -3.99 -5.36 -6.215 -6.6775 -6.8764 -5.93 -6.11 -6.26 -6.38 -6.47Table 4.2: Supplier’s profit-to-go under yR(x) and max{yL(x), x}. (Parameters: r = 10, c = 6,b = 7, h = 5, y0 = 3, λ = 1 and δ = 0.9.)However, we find that when the cost parameters are large enough, the supplier gets negativeprofit-to-go no matter she implements yR(x) or max{yL(x), x} (see Table 4.2). The supplier isable to improve her profit-to-go by terminating the relationship with such retailer. In this case,we can actually show that the optimal short-term contract is stationary and takes a simpleform. The optimal contract consists of a threshold x∗∗ (maybe different from the x∗ describedabove) and a base-stock policy with a positive order-up-to level yR(0). When x ≤ x∗∗, theretailer orders up to yR(0). Yet when x > x∗∗, the supplier terminates the relationship with theretailer. In other words, the optimal short-term contract in the backorder case induces partialparticipation. When the beginning inventory is high, it would be too expensive for the supplierto encourage the retailer’s participation, and the supplier would rather exclude the retailer. Sofar this result is only shown numerically. We state it as a conjecture.Conjecture 1. Suppose the cost parameters (c, b, h and r) are such that (i) v(yR(0)) +δU(yR(0))− cyR(0) + δ r−cλ e−λyR(0) < 0 and (ii) yR(0) < xL. The optimal short-term contractis stationary and takes the following form: there exists a threshold x∗∗ ≤ 0; at x ≤ x∗∗, theretailer participates and orders up to yR(0); at x > x∗∗, the retailer is excluded. The thresholdx∗∗ solves (r − c)x∗∗ = v(yR(0)) + δU(yR(0))− cyR(0) + δ r−cλ e−λ(yR(0)−x∗∗).Figure 4.12 demonstrates a sample trajectory of pre-order and post-order inventory levelsunder the contract in Conjecture 1. The corresponding threshold is x∗∗ = −0.009. We stillstart with y0 = 0. Demand is realized and the retailer has pre-order inventory x1 which isbelow the threshold. Then the retailer orders up to yR(0). As long as the retailer’s pre-orderinventory level is smaller than the threshold x∗∗ = −0.009, the order-up-to level is always yR(0).Yet at the beginning of period 5, the retailer holds positive inventory x5 > 0, the two partiesterminates their relationship and the game ends.As we can see, the optimal short-term contract in the backorder case can be drasticallydifferent from the lost-sales case. Our results yield valuable insights to practitioners. In orderto improve customer satisfaction, the supplier may require the retailer to take backorder instead734.7. Conclusion0 1 2 3 4 5 6−2.5−2−1.5−1−0.500.511.52period tinventory level  pre−order inventory xpost−order inventory yexcludedx** = −0.009Figure 4.12: Sample inventory trajectory under the quantity plan in Conjecture 1. (Parameters:r = 10, c = 6, b = 7, h = 5, y0 = 0, λ = 1 and δ = 0.9.)of simply losing excess demand. This has a huge impact on the contract design. If the costparameters lie in a certain region, the supplier’s optimal short-term contract has a simplestructure. It entails a base-stock policy and an exclusion region. However, in other cases, theoptimal short-term contract may be complex and hard to implement. As a result, the suppliermay search for some simple contract that has a good, though not optimal, performance, suchas the zero-rent contract. Alternatively, the supplier may consider switching to long-termcontracting.4.7 ConclusionWe study a dynamic adverse-selection model in which a supplier sells to a retailer with privateinventory and backorder information. Our work fills a significant gap and tackles an openproblem in the dynamic contracting literature and the supply chain management literature.First, we show that in the single-period case, the supplier’s optimal contract consists of abase-stock policy with base-stock level 0. With backorder (negative inventory) on hand, theretailer should order up to 0 whereas with positive inventory, the retailer should order nothing.He will receive zero information rent in any case. Similar results do not hold in the multi-periodsetting. In the two-period case, we demonstrate that the optimal contract in the first period canbe fairly complex. It has a threshold structure with possibly two thresholds. More interestingly,the retailer starts getting positive information rent when his inventory is high enough. It isdrastically different from the lost-sales case in which a higher inventory level makes the retailera “worse” type. In the backorder case, it may be a good thing to have high inventory. Moreover,the optimal contract may entail an exclusion region. When the retailer’s beginning inventory744.7. Conclusionfalls into that region, the supplier will terminate the relationship with him, because it is toocostly for the supplier to ensure the retailer’s participation.Finally, we analyze the optimal short-term contract in the infinite-horizon case. We showthat a stationary optimal contract may not exist in general. However, in certain cost regimes,the optimal short-term contract may have a simple threshold structure. If the retailer’s be-ginning inventory is below the threshold, he orders up to a positive base-stock level, and heobtains his reservation profit. In this case, the contract looks similar as the single-period case.However, if the retailer’s beginning inventory is beyond the threshold, he will be excluded, i.e.the supplier terminates the business relationship with the retailer.In the paper, we make a few assumptions to improve the tractability of the analysis. Forinstance, we only consider the exponential demand. In the future, we will further explore theproblem under more general demand types, such as Erlang distribution. Notice that Erlangdistribution with shape parameter 1 is exponential distribution, so we may extend the resultsto cases with shape parameters other than 1. Although there may not be a closed-form solutionunder general demand, we will pursue interesting structural properties. For instance, undermore general demand assumptions, can we still have a threshold structure? If so, how will thethreshold change when the number of no-ordering periods is larger?75Chapter 5ConclusionThe first essay explores the use of incentivized action in mobile games. To our best knowledge,our work provides the first analytical model to study incented actions. We provide sufficientconditions for the optimality of a threshold strategy of offering incented actions to low-engagedplayers and then removing them to encourage real-money purchases once a player is sufficientlyengaged. We also explore the settings where the optimality of the threshold policy breaks down.Moreover, we provide managerial insights and assist game publishers in targeting which typesof games can take most advantage of delivering incented actions. The results and modelingapproach will be useful to researchers as well as practitioners.In the future, we plan to investigate the setting where transition probabilities are unknownand therefore some statistical learning algorithm would be required. We are also interestedin the situation where engagement is difficult to define or measure and a partially observedMarkov decision process (POMDP) model would be required. Also in the age of big data, withthe increasing availability of player-level data, we would like to develop data-driven approachesto establish appropriate player behavior models, estimate game parameters, and derive insightson the impact of certain policies. Furthermore, for games hosted on mobile platforms, theplatform holder is able to make interventions into the practice of incented actions. In fact,the platform holder and the game publisher have misaligned incentives. Typically, the revenuederived from incented actions is not processed through the platform whereas in-app purchasesare. We would like to investigate the incentive misalignment problem between the platform andgame publisher, possibly as a dynamic contracting problem.The second essay studies a simple but new dynamic contract that generalizes the well-knownwholesale price-only contract and is related to well-known ideas such as double marginalization,contract structure and commitment issues. We show that the generalized price-only contractbenefits both players. Moreover, the inefficiency approaches 0 as the number of price offersn approaches infinity. We also demonstrate that for a given contract with a specific n, thewholesale prices monotonically decrease. However, somewhat surprisingly, for a fixed n, theorder quantities within the n periods may not be monotone. We provide necessary and sufficientconditions for the stationarity of the supplier’s per period profit. As one of the future researchdirections, we are interested in characterizing the bound on the performance of generalized price-only contract for a fixed n. Another interesting direction is to investigate the performance ofother contracts when subjected to the same dynamics as in this paper.The third essay contributes to the dynamic contracting literature. We analyze a dynamic76Chapter 5. Conclusionadverse-selection problem where a supplier sells to a retailer with private inventory or back-log information. Our work fills the gap in the literature and focuses on dynamic short-termcontracts. We demonstrate that the information rent (profit yielded to the retailer) under theoptimal contract may be non-monotone in the retailer’s inventory (or backlog) level. In thelost-sales case, Zhang et al. [57] shows that the retailer with a higher inventory is a “worse” typebecause he gets less information rent. However, in the backlogging case, the information rentwill sometimes increase in the retailer’s inventory (or backlog) level. Hence, the retailer with ahigher inventory can be a “better” type. More interestingly, we find that the supplier may bebetter off by excluding some types of retailers. Specially, under exponentially distributed de-mand, if the cost parameters fall into a certain regime, the optimal short-term contract entailsa base-stock order policy and a exclusion region. It is drastically different from the lost-salessetting and yields new insights to academics and practitioners. In the future, we will furtherexplore the problem under more general demand types. Although there may not be closed-formsolution under general demand, we will pursue interesting structural properties. For instance,under more general demand assumptions, can we still have a threshold structure? If so, howwill the threshold change when the number of no-ordering periods is larger?77Bibliography[1] P. Albuquerque and Y. Nevskaya. The impact of innovation on product usage: A dynamicmodel with progression in content consumption. Working Paper, 2012.[2] K. Anand, R. Anupindi, and Y. Bassok. Strategic inventories in vertical contracts. Man-agement Science, 54:1792–1804, 2008.[3] F. M. Bass. A new product growth model for consumer durables. Management Science,15(1):215–227, 1969.[4] M. Battaglini. Long-term contracting with markovian consumers. American EconomicReview, 95:637 – 658, 2005.[5] M. Battaglini and R. Lamba. Optimal dynamic contracting: the first-order approach andbeyond. Working Paper, 2015.[6] G. Berbeglia, P. Sloan, and A. Vetta. The effect of a finite time horizon in the durablegood monopoly problem with atomic consumers. Working Paper, 2016. University ofMelbourne.[7] F. Bernstein and M. Nagarajan. Competition and cooperative bargaining models in supplychains. Foundations and Trends in Technology, Information and Operations Management,5:87–145, 2012.[8] P. Bolton and M. Dewatripont. Contract Theory. MIT Press, 2005.[9] A. Burnetas, S. Gilbert, and C. Smith. Quantity discounts in single-period supply contractswith asymmetric demand information. IIE Transactions, 29(5):465–479, 2007.[10] G. Cachon and M. Lariviere. Contracting to assure supply: How to share demand forecastsin a supply chain. Management Science, 47(5):629–646, 2001.[11] Statistics Canada. Retail trade survey. Retrieved September 15, 2016, from ”http://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=2406”, 2003.[12] H.K. Cheng and Y. Liu. Optimal software free trial strategy: The impact of networkexternalities and consumer uncertainty. Information Systems Research, 23(2):488–504,2012.78Bibliography[13] H.K. Cheng, S. Li, and Y. Liu. Optimal software free trial strategy: Limited version,time-locked, or hybrid? Production and Operations Management, 24(3):504–517, 2015.[14] R. H. Coase. Durability and monopoly. Journal of Law and Economics, 15:143–149, 1972.[15] J. Connelly. Apple’s crackdown on incentivizing app installs means marketers neednew tricks. Retrieved April 29, 2016, from ”http://venturebeat.com/2014/06/21/apples-crackdown-on-incentivizing-app-installs-means-marketers-need-new-tricks/”,2014.[16] C. J. Corbett and X. Groote. A supplier’s optimal quantity discount policy under asym-metric information. Management Science, 46(3):444–450, 2000.[17] C. J. Corbett, D. Zhou, and S. C. Tang. Designing supply contracts: Contract type andinformation asymmetry. Management Science, 50(4):550–559, 2004.[18] K-M. Cutler. King quits advertising since it earns so much on Candy Crush pur-chases. Retrieved April 29, 2016, from ”http://techcrunch.com/2013/06/12/king-quits-advertising-since-it-earns-so-much-on-candy-crush-purchases/”,2013.[19] S. Dredge. Crossy Road earned $10 million in three month-s. Retrieved May 8, 2016, from ”http://www.businessinsider.com/crossy-road-earned-10-million-in-three-months-2015-3”, 2015.[20] Feryal Erhun, Pinar Keskinocak, and Sridhar Tayur. Dynamic procurement, quantitydiscounts, and supply chain efficiency. Production and Operations Management, 17(5):543–550, 2008.[21] G.E. Fruchter and S.P. Sigue´. Dynamic pricing for subscription services. Journal of Eco-nomic Dynamics and Control, 37(11):2180–2194, 2013.[22] D. Fudenberg and J. Tirole. Perfect bayesian equilibrium and sequential equilibrium.Journal of Economic Theory, 53(2):236–260, 1991.[23] H. Guo, L. Hao, T. Mukhopadhyay, and D. Sun. Selling virtual currency in digital games:Implications on gameplay and social welfare. Working paper, 2016.[24] S. Gupta, C.F. Mela, and J.M. Vidal-Sanz. The value of a “free” customer. Working paper,2009.[25] A. Ha. Supplier-buyer contracting: Asymmetric cost information and cutoff level policyfor buyer participation. Naval Research Logistics, 48(1):41–64, 2001.[26] D. L. Iglehart. The dynamic inventory problem with unknown demand distribution. Man-agement Science, 10(3):429–440, 1964.79Bibliography[27] L. Ilan and W. Xiao. Optimal long-term supply contracts with asymmetric demand infor-mation. Forthcoming in Operations Research, 2017.[28] Z. Jiang and S. Sarkar. Speed matters: The role of free software offer in software diffusion.Journal of Management Information Systems, 26(3):207–240, 2009.[29] B. Jullien. Participation constraints in adverse selection models. Journal of EconomicTheory, 93:1–47, 2000.[30] V. Krishnamurthy and D.V. Djonin. Structured threshold policies for dynamic sensorscheduling — A partially observed markov decision process approach. IEEE Transactionson Signal Processing, 55(10):4938–4957, 2007.[31] M. A. Lariviere and E. L. Porteus. Selling to the newsvendor: An analysis of price-onlycontracts. Manufacturing Service Operations Management, 3:293–305, 2001.[32] A. H. Lau and H. Lau. The newsboy problem with price-dependent demand distribution.IIE Transactions, 20(2):168–175, 1988.[33] C. Lee, V. Kumar, and S. Gupta. Designing freemium: a model of consumer usage,upgrade, and referral dynamics. Working paper, 2014.[34] T. Lewis and D. Sappington. Countervailing incentives in agency problems. Journal ofEconomic Theory, 49(2):294–313, 1989.[35] B. Libai, E. Muller, and R. Peres. The diffusion of services. Journal of Marketing Research,46(2):163–175, 2009.[36] N. Lovell. Conversion rate. Retrieved March 3, 2016, from ”http://www.gamesbrief.com/2011/11/conversion-rate/”, 2011.[37] I. Lunden. Activision Blizzard closes its $5.9b acquisition of King, makers of Can-dy Crush. Retrieved August 20, 2016, from ”http://techcrunch.com/2016/02/23/activision-blizzard-closes-its-5-9b-acquisition-of-king-makers-of-candy-crush”,2016.[38] Victor Mart´ınez de Albe´niz and David Simchi-Levi. Supplier-buyer negotiation games: E-quilibrium conditions and supply chain efficiency. Production and Operations Management,22(2):397–409, 2013.[39] McKinsey. Global media report. Retrieved November 14, 2015, from ”http://www.mckinsey.com/client_service/media_and_entertainment/latest_thinking/global_media_report_2013”, 2013.[40] R. B. Myerson. Incentive compatibility and the bargaining problem. Econometrica, 47(1):61–73, 1979.80Bibliography[41] M. Nagarajan and S. Rajagophlan. Contracting under vendor managed inventory systemsusing holding cost subsidies. Production and Operations Management, 17(2):200–210, 2008.[42] H. Nazerzadeh and G. Perakis. Menu pricing competition with private capacity constraints.Proceedings of the ACM Conference on Electronic Commerce, 2011.[43] M.F. Niculescu and D.J. Wu. Economics of free under perpetual licensing: Implicationsfor the software industry. Information Systems Research, 25(1):173–199, 2014.[44] J. M. Osborne and A. Rubinstein. A Course in Game Theory. MIT Press, 1994.[45] G. Perakis and G. Roels. The price of anarchy in supply chains: Quantifying the efficiencyof price-only contracts. Management Science, 53:1249–1268, 2007.[46] M.L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming.Wiley, 1994.[47] B. Salanie. The Economics of Contracts. MIT Press, 1997.[48] C. Schoger. 2013 Year in Review (Distimo). 2013. [Accessed 20-January-2015].[49] H. Shin and T. Tunca. Do firms invest in forecasting efficiently? the effect of competitionon deamnd forecast investments and supply chain coordination. Operations Research, 58(6):1592–1610, 2010.[50] R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT press, 1998.[51] D. Takahashi. With just 3 games, Supercell made $924m in profits on $2.3b in rev-enue in 2015. Retrieved May 22, 2016, from ”http://venturebeat.com/2016/03/09/with-just-3-games-supercell-made-924m-in-profits-on-2-3b-in-revenue-in-2015/”,2016.[52] K.T. Talluri and G.J. Van Ryzin. The Theory and Practice of Revenue Management,volume 68. Springer, 2006.[53] H.M. Taylor and S. Karlin. An Introduction to Stochastic Modeling. Academic Press, 2014.[54] T. Taylor and W. Xiao. Does a manufacturer benefit from selling to a better-forecastingretailer? Management Science, 56(9):1584–1598, 2010.[55] D. M. Topkis. Supermodularity and Complementarity. Princeton University Press, 1998.[56] H. Zhang and S. Zenios. A dynamic principal-agent model with hidden information: Se-quential optimality through truthful state revelation. Operations Research, 56(3):681–696,2008.81[57] H. Zhang, M. Nagarajan, and G. Sosic. Dynamic supplier contracts under asymmetricinventory information. Operations Research, 58(5):1380–1397, 2010.[58] P.H. Zipkin. Foundations of Inventory Management, volume 20. McGraw-Hill New York,2000.82Appendix AProofs of Results in Chapter 2Derivation of expected total value in (2.8)Given a policy y, the induced stochastic process underlying our problem is an absorbing Markovchain (for a discussion on absorbing Markov chains see Chapter III.4 of Taylor and Karlin [53]).An absorbing Markov chain is one where every state can reach (with nonzero probability) anabsorbing state. In our setting the absorbing state is the quit state −1 and (A3.3) assures thatthe quit state is reachable from every engagement level.The absorbing Markov chain structure allows for clean formulas for the total expectedreward. Policy y induces a Markov chain transition matrixP y :=[Sy sy0 1](A.1)where Sy is an n+ 1 by n+ 1 matrix with entries corresponding to the transition probabilitiesbetween engagement levels, given the policy y (see Example 9 below for an illustration). Thevector sy has entries corresponding to the quitting probabilities of the engagement levels, andthe bottom right corner “1” indicates the absorbing nature of the quitting state −1.Associated with policy y and the transition matrix P y is a fundamental matrixMy :=∞∑k=0Sk = (In+1 − S)−1 (A.2)where In+1 is the n+ 1 by n+ 1 identity matrix. The fundamental matrix is a key ingredientfor analyzing absorbing Markov chains. Its entries have the following useful interpretation: the(e, e′)th entry nye,e′ of My is the expected number of visits to engagement level e′ starting inengagement level e before being absorbed in the quit state. Using the entries of the fundamentalmatrix we can write a closed-form formula for the total expected revenue of policy y:W y(e) =∑e′∈Enye,e′r(e′, y(e′)). (A.3)An advantage of (A.3) over (2.7) is that the former is a finite sum over the number of engagementlevels and does not explicitly involve the time index t. However, this formula can be simplifiedfurther. Observe that ne,e′ = 0 for e′ < e since engagement can only increase over time (provided83Appendix A. Proofs of Results in Chapter 2the player does not quit). Hence we can write:W y(e) =∑e′≥enye,e′r(e′, y(e′)).Example 9. In this example we derive the formulas given in Example 1 using the fundamentalmatrix. For policy y1 the matrix S1 introduced in (A.1) isS1 =[qM (0)(1− τM ) qM (0)τM0 qM (1)]where the entries come from the transition probabilities in (2.5). The fundamental matrix isM1 =[11−qM (0)(1−τM )qM (0)τM1−qM (0)(1−τM )(1−qM (1))0 11−qM (1)]and the total expected rewards (2.8) for in each of the two starting engagement levels are:W 1(0) = 11−qM (0)(1−τM )qM (0)µM +qM (0)τM(1−qM (0)(1−τM ))(1−qM (1))qM (1)µMandW 1(1) = qM (1)µMqQ(1)respectively.Proof of Proposition 2.4.1The following is an important lemma to understand the nature of the fundamental matrix inour setting:Lemma A.0.1. The matrix Q is upper bidiagonal and its component is denoted by ki,j,S =k1,1 k1,2 0 . . . 0 00 k2,2 k2,3 0 . . . 0. . . . . .0 0 . . . 0 kN−1,N−1 kN−1,N0 0 0 0 . . . kN,Nthe corresponding fundamental matrix (I − S)−1 is upper triangular and its (i, j)-th entry is1(1−ki,i) if i = j and∏j−1v=i kv,v+1∏jv=i(1−kv,v)if i < j, i.e.84Appendix A. Proofs of Results in Chapter 2(I − S)−1 =1(1−k1,1)k1,2(1−k1,1)(1−k2,2) . . .∏N−2j=1 kj,j+1∏N−1j=1 (1−kj,j)∏N−1j=1 kj,j+1∏Nj=1(1−kj,j)0 1(1−k2,2)k2,3(1−k2,2)(1−k3,3) . . .∏N−1j=2 kj,j+1∏Nj=2(1−kj,j). . .0 . . . 0 1(1−kN−1,N−1)kN−1,N(1−kN−1,N−1)(1−kN,N )0 0 0 . . . 1(1−kN,N )Proof. We prove the lemma by showing (I −S)× (I −S)−1 = I where (I −S)−1 is proposedabove. Denote R as the production of (I − S) and (I − S)−1. The (i, j)-th entry of R resultsfrom the multiplication of the i-th row of (I − S) and the j-th column of (I − S)−1.The i-th row of (I−S) is (0, . . . , 0, 1− ki,i︸ ︷︷ ︸i,−ki,i+1︸ ︷︷ ︸i+1, 0, . . . , 0). For the j-th column of (I−S)−1,we consider three possible cases:1) If j < i, the j-th column of (I − S)−1 is (∏j−1v=1 kv,v+1∏jv=1(1−kv,v), . . . , 0︸︷︷︸i, 0︸︷︷︸i+1, 0, . . . , 0)T . Clearlythe (i, j)-th entry of R is 0.2) If j = i, the j-th column of (I −S)−1 is (∏j−1v=1 kv,v+1∏jv=1(1−kv,v), . . . , 1(1−ki,i)︸ ︷︷ ︸i, 0︸︷︷︸i+1, 0, . . . , 0)T . So the(i, i)-th entry of R is 1.3) If j > i, the j-th column of (I−S)−1 is (∏j−1v=1 kv,v+1∏jv=1(1−kv,v), . . . ,∏j−1v=i kv,v+1∏jv=i(1−kv,v)︸ ︷︷ ︸i,∏j−1v=i+1 kv,v+1∏jv=i+1(1−kv,v)︸ ︷︷ ︸i+1, 0, . . . , 0)T .By simply algebra, we obtain the (i, j)-th entry of R is 0.In conclusion, the (i, j)-th entry of R is 1 if i = j and is 0 otherwise. This implies that Ris an identity matrix.In our model, ki,j indicates the transition probability from state i to state j. Supposethe policy is y1, we have ke¯,e¯ = qM (e¯)(1 − τM ) and ke¯,e¯+1 = qM (e¯)τM . Suppose the policyis y2, we have ke¯,e¯ = pM (e¯)(1 − τM ) + pI(e¯)(1 − τI) and ke¯,e¯+1 = pM (e¯)τM + pI(e¯)τI . Bydefinition, the expected number of visits nye¯,e is the (e¯, e)-th entry of the fundamental matrixMy. According to Lemma A.0.1, we have n1e¯,e =qM (e¯)τM1−qM (e¯)(1−τM ) ×∏e−1j=e¯+1 kj,j+1∏ej=e¯+1(1−kj,j) and n2e¯,e =pM (e¯)τM+pI(e¯)τM1−pM (e¯)(1−τM )−pI(e¯)(1−τI) ×∏e−1j=e¯+1 kj,j+1∏ej=e¯+1(1−kj,j) . (We assume∏e¯j=e¯+1 kj,j+1 = 1). Because we onlymake local change of the policy at engagement level e¯, n1e¯,e and n2e¯,e share the same term85Appendix A. Proofs of Results in Chapter 2∏e−1j=e¯+1 kj,j+1∏ej=e¯+1(1−kj,j) where kj,j and kj,j+1 depend on the policy y1(j) for j > e¯. In fact,kj,j =qM (j)(1− τM ) if y∗(j) = 0 and j < NpM (j)(1− τM ) + pI(j)(1− τI) if y∗(j) = 1 and j < NqM (j) if y∗(j) = 0 and j = NpM (j) + pI(j) if y∗(j) = 1 and j = Nkj,j+1 =qM (j)τM if y∗(j) = 0 and j < NpM (j)τM + pI(j)τI if y∗(j) = 0 and j < NMoreover, we find out that ny1(e¯+1)e¯+1,e =∏e−1j=e¯+1 kj,j+1∏ej=e¯+1(1−kj,j) . Hence, we rewrite n1e¯,e =qM (e¯)τM1−qM (e¯)(1−τM )ny1(e¯+1)e¯+1,eand n2e¯,e =pM (e¯)τM+pI(e¯)τM1−pM (e¯)(1−τM )−pI(e¯)(1−τI)ny1(e¯+1)e¯+1,e for all e > e¯. Finally, the progression effect is e-quivalent to the following:∆n(e|e¯) = n2e¯,e − n1e¯,e= pM (e¯)τM+pI(e¯)τM1−pM (e¯)(1−τM )−pI(e¯)(1−τI)ny1(e¯+1)e¯+1,e − qM (e¯)τM1−qM (e¯)(1−τM )ny1(e¯+1)e¯+1,e= [ pM (e¯)τM+pI(e¯)τM1−pM (e¯)(1−τM )−pI(e¯)(1−τI) −qM (e¯)τM1−qM (e¯)(1−τM ) ]ny1(e¯+1)e¯+1,e= pI(e¯){τI−α(e¯)τM+qM (e¯)[(1−τI)τM−(1−τM )τI ]}[1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]ny1(e¯+1)e¯+1,e (A.4)Since the denominator of (A.4) is positive and ny1(e¯+1)e¯+1,e is positive, the sign of ∆n(e|e¯) is com-pletely determined by the term τI − α(e¯)τM + qM (e¯)[(1 − τI)τM − (1 − τM )τI ] for all e > e¯.It is only affected by e¯ but not e. It means that the progression effect is uniform in sign withrespect to e.Proof of Proposition 2.4.2By definition, nye¯,e¯ is the (e¯, e¯)-th entry of the fundamental matrix Ny. According to LemmaA.0.1 if the policy is y1, k1e¯,e¯ = qM (e¯)(1 − τM ) and thereby n1e¯,e¯ = 11−qM (e¯)(1−τM ) . If the policyis y2, k2e¯,e¯ = pM (e¯)(1− τM ) + pI(e¯)(1− τI) and consequently n2e¯,e¯ = 11−pM (e¯)(1−τM )−pI(e¯)(1−τI) .Therefore, the retention effect is equal to∆n(e¯|e¯) = n2e¯,e¯ − n1e¯,e¯= 11−pM (e¯)(1−τM )−pI(e¯)(1−τI) −11−qM (e¯)(1−τM )= [1−qM (e¯)(1−τM )]−[1−pM (e¯)(1−τM )−pI(e¯)(1−τI)][1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]= pI(e¯)[(1−τI)−α(e¯)(1−τM )][1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]where the last equality comes from the fact qM (e¯) = pM (e¯) + α(e¯)pI(e¯). The sign of ∆n(e¯|e¯)completely depends on (1 − τI) − α(e¯)(1 − τM ). Under Assumptions 2.3.1–2.3.4, we have86Appendix A. Proofs of Results in Chapter 2(1 − τI) ≥ (1 − τM ) ≥ α(e¯)(1 − τM ). Hence the retention effect is always nonnegative, i.e∆n(e¯|e¯) ≥ 0 for all e¯.Proof of Theorem 2.5.2In order to prove the Theorem, we first introduce the following lemma.Lemma A.0.2. For any e = 1, . . . , N , W (e, y = 0) ≥ qM (e)µM1−qM (e) and W (e, y = 1) ≥pM (e)µM+pI(e)µI1−pM (e)−pI(e)Proof of Lemma A.0.2: The proof is by induction. Clearly, at the highest engagementlevel e = N , we have W (N, y = 0) = qM (N)µM1−qM (N) and W (N, y = 1) =pM (N)µM+pI(N)µI1−pM (N)−pI(N) . Nowsuppose it holds for level j ≥ e+ 1, we would like to show that the result still holds for level e.W (e, y = 0)− qM (e)µM1− qM (e)=qM (e)µM1− qM (e)(1− τM ) +qM (e)τM1− qM (e)(1− τM )W (e+ 1)−qM (e)µM1− qM (e)=[qM (e)µM + qM (e)τMW (e+ 1)][1− qM (e)]− qM (e)µM [1− qM (e)(1− τM )][1− qM (e)(1− τM )][1− qM (e)]=qM (e)τM [W (e+ 1)(1− qM (e))− qM (e)µM ][1− qM (e)(1− τM )][1− qM (e)]By the inductive assumption, W (e + 1) ≥ W (e + 1, y = 0) ≥ qM (e+1)µM1−qM (e+1) ≥qM (e)µM1−qM (e) , we finallyobtain W (e, y = 0)− qM (e)µM1−qM (e) ≥ 0. Similarly,W (e, y = 1)− pM (e)µM + pI(e)µI1− pM (e)− pI(e)=pM (e)µM + pI(e)µI1− pM (e)(1− τM )− pI(e)(1− τI) +[pM (e)τM + pIτI ]W (e+ 1)1− pM (e)(1− τM )− pI(e)(1− τI) −pM (e)µM + pI(e)µI1− pM (e)− pI(e)=(pM (e)τM + pIτI)[W (e+ 1)(1− pM (e)− pI(e))− pM (e)µM − pI(e)µI ][1− pM (e)(1− τM )− pI(e)(1− τI)][1− pM (e)− pI(e)]By the inductive assumption, we have W (e + 1) ≥ W (e + 1, y = 1) ≥ pM (e+1)µM+pI(e+1)µI1−pM (e+1)−pI(e+1) ≥pM (e)µM+pI(e)µI1−pM (e)−pI(e) . Therefore, W (e, y = 1) −pM (e)µM+pI(e)µI1−pM (e)−pI(e) ≥ 0. This completes the proof forLemma A.0.2.We return to the proof of Theorem 2.5.2, to show the optimal value function W (e) isincreasing in e. It suffices to show W (e, y = 1) ≤ W (e + 1) and W (e, y = 0) ≤ W (e + 1) forany e < N , in that we will have W (e) = max{W (e, y = 1),W (e, y = 0)} ≤W (e+ 1). First, we87Appendix A. Proofs of Results in Chapter 2compare W (e, y = 1) and W (e+ 1).W (e, y = 0)−W (e+ 1)=qM (e)µM1− qM (e)(1− τM ) +qM (e)τM1− qM (e)(1− τM )W (e+ 1)−W (e+ 1)=qM (e)µM1− qM (e)(1− τM ) −1− qM (e)1− qM (e)(1− τM )W (e+ 1)=1− qM (e)1− qM (e)(1− τM ) [qM (e)µM1− qM (e) −W (e+ 1)] ≤ 0 (A.5)The inequality (A.5) holds because qM (e)µM1−qM (e) ≤qM (e+1)µM1−qM (e+1) ≤ W (e + 1, y = 0) ≤ W (e + 1).Similarly, we compare W (e, y = 1) and W (e+ 1).W (e, y = 1)−W (e+ 1)=pM (e)µM + pI(e)µI1− pM (e)(1− τM )− pI(e)(1− τI) +pM (e)τM + pI(e)τI1− pM (e)(1− τM )− pI(e)(1− τI)W (e+ 1)−W (e+ 1)=pM (e)µM + pI(e)µI1− pM (e)(1− τM )− pI(e)(1− τI) −1− pM (e)− pI(e)1− pM (e)(1− τM )− pI(e)(1− τI)W (e+ 1)=1− pM (e)− pI(e)1− pM (e)(1− τM )− pI(e)(1− τI) [pM (e)µM + pI(e)µI1− pM (e)− pI(e) −W (e+ 1)] ≤ 0 (A.6)where the inequality (A.6) holds since pM (e)µM+pI(e)µI1−pM (e)−pI(e) ≤pM (e+1)µM+pI(e+1)µI1−pM (e+1)−pI(e+1) ≤ W (e + 1, y =1) ≤W (e+ 1).Finally, since W (e) = max{W (e, y = 1),W (e, y = 0)} and we have shown that bothW (e, y = 1) and W (e, y = 0) are no greater than W (e + 1), we conclude that W (e) ≤ W (e +1).Proof of Proposition 2.5.1We denote W 2(e¯) − W 1(e¯) = C(e¯) + F (e¯), where C(e¯) represents the “current” benefits ofoffering incented actions and F (e¯) represents the “future” benefits. In order to prove theoptimality of the myopic policy, we first take a close look at C(e¯) and F (e¯). By definition,C(e¯) = pM (e¯)µM+pI(e¯)µI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) −qM (e¯)µM1−qM (e¯)(1−τM )= {[pM (e¯)µM+pI(e¯)µI ][1−qM (e¯)(1−τM )]−qM (e¯)µM [1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]}[1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]= pI(e¯)[1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]{µI − α(e¯)µM + qM (e¯)[(1− τI)µM − (1− τM )µI ]}F (e¯) = { pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) −qM (e¯)τM1−qM (e¯)(1−τM )}{∑e′>e¯ny1e¯+1,e′r(e′, y(e′))}= {[pM (e¯)τM+pI(e¯)τI ][1−qM (e¯)(1−τM )]−qM (e¯)τM [1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]}[1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)] {∑e′>e¯ny1e¯+1,e′r(e′, y(e′))}=pI(e¯){∑e′>e¯ ny1(e¯+1)e¯+1,e′ r(e′,y(e′))}[1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]{τI − α(e¯)τM + qM (e¯)[(1− τI)τM − (1− τM )τI ]}88Appendix A. Proofs of Results in Chapter 2We further defineδ1(e¯) = µI − α(e¯)µM + qM (e¯)[(1− τI)µM − (1− τM )µI ]δ2(e¯) = τI − α(e¯)τM + qM (e¯)[(1− τI)τM − (1− τM )τI ]Clearly, the sign of C(e¯) is determined by δ1(e¯) and the sign of F (e¯) is determined by δ2(e¯).Now suppose τI/τM = µI/µM , it leads to δ1(e¯)/µM = δ2(e¯)/τM . Therefore, δ1(e¯) and δ2(e¯)must have the same sign. It implies that whenever δ1(e¯) is positive, we must have δ2(e¯) positiveand thereby W 2(e¯)−W 1(e¯) positive. Similarly, whenever δ1(e¯) is negative, we must have δ2(e¯)negative and thereby W 2(e¯)−W 1(e¯) negative.Notice that the previous analysis does not rely on the policy for higher engagement levele > e¯. Even if we fix y1(e) to be the optimal action which is solved by backward induction fore > e¯, we still have that the “current” benefit C(e¯) and the “future” benefit F (e¯) share thesame sign. Therefore, the optimal action at engagement level e¯ can be determined by whetherC(e¯) is positive or negative. Equivalently speaking, the myopically-optimal policy will be theoptimal policy.Proof of Proposition 2.5.4By definition, the revenue effect is∆r(e¯) = pM (e¯)µM + pI(e¯)µI − qM (e¯)µM = pI(e¯)[µI − α(e¯)µM ]where the last equality comes from the fact qM (e¯) = pM (e¯) + α(e¯)pI(e¯).First of all, as pI(e¯) is non-negative, the sign of ∆r(e¯) is determined by µI −α(e¯)µM whichis decreasing in e¯. Thus if ∆r(e¯) < 0 for some e¯, we will also have ∆r(e′) < 0 for all e′ ≥ e¯.Moreover, both pI(e¯) and µI −α(e¯)µM are nonincreasing in e¯, so we conclude that ∆r(e¯) isnonincreasing in e¯ whenever µI − α(e¯)µM > 0, i.e. whenever ∆r(e¯) > 0.Proof of Proposition 2.5.6(a) Recall the expression for ∆n(e|e¯).∆n(e|e¯) = pI(e¯)ny1(e¯+1)e¯+1,e[1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]{τI − α(e¯)τM + qM (e¯)[(1− τI)τM − (1− τM )τI ]}The sign of ∆n(e|e¯) is completely decided by δ2(e¯) = τI − α(e¯)τM + qM (e¯)[(1 − τI)τM − (1 −τM )τI ] = τI − α(e¯)τM + qM (e¯)[τM − τI ].Assumption 2.5.5 indicates that −α(e¯)τM + qM (e¯)τM will decrease in e¯. Besides, −qM (e¯)τIwill also decrease in e¯. Therefore, δ2(e¯) will be a decreasing function of e¯. It implies ∆n(e|e¯)satisfies the following property: if ∆n(e|e¯) > 0 for some e¯, then ∆n(e|e′) > 0 for all e′ ≤ e¯; andif ∆n(e|e¯) < 0 for some e¯, then ∆n(e|e′) < 0 for all e′ ≥ e¯89Appendix A. Proofs of Results in Chapter 2(b) From the previous analysis, we have already seen that the sign of C(e¯) only dependson the term δ1(e¯) = µI − α(e¯)µM + qM (e¯)[(1− τI)µM − (1− τM )µI ] and the sign of F (e¯) onlydepends on the term δ2(e¯) = τI − α(e¯)τM + qM (e¯)[τM − τI ]. From (a), we have already shownthat δ2(e¯) will decrease in e¯. Therefore, if F (e¯) > 0 for some e¯, then F (e¯) > 0 for all e′ ≤ e¯; ifF (e¯) < 0 for some e¯, then F (e¯) < 0 for all e′ ≥ e¯.Similarly, Assumption 2.5.5 ensures that −α(e¯)µM + qM (e¯)(1 − τI)µM will also decreasein e¯, in that α(e¯ + 1)µM − qM (e¯ + 1)(1 − τI)µM − α(e¯)µM + qM (e¯)(1 − τI)µM = [α(e¯ + 1) −α(e¯)]µM − [qM (e¯+ 1)− qM (e¯)]µM (1− τI) ≥ µM [(α(e¯+ 1)− α(e¯))− (qM (e¯+ 1)− qM (e¯))] ≥ 0.In addition, −qM (e¯)(1 − τM )µI will decrease in e¯ as well. As a result, δ1(e¯) will also be andecreasing function of e¯. Hence C(e¯) satisfies the following property: if C(e¯) > 0 for some e¯,then C(e¯) > 0 for all e′ ≤ e¯; if C(e¯) < 0 for some e¯, then C(e¯) < 0 for all e′ ≥ e¯.Proof of Theorem 2.5.8In order to prove the optimal policy is a threshold policy, it suffices to show if there exists somee¯ such that W (e¯, y = 1)−W (e¯, y = 0) > 0, then we must have W (e, y = 1)−W (e, y = 0) > 0for all e < e¯.First of all, Assumption 2.5.3 guarantees that it is optimal not to offer incented action atthe highest engagement level. Because W (N, y = 1) = pM (N)µM+pI(N)µIpQ(N) <pM (N)µM+pI(N)µMpQ(N)=qM (N)µMqQ(N)= W (N, y = 0). Suppose that W (e¯, y = 1)−W (e¯, y = 0) > 0 for some e¯ < N whereW (e¯, y = 1)−W (e¯, y = 0) = pI(e¯)[1−qM (e¯)(1−τM )][1−pM (e¯)(1−τM )−pI(e¯)(1−τI)]{δ1(e¯) + δ2(e¯)∑e>e¯ny∗(e¯+1)e¯+1,e r(e, y(e))}so it is equivalent to assume δ1(e¯) + δ2(e¯)∑e>e¯ ny∗(e¯+1)e¯+1,e r(e, y(e)) > 0. It implies that atleast one of δ1(e¯) and δ2(e¯) has to be positive.We would like to show W (e¯ − 1, y = 1) −W (e¯ − 1, y = 0) > 0. It suffices to show δ1(e¯ −1) + δ2(e¯− 1)∑e>e¯−1 ny∗(e¯)e¯,e r(e, y(e)) > 0. Now we consider three possible scenarios:(1.1) Suppose δ1(e¯) > 0 and δ2(e¯) > 0. We have already proven that δ1(e¯) and δ2(e¯)are decreasing functions in e¯ under Assumption 2.5.5. Therefore, δ1(e¯ − 1) > δ1(e¯) > 0 andδ2(e¯− 1) > δ2(e¯) > 0. Clearly, we have W (e¯− 1, y = 1)−W (e¯− 1, y = 0) > 0 in this case.(1.2) Suppose δ1(e¯) < 0 but δ2(e¯) > 0 (which may happen only if µI/µM < τI/τM ). We firstprove under Assumption 2.5.7, δ2(e¯− 1)∑e>e¯−1 ny∗(e¯)e¯,e r(e, y(e)) > δ2(e¯)∑e>e¯ ny∗(e¯+1)e¯+1,e r(e, y(e)).Notice thatδ2(e¯− 1)∑e>e¯−1ny∗(e¯)e¯,e r(e, y(e)) = δ2(e¯− 1){ny∗(e¯)e¯,e¯ r(e¯, y∗(e¯)) + ny∗(e¯)e¯,e¯+1r(e¯+ 1, y∗(e¯+ 1)) + · · ·+ ny∗(e¯)e¯,N r(N, y∗(N))}δ2(e¯)∑e>e¯ny∗(e¯+1)e¯+1,e r(e, y(e)) = δ2(e¯){ny∗(e¯+1)e¯+1,e¯+1r(e¯+ 1, y∗(e¯+ 1)) + · · ·+ ny∗(e¯+1)e¯+1,N r(N, y∗(N))}For e > e¯, we have such relationship n1e¯,e =pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI)ny∗(e¯+1)e¯+1,e since y∗(e¯) = 1.90Appendix A. Proofs of Results in Chapter 2As a result,δ2(e¯− 1)∑e>e¯−1ny∗(e¯)e¯,e r(e, y(e))= δ2(e¯− 1)ny∗(e¯)e¯,e¯ r(e¯, y∗(e¯)) + δ2(e¯− 1) pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI)∑e>e¯ny∗(e¯+1)e¯+1,e r(e, y(e))Next we are going to show δ2(e¯ − 1) pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) > δ2(e¯) under Assumption 2.5.7.Since δ2(e¯ − 1) > δ2(e¯) > 0, we can compare δ2(e¯)δ2(e¯−1) withpM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) . For theratio δ2(e¯)δ2(e¯−1) , we haveδ2(e¯)δ2(e¯−1) =τI−α(e¯)τM+qM (e¯)[τM−τI ]τI−α(e¯−1)τM+qM (e¯−1)[τM−τI ] =τI/τM−α(e¯)+qM (e¯)[1−τI/τM ]τI/τM−α(e¯−1)+qM (e¯−1)[1−τIτM ]= 1−qM (e¯)1−qM (e¯−1) +[α(e¯−1)−qM (e¯−1)] 1−qM (e¯)1−qM (e¯−1)−[α(e¯)−qM (e¯)]τI/τM−α(e¯−1)+qM (e¯−1)[1−τI/τM ] ≤1−α(e¯)1−α(e¯−1) (A.7)Because [α(e¯− 1)− qM (e¯− 1)] 1−qM (e¯)1−qM (e¯−1) − [α(e¯)− qM (e¯)] < 0 and 1− qM (e¯− 1) > 0, the ratioδ2(e¯)δ2(e¯−1) will increase in τI/τM and reach maximum when τI/τM = 1. We achieve (A.7).Finally, Assumption 2.5.7 claims that 1−α(e¯)1−α(e¯−1) ≤ pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) . Hence, we endup with δ2(e¯− 1) pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI) > δ2(e¯). We further haveδ2(e¯− 1)∑e>e¯−1ny∗(e¯)e¯,e r(e, y(e))= δ2(e¯− 1)n2e¯,e¯r(e¯, y∗(e¯)) + δ2(e¯− 1) pM (e¯)τM+pI(e¯)τI1−pM (e¯)(1−τM )−pI(e¯)(1−τI)∑e>e¯ny∗(e¯+1)e¯+1,e r(e, y(e))> δ2(e¯)∑e>e¯ny∗(e¯+1)e¯+1,e r(e, y(e))Finally we concludeδ1(e¯− 1) + δ2(e¯− 1)∑e>e¯−1ny∗(e¯)e¯,e r(e, y(e)) > δ1(e¯) + δ2(e¯)∑e>e¯ny∗(e¯+1)e¯+1,e r(e, y(e)) > 0equivalently, we have W (e¯− 1, y = 1)−W (e¯− 1, y = 0) > 0 in this case.(1.3) Suppose δ1(e¯) > 0 but δ2(e¯) < 0 (which may happen only if µI/µM > τI/τM ). Ifδ2(e¯− 1) ≥ 0, we easily get δ1(e¯− 1) + δ2(e¯− 1)W (e¯) > δ1(e¯) + δ2(e¯− 1)W (e¯) > 0. Else if 0 >δ2(e¯− 1) ≥ δ2(e¯), Theorem 2.5.2 indicates that 0 ≤W (e¯) ≤W (e¯+ 1), therefore we have δ2(e¯−1)W (e¯) > δ2(e¯)W (e¯) > δ2(e¯)W (e¯ + 1). Finally, δ1(e¯ − 1) + δ2(e¯ − 1)∑e>e¯−1 ny∗(e¯)e¯,e r(e, y(e)) =δ1(e¯− 1) + δ2(e¯− 1)W (e¯) > δ1(e¯) + δ2(e¯)W (e¯+ 1) = δ1(e¯) + δ2(e¯)∑e>e¯ ny∗(e¯+1)e¯+1,e r(e, y(e)) > 0.Therefore, W (e¯− 1, y = 1)−W (e¯− 1, y = 0) > 0.In conclusion, we have shown that once W (e¯, y = 1)−W (e¯, y = 0) > 0 for some e¯, we mustalso have W (e¯ − 1, y = 1) −W (e¯ − 1, y = 0) > 0. As a result, the optimal policy should be aforward threshold policy.91Appendix A. Proofs of Results in Chapter 2Example of a non-threshold policy when Assumption 2.5.5 is violatedConsider the following two engagement level example. Assume µM = 1, µI = 0.27, τM = 0.99,τI = 0.25. At level 0, pM (0) = 0.23, pI(0) = 0.54, α(0) = 0.75 and thereby qM (0) = 0.635.At level 1, pM (1) = 0.34, pI(1) = 0.52, α(1) = 0.81 and thereby qM (1) = 0.7612. At level 2,pM (2) = 0.42, pI(2) = 0.45, α(2) = 1 and thereby qM (1) = 0.87.The optimal policy is y∗ = (0, 1, 0). We use backward induction. At the highest level 2, wehave y∗(2) = 0 and W (2) = 0.87/0.13 = 6.692. At level 1,W (1, y = 1) = pM (1)µM+pI(1)µI1−pM (1)(1−τM )−pI(1)(1−τI) +pM (1)τM+pI(1)τI(1−pM (1)(1−τM )−pI(1)(1−τI))qM (2)µMqQ(2)= 0.48040.6066 +0.46660.6066(6.692) = 0.7920 + 0.7692(6.692) = 5.9395W (1, y = 0) = qM (1)µM1−qM (1)(1−τM ) +qM (1)τM1−qM (1)(1−τM )qM (2)µMqQ(2)= 0.76120.9924 +0.75360.9924(6.692) = 0.7670 + 0.7594(6.692) = 5.849therefore y∗(1) = 1 and W (1) = W (1, y = 1) = 5.9395. Moreover, C(1) = 0.7920 − 0.7670 =0.025 and F (1) = (0.7692− 0.7594)(6.692) = 0.0098(6.692) = 0.066. Finally, we look at level 0.W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.37580.5927 +0.36270.5927(5.9395) = 0.6340 + 0.6119(5.9395) = 4.2684W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.6350.9937 +0.62870.9937(5.9395) = 0.6391 + 0.6327(5.9395) = 4.3969as we can see y∗(0) = 0 and W (0) = W (0, y = 0) = 4.3969. Besides, C(0) = 0.6340− 0.6391 =−0.0051 and F (0) = (0.6119 − 0.6327)(5.9395) = −0.0208(5.9395) = −0.1232. The optimalpolicy is not a threshold policy.In fact, Assumption 2.5.5 is violated because α(1) − α(0) = 0.81 − 0.75 = 0.06 whileqM (1) − qM (0) = 0.7612 − 0.635 = 0.1262. Assumption 2.5.7 is satisfied since 1 − α(1) =1− 0.81 = 0.19 and (1− α(0)) pM (1)τM+pI(1)τI(1−pM (1)(1−τM )−pI(1)(1−τI)) = (1− 0.75)0.46660.6066 = 0.1923.Example of a non-threshold policy when Assumption 2.5.7 is violatedConsider the following two engagement level example. Assume µM = 1, µI = 0.2, τM = 0.91,τI = 0.47. At level 0, pM (0) = 0.03, pI(0) = 0.51, α(0) = 0.59 and thereby qM (0) = 0.3309.At level 1, pM (1) = 0.05, pI(1) = 0.5, α(1) = 0.62 and thereby qM (1) = 0.36. At level 2,pM (2) = 0.34, pI(2) = 0.45, α(2) = 1 and thereby qM (1) = 0.79.The optimal policy is y∗ = (0, 1, 0). We use backward induction. At the highest level 2, we92Appendix A. Proofs of Results in Chapter 2have y∗(2) = 0 and W (2) = 0.79/0.21 = 3.7619. At level 1,W (1, y = 1) = pM (1)µM+pI(1)µI1−pM (1)(1−τM )−pI(1)(1−τI) +pM (1)τM+pI(1)τI(1−pM (1)(1−τM )−pI(1)(1−τI))qM (2)µMqQ(2)= 0.150.7305 +0.28050.7305(3.7619) = 0.2053 + 0.3840(3.7619) = 1.6498W (1, y = 0) = qM (1)µM1−qM (1)(1−τM ) +qM (1)τM1−qM (1)(1−τM )qM (2)µMqQ(2)= 0.360.9676 +0.32760.9676(3.7619) = 0.3721 + 0.3386(3.7619) = 1.6459therefore y∗(1) = 1 and W (1) = W (1, y = 1) = 1.6498. Moreover, C(1) = 0.2053 − 0.3721 =−0.1668 and F (1) = (0.3840− 0.3386)(3.7619) = 0.0454(3.7619) = 0.1708. Finally, we look atlevel 0.W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.13200.7270 +0.26700.7270(1.6498) = 0.1816 + 0.3673(1.6498) = 0.7876W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.33090.9702 +0.30110.9702(1.6498) = 0.3411 + 0.3104(1.6498) = 0.8532as we can see y∗(0) = 0 and W (0) = W (0, y = 0) = 0.8532. Besides, C(0) = 0.1816− 0.3411 =−0.1595 and F (0) = (0.3673− 0.3104)(1.6498) = 0.0569(1.6498) = 0.0939. The optimal policyis not a threshold policy.In fact, Assumption 2.5.5 is satisfied because α(1) − α(0) = 0.62 − 0.59 = 0.03 whileqM (1)− qM (0) = 0.3600−0.3309 = 0.0291. But Assumption 2.5.7 is violated since (1−α(1)) =1− 0.62 = 0.38 and (1− α(0)) pM (1)τM+pI(1)τI(1−pM (1)(1−τM )−pI(1)(1−τI)) = (1− 0.59)0.28050.7305 = 0.1574.Proof of Proposition 2.6.1We will restrict ourselves only to threshold policies. According to the backward induction, inorder to prove the optimal threshold is non-decreasing in µI , we only need to show W2(e) −W 1(e) is non-decreasing in µI given that y1(e′) = 0 for e′ > e. Because the optimal threshold e∗is solved by W 2(e∗)−W 1(e∗) > 0 where y1(e′) = 0 for e′ > e∗ and W 2(e∗+1)−W 1(e∗+1) ≤ 0where y1(e′) = 0 for e′ > e∗ + 1.We have already characterized the explicit expression for W 2(e)−W 1(e) which isW 2(e)−W 1(e) =pI(e)[1−qM (e)(1−τM )][1−pM (e)(1−τM )−pI(e)(1−τI)]{δ1(e) + δ2(e)∑e′>e ny1(e+1)e+1,e′ r(e′, y(e′))}, e < NpI(N)pQ(N)qQ(N){µI − α(N)µM + qM (N)(µM − µI)}, e = Nwhere δ1(e) = [µI − α(e)µM + qM (e)(1− τI)µM − qM (e)(1− τM )µI ]δ2(e) = [τI − α(e)τM + qM (e)(τM − τI)]93Appendix A. Proofs of Results in Chapter 2Clearly, W 2(N) −W 1(N) will increase in µI . For any e < N , δ1(e) will increase in µI whileδ2(e) will keep constant. In addition, both ny1(e+1)e+1,e′ and r(e′, y(e′)) will remain the same sincewe fix y1(e′) = 0 unchanged for all e′ > e. Hence, W 2(e) −W 1(e) will increase in µI . Let eˆ∗be the largest engagement level such that W 2(e) −W 1(e) > 0. By definition, eˆ∗ is actuallythe new optimal threshold under a larger µI . Since originally W2(e∗) −W 1(e∗) > 0 and thedifference W 2(e)−W 1(e) is increasing in µI , we conclude that eˆ∗ ≥ e∗. The optimal thresholdmust be non-decreasing in µI .Proof of Proposition 2.6.2Similarly as Proposition 2.6.1, we will still restrict ourselves to threshold policies. It suffices toshow W 2(e)−W 1(e) is non-decreasing in τI given that y1(e′) = 0 for e′ > e.Obviously, W 2(N) − W 1(N) does not depend on τI . For e < N , since y1(e′) = 0 fore′ > e, we have r(e′, y(e′)) and ny1(e+1)e+1,e′ unrelated with τI . Therefore, W1(e) = qM (e)µM1−qM (e)(1−τM ) +qM (e)τM1−qM (e)(1−τM )∑e′>e ny1(e+1)e+1,e′ r(e′, y(e′)) will not be affected by τI . Now we would like to showW 2(e) = pM (e)µM+pI(e)µI1−pM (e)(1−τM )−pI(e)(1−τI) +pM (e)τM+pI(e)τI1−pM (e)(1−τM )−pI(e)(1−τI)∑e′>e ny1(e+1)e+1,e′ r(e′, y(e′)) will in-crease in τI . In fact, if τI increases by  > 0, we have[ pM (e)µM+pI(e)µI1−pM (e)(1−τM )−pI(e)(1−τI−) +pM (e)τM+pI(e)(τI+)1−pM (e)(1−τM )−pI(e)(1−τI−)∑e′>eny1(e+1)e+1,e′ r(e′, y(e′))]− [ pM (e)µM+pI(e)µI1−pM (e)(1−τM )−pI(e)(1−τI) +pM (e)τM+pI(e)τI1−pM (e)(1−τM )−pI(e)(1−τI)∑e′>eny1(e+1)e+1,e′ r(e′, y(e′))]=pI(e)qQ(e)[1−pM (e)(1−τM )−pI(e)(1−τI−)][1−pM (e)(1−τM )−pI(e)(1−τI)]{∑e′>eny1(e+1)e+1,e′ r(e′, y(e′))− pM (e)µM+pI(e)µIpQ(e) }=pI(e)qQ(e)[1−pM (e)(1−τM )−pI(e)(1−τI−)][1−pM (e)(1−τM )−pI(e)(1−τI)]{Wy1(e+ 1)− pM (e)µM+pI(e)µIpQ(e) } > 0(A.8)where W y1(e + 1) is the revenue at engagement level e + 1 if Publisher follows the policyy1(e′) = 0 for all e′ > e. The inequality (A.8) holds because of Lemma A.0.2. As a result,W 2(e) will increase in τI and consequently W2(e) −W 1(e) will increase in τI for all e < N .This implies that the optimal threshold must be non-decreasing in τI .Details of Example 6Consider the following two engagement level example. Assume µM = 1, µI = 0.05, τM = 0.8,τI = 0.2. At level 0, pM (0) = 0.3, pI(0) = 0.5, α(0) = 0.7 and thereby qM (0) = 0.65. At level1, pM (1) = 0.5, pI(1) = 0.4, α(1) = 1 and thereby qM (1) = 0.9.Because α(1) = 1, we easily get y∗(1) = 0 and W (1) = qM (1)µM1−qM (1) = 9. Now we solve for level94Appendix A. Proofs of Results in Chapter 20. At level 0,W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.3250.54 +0.340.54(9) = 0.6019 + 0.6296(9) = 6.2683W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.650.87 +0.520.87(9) = 0.7471 + 0.5977(9) = 6.1264hence y∗(0) = 1 and W (0) = max {W (0, y = 1),W (0, y = 0)} = 6.2683.Now we change the parameters as follows: µM = 1, µI = 0.05, τM = 0.8, τI = 0.25. At level0, pM (0) = 0.1, pI(0) = 0.7, α(0) = 0.7 and thereby qM (0) = 0.59. At level 1, pM (1) = 0.3,pI(1) = 0.6, α(1) = 1 and thereby qM (1) = 0.9.We still have y∗(1) = 0 and W (1) = qM (1)µM1−qM (1) = 9. But at level 0,W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.1350.425 +0.2250.425(9) = 0.3176 + 0.5294(9) = 5.0823W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.590.882 +0.4720.882(9) = 0.6689 + 0.5351(9) = 5.4852hence y∗(0) = 0 and W (0) = max {W (0, y = 1),W (0, y = 0)} = 5.4852.Suppose the incented action becomes so attractive (e.g. Candy Crush), it will increase τI .At the same time it will attract from Monetization to Incented action. In other words, it willincrease pI and decrease pM . From the above example, we observe that the optimal thresholdmay decrease.Details of Example 8Consider the following two engagement level example. Assume µM = 1, µI = 0.0001, τM = 0.01,τI = 0.009. At level 0, pM (0) = 0.05, pI(0) = 0.68. At level 1, pM (1) = 0.3, pI(1) = 0.65.Besides, we set α step size be 0.6, i.e. α(1) = α(0) + 0.6.We start with α(0) = 0.25 and α(1) = 0.85. Therefore, qM (0) = 0.05 + 0.68(0.25) = 0.22and qM (1) = 0.3 + 0.65(0.85) = 0.8525. We solve the optimal policy by backward induction.At level 1,W (1, y = 1) = pM (1)µM+pI(1)µI1−pM (1)−pI(1) =0.3000650.05 = 0.4616W (1, y = 0) = qM (1)1−qM (1) =0.85250.1475 = 5.779795Appendix A. Proofs of Results in Chapter 2therefore, y∗(1) = 0 and W (1) = 5.7797. Now we solve for level 0. At level 0,W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.0500680.27662 +0.006620.27662(5.7797) = 0.18099 + 0.02393(5.7797) = 0.3193W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.220.7822 +0.00220.7822(5.7797) = 0.28125 + 0.00281(5.7797) = 0.2975hence y∗(0) = 1 and W (0) = max {W (0, y = 1),W (0, y = 0)} = 0.3193. In conclusion, theoptimal policy is y∗ = (0, 0) under this case.Next we increase α(0) by 0.1 but keep all the other parameters unchanged, i.e. α(0) = 0.35and α(1) = 0.95. Correspondingly, qM (0) = 0.05 + 0.68(0.35) = 0.288 and qM (1) = 0.3 +0.65(0.95) = 0.9175. Under this case, at level 1,W (1, y = 1) = pM (1)µM+pI(1)µI1−pM (1)−pI(1) =0.3000650.05 = 0.4616W (1, y = 0) = qM (1)1−qM (1) =0.91750.0825 = 11.1212therefore, y∗(1) = 0 and W (1) = 11.1212. Now we solve for level 0. At level 0,W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.0500680.27662 +0.006620.27662(11.1212) = 0.18099 + 0.02393(11.1212) = 0.4471W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.2880.71488 +0.002880.71488(11.1212) = 0.40286 + 0.00403(11.1212) = 0.4477hence y∗(0) = 0 and W (0) = max {W (0, y = 1),W (0, y = 0)} = 0.4477. In conclusion, theoptimal policy is y∗ = (0, 0) under this case. The optimal threshold decreases as α(0) increases.Now we further increase α(0) by 0.05 but still keep all the other parameters unchanged, i.e.α(0) = 0.4 and α(1) = 1. Therefor qM (0) = 0.05 + 0.68(0.4) = 0.322 and qM (1) = 0.3 + 0.65 =0.95. Under the new parameters, at level 1,W (1, y = 1) = pM (1)µM+pI(1)µI1−pM (1)−pI(1) =0.3000650.05 = 0.4616W (1, y = 0) = qM (1)1−qM (1) =0.950.05 = 1996Appendix A. Proofs of Results in Chapter 2so y∗(1) = 0 and W (1) = 19. At level 0,W (0, y = 1) = pM (0)µM+pI(0)µI1−pM (0)(1−τM )−pI(0)(1−τI) +pM (0)τM+pI(0)τI(1−pM (0)(1−τM )−pI(0)(1−τI))qM (1)µMqQ(1)= 0.0500680.27662 +0.006620.27662(19) = 0.18099 + 0.02393(19) = 0.63566W (0, y = 0) = qM (0)µM1−qM (0)(1−τM ) +qM (0)τM1−qM (0)(1−τM )qM (1)µMqQ(1)= 0.3220.68122 +0.003220.68122(19) = 0.47268 + 0.00488(19) = 0.5654therefore we get y∗(0) = 1 and W (0) = max {W (0, y = 1),W (0, y = 0)} = 0.63566. In conclu-sion, the optimal policy is y∗ = (1, 0). The optimal threshold increases as α(0) increases.To summarize, the optimal threshold may increase or decrease with α(0) and it is possibleto have U-shape.97Appendix BProofs of Results in Chapter 3We first introduce a lemma.Lemma B.0.3. At each t = 1, . . . , n, the solution of (3.2) satisfies: 1 +∂q∗t∂xt> 0, i.e. thepost-order inventory xt + q∗t (xt) strictly increases in xt.Proof of Lemma B.0.3: Suppose q∗n−k(xn−k) satisfies (3.2), i.e.R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j) +∂w∗n−k(xn−k, q∗n−k)∂qn−kq∗n−k = 0 .We apply the Implicit Function Theorem and take derivative with respect to xn−k on both sideswhich yieldsddxn−k[R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j) +∂w∗n−k∂qn−kqn−k]|qn−k=q∗n−k(1 +∂q∗n−k∂xn−k) +∂w∗n−k(xn−k, q∗n−k)∂qn−k∂q∗n−k∂xn−k= 0 .(B.1)q∗n−k(xn−k) satisfies the second-order condition, i.e., at qn−k = q∗n−k(xn−k)ddxn−k[R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j) +∂w∗n−k∂qn−kqn−k] ≤ 0 . (B.2)Suppose the derivative (B.2) is equal to 0, then from (B.1),∂w∗n−k(xn−k, q∗n−k)∂qn−k∂q∗n−k∂xn−k= 0 =⇒ ∂q∗n−k∂xn−k= 0 =⇒ 1 + ∂q∗n−k∂xn−k= 1 > 0 .Suppose the derivative (B.2) is less than 0, then from (B.1),1 +∂q∗n−k∂xn−k=∂w∗n−k(xn−k,q∗n−k)qn−kddxn−k [R′(Qn)∏kj=1(1 +∂q∗n−k+j∂xn−k+j ) +∂w∗n−k∂qn−k qn−k]|qn−k=q∗n−k(xn−k)> 0 ,where the inequality holds because w∗n−k(xn−k, q∗n−k) must decrease in q∗n−k. Otherwise, if∂w∗n−k(xn−k,q∗n−k)∂qn−k ≥ 0, (3.2) is positive for all qn−k ≥ 0, and the equilibrium solution at t = n−kdoes not exist.In conclusion, we have shown 1 +∂q∗t∂xt> 0, and the proof is complete.98Appendix B. Proofs of Results in Chapter 3Proof of Proposition 3.2.1The proof is by induction.At the last offer t = n, given pre-order inventory level xn and wholesale price wn, the retailerdecides upon his optimal order quantity to maximize his profit which is pinR = R(xn + qn) −w1q1 − · · · − wnqn. The problem is concave in qn and the first-order derivative is dpinR/dqn =R′(xn + qn) − wn. If wn ≥ R′(xn), then ∂pinR/∂qn = R′(xn + qn) − wn < R′(xn) − wn ≤ 0 forall qn > 0. Thus the retailer should order 0; else if wn < R′(xn), then the retailer will order apositive quantity, q∗n(xn, wn), which satisfies the first-order condition wn = R′(xn + qn).As for the supplier, she anticipates the retailer’s order for any wholesale price and chooses wnto maximize pinS = w1q1 + · · ·+wn−1qn−1 +wnq∗n(xn, wn). Trivially, if wn > R′(xn), the supplierwill earn nothing since the retailer will not order. We focus on the case wn ≤ R′(xn). Since R′is strictly decreasing, there exists a one-to-one map between wn (wn ≤ R′(xn)) and q∗n(xn, wn)(q∗n ≥ 0). We can focus on an equivalent problem where the supplier faces the inverse demandcurve w∗n(xn, qn) = R′(xn+ qn) and has to decide upon an optimal quantity qn to maximize herprofit pinS = w1q1 + · · ·+wn−1qn−1 +w∗n(xn, qn)qn, where the first-order derivative is ∂pinS/∂qn =R′(xn + qn) +R′′(xn + qn)qn. The optimal q∗n has to be either the boundary solution 0, or thesolution of first-order condition. If xn ≥ QFB, R′(xn + qn) +R′′(xn + qn)qn ≤ R′(xn + qn) < 0for all qn > 0, hence we have q∗n = 0; else if xn < QFB, ∂pinS/∂qn|qn=0 = R′(xn) > 0, so theoptimal solution q∗n cannot be the boundary solution, and thereby must be a solution of thefirst order condition R′(xn + qn) + R′′(xn + qn)qn = R′(Qn) +∂w∗n∂qnqn = 0. The Theorem holdsfor the last offer.Suppose the equilibrium solution {w∗n−k+j(xn−k+j , qn−k+j), q∗n−k+j(xn−k+j)} satisfy (3.1)and (3.2) for j = 1, . . . , k. For t = n− k, the retailer’s profit ispinR = R(xn−k + qn−k + q∗n−k+1 + · · ·+ q∗n)− w1q1 − · · · − wn−kqn−k − w∗n−k+1q∗n−k+1 − · · · − w∗nq∗n ,and its corresponding first-order derivative is∂pinR∂qn−k= R′(Qn)dQndqn−k− wn−k − ddqn−k(w∗n−k+1q∗n−k+1 + · · ·+ w∗nq∗n) (B.3)= R′(Qn)(1 +k∑j=1dq∗n−k+jdqn−k)− wn−k −n∑j=1(q∗n−k+jdw∗n−k+jdqn−k+ w∗n−k+jdq∗n−k+jdqn−k) .Next, we closely investigate the termsdq∗n−k+jdqn−k anddw∗n−k+jdqn−k . Note that both q∗n−k+j and w∗n−k+jare functions of xn−k+j which is equal to xn−k + qn−k + q∗n−k+1 + · · ·+ q∗n−k+j−1. Furthermore,99Appendix B. Proofs of Results in Chapter 3for all 1 ≤ m ≤ j − 1, q∗n−k+m also depends on qn−k. Therefore, we havedq∗n−k+j(xn−k+j)dqn−k=dq∗n−k+j(xn−k + qn−k +∑j−1m=1 q∗n−k+m)dqn−k=∂q∗n−k+j∂x∗n−k+j{1 +j−1∑m=1dq∗n−k+mdqn−k}=∂q∗n−k+j∂x∗n−k+j{1 + ∂q∗n−k+1∂x∗n−k+1+∂q∗n−k+2∂x∗n−k+2(1 +∂q∗n−k+1∂x∗n−k+1) + . . .+∂q∗n−k+j−1∂x∗n−k+j−1(1 +∂q∗n−k+1∂x∗n−k+1)(1 +∂q∗n−k+2∂x∗n−k+2) . . . (1 +∂q∗n−k+j−2∂x∗n−k+j−2)}=∂q∗n−k+j∂x∗n−k+jj−1∏m=1(1 +∂q∗n−k+m∂xn−k+m) ,anddw∗n−k+j(xn−k+j , q∗n−k+j(xn−k+j))dqn−k=∂w∗n−k+j∂x∗n−k+j(dxn−k+jdqn−k+dq∗n−k+j(xn−k+j)dqn−k)=∂w∗n−k+j∂x∗n−k+j{j−1∏m=1(1 +∂q∗n−k+m∂x∗n−k+m) +∂q∗n−k+j∂x∗n−k+jj−1∏m=1(1 +∂q∗n−k+m∂x∗n−k+m)}=∂w∗n−k+j∂x∗n−k+jj∏m=1(1 +∂q∗n−k+m∂xn−k+m) .In addition, we simplify dQndqn−k as follows:dQndqn−k=d(xn−k + qn−k + q∗n−k+1 + · · ·+ q∗n)dqn−k= 1 +k∑j=1dq∗n−k+j(xn−k+j)dqn−k= 1 +k∑j=1∂q∗n−k+j∂x∗n−k+jj−1∏m=1(1 +∂q∗n−k+m∂xn−k+m)= 1 +∂q∗n−k+1∂x∗n−k+1+∂q∗n−k+2∂x∗n−k+2(1 +∂q∗n−k+1∂x∗n−k+1) + · · ·+ ∂q∗n∂x∗nn−1∏m=1(1 +∂q∗n−k+m∂xn−k+m)=k∏j=1(1 +∂q∗n−k+j∂xn−k+j) .100Appendix B. Proofs of Results in Chapter 3By the inductive assumption, we haveq∗n−k+jdw∗n−k+jdqn−k= q∗n−k+j∂w∗n−k+j∂x∗n−k+jj∏m=1(1 +∂q∗n−k+m∂q∗n−k+m−1)= −R′(Qn)k∏m=1(1 +∂q∗n−k+j+m∂xn−k+j+m)j∏m=1(1 +∂q∗n−k+m∂xn−k+m)= −R′(Qn)k∏m=1(1 +∂q∗n−k+m∂xn−k+m) .Finally, by the first-order condition, we end up with the inverse demand functionw∗n−k(xn−k, qn−k) = R′(Qn)dQndqn−k−n∑j=1(q∗n−k+jdw∗n−k+jdqn−k+ w∗n−k+jdq∗n−k+jdqn−k)= R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j) +k∑j=1R′(Qn)k∏m=1(1 +∂q∗n−k+m∂xn−k+m)−k∑j=1w∗n−k+j∂q∗n−k+j∂xn−k+jj−1∏m=1(1 +∂q∗n−k+m∂xn−k+m)= (k + 1)R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j)−k∑j=1w∗n−k+j∂q∗n−k+j∂xn−k+jj−1∏m=1(1 +∂q∗n−k+m∂xn−k+m) .Given the inverse demand function w∗n−k(xn−k, qn−k), the supplier chooses qn−k to maximizeher profit which ispinS = w1q1 + · · ·+ w∗n−kqn−k + w∗n−k+1q∗n−k+1 + · · ·+ w∗nq∗nand the corresponding first-order derivative is∂pinS∂qn−k= w∗n−k +∂w∗n−k∂qn−kqn−k +ddqn−k(w∗n−k+1q∗n−k+1 + · · ·+ w∗nq∗n)= R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j) +∂w∗n−k∂qn−kqn−k , (B.4)where the equality holds because w∗n−k satisfies (B.3).Now if xn−k ≥ QFB, we have ∂pinS∂qn−k = R′(Qn)∏kj=1(1+∂q∗n−k+j∂xn−k+j )+∂w∗n−k∂qn−k qn−k ≤ R′(Qn)∏kj=1(1+∂q∗n−k+j∂xn−k+j ) < 0 for all qn−k > 0. So the optimal solution is the boundary solution q∗n−k = 0. Ifxn−k < QFB, then∂pinS∂qn−k |qn−k=0 = R′(Qn)∏kj=1(1 +∂q∗n−k+j∂qn−k+j−1 ) > 0 (by Lemma B.0.3), so theoptimal solution must satisfy the first-order condition rather than the boundary solution, andthe proof is completed.101Appendix B. Proofs of Results in Chapter 3Proof of Theorem 3.2.2(a) Consider Qn = q1 + q∗2 + · · ·+ q∗n as a function of q1. Lemma B.0.3 implies that Qn strictlyincreases in q1 because∂Qn∂q1=n−1∏j=1(1 +∂q∗j+1∂xj+1) > 0 .Notice that the equilibrium total inventory Qn−1,∗ under the (n − 1)-offer case is equivalentto the total inventory Qnq1=0 under the n-offer case by forcing q1 = 0. The equilibrium totalinventory Qn,∗ under the n-offer case has q1 = q∗1 > 0. Therefore, we conclude that Qn,∗ =Qnq1=q∗1> Qnq1=0 = Qn−1,∗, and we have shown that Qn−1,∗ < Qn,∗.Finally, we would like to show that the equilibrium total inventory Qn,∗ ≤ QFB. We proveit by contradiction. Suppose Qn,∗ > QFB, then there exists a certain tˆ, 1 ≤ tˆ ≤ n − 1, suchthat x∗tˆ≤ QFB but x∗tˆ+1> QFB. The retailer’s total profit is pin,∗R = R(Qn,∗)− w∗1q∗1 − w∗2q∗2 −· · · − w∗nq∗n.We construct a new strategy for the retailer where qˆj is the same as q∗j except that (i)qˆj(xj) = 0 whenever xj > QFB and (ii) at j = tˆ, we decrease the order quantity fromq∗tˆ(xtˆ) to qˆtˆ(xtˆ) = QFB − xtˆ. Under this strategy, the retailer’s order quantities will be{q∗1, . . . , q∗tˆ−1, qˆtˆ, 0, . . . , 0} and the retailer’s total profit will be pˆinR = R(QFB)− w∗1q∗1 − w∗2q∗2 −· · · − w∗tˆqˆtˆ. Thus,pˆinR − pin,∗R = R(QFB)−R(Qn,∗) + w∗tˆ q∗tˆ + · · ·+ w∗nq∗n − w∗tˆ qˆtˆ .Since the function R(Q) strictly decreases when Q ≥ QFB, we have R(QFB) − R(Qn,∗) > 0as Qn,∗ > QFB. Further, the supplier’s wholesale price w∗t must be non-negative and q∗tˆ > qˆtˆ.Therefore, w∗tˆq∗tˆ+ · · ·+w∗nq∗n−w∗tˆ qˆtˆ > 0. We found a strictly profitable unilateral deviation forthe retailer which is a contradiction. In conclusion, we must have Qn,∗ ≤ QFB.We prove (b) by contradiction. Suppose the retailer’s total profit, pin,∗R , or the supplier’stotal profit, pin,∗S , is not monotonically increasing. In other words, there exists some nˆ > 1, suchthat pinˆ,∗R < pinˆ−1,∗R and/or pinˆ,∗S < pinˆ−1,∗S .As we can see, the pre-order inventory, xt, at each offer provides all necessary informationto determine the equilibrium strategy. Therefore, in the nˆ-offer case, if the retailer (mistakenly)orders zero at the first offer, then the remaining game will have the same equilibrium outcomeas the game with nˆ− 1 offers.Now suppose the retailer’s equilibrium total profits satisfy pinˆ,∗R < pinˆ−1,∗R . Under the nˆ-offergame, the retailer is able to unilaterally deviate to a strategy where he orders 0 in the firstperiod but returns to his equilibrium strategy in later periods. By doing so, the retailer willget total profit pinˆ−1,∗R . So there exists a strictly profitable unilateral deviation for the retailer,which is a contradiction. Therefore, we must have pinˆ,∗R ≥ pinˆ−1,∗R .Similarly, suppose the supplier’s equilibrium total profit satisfies pinˆ,∗S < pinˆ−1,∗S . Under the102Appendix B. Proofs of Results in Chapter 3nˆ-offer game, the supplier can unilaterally deviate to a strategy where she uses the equilibriumstrategy in periods t ≥ 2 but in the first period she sets a wholesale price so high that theretailer’s best response is to order 0. In particular, if the supplier proposes a wholesale priceto be maxR(Q), the retailer will definitely order nothing. Otherwise the retailer will surelyget negative profit if he orders, since his profit is not greater than R(Q) − w1q1 < R(Q) −maxR(Q)q1 < 0. As a result, by applying this new strategy, the supplier can achieve a totalprofit pinˆ−1,∗S . The supplier has a strictly profitable unilateral deviation, which is a contradiction.Therefore, we must have pinˆ,∗S ≥ pinˆ−1,∗S .In conclusion, we proved that pin,∗R ≥ pin−1,∗R and pin,∗S ≥ pin−1,∗S for all n ≥ 2. Furthermore,the supply chain profit pin,∗ = R(Qn,∗) will strictly increase in Qn,∗ as Qn,∗ ≤ QFB. Therefore,Theorem 3.2.2(a) implies pin,∗T 6= pin−1,∗T . We conclude that pin,∗T > pin−1,∗T , i.e., the supply chaintotal profit is strictly increasing.Remark: An implicit assumption that was made is that maxR(Q) is finite. However, wemerely need the existence of a price w, high enough, for which the retailer will not order. TakingmaxR(Q) is only an example.Proof of Theorem 3.2.3By Theorem 3.2.2, the equilibrium total inventory level Qn,∗ strictly increases in n. Moreover,Theorem 3.2.2 guarantees that Qn,∗ is bounded from above by QFB. Therefore, the limit of Qn,∗exists as n goes to infinity. We denote it as Q∗, i.e limn→+∞Qn,∗ = Q∗. By a similar proof asTheorem 3.2.2, we can show, on the equilibrium path, x∗n strictly increases in n and is boundedfrom above by QFB, so its limit also exists, denoted as limn→+∞ x∗n = x∗. Consequently,limn→+∞ q∗n(x∗n) = limn→+∞Qn,∗ − x∗n = Q∗ − x∗, i.e. the limit of q∗n(x∗n) also exists. Finally,since the last period wholesale price is w∗n = R′(Qn,∗), it will strictly decreases and boundedfrom below by 0. So the limit of w∗n(x∗n) exists and is denoted as limn→+∞w∗n(x∗n) = w∗We now prove the theorem by contradiction. Suppose that Q∗ < QFB. As n goes toinfinity, the optimality condition (3.1) for the last period becomes w∗ = R′(Q∗) > 0. Becausethe equilibrium supply chain profit pin,∗T is strictly increasing in n and bounded from above bythe first-best supply chain profit piFB, pin,∗T will converge to a constant when n goes to infinity,which also implies thatlimn→+∞dpinTdxn= 0 .However, since pinT = R(xn + qn) and wn = R′(xn + qn), we havedpinT (xn)dxn= R′(xn + q∗n)[1 +dq∗n(xn)dxn] = w∗n(xn)[1 +dq∗n(xn)dxn] .103Appendix B. Proofs of Results in Chapter 3Thereforelimn→+∞dpinT (xn)dxn= limn→+∞w∗n(xn)[1 +dq∗n(xn)dxn]= w∗ limn→+∞[1 +dq∗n(xn)dxn]= w∗ limn→+∞R′′(xn + q∗n(xn))2R′′(xn + q∗n(xn)) +R(3)(xn + q∗n(xn))q∗n(xn),The last equality follows from (B.1). Specifically, by Proposition 3.2.1, q∗n(xn) is the solutionof R′(Qn) + ∂w∗n∂qnqn = R′(Qn) +R′′(Qn)qn = 0. We apply the Implicit Function Theorem,R′′(Qn)(1 +dq∗n(xn)dxn) +R(3)(Qn)(1 +dq∗n(xn)dxn)q∗n(xn) +R′′(Qn)dq∗n(xn)dxn= 0 ,which leads to1 +dq∗n(xn)dxn=R′′(xn + q∗n(xn))2R′′(xn + q∗n(xn)) +R(3)(xn + q∗n(xn))q∗n(xn).As we assume w∗ 6= 0, in order to achieve limn→+∞ dpinTdxn= 0, we should havelimn→+∞R′′(xn + qn(xn))2R′′(xn + qn(xn)) +R(3)(xn + qn(xn))q(xn)=R′′(x∗ + q∗)2R′′(x∗ + q∗) +R(3)(x∗ + q∗)q∗= 0 .However, if the equality holds, R′′(x∗+q∗) = 0, the optimality condition (3.2) for the last periodbecomes R′(x∗ + q∗) + R′′(x∗ + q∗)q∗ = R′(x∗ + q∗) = 0, implying that x∗ + q∗ = QFB. ButR′′(QFB) < 0 and as a result, limn→+∞dpinTdxn6= 0, which leads to a contradiction. In conclusion,we must have limn→+∞Qn,∗ = QFB.Proof of Proposition 3.2.4By (3.1), we havew∗n−k = (k + 1)R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j)−k∑j=1w∗n−k+j∂q∗n−k+j∂xn−k+jj−1∏m=1(1 +∂q∗n−k+m∂xn−k+m)w∗n−k+1 = kR′(Qn)k−1∏j=1(1 +∂q∗n−k+1+j∂xn−k+1+j)−k−1∑j=1w∗n−k+1+j∂q∗n−k+1+j∂xn−k+1+jj−1∏m=1(1 +∂q∗n−k+1+m∂xn−k+1+m) .104Appendix B. Proofs of Results in Chapter 3Therefore,w∗n−k − w∗n−k+1= (1 +∂q∗n−k+1∂xn−k+1)[(k + 1)R′(Qn)k∏j=2(1 +∂q∗n−k+j∂xn−k+j)− w∗n−k+1 −k∑j=2w∗n−k+j∂q∗n−k+j∂xn−k+jj−1∏m=1(1 +∂q∗n−k+m∂xn−k+m)]= (1 +∂q∗n−k+1∂xn−k+1)[(k + 1)R′(Qn)k∏j=2(1 +∂q∗n−k+j∂xn−k+j)− kR′(Qn)k∏j=2(1 +∂q∗n−k+j∂xn−k+j)]= R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j) ≥ 0 , (B.5)where the inequality holds by Lemma B.0.3.Proof of Proposition 3.2.5Since (B.5) provides a recursive equation for the equilibrium wholesale price, we havew∗n−k = R′(Qn)k∏j=1(1 +∂q∗n−k+j∂xn−k+j) +R′(Qn)k−1∏j=1(1 +∂q∗n−k+1+j∂xn−k+1+j) + · · ·+R′(Qn)(1 + ∂q∗n∂xn) +R′(Qn)= R′(Qn){1 +n∑m=1k∏j=m(1 +∂q∗n−k+j∂xn−k+j)}= R′(Qn)αn−k .According to Proposition 3.2.1, we havew∗n−kw∗n−k+1=αn−kαn−k+1q∗n−k+1q∗n−k=−R′(Qn)∏k−1j=1(1 + ∂q∗n−k+1+j∂xn−k+1+j )/∂w∗n−k+1∂qn−k+1−R′(Qn)∏kj=1(1 + ∂q∗n−k+j∂xn−k+j )/∂w∗n−k∂qn−k =∂w∗n−k∂qn−k(1 +q∗n−k+1xn−k+1 )∂w∗n−k+1∂qn−k+1.105Appendix B. Proofs of Results in Chapter 3As w∗n−k =αn−kαn−k+1w∗n−k+1, we further obtainq∗n−k+1q∗n−k=∂αn−kαn−k+1w∗n−k+1∂qn−k(1 +q∗n−k+1xn−k+1 )∂w∗n−k+1∂qn−k+1=αn−kαn−k+1∂w∗n−k+1∂qn−k(1 +q∗n−k+1xn−k+1 )∂w∗n−k+1∂qn−k+1+∂(αn−kαn−k+1 )∂qn−kw∗n−k+1(1 +q∗n−k+1xn−k+1 )∂w∗n−k+1∂qn−k+1=αn−kαn−k+1+∂(αn−kαn−k+1 )∂qn−kw∗n−k+1(1 +q∗n−k+1xn−k+1 )∂w∗n−k+1∂qn−k+1=w∗n−kw∗n−k+1+∂(αn−kαn−k+1 )∂qn−kw∗n−k+1(1 +q∗n−k+1xn−k+1 )∂w∗n−k+1∂qn−k+1.As a result, w∗n−k+1q∗n−k+1 − w∗n−kq∗n−k, or, equivalently, whetherq∗n−k+1q∗n−k− w∗n−kw∗n−k+1is positiveor negative completely depends on the term∂(αn−kαn−k+1)∂qn−kw∗n−k+1(1+q∗n−k+1xn−k+1)∂w∗n−k+1∂qn−k+1. Since w∗n−k+1 ≥ 0,(1 +q∗n−k+1xn−k+1 ) > 0 and∂w∗n−k+1∂qn−k+1 < 0, we conclude thatq∗n−k+1q∗n−k− w∗n−kw∗n−k+1(>,=, <)0 if and only if∂(αn−kαn−k+1)∂qn−k (<,=, >)0.Proof of Theorem 3.2.6(1) Exponential demand: We will show that at each offer n − k, the equilibrium strategy isas follows:w∗n−k(xn−k, qn−k) = (k + 1)pe−λQn (B.6)q∗n−k(xn−k) =1λ(k + 1). (B.7)Under exponential demand, the revenue function is R(Q) = p∫ Q0 e−λξdξ, so R′(Q) =pe−λQ and R′′(Q) = pλe−λQ.According to Proposition 3.2.1, at the last offer, w∗n(xn, qn) = R′(Qn) = pe−λQnand q∗n isthe solution of pe−λQn − pλe−λQnqn = 0. Therefore, q∗n(xn) = 1/λ. Suppose these resultshold for offers n − k + j, with j = 1, . . . , k, and consider offer n − k. By the inductiveassumption, q∗n−k+j is a constant for all j ≥ 1, hence∂q∗n−k+j∂xn−k+j = 0. Proposition 3.2.1 leadstow∗n−k(xn−k, qn−k) = (k + 1)pe−λQnq∗n−k solves pe−λQn − p(k + 1)λe−λQnqn−k = 0 ,106Appendix B. Proofs of Results in Chapter 3hence q∗n−k =1λ(k+1) and we have proven (B.6) and (B.7).Finally, on the equilibrium path, q∗j =1λ(n−j+1) , so Qn,∗ =∑nl=1 1/l and w∗j = (n − j +1)pe−λ∑nl=1 1/l.(2) Uniform demand: Instead of directly using Proposition 3.2.1, we prove the result byinduction. We will show that at each offer j, given any initial inventory xj , the equilibriumstrategy satisfies the following equation:q∗j (xj) =M2(n− j + 1) −xj2(n− j + 1) (B.8)w∗j (xj , q∗j (xj)) = βj [1−xjM]p . (B.9)Under uniform demand, the revenue function is R(Q) = p(Q − Q2/2M). So R′(Q) =p(1 − Q/M) and R′′(Q) = −p/M . According to Proposition 3.2.1, at the last offer,w∗n(xn, qn) = R′(Qn) = p(1−Qn/M) and q∗n is the solution of p(1−Qn/M)−pqn/M = 0.Hence, q∗n =M2 − xn2 . For simplicity we denote w∗n(xn, q∗n(xn)) as w∗n, and note thatw∗n =12 [1− xnM ]p.Suppose the result holds for offer j ≥ k + 1, and consider the offer k. For j ≥ k + 1, q∗jand w∗j satisfy the following recursive equations:q∗j+1 =M2(n− j) −xj+12(n− j) =M2(n− j) −xj + q∗j2(n− j) =M2(n− j) −xj +M2(n−j+1) −xj2(n−j+1)2(n− j)=2(n− j) + 12(n− j) [M2(n− j + 1) −xj2(n− j + 1)] =2(n− j) + 12(n− j) q∗j ,w∗j+1 = βj+1[1−xj+1M]p = βj+1[1−xj + q∗jM]p = βj+1[1−xj +M2(n−j+1) −xj2(n−j+1)M]p=2(n− j) + 12(n− j + 1)βj+1[1−xjM]p =2(n− j)2(n− j) + 1βj [1−xjM]p =2(n− j)2(n− j) + 1w∗j .Moreover, the total inventory level Qn can be expressed asQn = xn + q∗n =12(M + xn) =12(M + xn−1 + q∗n−1) =12(M + xn−1 +M − xn−14)= M [12+12 ∗ 4] +12∗ 34xn−1 = M [12+12 ∗ 4] +12∗ 34(xn−2 + q∗n−2)= . . .= Mn∑i=k+112(n− i+ 1)n∏l=i+12(n− l) + 12(n− l + 1) +n∏l=k+12(n− l) + 12(n− l + 1)xk+1= Ak+1 ∗M +Bk+1 ∗ (xk + qk) ,where Ak+1 =∑ni=k+112(n−i+1)∏nl=i+12(n−l)+12(n−l+1) and Bk+1 =∏nl=k+12(n−l)+12(n−l+1) .107Appendix B. Proofs of Results in Chapter 3We next compute the equilibrium for period k. The retailer’s total profit is equal topinR = p(Qn − (Qn)2/2M)− w1q1 − · · · − wkqk − w∗k+1q∗k+1 − · · · − w∗nq∗n= p[Ak+1 ∗M +Bk+1 ∗ (xk + qk)− (Ak+1 ∗M +Bk+1 ∗ (xk + qk))22M]−w1q1 − · · · − wkqk − (n− k)w∗k+1q∗k+1= p(Ak+1 ∗M +Bk+1 ∗ (xk + qk)− (Ak+1 ∗M +Bk+1 ∗ (xk + qk))22M)−w1q1 − · · · − wkqk − βk+1p(M − (xk + qk))22M,where the third equality follows the observation that w∗k+1q∗k+1 = w∗k+2q∗k+2 = · · · =w∗nq∗n = βk+1p(M−(xk+qk))22(n−k)M . Hence, the first-order condition is∂pinR∂qk= p(1− Ak+1 ∗M +Bk+1 ∗ (xk + qk)M)Bk+1 − wk + βk+1pM − (xk + qk)M= 0 .Therefore, the supplier’s inverse demand function isw∗k(xk, qk) = p(1−Ak+1 ∗M +Bk+1 ∗ (xk + qk)M)Bk+1 + βk+1pM − (xk + qk)M.The supplier needs to determine an order quantity qk to maximize her total profit, givenbypinS = w1q1 + · · ·+ w∗kqk + w∗k+1q∗k+1 + · · ·+ w∗nq∗n= w1q1 + · · ·+ wk−1qk−1 + [p(1− Ak+1 ∗M +Bk+1 ∗ (xk + qk)M)Bk+1 + βk+1pM − (xk + qk)M]qk+βk+1p(M − (xk + qk))22M.The first-order condition is∂pikS∂qk= p(1− Ak+1 ∗M +Bk+1 ∗ (xk + qk)M)Bk+1 − [pB2k+1M+rβk+1M]qk = 0 .As a result,q∗k(xk) =M(1−Ak+1)Bk+12B2k+1 + βk+1− B2k+1xk2B2k+1 + βk+1.However, we have βk+1 = 2(n− k)B2k+1 and Ak+1 +Bk+1 = 1, which finally leads toq∗k(xk) =M2(n− k + 1) −xk2(n− k + 1) .108Appendix B. Proofs of Results in Chapter 3The corresponding wholesale price isw∗k = w∗k(xk, q∗k(xk)) = p(1−Ak+1 ∗M +Bk+1 ∗ (xk + q∗k)M)Bk+1 + βk+1pM − (xk + q∗k)M= [pB2k+1M+pβk+1M]q∗k + βk+1pM − (xk + q∗k)M=pB2k+1M(M2(n− k + 1) −xk2(n− k + 1)) + βk+1pM − xkM= p(B2k+12(n− k + 1) + βk+1)[1−xkM]= p(2(n− k + 1))22(n− k + 1)2(n− k)βk+1[1−xkM]= pβk[1− xkM] ,and the proof is complete.(3) Linear demand: Under linear (price-sensitive) demand, the revenue function is R(Q) =(a− bQ)Q. Note that if we re-write a = p and b = p/2M , the revenue function becomesR(Q) = r(Q−Q2/2M), the same as in the uniform demand case. Therefore, by parametertransformation, the problem under linear demand is equivalent to the one under uniformdemand.109Appendix CProofs of Results in Chapter 4Proof of Lemma 4.4.1The proof is almost the same as Zhang et al. [57] with little modification.(1) ⇒ direction: Suppose {q1(x1), s1(x1)} satisfies the global IC constraint. By the enveloptheorem, as analyzed in the main text, we can show q1(x1) satisfies the local IC constraint.Because ∂2v1(x1+q1)∂x1∂q1= v′′1(x1 + q1) =−λre−λ(x1+q1) x1 + q1 > 00 x1 + q1 < 0. The function v1(x1 +q1)−s1(x1) has decreasing differences and thereby q1(x1) must be weakly decreasing in x1. (SeeTopkis [55])⇐ direction: Suppose q1(x1) satisfies the local IC constraint and is weakly decreasing in x1.Without loss of generality, we only consider the case x1 > xˆ1. The case x1 < xˆ1 is similar.u1(x1)− [v1(x1 + q1(xˆ1))− s1(xˆ1)]= [u1(xˆ1) +∫ x1xˆ1v′1(ξ + q1(ξ))dξ]− [u1(xˆ1) + v1(x1 + q1(xˆ1))− v1(xˆ1 + q1(xˆ1))]= [u1(xˆ1) +∫ x1xˆ1v′1(ξ + q1(ξ))dξ]− [u1(xˆ1) +∫ x1xˆ1v′1(ξ + q1(xˆ1))dξ]=∫ x1xˆ1[v′1(ξ + q1(ξ))− v′1(ξ + q1(xˆ1))]dξ ≥ 0The last inequality holds because q1(ξ) ≤ q1(xˆ1) and v′1 is a decreasing function. So the globalIC constraint holds.(2) The IC constraint says v1(x1+q1(x1))−s1(x1)−v1(x1) ≥ v1(x1+q1(y0))−s1(y0)−v1(x1).In addition, since v′′1 ≤ 0, i.e. v1 has decreasing differences, we have v1(x1 + q1(y0))− v1(x1) ≥v1(y0+q1(y0))−v1(y0). Combining the two inequalities, we get v1(x1+q1(x1))−s1(x1)−v1(x1) ≥v1(y0+q1(y0))−s1(y0)−v1(y0) ≥ 0, for all x1 < y0. The supplier wants to maximize her expectedprofit, therefore she should let v1(y0 + q1(y0))− s1(y0)− v1(y0) = 0 at optimum. In conclusion,the IR constraint must be binding at y0 and redundant at x1 < y0.Proof of Theorem 4.4.2Because the IR constraint must be binding at y0, we replace s1(x1) with v1(x1 + q1(x1)) −u1(x1) = v1(x1 + q1(x1)) − v1(y0) +∫ y0x1v′1(z + q1(z))dz and rewrite the supplier’s objective110Appendix C. Proofs of Results in Chapter 4function as∫ y0−∞[s1(x1)− cq1(x1)]dG1(x1) =∫ y0−∞J1(q1(x1)|x1)g1(x1)dx1 − u1(y0)The detailed analysis is in the main text. And J1(q1|x1) = v1(x1 + q1)− cq1 + v′1(x1 + q1)G1(x1)g1(x1)is the so-called virtual surplus. The optimal quantity plan q∗1(x1) maximizes J1(q1|x1). We lookat the derivative of J1(q1|x1) with respect to q1.∂J1(q1|x1)∂q1= v′1(x1 + q1)− c+ v′′1(x1 + q1)G1(x1)g1(x1)=re−λ(x1+q1) − c− λre−λ(x1+q1)G1(x1)g1(x1)x1 + q1 > 0r − c x1 + q1 < 0Since we assume x1 = y0−D0 for some constant y0, we have CDF G1(x1) = e−λ(y0−x1) and PDFg1(x1) = λe−λ(y0−x1). Therefore, G1(x1)/g1(x1) = λ−1. When x1 + q1 > 0, ∂J1(q1|x1)/∂q1 =−c < 0. But when x1 +q1 < 0, ∂J1(q1|x1)/∂q1 = r− c > 0. It indicates that J1(q1|x1) increaseswhen x1 + q1 < 0 but decreases when x1 + q1 > 0. As a result, the optimal quantity shouldbe q∗1(x1) = max{0,−x1}, or equivalently y∗1(x1) = max{0, x1}. The corresponding payment issolved bys∗1(x1) = v1(x1 + q∗1(x1))− u∗1(x1)= v1(x1 + q∗1(x1))− v1(y0) +∫ y0x1v′1(ξ + q∗1(ξ))dξ=v1(0)− v1(y0) +∫ 0x1v′1(0)dξ +∫ y00 v′1(ξ)dξ, x1 < 0v1(x1)− v1(y0) +∫ y0x1v′1(ξ)dξ, x1 ≥ 0=−rx1, x1 < 00, x1 ≥ 0.Proof of Proposition 4.5.2We prove the result by contradiction. Suppose under the optimal contract, there exists “bump”at [x−1 , x+1 ] in the retailer’s profit-to-go function. Now we focus on the interval [x−1 , x+1 ] and wesolve a subproblem P[x−1 , x+1 ], where we keep the IC constraint unchanged but we let the IRconstraint binding at the two endpoints x−1 and x+1 and we ignore the IR constraint at points111Appendix C. Proofs of Results in Chapter 4in between:(P[x−1 , x+1 ]) maxs1(x1),y1(x1)∫ x+1x−1{s1(x1)− cy1(x1) + cx1 + δΠ2(y1(x1))}dG1(x1)s.t. µ1(x1 + q1(x1))− s1(x1) ≥ µ1(x1 + q1(xˆ1))− s1(xˆ1), x1, xˆ1 ∈ [x−1 , x+1 ]µ1(y1(x−1 ))− s1(x−1 ) = u1(x−1 ) and µ1(y1(x+1 ))− s1(x+1 ) = u1(x+1 )Suppose the hypothesis holds, the optimal solution of P[x−1 , x+1 ] will automatically satisfy theIR constraint at x1 ∈ (x−1 , x+1 ).Next, we replace the global IC constraints by the local IC constraints u′1(x1) = µ′1(y1(x1))and the monotonicity condition y′1(x1) ≤ 1. Then we rewrite u1(x1) = u1(x−1 )+∫ x1x−1µ′1(y1(ξ))dξ =u1(x−1 )+∫ x1x−1µ′1(y1(ξ))dξ and replace s1(x1) with µ1(x1 +q1(x1))−u1(x1) in the objective func-tion. By doing so, we obtain the virtual surplus anchoring at the bottom endpoint x−1 :J1(y1|x1) = cx1 − cy1 + µ1(y1) + δΠ2(y1)− µ′1(y1)G1(x+1 )−G1(x1)g1(x1)(C.1)As a result, the subproblem P[x−1 , x+1 ] can be re-formulated as follows:(P[x−1 , x+1 ]) maxy1(x1)∫ x+1x−1J(y1(x1)|x1)g1(x1)dx1s.t.u′1(x1) = µ′1(y1(x1)), x1 ∈ [x−1 , x+1 ]u1(x−1 ) = u1(x−1 ) and u1(x+1 ) = u1(x+1 )y1(x1) ≥ x1 and y′1(x1) ≤ 1 x1 ∈ [x−1 , x+1 ]Following the standard notations in the optimal control literature, we define x1 → tas the time; u1(x1) → x(t) as the state variable; y1(x1) → u(t) as the control variable;µ′1(y1) → g(t, x, u) as the state transition function; J(y1(x1)|x1)g1(x1) → f(t, x, u) as theobjective function; x−1 → t0 and x+1 → tf as the initial and final time; u1(x−1 ) → x0 andu1(x+1 ) → xf as the initial and final state. Therefore, the subproblem P[x−1 , x+1 ] is trans-lated into an optimal control problem with two fixed endpoints and two constraints on thecontrol variable y′1(x1) ≤ 1 and y1(x1) ≥ x1. The corresponding Hamiltonian should beH(y1|x1, η) = J(y1|x1)g1(x1) + η(x1)µ′1(y1) and the Lagrangian should be L(y1|x1, η, ρ) =H(y1|x1, η) + ρ1(y1 − x1) + ρ2(y′1(x1)− 1).The maximum principle requires that the optimal control y∗1(x1) and the optimal statevariable u∗1(x1) should satisfy the following conditions:(1) Feasibility: u˙∗1(x1) = µ′1(y∗1(x1)), u∗1(x−1 ) = u1(x−1 ) and u∗1(x+1 ) = u1(x+1 ); y∗1(x1) ≥ x1and y˙∗1(x1) ≤ 1.(2) Adjoint equation for η: η should satisfyη′(x1) = ∂L∂u1 = 0η(x+1 ) = β for some constant β . Clearly, η112Appendix C. Proofs of Results in Chapter 4is a constant, i.e. η(x1) = β for all x1 ∈ [x−1 , x+1 ].(3) Condition for Lagrange multiplier: ρ are such that ∂L∂y1 |y1=y∗1 = 0; ρ ≥ 0 and satisfy thecomplementary slackness condition ρ1(y∗1(x1)− x1) = 0, ρ2(y˙∗1(x1)− 1)) = 0.(4) Hamiltonian maximization condition: y∗1(x1) maximizes the HamiltonianH(y∗1(x1)|x1, η) ≥H(y1|x1, η) for all y1.The analysis proceeds as follows. We will first characterize the optimal control y∗1(x1).Then we will show under the optimal control, the IR constraint at the two endpoints can notbe binding simultaneously. This will lead to a contradiction and thereby we will conclude thereis no “bump”.We start with characterizing the optimal control y∗1(x1). First of all, we look at the first-order derivative of the Hamiltonian:∂H∂y1= ∂J1(y1|x1)∂y1 g1(x1) + βµ′′1(y1)=g1(x1){b+ δc− c}, y1 < 0g1(x1){−h− c+ δce−λy1 + (b+ h+ δrλy1)e−λy1eλ(y0−x1)(G1(x+1 )− β)}. y1 > 0(C.2)Clearly, when y1 < 0, we have∂H∂y1> 0 because b > c(1 − δ). The function H increases in y1when y1 is negative. However, when y1 > 0, the sign of∂H∂y1is determined by the functionϕ(y1|x1, β) = −h− c+ δce−λy1 + (b+ h+ δrλy1)e−λy1eλ(y0−x1)(G1(x+1 )− β) (C.3)In other words, the first order condition ∂H∂y1 = 0 is equivalent to ϕ(y1|x1, β) = 0.One observation is that β must be smaller than G1(x+1 ) under the optimal control. If not,β ≥ G1(x+1 ), we have ϕ(y1|x1, β) ≤ −h − c + δce−λy1 ≤ −h − c + δc < 0. Therefore, thefunction H decreases in y1 when y1 is positive. Correspondingly, the optimal control will bey∗(x1) = max{0, x1}. Notice that y∗(x1) < yR1 (x1) for all x1. We obtain:u∗1(x+1 )− u1(x+1 ) = [u∗1(x−1 ) +∫ x+1x−1µ′1(y∗1(z))dz]− [u1(x−1 ) +∫ x+1x−1µ′1(yR1 (z))dz]=∫ x+1x−1(µ′1(y∗1(z))− µ′1(yR1 (z))dz < 0where the equality holds because u∗1(x−1 ) = u1(x−1 ) and the inequality holds because y∗(x1) <yR1 (x1). Hence, the constraint u∗1(x+1 ) = u1(x+1 ) can not be satisfied, which is a contradiction.From now on, we only consider the case β < G1(x+1 ).Another observation is that there does not exist any constant β such that y∗1(x1) = yR(x1)for all x1 ∈ [x−1 , x+1 ]. This can be easily checked by comparing the definition of yR1 and the firstorder condition ∂H∂y1 = 0.113Appendix C. Proofs of Results in Chapter 4Now we would like to show that the optimal control y∗1(x1) will be a decreasing function ofx1. We examine the structure of ϕ(y1|x1, β). First of all, ϕ, as a function of y1, will decreasein x1, i.e. ϕ(y1|x1, β) > ϕ(y1|x˜1, β) for all y1 if x1 < x˜1. It is because∂ϕ(y1|x1,β)∂x1= −λ(b+ h+ δrλy1)e−λy1eλ(y0−x1)(G1(x+1 )− β) < 0 (C.4)where the inequality holds in that β < G1(x1).Secondly, we have∂ϕ(y1|x1,β)∂y1= λe−λy1{−δc+ [δr − (b+ h+ δrλy1)]eλ(y0−x1)(G1(x+1 )− β)}. (C.5)Given x1 and β, there are two possible scenarios: either (i)∂ϕ(y1|x1,β)∂y1is always negative or(ii) ∂ϕ(y1|x1,β)∂y1 is first positive and then becomes negative. It implies that ϕ(y1|x1, β) = 0 hasmost two solutions. More preciously, if ϕ(y1|x1, β) = 0 has no solution, the optimal controly∗1(x1) = max{ 0, x1}; if ϕ(y1|x1, β) = 0 has unique solution, the optimal control y∗1(x1) is themaximum of this solution and x1; and if ϕ(y1|x1, β) = 0 has two solutions, we pick the largerone. When x1 is negative, we compare it with 0 and choose the one that leads to a bigger Has the optimal control. When x1 is positive, the y∗1(x1) is the maximum of the solution and x1.See Figure C.1 for different scenarios. However, no matter how many zero points ϕ(y1|x1, β)has, we argue that as long as y∗1(x1) is solved by the first-order condition, i.e. ϕ(y1|x1, β) = 0,y∗1(x1) must be decreasing in x1. This is illustrated by Figure C.2.Now we want to show under the optimal control y∗1(x1), the IR constraint will be violatedat some x1 ∈ (x−1 , x+1 ). We need to introduce the following lemma:Lemma C.0.4. (a) Suppose at some point 0 ≤ xˆ1 < y0, the optimal control is such thaty∗(xˆ1) = xˆ1. We must also have y∗(x1) = x1 for xˆ1 < x1 ≤ y0.(b) Suppose at some point xˆ1 < 0, the optimal control is such that y∗(xˆ1) = 0. We mustalso have y∗(x1) = 0 for xˆ1 < x1 ≤ 0.Proof of Lemma C.0.4:(a) According to the feasibility condition, we must satisfy y∗1(x1) ≥ x1 and y˙∗1(x1) ≤ 1.However, y˙∗1(x1) ≤ 1 implies that the order quantity q∗1(x1) is weakly decreasing in x1. As aresult, once y∗1(xˆ1) = xˆ1, i.e. q∗1(xˆ1) = 0, we will have q∗1(x1) = 0 for all x1 > xˆ1. So y∗1(x1) = x1for all x1 > xˆ1.(b) We know y∗1(xˆ1) = 0. It can happen only when has ϕ(y1|x1, β) = 0 has 0 solution or 2solutions. We need to discuss these two possible cases.If ϕ(y1|x1, β) = 0 has 0 solution, it means ϕ(y1|xˆ1, β) < 0 for all y1 ≥ 0. We have alreadyshown ϕ, as a function of y1, will decrease in x1, therefore ϕ(y1|x1, β) < ϕ(y1|xˆ1, β) < 0 for allx1 ≥ xˆ1 and y1 ≥ 0. So we conclude that the optimal control will be y∗1(x1) = 0 for xˆ1 < x1 ≤ 0.If ϕ(y1|x1, β) = 0 has 2 solutions, as mentioned earlier, we pick the larger solution denoted asylarge−FOC1 (x1) and compare it with 0. Then we select the one which gives us a higher H as the114Appendix C. Proofs of Results in Chapter 4Figure C.1: Illustration of function ϕ(y1|x1, β) under different x1 and β.0 0.5 1 1.5 2 2.5 3−6−5−4−3−2−1012345y1x1=0.2, beta=0.1x1=0, beta=0.1Figure C.2: Function ϕ(y1|x1, β) when choosing two different x1. Parameters: r = 10, c = 5,b = 2, h = 3, y0 = 3, β = 0.1, λ = 1 and δ = 0.9.115Appendix C. Proofs of Results in Chapter 4Figure C.3: Function ϕ(y1|x1, β) = 0 has 2 solutions but the optimal control is y∗1(x1) = 0.optimal control. That is to say, y∗1(xˆ1) = 0 implies that H(ylarge−FOC1 (xˆ1)|xˆ1, β) ≤ H(0|xˆ1, β).Now we compare H(ylarge−FOC1 (x1)|x1, β) and H(0|x1, β) for xˆ1 < x1 ≤ 0. Notice thatH(ylarge−FOC1 (xˆ1)|xˆ1, β) = H(0|xˆ1, β) +∫ ylarge−FOC1 (xˆ1)0 g1(xˆ1)ϕ(z|xˆ1, β)dz,and H(ylarge−FOC1 (xˆ1)|xˆ1, β) ≤ H(0|xˆ1, β) implies∫ ylarge−FOC1 (xˆ1)0 ϕ(z|xˆ1, β)dz ≤ 0. Moreover,we have ϕ(z|xˆ1, β) > ϕ(z|x1, β) for all z when xˆ1 < x1. Therefore,∫ ylarge−FOC1 (x1)0ϕ(z|x1, β)dz <∫ ylarge−FOC1 (x1)0ϕ(z|xˆ1, β)dzAlso, we have claimed that ylarge−FOC1 (xˆ1) > ylarge−FOC1 (x1) if both of them are solved by first-order condition. In addition, the function ϕ(z|xˆ1, β) is positive when z ∈ (ylarge−FOC1 (x1), ylarge−FOC1 (xˆ1)).Please refer to Figure C.3. As a result, we obtain∫ ylarge−FOC1 (x1)0ϕ(z|xˆ1, β)dz <∫ ylarge−FOC1 (xˆ1)0ϕ(z|xˆ1, β)dzFinally,∫ ylarge−FOC1 (x1)0ϕ(z|x1, β)dz <∫ ylarge−FOC1 (x1)0ϕ(z|xˆ1, β)dz <∫ ylarge−FOC1 (xˆ1)0ϕ(z|xˆ1, β)dz ≤ 0which impliesH(ylarge−FOC1 (x1)|x1, β) = H(0|x1, β) +∫ ylarge−FOC1 (x1)0g1(x1)ϕ(z|x1, β)dz < H(0|x1, β).116Appendix C. Proofs of Results in Chapter 4Therefore, it is optimal to have y∗1(x1) = 0 for 0 ≥ x1 > xˆ1.Next, we show that the IR constraint cannot be satisfied at the two end points simultaneous-ly. One necessary condition for the IR constraint satisfied at x1 ∈ [x−1 , x+1 ] is y∗1(x−1 ) ≤ yR1 (x−1 )and y∗1(x+1 ) ≥ yR1 (x+1 ). We consider three possible cases:Case 1): Suppose x−1 < x+1 ≤ 0. In this case, yR1 (x1) = yR1 (0) is a constant for all x1 ∈[x−1 , x+1 ].If y∗1(x−1 ) happens to be 0, Lemma C.0.4 implies that y∗1(x+1 ) = 0 < yR1 (x+1 ). This violatesthe necessary condition.If y∗1(x−1 ) is the (larger) solution of the first-order condition, we have 0 < y∗1(x−1 ) ≤ yR1 (x−1 ).However, at the endpoint x+1 , we may get y∗1(x+1 ) = 0 or y∗1(x+1 ) > 0. If y∗1(x+1 ) = 0, it alreadyviolates the necessary condition since 0 < yR1 (x+1 ). If y∗1(x+1 ) > 0, it implies that the optimalcontrol is solved by the first-order condition. Therefore, we have y∗1(x+1 ) < y∗1(x−1 ) ≤ yR1 (x−1 ) =yR1 (x+1 ), which also violates the necessary condition.In conclusion, the IR constraints must be violated at some point in (x−1 , x+1 ) in Case 1).Case 2) Suppose 0 ≤ x−1 < x+1 . In this case, yR1 (x1) will increase in x1 ∈ [x−1 , x+1 ] becausedyR1dx1=u′′1 (x1)µ′′1 (x1)> 0.If y∗1(x−1 ) = x−1 , according to Lemma C.0.4, we have y∗1(x+1 ) = x+1 < yR1 (x+1 ), which violatesthe necessary condition.If x−1 < y∗1(x−1 ) ≤ yR1 (x−1 ), we either have y∗1(x+1 ) = x+1 or y∗1(x+1 ) > x+1 . If y∗1(x+1 ) = x+1 ,it violates the necessary condition since x+1 < yR1 (x+1 ). If y∗1(x+1 ) > x+1 , it also violates thenecessary condition in that y∗1(x+1 ) < y∗1(x−1 ) ≤ yR1 (x−1 ) < yR1 (x+1 ). The first inequality holdsbecause y∗1(x1) is the (larger) solution of the first-order condition and is decreasing in x1.In conclusion, the IR constraints must be violated at some point in (x−1 , x+1 ) in Case 2).Case 3) Suppose x−1 < 0 < x+1 , we apply a similar approach as the previous two cases.If y∗1(x−1 ) = 0, we also have y∗1(0) = 0. Therefore, y∗1(x+1 ) = x+1 < yR1 (x+1 ), which violatesthe necessary condition.If 0 < y∗1(x−1 ) ≤ yR1 (x−1 ), we either have y∗1(x+1 ) = x+1 or y∗1(x+1 ) > x+1 . If y∗1(x+1 ) = x+1 ,it violates the necessary condition. If y∗1(x+1 ) > x+1 , it also violates the necessary condition inthat y∗1(x+1 ) < y∗1(0) < y∗1(x−1 ) ≤ yR1 (x−1 ) < yR1 (x+1 ).In conclusion, the IR constraints must be violated at some point in (x−1 , x+1 ) in Case 3).In summary, we conclude that the optimal contract does not have “bump” in the retailer’sprofit-to-go function.117Appendix C. Proofs of Results in Chapter 4Proof of Lemma 4.5.4pi1(x1) =µ1(yR1 (x1))− cyR1 (x1) + cx1 + δΠ2(yR1 (x1))− u1(x1) (C.6)=r+hλ − b+hλ e−λyR1 (0) − hyR1 (0)− cyR1 (0) + cx1 − rx1+δ[ rλ − r+cλ e−λyR1 (0) − ryR1 (0)e−λyR1 (0)], x1 ≤ 0r+hλ − b+hλ e−λyR1 (x1) − hyR1 (x1)− cyR1 (x1) + cx1+δ[ rλ − r+cλ e−λyR1 (x1) − ryR1 (x1)e−λyR1 (x1)]−{ rλ(1− e−λx1)− hx1 + hλ(1− e−λx1) + δ[ rλ − rλe−λx1 − rx1e−λx1 ]}. x1 > 0(C.7)Recall the definition of yR1 (x1):(b+ h+ δr + δrλyR1 )e−λyR1 =(r + h+ δrλx1)e−λx1 , x1 > 0r + h, x1 ≤ 0We can further simplify the supplier’s profit as follows:pi1(x1) =−(h+ c)yR1 (0)− (r − c)x1 + δ r−ce−λyR1 (0)λ x1 ≤ 0−(h+ c)(yR1 (x1)− x1) + δ re−λx1−ce−λyR1 (x1)λ x1 > 0(C.8)When x1 < 0, pi1(x1) is a linear function x1 with slope −(r − c). Thus, when x1 → −∞, wehave pi1(x1) → +∞. When x1 > 0, pi1(x1) may not be monotone. But we can see that pi1(x1)approaches 0 from the negative side as x1 → +∞. Therefore, pi1(x1) has at least one root. Nowwe look at its first-order derivative when x1 > 0.pi′1(x1) = µ′1(yR1 (x1))dyR1dx1− cdyR1dx1 + c− δ(r − c)e−λyR1 (x1) dyR1dx1− u′1(x1)= [(r + h+ δrλx1)e−λx1 − h− c](dyR1dx1 − 1)− δ(r − c)e−λyR1 (x1) dyR1dx1= (h+ c)(1− dyR1dx1 )− δre−λx1 + δce−λyR1 (x1) dyR1dx1(C.9)Without loss of generality, we let λ = 1. We compare (C.8) and (C.9) when x1 > 0:pi1(x1) = −(h+ c)qR1 (x1) + δe−x1 [r − ce−qR1 (x1)]pi′1(x1) = −(h+ c)dqR1 (x1)dx1− δe−x1 [r − ce−qR1 (x1)] + δce−x1−qR1 (x1) dqR1 (x1)dx1Therefore, pi1(x1) + pi′1(x1) = −(h+ c)[qR1 (x1) + dqR1 (x1)dx1] + δce−x1−qR1 (x1) dqR1 (x1)dx1. Since qR1 (x1) isdecreasing in x1, we havedqR1 (x1)dx1< 0. Next we examine the term qR1 (x1) +dqR1 (x1)dx1. Recall that118Appendix C. Proofs of Results in Chapter 4qR1 (x1) = yR1 (x1)− x1, so we havedqR1 (x1)dx1=dyR1 (x1)dx1− 1 = − δr(eqR1 (x1)−1)b+h+δryR1 (x1)= − δr(eqR1 (x1)−1)(r+h+δrx1)eqR1 (x1)−δrWhen x1 ≥ 2− (r + h)/(δr), we further have−dqR1 (x1)dx1 =δr(eqR1 (x1)−1)(r+h+δrx1)eqR1 (x1)−δr≤ δr(eqR1 (x1)−1)(r+h+δrx1−δr)eqR1 (x1)= δr(r+h+δrx1−δr)(1− e−qR1 (x1)) ≤ (1− e−qR1 (x1)) ≤ qR1 (x1)As a result, we get pi1(x1) + pi′1(x1) = (h + c)(−dqR1 (x1)dx1− qR1 (x1)) + δce−x1−qR1 (x1)dqR1 (x1)dx1≤ 0.That is to say, pi1 and pi′1 cannot be positive simultaneously when x1 ≥ 2− (r + h)/(δr).When x1 < 2− (r + h)/(δr) < 1, we rewrite the expression of pi′1(x1) aspi′1(x1) = [u′1(x1)− c](dyR1dx1− 1)− δ(r − c)e−λyR1 (x1) dyR1dx1 (C.10)We assume (r+h+δr)e−1− (h+c) > 0. Therefore, we have u′1(x1) ≥ u′1(1) = (r+h+δr)e−1−(h+ c) ≥ 0. Because dyR1dx1 , we conclude from (C.10) that pi′1(x1) < 0 in this case.Combining the two cases, when x1 ≥ 2 − (r + h)/(δr), we know that pi1 and pi′1 cannot bepositive at the same time. When x1 < 2− (r+h)/(δr), we know pi′1(x1) < 0 and thereby pi1(x1)is decreasing in x1. Now we claim that pi1(x1) has only one root. We prove by contradiction.Suppose pi1 has at least two roots. One of them must happen when pi1(x1) is crossing thex-axis from negative to positive. In other words, there exists some x1 such that pi1(x1) > 0and pi′1(x1) > 0. However, we have shown it will not happen. So we obtain a contradiction. Inconclusion, pi1(x1) has only one root.Proof of Lemma 4.5.5yR1 (x1) strictly increases in x1 and yR1 (x1) > x1. Yet, yL1 (x1) strictly decreases in x1 whenyL1 (x1) > x1. Therefore, the solution of yR1 (x1) = yL1 (x1) must be unique.Proof of Lemma 4.6.1By definition, when x > 0, yR(x) satisfies[b− r(1− δ)]e−λyR(x) + h+ r(1− δ)1− δ e−λyR(x)(1−δ) =h+ r(1− δ)1− δ e−λx(1−δ)We take derivative with respect to x on both sides which leads to{[b− r(1− δ)]e−λyR(x) + [h+ r(1− δ)]e−λyR(x)(1−δ)}dyRdx= [h+ r(1− δ)]e−λx(1−δ)119Appendix C. Proofs of Results in Chapter 4Therefore,dyRdx=[h+ r(1− δ)]e−λx(1−δ)[b− r(1− δ)]e−λyR(x) + [h+ r(1− δ)]e−λyR(x)(1−δ) > 0. (C.11)i.e. yR(x) is strictly increasing in x.Next, we consider d2yRdx2which satisfiesd2yRdx2{[b− r(1− δ)]e−λyR(x) + [h+ r(1− δ)]e−λyR(x)(1−δ)}/λ (C.12)= {[b− r(1− δ)]e−λyR(x) + (1− δ)[h+ r(1− δ)]e−λyR(x)(1−δ)}(dyRdx)2 − (1− δ)[h+ r(1− δ)]e−λx(1−δ)Clearly, whether d2yRdx2is positive depends on the right-hand side of Equation (C.12). For thesake of presentation, we shorten the notation by letting K1 = [b − r(1 − δ)]e−λyR(x), K2 =[h+r(1−δ)]e−λyR(x)(1−δ) and K3 = [h+r(1−δ)]e−λx(1−δ). We have dyRdx = K3K1+K2 from (C.11).Moreover, by the definition of yR(x), we have K1 +K21−δ =K31−δ . The right-hand side of Equation(C.12) can be simplified as[K1 + (1− δ)K2]( K3K1 +K2)2 − (1− δ)K3=K3(K1 +K2)2{[K1 + (1− δ)K2]K3 − (1− δ)(K1 +K2)2}=K3(K1 +K2)2{[K1 + (1− δ)K2][(1− δ)K1 +K2]− (1− δ)(K1 +K2)2}= δ2K3(K1 +K2)2K1K2 > 0As a result, we conclude d2yRdx2> 0, i.e yR(x) is convex in x.Proof of Lemma 4.6.2Suppose the supplier implements yR(x) in each period. Notice that yR(x) is stationary anddoes not depend on the supplier’s belief. In addition, we have seen as T → ∞, the retailer’sprofit-to-go U(y) exists and is unique. We can compute pi iteratively by the following equation:pin(z) = v(yR(z)) + δU(yR(z))− u(z)− cyR(z) + cz + δΠn−1(yR(z)) (C.13)Πn−1(y(z)) =∫ y(z)−∞pin−1(ξ)λe−λ(y(z)−ξ)dξ (C.14)We use the Contraction Mapping Theorem. In other words, for any two functions pin−1 and pi′n−1which satisfy ||pin−1 − pi′n−1|| := maxz |pin−1(z)− pi′n−1(z)| < , we want to prove ||pin − pi′n|| :=maxz |pin(z)− pi′n(z)| < δ.120Appendix C. Proofs of Results in Chapter 4In fact, for any z,|pin(z)− pi′n(z)| = δ|Πn−1(z)−Π′n−1(z)|= δ|∫ yR(z)−∞[pin−1(ξ)− pi′n−1(ξ)]λe−λ(yR(z)−ξ)dξ|≤ δ∫ yR(z)−∞|pin−1(ξ)− pi′n−1(ξ)|λe−λ(yR(z)−ξ)dξ< δ∫ yR(z)−∞λe−λ(yR(z)−ξ)dξ = δtherefore, ||pin − pi′n|| := maxz |pin(z) − pi′n(z)| < δ. Therefore, the iteration (C.13)-(C.14) isindeed a contraction mapping. By the Contraction Mapping Theorem, the sequence of pin con-verges and its limit limn→∞ pin = pi exists and is unique. In addition, Π(y) = limn→∞Πn(y) =limn→∞∫ y−∞ pin−1(ξ)λe−λ(y(z)−ξ)dξ =∫ y−∞ pi(ξ)λe−λ(y(z)−ξ)dξ also exists and is unique.Proof of Proposition 4.6.3We first consider the case z ≤ 0. Note that Π(z) = ∫ z−∞ pi(ξ)λe−λ(z−ξ)dξ. We assume the sup-plier implements the quantity plan yR from the “next” period onwards, i.e. pi(ξ) = v(yR(ξ)) +δU(yR(ξ))−u(ξ)−cyR(ξ)+cξ+δΠ(yR(ξ)). We have shown yR(ξ) = yR(0) is a constant wheneverξ ≤ 0. Therefore, pi(ξ) = v(yR(0))+δU(yR(0))−u(ξ)−cyR(0)+cξ+δΠ(yR(0)) = pi(0)−(r−c)ξ.As a result, we haveΠ(z) =∫ z−∞pi(ξ)λe−λ(z−ξ)dξ=∫ z−∞[pi(0)− (r − c)ξ]λe−λ(z−ξ)dξ= pi(0)− (r − c)(z − 1λ)When z ≤ 0, Π(z) is a linear function of z with slope −(r − c). Finally,dJR(z|x)dz= −c+ b+ δr + δΠ′(z) = −c+ b+ δr − δ(r − c) = b+ δc− c > 0,so JR(z|x) increases when z < 0.Next we examine the case z > 0. Note Π(z) =∫ z−∞ pi(ξ)λe−λ(z−ξ)dξ. We take derivative on121Appendix C. Proofs of Results in Chapter 4both sides with respect to z and we getΠ′(z) = λpi(z)− λ∫ z−∞pi(ξ)λe−λ(z−ξ)dξ= λ[pi(z)−Π(z)]= λ∫ z−∞[pi(z)− pi(ξ)]λe−λ(z−ξ)dξ= λ∫ z−∞[∫ zξpi′(η)dη]λe−λ(z−ξ)dξ=∫ z−∞pi′(ξ)λe−λ(z−ξ)dξ,where the last equality holds by changing the order of integration. We use the equation Π′(z) =∫ z−∞ pi′(ξ)λe−λ(z−ξ)dξ and prove the result by induction. Suppose−c− h1−δ+δ h+r(1−δ)1−δ e−λy(1−δ)+δΠ′t+1(y) < 0. We want to show that similar inequality holds for Π′t(y). In fact,Π′t(z) =∫ z−∞pi′t(ξ)λe−λ(z−ξ)dξ=∫ z−∞{[v′(yR(ξ) + δU ′(yR(ξ))− c+ δΠ′t+1(yR(ξ))]dyR(ξ)dξ+ c− u′(ξ)}λe−λ(z−ξ)dξ=∫ 0−∞−(r − c)λe−λ(z−ξ)dξ+∫ z0{[v′(yR(ξ) + δU ′(yR(ξ))− c+ δΠ′t+1(yR(ξ))]dyR(ξ)dξ+ c− u′(ξ)}λe−λ(z−ξ)dξ= −(r − c)e−λz +∫ z0{[v′(yR(ξ) + δU ′(yR(ξ))− c+ δΠ′t+1(yR(ξ))]dyR(ξ)dξ+ c− u′(ξ)}λe−λ(z−ξ)dξWe replace dyR(ξ)dξ by[h+r(1−δ)]e−λx(1−δ)[b−r(1−δ)]e−λyR(x)+[h+r(1−δ)]e−λyR(x)(1−δ) in the integrand which leads to[v′(yR(ξ) + δU ′(yR(ξ))− c+ δΠ′t+1(yR(ξ))]dyR(ξ)dξ+ c− u′(ξ)= {[b− r(1− δ)]e−λyR(ξ) + h+ r(1− δ)1− δ e−λyR(ξ)(1−δ) − h1− δ − c+ δΠ′t+1(yR(ξ))}× [h+ r(1− δ)]e−λξ(1−δ)[b− r(1− δ)]e−λyR(ξ) + [h+ r(1− δ)]e−λyR(ξ)(1−δ) + c−h+ r(1− δ)1− δ e−ξ(1−δ) +h1− δ= c+h1− δ − δh+ r(1− δ)1− δ e−ξ(1−δ) + {δh+ r(1− δ)1− δ e−λyR(ξ)(1−δ) − h1− δ − c+ δΠ′t+1(yR(ξ))}dyR(ξ)dξFor the first part,∫ z0{c+ h1− δ − δh+ r(1− δ)1− δ e−ξ(1−δ)}λe−λ(z−ξ)dξ = (c+ h1− δ )(1− e−λz)− h+ r(1− δ)1− δ (e−λz(1−δ) − e−λz)For the second part, by the inductive assumption, for any ξ, δ h+r(1−δ)1−δ e−λyR(ξ)(1−δ)− h1−δ − c+122Appendix C. Proofs of Results in Chapter 4δΠ′t+1(yR(ξ)) < 0. However, Lemma 4.6.1 saysdyR(ξ)dξ > 0. Therefore, the whole integrand isnegative. As a result,∫ z0{δh+ r(1− δ)1− δ e−λyR(ξ)(1−δ) − h1− δ − c+ δΠ′t+1(yR(ξ))}dyR(ξ)dξλe−λ(z−ξ)dξ < 0Finally,Π′t(z) =∫ z−∞pi′t(ξ)λe−λ(z−ξ)dξ= −(r − c)e−λz + (c+ h1− δ )(1− e−λz)− h+ r(1− δ)1− δ (e−λz(1−δ) − e−λz)+∫ z0{δh+ r(1− δ)1− δ e−λyR(ξ)(1−δ) − h1− δ − c+ δΠ′t+1(yR(ξ))}dyR(ξ)dξλe−λ(z−ξ)dξ < 0≤ (r − c)e−λz + (c+ h1− δ )(1− e−λz)− h+ r(1− δ)1− δ (e−λz(1−δ) − e−λz)= c+h1− δ −h+ r(1− δ)1− δ e−λz(1−δ)By rearranging the terms, we havedJR(z|x)dz= −c− h1− δ + δh+ r(1− δ)1− δ e−λz(1−δ) + δΠ′t(z)≤ −c− h1− δ + δh+ r(1− δ)1− δ e−λy(1−δ) + δ[c+h1− δ −h+ r(1− δ)1− δ e−λz(1−δ)]= −(1− δ)(c+ h1− δ ) < 0i.e. dJR(z|x)dz < 0 when z > 0.In the following, we want to show the virtual surplus JR(z|x) is concave. For the sake ofanalysis, we define ϕ(z) = c + h1−δ − δ h+r(1−δ)1−δ e−λz(1−δ) − δΠ′(z). In fact, ϕ(z) = −dJR(z|x)dz .Equivalently we want to prove ϕ(z) increases in z. We prove the result by induction. Supposeϕt+1(z) is an increasing function in z.From the previous analysis, we have seenϕt(z) = c+h1− δ − δh+ r(1− δ)1− δ e−λz(1−δ) − δΠ′t(z)= c+h1− δ − δh+ r(1− δ)1− δ e−λz(1−δ)−δ{c+ h1− δ −h+ r(1− δ)1− δ e−λz(1−δ)+∫ z0{δh+ r(1− δ)1− δ e−λyR(ξ)(1−δ) − h1− δ − c+ δΠ′t+1(yR(ξ))}dyR(ξ)dξλe−λ(z−ξ)dξ}= (1− δ)(c+ h1− δ ) + δ∫ z0ϕt+1(yR(ξ))dyR(ξ)dξλe−λ(z−ξ)dξ123Appendix C. Proofs of Results in Chapter 4We consider its derivativeϕ′t(z) = λδ{ϕt+1(yR(z))dyR(z)dξ−∫ z0ϕt+1(yR(ξ))dyR(ξ)dξλe−λ(z−ξ)dξ}= λδ{ϕt+1(yR(z))dyR(z)dξe−λz +∫ z0[ϕt+1(yR(z))dyR(z)dξ− ϕt+1(yR(ξ))dyR(ξ)dξ]λe−λ(z−ξ)dξ}From the previous analysis, we have ϕt+1(yR(z)) > 0 and dyR(z)dξ > 0. So ϕt+1(yR(z))dyR(z)dξ e−λz >0. In addition, by the inductive hypothesis, ϕt+1 is an increasing function. For any ξ < z,yR(ξ) < yR(z), thereby ϕt+1(yR(z)) > ϕt+1(yR(ξ)). Moreover, dyR(z)dξ >dyR(ξ)dξ because yR(z)is convex. As a result,∫ z0 [ϕt+1(yR(z))dyR(z)dξ − ϕt+1(yR(ξ))dyR(ξ)dξ ]λe−λ(z−ξ)dξ > 0. Combiningthese two terms, we conclude ϕ′t(z) > 0. Therefore,d2JR(z|x)dz2< 0, i.e. JR(z|x) is concave.Proof of Proposition 4.6.4We examine the first-order condition dJL(y|x)dy 0. When y ≤ 0, we have seen in the main textdJL(y|x)dy = −c+ b+ δc > 0. We now consider the case y > 0.−c− h1− δ + δh+ r(1− δ)1− δ e−λy(1−δ) + δΠ′(y) + {[b− r(1− δ)e−λy + [h+ r(1− δ)]e−λy(1−δ)}eλ(y0−x) = 0By moving the terms, we end up with the following equationc+ h1−δ − δ h+r(1−δ)1−δ e−λy(1−δ) − δΠ′(y)[b− r(1− δ)e−λy + [h+ r(1− δ)]e−λy(1−δ) = eλ(y0−x) (C.15)Proposition 4.6.3 ensures that the numeration c+ h1−δ−δ h+r(1−δ)1−δ e−λy(1−δ)−δΠ′(y) is increasingin y. And the denominator [b−r(1−δ)e−λy+[h+r(1−δ)]e−λy(1−δ) is decreasing in y. Therefore,the fraction as a whole is a increasing function of y. As a result, for any fixed x, the solutionof (C.15) is unique. What is more, the right-hand side of (C.15) is equal to eλ(y0−x) decreasingin x. The solution yL(x) will also decrease in x.Proof of Theorem 4.6.5We apply a similar proof as the two-period case. We prove the theorem by contradiction.Suppose under the optimal contract, there exists “bump” at [x−, x+] in the retailer’s profit-to-go function. We focus our attention on the sub-interval and construct the subproblem as an124Appendix C. Proofs of Results in Chapter 4optimal control problem.(P[x−, x+]) maxy(x)∫ x+x−J(y(x)|x)g(x)dxs.t.u′(x) = v′(y(x)) + δU ′(y(x)), x ∈ [x−, x+]u(x−) = u(x−) and u(x+) = u(x+)y(x) ≥ x and y′(x) ≤ 1 x ∈ [x−, x+]where J(y|x) is the virtual surplus anchoring at the bottom endpoint x−.J(y|x) = cx− cy + v(y) + δU(y) + δΠ(y)− [v′(y) + δU ′(y)]G1(x+1 )−G1(x1)g1(x1)Similar as before, the subproblem P[x−, x+] can be translated into an optimal control problemwith two fixed endpoints and two constraints on the control variable y′(x) ≤ 1 and y(x) ≥ x.The corresponding Hamiltonian should be H(y|x, η) = J(y|x)g(x) + η(x)[v′(y) + δU ′(y)] andthe Lagrangian should be L(y|x, η, ρ) = H(y|x, η) + ρ1(y − x) + ρ2(y′(x) − 1). Furthermore,the adjoint parameter η should satisfyη′(x) = ∂L∂u = 0η(x+) = β for some constant β . So η should be aconstant, i.e. η(x) = β for all x ∈ [x−, x+].We first investigate the first-order derivative of the Hamiltonian:∂H∂y(y|x) =g(x){b+ δc− c} > 0, y < 0g(x){−c− h1−δ + δ h+r(1−δ)1−δ e−λy(1−δ) + δΠ′(y)+eλ(y0−x)(G(x+)− β){[b− r(1− δ)]e−λy + [h+ r(1− δ)]e−λy(1−δ)}}, y > 0=g(x){b+ δc− c} > 0, y < 0g(x){−ϕ(y) + eλ(y0−x)(G(x+)− β){[b− r(1− δ)]e−λy + [h+ r(1− δ)]e−λy(1−δ)}}, y > 0As a result, the corresponding first-order condition should beϕ(y)[b− r(1− δ)]e−λy + [h+ r(1− δ)]e−λy(1−δ) = eλ(y0−x)(G(x+)− β) (C.16)Proposition 4.6.3 guarantees that the left-hand side of (C.16) is increasing in y whereas the right-hand side of (C.16) is decreasing in x. Therefore, (C.16) has at most one solution. Besides,the solution (if exists) must strictly decrease in x. In conclusion, when x ≤ 0, the optimalcontrol y∗(x) is either 0 or the solution of (C.16). When x > 0, the optimal control y∗(x) isthe maximum between x and the solution of (C.16).Suppose for all x ∈ [x−, x+], the optimal control y∗(x) is the solution of (C.16), we clearlyhave y∗(x−) > y∗(x+). Now we need to take care about the other two cases y∗(x) = 0 andy∗(x) = x. Notice that y′∗(x) ≤ 1, i.e. the order quantity q∗(x) = y∗(x) − x decreases in x.125Appendix C. Proofs of Results in Chapter 4Once there is a point xˆ < x+ such that y∗(xˆ) = xˆ, we must have y∗(x) = x for all x > xˆ. Onthe other hand, recall that ∂H∂y (y|x), as a function of y, is decreasing in x. We can easily arguethat if there is a point xˆ < x+ such that y∗(xˆ) = 0, we must have y∗(x) = 0 for all xˆ < x < 0.No matter what cases it is, we always end up with y∗(x−) ≤ y∗(x+). However, one necessarycondition for the existence of “bump” is y∗(x−) ≤ yR(x−) < yR(x+) ≤ y∗(x+). The necessarycondition cannot be satisfied, and thereby the IR constraint will be violated at some pointbetween [x−, x+]. We obtain a contradiction. So there cannot exist “bump” in the retailer’sprofit-to-go function under the optimal contract.Proof of Lemma 4.6.6As the two-period case, because yR(x) > x strictly increases in x and yL(x) strictly decreasesin x. The solution of yR(x) = yL(x) must be unique.126

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0348670/manifest

Comment

Related Items