UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essays in industrial organization Graves, Jonathan Lewis 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2017_november_graves_jonathan.pdf [ 2.77MB ]
JSON: 24-1.0355880.json
JSON-LD: 24-1.0355880-ld.json
RDF/XML (Pretty): 24-1.0355880-rdf.xml
RDF/JSON: 24-1.0355880-rdf.json
Turtle: 24-1.0355880-turtle.txt
N-Triples: 24-1.0355880-rdf-ntriples.txt
Original Record: 24-1.0355880-source.json
Full Text

Full Text

Essays in Industrial OrganizationbyJonathan Lewis GravesB.Sc. (hons), The University of Victoria, 2008M.A., The University of British Columbia, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Economics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2017c© Jonathan Lewis Graves 2017AbstractChapter 2 examines the role of sales (temporary price reductions) in the pricing of perish-able products. When products can be stored, periodic sales are explained using inventories:the ability to store lets consumers wait for better prices. When consumers differ in theirability to wait, firms keep prices regularly high, using sales to target the low prices to themost patient consumers. This explanation is not reasonable for perishable goods, sincethey cannot be stored. Using a retail dataset, I show that nonetheless a cyclic pattern ofsales is a major feature of how perishable products are priced. I explain this pattern usinga dynamic model of loss leadership. I then test my model using grocery store data.Chapter 3 studies large contributions to crowdfunding projects and their impact onproject success. I find large contributions display a preference for being effective in helpingprojects succeed: they are often pivotal in the success of a project. These findings matchpredictions from a consumer choice explanation of how large contributions are made. Ithen examine the role large contributions play in project success. Using an instrumentalvariables approach, I show the ability of a project to attract large contributors is important:a project is 40-60% more likely to succeed if they can attract a large contributor. Thisinverts the logic of crowdfunding: the crowd may be important, but the success of manyprojects is driven by large contributors.Chapter 4 develops a method for determining whether a given observation is a sale ornot in the context of a sequence of prices for a retail product. This classification, based on ahidden Markov model framework has the advantage of using all the information availablefor classifying sales. I develop identification requirements for this method, and illustrateits utility in directly testing questions of correlation for sales and other variables: allowingmodels to be evaluated without reduced-form analysis. I perform simulations, demonstrat-ing the method’s accuracy method in classifying sales and understanding correlations. Thischapter adds to the toolbox industrial economists have for studying sales, with advantagesover existing methods of sales classification.iiLay SummaryThis essay studies several topics in industrial organization. The second and fourth chaptersconsider why perishable products are placed on sale (sold at a discount): most models ofsales in economics cannot explain these kinds of discounts. In Chapter 2, I show sales areexplainable when firms use a strategy whereby one product is put on sale to encourageconsumers to buy related products. In Chapter 4, I develop a method to determine whichprices are sales which improves on some of the statistical issues faced by researchers in thisarea. In Chapter 3, I study the role of large contributions in crowdfunding, a form of fund-raising used by entrepreneurs. I show that large contributions are both very important forthe success of projects and sophisticated in when they contribute.iiiPrefaceChapter 2 is calculated (or Derived) based on data from The Nielsen Company (US), LLCand marketing databases provided by the Kilts Center for Marketing Data Center at TheUniversity of Chicago Booth School of Business, data copyright resides with Nielsen. Iwas responsible for the design of the research program, the analysis of the data, and theperformance of the research. This work is the unpublished, original and independent workof the author.Chapter 3 is based on data collected by Kicktraq Inc, and shared under a data-distributionagreement. My supervisor, Ralph A. Winter was a signatory to the data-distribution agree-ment. I was responsible for the design of the research program, the analysis of the data,and the performance of the research. This work is also the unpublished, original and inde-pendent work of the author.Chapter 4 is the unpublished, original and independent work of the author.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Sales and Perishable Products . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Data and Sales Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.1 Data and Sample Selection . . . . . . . . . . . . . . . . . . . . . . . 132.3.2 Sales Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.3 Estimation and Classification . . . . . . . . . . . . . . . . . . . . . . 252.3.4 Comment on Structural Models . . . . . . . . . . . . . . . . . . . . . 272.4 Reduced Form Results: What Drives Sales? . . . . . . . . . . . . . . . . . . 292.4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30vTable of Contents2.5 A Dynamic Model of Loss Leadership . . . . . . . . . . . . . . . . . . . . . . 362.5.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.6 Going Behind Sales: Consumer Choice and Sales Pricing . . . . . . . . . . . 502.6.1 The Nielsen-Kilts HMS Survey . . . . . . . . . . . . . . . . . . . . . 502.6.2 Empirical Model and Results . . . . . . . . . . . . . . . . . . . . . . 522.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Large Contributions and Crowdfunding Success . . . . . . . . . . . . . . . . . 653.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.2.1 What is Crowdfunding? . . . . . . . . . . . . . . . . . . . . . . . . . 693.2.2 Consumer Crowdfunding on Kickstarter . . . . . . . . . . . . . . . . 753.2.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.3 Data and Facts About Large Contributions . . . . . . . . . . . . . . . . . . 823.4 A Model of Large Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 893.4.1 Simulation and Predictions . . . . . . . . . . . . . . . . . . . . . . . 953.5 Why Do Individuals Provide Large Contributions? . . . . . . . . . . . . . . 1003.5.1 Assessment of Theoretical Predictions . . . . . . . . . . . . . . . . . 1033.6 How Important Are Large Contributions to Crowdfunding Success? . . . . . 1053.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093.7.1 Effectiveness of large contributions relative to size . . . . . . . . . . 1093.7.2 Interpretation and Robustness of IV estimates . . . . . . . . . . . . . 1123.7.3 Policy Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214 Sales Classification via Hidden Markov Models . . . . . . . . . . . . . . . . . 1234.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1234.2 Background on Hidden Markov Models . . . . . . . . . . . . . . . . . . . . 1284.2.1 Notation and Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1324.3 Applications of Hidden Markov Models to Sales . . . . . . . . . . . . . . . . 1384.3.1 Identification of Block Left-to-Right Hidden Markov Models . . . . . 1424.3.2 Heterogeneity and Covariates . . . . . . . . . . . . . . . . . . . . . . 1464.3.3 Higher Order Markov Processes . . . . . . . . . . . . . . . . . . . . 148viTable of Contents4.4 Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1504.4.1 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1524.4.2 Higher Order Markov Processes . . . . . . . . . . . . . . . . . . . . 1554.4.3 Small Sample Performance . . . . . . . . . . . . . . . . . . . . . . . 1584.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167AppendicesA Appendix to “Sales and Perishable Products” . . . . . . . . . . . . . . . . . . . 173A.1 A More Realistic Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . 173A.2 k-means Clustering on Pricing Regimes . . . . . . . . . . . . . . . . . . . . 175A.3 A Duopoly Model of Loss Leadership . . . . . . . . . . . . . . . . . . . . . 177A.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181A.3.2 Quantity Discount versus Sales . . . . . . . . . . . . . . . . . . . . . 183A.3.3 Competition and Monopoly . . . . . . . . . . . . . . . . . . . . . . . 184A.4 Extensions to the Duopoly Model of Loss Leadership . . . . . . . . . . . . . 185A.4.1 Myopic Consumers, Both with Inventories . . . . . . . . . . . . . . . 186A.4.2 Forward-looking Consumers . . . . . . . . . . . . . . . . . . . . . . 188B Appendix to “Large Contributions and Crowdfunding Success” . . . . . . . . 192B.1 Large Contribution Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192B.1.1 Discussion and Size Considerations . . . . . . . . . . . . . . . . . . 193B.2 Overview of Crowdfunding Data . . . . . . . . . . . . . . . . . . . . . . . . 193B.3 Holidays Used For Instrumentation . . . . . . . . . . . . . . . . . . . . . . . 199C Appendix to “Sales Classification via Hidden Markov Models” . . . . . . . . . 202C.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202C.1.1 Proof of Lemma 4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 202C.1.2 Proof of Theorem 4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 202viiList of Tables2.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Estimation Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3 Results for Estimation Specifications . . . . . . . . . . . . . . . . . . . . . . 342.4 Results for Estimation Specifications (No Volumes) . . . . . . . . . . . . . . 352.5 Statistics for the HMS Survey . . . . . . . . . . . . . . . . . . . . . . . . . . 532.6 Summary Statistics for Consolidated Trip-Level Data . . . . . . . . . . . . . 532.7 Regression of A/O Spending . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.8 Regressions on number of departments and sales . . . . . . . . . . . . . . . 572.9 Robustness Results I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.10 Frozen goods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.2 What leads to large contributions? . . . . . . . . . . . . . . . . . . . . . . . 1063.3 IV Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083.4 How important are large contributions? . . . . . . . . . . . . . . . . . . . . 1103.5 Counterfactual “smoothing” exercise . . . . . . . . . . . . . . . . . . . . . . 1113.6 Robustness I: Pre/Post Success . . . . . . . . . . . . . . . . . . . . . . . . . 1143.7 Robustness II - By Category . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173.8 Specification Robustness and Heterogeneity . . . . . . . . . . . . . . . . . . 1204.1 Estimates relative to Monte Carlo Simulation: Model 1 (Basic), no Covariates 1524.2 Estimates relative to Monte Carlo Simulation: Model 2 (Complex), no Co-variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1534.3 Estimates relative to Monte Carlo Simulation: Model 1 (Basic), Covariates . 1564.4 Estimates relative to Monte Carlo Simulation: Model 2 (Complex), Covariates1574.5 Higher-Order Markov Chain: Model 1 (Basic), with Covariates . . . . . . . . 1634.6 Higher-Order Markov Chain: Model 2 (Complex), with Covariates . . . . . . 164viiiList of TablesB.1 FE Regression on Number of Backers - Baseline Specification . . . . . . . . . 199B.2 FE Regression on Number of Backers - Lagged Dependent Variable . . . . . 200B.3 List of Holidays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201ixList of Figures2.1 A typical time series (20 oz packaged meat), levels omitted for privacy . . . 212.2 Clustering on simulated and actual pricing data . . . . . . . . . . . . . . . . 262.3 Fitted values of model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4 Classification of sales from data . . . . . . . . . . . . . . . . . . . . . . . . . 282.5 Classification tests, two methods, p = 0.9 . . . . . . . . . . . . . . . . . . . . 312.6 Plot of crossing points in Table 2.8 . . . . . . . . . . . . . . . . . . . . . . . 593.1 Comparison of Featured Projects on GoFundMe (top panel) and IndieGoGo(bottom panel) as of April 24, 2017 . . . . . . . . . . . . . . . . . . . . . . . 723.2 Illustration of Inferring Large Contribution-Days . . . . . . . . . . . . . . . . 863.3 Histogram of Large Contributions versus % Elapsed and % of Goal . . . . . 873.4 Pivotality of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.5 Consumer Choice Models (Panel A) . . . . . . . . . . . . . . . . . . . . . . 923.6 Consumer Choice Models (Panel B) . . . . . . . . . . . . . . . . . . . . . . . 933.7 Consumer Choice Models (Panel C) . . . . . . . . . . . . . . . . . . . . . . . 943.8 Illustration of Numerical Solutions to Model v=0.8 . . . . . . . . . . . . . . 973.9 Evolution of Numerical Solution to Model v=0.8, Varying a values . . . . . 973.10 Evolution of Numerical Solution to Model v=0.8, Varying m values . . . . . 994.1 A graphical depiction of a first-order hidden Markov model . . . . . . . . . . 1334.2 Histogram of State Errors, Model 2 (Complex), Covariates . . . . . . . . . . 1554.3 Histogram of γ-fit Errors in Small Sample Estimates (Model 2) . . . . . . . . 1594.4 Histogram of µ Percentage Errors in Small Sample Estimates (Model 2) . . . 1604.5 Histogram of γ-fit Errors in Small Sample Estimates, Higher Order (Model 2) 1614.6 Histogram of µ Percentage Errors in Small Sample Estimates, Higher Order(Model 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162xList of FiguresA.1 Different Sales Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189B.1 Distribution of Large Contributions . . . . . . . . . . . . . . . . . . . . . . . 194B.2 % of total backers raised today versus % of project-time elapsed . . . . . . . 195B.3 Number of Projects by Year . . . . . . . . . . . . . . . . . . . . . . . . . . . 195B.4 Project histogram of final % of goal (capped at 1) . . . . . . . . . . . . . . . 196xiAcknowledgementsI would like to thank my advisor Ralph A. Winter for his superlative supervision and en-couragement. This thesis is much improved for his suggestions, ideas, and insight. I wouldalso like to thank Thomas Lemieux, Vadim Marmer, and Michael Peters for their assistanceand encouragement in the development of this thesis. Additionally, I would like to thankthe faculty and staff at the Vancouver School of Economics, particularly Nancy Gallini,Matilde Bombardini, and the members of the empirical lunch for their feedback. I wouldalso be remiss not to mention the help of the current and past students, who were mycolleagues and friends during this thesis, particularly Jacob Schwartz, Anujit Chakrabotry,and Hugo Jales.Professionally, I would like to thank Kicktraq Inc for their assistance with the data forChapter 3, and the Marketing Data Center at The University of Chicago Booth School ofBusiness Kilts Center for Marketing for their provision of the data for Chapter 2. I wouldalso like to thank the participants at the 2016 CEA Conference who saw early versions ofsome of these chapters. Their feedback was invaluable. I would also like to recognize theassistance of the SSHRC committee for providing fellowship support for this thesis, andthe University of British Columbia for financial and logistical support, particularly with theUniversity-Industry Liaison Office.Finally, I would like to thank my parents, family, and friends for their support andencouragement during this process. This would not have been possible without all of you.A special note of gratitude must be conveyed to my biggest supporter, my wife, Giuliana,to whom this thesis is dedicated.xiiDedicationTo my beloved wife ,GiulianaxiiiChapter 1IntroductionIndustrial organization studies the way businesses interact with other individuals in theeconomy. This dissertation focuses on the interaction between consumers and firms, study-ing the strategies firms use to maximize profit and compete for the attention and supportof consumers in different environments. Using a mixture of theoretical modelling and em-pirical analysis, this dissertation studies this important economic intersection, sheddingnew light on the actions of firms providing insight into the implications these have forconsumers, policy-makers, and the economy in general.Chapter 2 studies a unique but critically important market: grocery stores. One of thehallmarks of grocery store pricing in the last century has been the “sale” - a temporaryreduction in the price of certain products. While different explanations have been putforward for why firms would want to hold sales, the answers provided are far from com-prehensive. This chapter looks to fill a gap in the literature, studying the role of sales in thepricing of perishable products using a combination of theoretical modelling and empiricalanalysis. Models provide the structure to make predictions about the real world, which canthen be analysed with data, providing support for the theory.The original contributions of Chapter 2 are three-fold. First, I establish the fact thatperiodic sales are a major part of grocery store pricing for highly perishable products. Thisis done using the largest existing dataset of grocery store pricing data, the Nielsen-KiltsSCANTRACK database, which provides weekly price and quantity information for a widerange of stores in the United States. Second, I demonstrate that the observed patterns ofsales are difficult to reconcile with existing models of sales, which tend to rely on inventory-based explanations. To explain the pattern, I develop a new model based on an intuitiveidea: that of loss-leadership. In this context, loss-leadership is when a store discountsa particular product in order to attract consumers to their store, who then subsequentlypurchase more of other products.In my model, consumers purchase baskets of goods which contain different mixturesof perishable and storable products. Shopping costs induce consumers to buy their entire1Chapter 1. Introductionbasket at a single store. Because these baskets differ, firms can use them to discriminate be-tween different groups of consumers. Specifically, firms want to attract perishable-buyingconsumers when they are also purchasing relatively many storable goods. In order to dothis, they offer these consumers a better total price for their whole basket. It is optimalto do this by lowering just the price of the perishable good, since this targets the pricereduction to the group the firm wishes to attract, keeping the total basket price high forother consumers. Firms time these price reductions to allow the target group of consumersto run down their inventory, which entails trading off present and future profit in an inter-temporal optimization problem. The link between storable good inventories and perishableproducts creates, in equilibrium, periodic sales on the perishable good.Finally, I test my model by linking the retail data to consumer choice data, a processwhich requires the development of a data-driven method of classifying prices into sales.The results validate the central prediction of the loss leadership model: when consumersbuy perishables on sale they also buy more of other products, particularly storable ones,relative to their purchasing in non-sale periods. These findings highlight the role multi-product competition has on pricing dynamics, and rationalize an empirical finding whichis difficult to explain with most models of sales, and demonstrate the important role theconnection between firms and consumers plays in this market.Next, in Chapter 3, I study the behaviour of firms in an emerging market: crowdfund-ing. Since its emergence in early 2010’s, crowdfunding has become a major source offinancing for individuals, and small or medium sized businesses. Characterized by largenumbers of consumers who pledge money for projects in return for a reward, consumercrowdfunding is an area in which economists are only beginning to comprehensively un-derstand the key driving forces. This chapter examines one aspect of crowdfunding whichhas been previously overlooked in the literature: the presence and role of large contrib-utors to projects. By their very nature, most contributors to crowdfunding are small innature: on the order of less than $100 (US). However, as I show, many projects includeindividuals who pledge many hundreds of times the typical amount, for no obvious benefit.This paper studies the ways these large contributors behave and estimates their impact onthe success of projects in reaching their fundraising goals. I do this using a primarily em-pirical framework, but rely on consumer modelling to provide structure for the empiricalwork.This chapter makes three substantial contributions. First, I systematically documentthe existence of large contributors in this market. With these fact, I then examine the2Chapter 1. Introductiondriving forces behind large contributions, finding that they display an apparent preferencefor being effective in helping projects succeed; indeed, a substantial proportion of largecontributions occur simultaneously with projects reaching their goal, often being pivotalin the success of a project. Second, I develop a consumer choice explanation of how largecontributions are made, which creates predictions from a theoretical model to explain theobserved facts. Finally, I further examine the role large contributions play in project successusing an instrumental variables approach. I find that the ability of a project to attract largecontributors is important: a project is approximately 40-60% more likely to succeed if theycan attract a large contributor. Large contributions also appear to be disproportionatelyeffective relative to their size, indicating they not only provide support in the amountneeded, but also when it is needed.Finally, Chapter 4 returns to the problem of sales, and address an elementary questionin this area: how do we know when a product is on sale? Many studies, such as Bercket al. (2008), Pesendorfer (2002), and my own work in Chapter 2 need to know whetheror not a given product is on sale at a particular time, before they can move on to addressquestions of more specific interest. Generally, this is based on very little data: a sequence ofprices, and a handful of other potentially useful or interesting variables. Papers in this areaadopt a heuristic rule to tell whether a product is on sale, based on the observed patternof prices. This problem of price classification is fundamental to research in this area, andevery author has their own specific method, based on their understanding of the productand environment their data is generated in. For example, Chapter 2 uses a clustering andfitted mixture model to tell whether or not a given product is a sale. Nielsen-Kilts suggestscalling a sale any price more than 5% below the average for a product. There are many suchmethods including a fixed discounts from the previous price, rolling averages, etc. In thischapter, I develop a new method to classify sales, which explicitly takes advantage of thestructure of how sales arise in the data. This allows me to not only use more information toclassify sales, and provide statistical assessments of how certain a classification is, but alsohelps link the classification step and the reduced-form analysis many papers seek to carryout. For instance, my method is able to directly test whether certain variables are correlatedwith sales, a central element of interest for many models of sales. This is advantageousbecause it circumvents any question of whether anomalies in the classification methodcould create spurious correlations detected by reduced-form education.My method of sales classification is based on the hidden Markov model framework.After reviewing the basics of this model, I develop explicit identification results for left-to-3Chapter 1. Introductionright Markov models which can be applied to sales data. I also demonstrate how to includecovariates and lagged state variables into the model, allowing for a variety of differentpatterns for sales to be classified and studied. I then perform Monte Carlo simulationof sales, and test the simulated data against my classification method. The performanceis remarkably good, demonstrating that with good initialization of a sales model, salescan be classified with a high degree of accuracy using this method. I also show that thisis robust to even “difficult” or complex sales environments, and that the direct testingof covariances is highly feasible with this method. I also investigate the small-sampleperformance of this method, to provide guidance for situations where data may be scarce.This chapter develops a new and powerful method of researchers in industrial organizationto study, grounded in structural considerations of how sales come about and serves as anatural alternative to heuristics, or situations where researchers may be unsure of whichclassification method is appropriate for their sales data.4Chapter 2Sales and Perishable Products2.1 OverviewSales are everywhere in the retail landscape: a visit to nearly any store will find productsbeing sold at a discount, regardless of whether that store sells power tools or heads of let-tuce. Products with very different properties for consumers and retailers all display similarpricing strategies, characterized by periodic sales (temporary price reductions). Why is thisthe case? Economists understand the phenomenon of sales in the case of storable products.In most models of periodic sales, temporary price reductions occur because firms want tocharge different prices to different types of consumers. When consumers can store, theycan wait for lower prices, which allows firms to discriminate between them based on theirability to wait; the more patient the consumer, the more elastic the demand. Periodic pricereductions keep the price high for most consumers, while offering a low price to only themost elastic (patient) group (see Berck et al. (2008); Pesendorfer (2002) for examples). Inthis chapter, I analyse sales on perishable products, which are particularly puzzling becausethey are not explained by conventional models for sales: since perishable goods cannot bestored, inventory-based explanations cannot be applied. These kinds of products are alsoimportant, since many staple goods such as meat and vegetables fall into this category. Inthis chapter, I develop and evaluate a new model to try to explain this phenomenon.The first step is to determine whether sales are an important part of the pricing strategyused by firms for perishable goods. Previous papers such as Hosken and Reiffen (2004)have found some evidence of this, while others such as Berck et al. (2008) have foundsuggestive results for semi-perishable goods such as orange juice. This literature tends tofocus on models in which sales occur periodically, as most studies strongly reject modelssuch as Varian (1980) in which sales are uncorrelated over time. I focus on an extreme caseof perishability, looking at highly perishable goods: specifically, packaged meat. Packagedmeat is an excellent case study because (1) it is uniform across stores and time, (2) it is ex-tremely perishable, lasting less than a week at home, and (3) it is commonly purchased by52.1. Overviewconsumers1. My dataset is the packaged meat segment from the Nielsen-Kilts SCANTRACKdatabase from 2010-2013 including 17,708,731 observations across 9,605 stores and 461products.The results validate the findings suggested by other studies: using a linear probabilityapproach, I find that the strongest effect is cyclic in nature. As time elapses after a sale,the marginal likelihood of a new sale rises from -3.4% to +2.5%, as we move from oneto four weeks after the last sale. This kind of periodicity is usually held up as evidenceto support an inventory model for sales. However, as described, this is implausible formy dataset. One alternative possibility I examine concerns products expiring: I concludethat while stores are responsive to expiring inventory, this does not play a major role in theoccurrence of sales. This is sensible, since we should not expect it to be profitable for storesto discard large amounts of product routinely, especially in the packaged meat segment.In order to carry out this analysis, similar to other papers, I need to adopt a methodto determine whether or not a given price is a sale. My technique is based on a simpleobservation: all heuristics which determine sales try to separate the data into high (regu-lar) and low (sale) prices over some time frame; for example, based on difference from themodal price in a specific window2. I do this explicitly, by estimating the price sequence as amixture of regular (high) and sale (low) prices. Rather than choosing a single time frame,I break the series of prices into sections using a clustering algorithm to intelligently sepa-rate the sequence based on within-group variation. My mixture model is then estimatedon each of these sections in turn. This is necessary because many products over a longtime frame makes any single existing method unlikely to perform well; for instance, thesuggested method in the SCANTRACK database is inadequate. This method can be givenexplicit economic context in the idea of two unobserved states; a slowly varying pricing“regime” and a quickly varying “sale.” Under regularity conditions that I develop, the longrun “regime” can be recovered and observations classified into different pricing regimes.These conditions amount to a limitation on the price variability of products; prices can-not change “too much” or “too little” over time. This method also allows more detailedexamination of whether points or sales are not, and makes explicit the conditions for theeffective classification of the heuristic. In Chapter 4, I demonstrate how my method can be1The consumers in my dataset are American and, based on the NHANES survey, consume an average ofabout 5.5 oz (130g) of meat per day, a rate that is one of the highest in the world (Daniel et al. (2011)).2Examples include techniques like a sale is a price “5% below the average price” (as in SCANTRACK’smethod) or “10% below the R3 month average price” .62.1. Overviewextended into a structural model, using a technique known as hidden Markov modelling.Having verified that sales are an important part of the pricing strategy for perishablegoods, I develop a model of dynamic loss leadership to explain these results. In my model,there is a single firm selling two goods, a perishable and a storable, to a large number ofconsumers. There are two kinds of consumers: singles and families. These consumers dif-fer in (1) their tastes for goods and (2) their ability to store. Families consume perishables(which are not storable) and also have the ability to inventory the storable good. Singlesdo not store, and also do not consume the perishable good. The ability to store on the partof the families means that, over time, their stock of the storable good changes. When fam-ilies are running out of stock, they would like to purchase both the storable and perishablegoods; when they are in stock, they would like to purchase just the perishable good. Bothtypes of consumers shop only once per period, and have the option of shopping at the firmor at a local source who sells only the perishable good at cost.I assume that consumers are distributed in space, with some families located closerto the firm; consequentially, the distant families face a higher cost of shopping at thefirm. The firm then faces a problem; they cannot sell to all three groups (the singles,the local families, and the distant families) at their reservation prices for their baskets,since shopping costs drive a wedge between the different groups. In a situation where thedistant families have high inventories (and low demand for the storable), the firm wouldprefer to sell to the local families and the singles at their willingness to pay. In order tosell to the distant families the firm must provide a discount (a sale) on the bundle; whichmeans lowering the price of one of the two goods. If the firm wants to have a sale, itis optimal to lower the price of the perishable good under the condition that singles buymore in total than the distant families. However, the monopolist wants to have a saleonly if the new market it attracts (the distant families) are more valuable than the profit itgives up by discounting the perishable good for the existing market (the local families). Ifdistant family inventories are high, they would prefer not to have a sale; if they are low, asale becomes more attractive. The monopolist also does not want to keep the price of theperishable good permanently low since they lose money by competing in this fashion.The firm makes this decision inter-temporally, understanding that if it delays a sale itbenefits because inventories will drop over time, increasing the profit from the distant fam-ilies. However, they also understand that when they hold a sale, the out of stock familiesrefill their inventories and become in-stock again, resulting in an inter-temporal trade-off.Nonetheless, a sale will eventually occur, because as time elapses eventually everyone runs72.1. Overviewout of stock. The exact timing of the sale depends on the parameters of the model, but itwill reoccur periodically and will be at least one period after the last sale, resulting in atemporary price change (not a permanent one). Families have rational expectations aboutprices, and sort over time into periods where their bundle is most affordable. The centralprediction of this model is typical of loss leadership: periods with sales should be associ-ated with higher-than-normal purchasing of other goods, particularly storables, relative tonon-sale periods. I also demonstrate a number of extensions, illustrating that this model isfairly robust to different assumptions about consumer behaviour and firm expectations.Finally, I try to evaluate my model by going beyond the retail data by using the large-scale nature of my classification methodology to forge a link to the Nielsen-Kilts HMSconsumer dataset, matching products at the product-store level. This highlights the use-fulness of my heuristic and a large retail sample; in this data, since I need to look at justthe perishable-buying consumers, I end up being able to focus on only 26,862 trips. Look-ing closely at the consumer data, I find that direct inventory explanations (the freezingof meat products) are largely ruled out but the central prediction of loss leadership holdsup. The behaviour of consumers demonstrates that they spend more in total (across allgoods) when they buy products on sale, and in particular on long-lasting (and highly prof-itable, from the store’s perspective) sundry products such as detergent, soap, and toiletpaper. I also find that there is heterogeneity among different types of consumers, withloss-leadership being targeted to certain subsets of the consumers; indicating a substantialdegree of sophistication on the part of stores in how they price products.This chapter makes three main contributions. First, it shows explicitly that regular,periodic sales are an important part of the story surrounding the pricing of perishableproducts. This is important because it means that when we talk about sales, or modelthem, our explanations need to either take into account the special nature of these kinds ofproducts or acknowledge that a “one size fits all” approach to explaining sales is not goingto work. Second, I illustrate how model sales for these products by developing a modelwhich explains the periodic nature of perishable sales. As the discussion will show, I alsoexplain how this model fits into the set of possible models and frameworks. I further showthat this model is plausible given the data by showing that the central causal connectionnecessary is supported by consumer choice data. This demonstrates the third contributionof this paper: showing how the connection of retail and consumer choice data can helpevaluate different types of theoretical models. This necessitates the development of moreflexible tools to determine when sales occur, and focuses attention on the economic content82.2. Backgroundof these kinds of heuristic decisions empirical researchers make.The remainder of this chapter is organized as follows: in section 2.2, I review somerelevant background, while in section 2.3 I discuss the data and the details of the heuristicmethod I develop to classify sales. Section 2.4 contains my reduced form results for theimportance of sales in the pricing of perishable products. Section 2.5 develops my theoret-ical model of sales, while section 2.6 evaluates my model using the linked retail-consumerdataset. Finally section 2.7 summarizes and concludes, while Appendix A contains proofs,extensions of the model, and extra material. A further chapter of this thesis, Chapter 4develops the structural version of the heuristic method presented in section BackgroundThe role sales play in the economy has been the subject of several different lines of anal-ysis, both theoretical and empirical. Sales are important because they are fundamental toprices in the real world, with implications for issues as varied as competition policy andmacroeconomic price stickiness. They also are puzzling, because as Chevalier et al. (2000)describe the evidence is firmly in favour of the causal relationship that high demand pe-riods lead to low prices, instead of vice versa; sales occur in periods when demand ishighest, in a causal sense. Furthermore, this occurs even in relatively competitive markets,such as grocery or retail environments. Sales also appear to occur with regularity, rulingout supply-side problems on the part of the firm3. Understanding the motivation behindsales is further complicated by the fact that most markets are complex in several dimen-sions at once, making it difficult to pin down any one explanation. For example, pricinga single product in a single grocery store may involve (1) a brand-name product versussubstitutable generic products, (2) an upstream/downstream relationship between retailerand wholesaler, (3) store and chain pricing strategies, (4) inter-temporal and geographiccompetition, (5) consumer inventories, search behaviour, and expectations, (6) store-levelmanagerial and performance incentives, and (7) many, many others. Given this range offactors to consider, many papers therefore find mixed (e.g. Berck et al. (2008); Hoskenand Reiffen (2004)) support for different motivations for sales.On the theoretical side, most papers focus on a single mechanism which can createsales, generally for simplicity of exposition and to provide insight into the economic mech-3For example, a firm purchases too much product and needs to reduce inventory or sell product before itexpires.92.2. Backgroundanisms behind sales. Most models rely on asymmetry between consumers, which results insome way for firms to discriminate between the consumers and maximize profits. This isoften price discrimination, which can be, as Varian (1989) summarizes (based on Stigler(1987)) “[be] present when two or more similar goods are sold at prices that are in dif-ferent ratios to marginal costs.” In many cases it is difficult to call the competition a formof price discrimination specifically. For example, models of loss leadership like Chevalieret al. (2000), or advertising-based models like Lal (1990); Lal and Matutes (1994) theasymmetry between consumers is more subtle, induced, or non-existent. Generally, thesekinds of models will rely on consumer limitations, such as imperfect information, choicefrictions, or travel costs, which firms exploit to maximize profit; with the result looking likesales.One class of models can be described as search-based. Salop and Stiglitz (1977) usesconsumers who are unsure of where the low prices are in the environment, and spend timesearching. A heterogeneous search cost leads to differentiation between monopolisiticallycompetitive firms which looks cross-sectionally like sales. Salop and Stiglitz (1982) extendsthis, moving the heterogeneity to an inventory level held by consumers instead, producingsimilar results while endogenizing the reason for consumer variation. One drawback ofthese models is that they imply cross-sectional variation in prices across firms, and notnecessarily variation within a given firm; this is largely a consequence of their static nature,and different prices over time must be rationalized as “rotations” of the low prices throughthe economy. The chief problem with this is that such a “rotation” requires the equilibriumto change from period to period, without any model-based motivation for it to do so; such apattern is acceptable, but there is no incentive creating it, nor ruling out any other patternof equilibria over time.To deal with this Varian (1980) improved upon this model by changing consumer be-haviour. Consumers make a decision about whether or not to perform costly search thenapproach the store with the lowest price. This produces groups of searching and non-searching consumers which firms compete over using price. The equilibrium outcome ofthis model is a mixed strategy on the part of the firms, rationally supported by the ex-pectations of the consumers about the pricing distribution. This naturally creates bothcross-sectional and inter-temporal variation in the pricing, which looks like sales. The ma-jor drawback of this model is that, for mixing to be effective, it must be uncorrelated overtime; truly random, as pointed out in many empirical studies on the topic (e.g.Hosken andReiffen (2004); Berck et al. (2008); Pesendorfer (2002)). This is problematic because the102.2. Backgroundcyclic nature of sales is considered to be a major feature of most environments, and thisrandomization does not generally stand up to economic scrutiny.One of the first papers to directly address this cyclicality was Conlisk et al. (1984). Inthis model, sales are caused by a monopolist responding to varying willingness to pay oflong-lived consumers in the market. Over time, consumers with low willingnesses to payaccumulate, until they reach a critical mass and the firm lowers prices to capture them allat once. Sales are created in a cyclic way by this dynamic pattern of arrivals. The questionof why willingness to pay would change over time is not explicitly endogenized, but othermodels have tried to adapt this: this forms a second class of models with the basic ideabeing some form of inventory management.For example, in Blattberg et al. (1981), sales are the result of long-lived consumers andfirms facing differential storage costs. From the firm’s point of view, it is immaterial whena consumer buys a product - they will buy it sooner or later, so really their decision is whento move the product between the store and the household. Variation in stock-holding costsbetween the two economic agents leads to periodic price reductions: a sort of “warehouseclear-out” model. Pesendorfer (2002) explicitly builds a consumer inventory managementmodel to explain the dynamics of pricing he observes in his data, as do Hendel and Nevo(2013). On a more macroeconomic note, Hendel and Nevo (2006) build a model focusingon consumers, and use it to estimate the elasticities of products which are likely to beinventoried, with implications. Because inventories form a kind of buffer for firms, and atype of savings for households, this has economy-wide implications: for example, Wong(2016) finds that consumer inventory stockpiles are a significant form of savings for low tomiddle-income households.A third branch of analysis tries to reconcile sales in light of the multi-product natureof most retailers, and the (often very complicated) multi-level firm-brand-chain structuremany markets display. Many of these models show a sharp divide between marketing, busi-ness, and economic explanations. For example, in Lal (1990), national brands use retailersas proxies in a competition with local brands over a pool of price-sensitive consumers.It is profit maximizing for the national brands to vary their price randomly over time tocapture part of the market for the price-sensitive consumers. This relies on the ability ofnational brands and chains to create and maintain brand equity to create “loyalist” con-sumers, something most economic models don’t try to address. Lal and Villas-Boas (1998)builds on this model by extending it to include multi-product retailers along with the com-peting national brands. Competition remains over groups of price sensitive consumers,112.2. Backgroundwith equilibrium outcomes interrelated in a similar way, but now across substitutable orcomplementary goods. Shelegia (2012) extends the idea of loyalists to multi-product re-tailing, demonstrating optimal differential pricing strategies based on the cross-elasticitiesof the goods in question.Other models use spatial differentiation. Shilony (1977) describes how monopolisit-ically competitive firms compete over prices using their locations for monopoly power.Sales pricing is the result of the a mixed strategy Nash equilibrium where competition isover the marginal consumer. DeGraba (2006) uses a similar spatial model to show salesare the results of price discriminating between different kinds of consumers who buy dif-ferent bundles of products, explicitly using the multi-product nature of the firms to createwhat he describes as “loss-leadership.” Behavioural models also abound, examining salespricing as part of a mix of potential marketing decisions (e.g. Yoo et al. (2000)), with acomplicated relationship between pricing, quality, brand premia and sales (see, for exam-ple, considerations in Aaker (1996)).The proliferation of so many different theoretical models, the macroeconomic implica-tions of price variability, and the obvious importance of the problem of how to administersales from a management point of view has lead to a rich empirical literature studyingsales. The papers closest in nature to this paper essentially follow a “differentiation” pro-gram of research; they note that in general, the different theoretical models have makedifferent predictions for how products, firms, and situations should appear in the patternsof sales which we observe. They try to evaluate these different explanations in light ofthe data, and possibly provide their own model for which results seem the most plausible.This has been enabled by the dramatic increase in the availability of scanner data fromsupermarkets, and the increasing affordability of both computational and data storage re-sources.For example, Hosken and Reiffen (2004) looking very broadly at the monthly dataused by the United States Bureau of Labour Statistics to create the Consumer Price Indexused to measure inflation (among other things). They find evidence against many of themixed strategy models, such as Varian (1980), but have difficulty pinning down a singlealternative model as the definitive winner. This is largely due to the fact that most of theinventory models they consider fail to properly explain the patterns related to perishablegoods in their data, coupled with the low granularity of the data available to them. On theother hand, Pesendorfer (2002) uses a study of ketchup bottles to develop and support aninventory model, again finding mixed results for some models of sales, but also looking also122.3. Data and Sales Classificationat the role of chains explicitly. In one of the broadest studies to date, Berck et al. (2008)uses orange juice (fresh and frozen) to test a wide variety of sales models, again findingmixed results for different kinds of sales, but relying explicitly on the time series natureof the data under examination. Hendel and Nevo (2013) develops an inventory model ofsales based on soda pop data, similar to Salop and Stiglitz (1982); Blattberg et al. (1981).This study is capable of quantifying the effects of sales, finding they are substantially profitimproving for firms.In general, the cyclic pattern of sales has proven difficult to explain; most mixed strat-egy most explicitly must rule it out, in order for the equilibrium to be supported, or rely onvery narrow and difficult-to-imagine equilibrium arrangements to support such a pattern.Accordingly, inventory models, which create an endogenous variation in the willingness topay for consumers are an appealing alternative. However, these are difficult to considerwhen it comes to inventory-less products like perishables; you cannot have a variation ininventory driving sales if no inventory can be held. Hosken and Reiffen (2004) points thisout explicitly, noting that the same inventory model appears to apply to all of their prod-ucts, even when it is difficult to explain why. This paper examines the same point in adifferent context, and presents a model which connects inventories and perishable productthrough the multi-product nature of consumer buying.2.3 Data and Sales Classification2.3.1 Data and Sample SelectionThis project uses four years of the Kilts-Nielsen SCANTRACK dataset, 2010-2013 inclusive.These data are a collection of weekly store-level sales data for every UPC4 processed by alarge number of stores located in the United States (see Bronnenberg et al. (2009, 2012);Danaher et al. (2008) for examples of this dataset). For each store in each week, we ob-serve the price of the product, the volume, along with several other covariates which willbe discussed later. The products also have a number of hierarchical details, such as theirproduct category, description, and for selected products additional information specific tothe UPC description; the detail given is structured so that it is associated with the UPC4UPC stands for Universal Product Code, a barcode system widely used in English-speaking countries fortracking products of a uniform type across stores. It facilitates inventory tracking, checkout, and store man-agement at both the retail and wholesale level and is connected to the company registration number used forfirms. The term UPC code is equivalent, albeit somewhat redundant.132.3. Data and Sales Classificationregistration. In the United States, the registration and use of UPCs is designed primar-ily to facilitate the differentiation of different products by packaging type (size, number,etc.) rather than by qualitative features relevant to the consumer. Accordingly, the SCANT-RACK dataset has a similar structure for UPCs. Similarly, the data contain information onthe stores, including chain membership and locations; however, precise identification ofthe stores (or chains) is not available or recoverable under the terms of the data-sharingagreement.In this paper, I focus on perishable products; as explained in Hendel and Nevo (2013);Hosken and Reiffen (2004) the pricing patterns of perishable products are difficult to ex-plain with inventory-based models. Specifically, I look at a specific subset of perishablegoods: all UPC-coded packaged fresh meat products5. I focus on packaged fresh meatproducts (as opposed to other kinds of grocery store perishables) for three reasons: first,packaged meat is highly perishable, typically lasting for less than a week after it is pack-aged. This makes it an excellent example of a perishable good, since it is very difficult toinventory or store. Equally useful is the fact that the shelf-life is less than the data col-lection period in the SCANTRACK dataset, which means stores seek to sell-through theirweekly inventory within the period observed. This means we have a close connectionbetween the life of the product and the periods we observe it for. Second, unlike manyperishable products (lettuce, apples, etc.) packaged meat is highly uniform in appearanceand quality, making it practical to compare products across stores and across time. Finally,it is important: for most Americans perishable meat products like the ones studied forman important part of their weekly or daily diet. This makes them not only inherently eco-nomically interesting, but also ensures that consumers are likely to include them in theirconsumption bundles. This features will prove very important when we turn our attentionto SCANTRACK’s sister database at the consumer level, which I discuss in section 2.6.One challenge with these data is that, as in Hendel and Nevo (2013), the collectionmethod of SCANTRACK dataset can cause certain problems for inferring pricing: specifi-cally, because SCANTRACK is collected on a weekly basis, they decide to report prices asthe volume-weighted average of the prices sold during the week. When the pricing pe-riod for a product agrees with SCANTRACK’s window, this is not a problem. However, if(as some stores do) the period in which prices change occurs in the middle of the week,5For example, this means including only individually packaged 16 oz units of lean ground beef, and notrandom weight ground beef sold through the butcher’s counter or packaged in store, or 16 oz packages ofground beef sold 3-for-1 as a “multi-pack.”142.3. Data and Sales Classificationthe observed pricing becomes very noisy and difficult to impute. However, this is easy todetect in the data: stores which change their pricing regularly during the week have avery large number of unique prices relative to observations for each product. We excludesuch stores based on a fraction criterion: stores with more than 1/3rd of their observa-tions demonstrating unique prices are deemed to be those which have changes during theweek. Given that products with higher frequencies would involve changing all pricing andsignage every other week, this is a reasonable restriction. Additionally, in order to makeobservations more similar, I additionally include only grocery stores (channel “F”) in thedata; this principally excludes mass merchandise stores (such as Walmart, Costco, Sam’sClub, etc.) which sell products other than food, and focus on products with at least threeyears of data.Since individual stores may have distinct pricing schemes (based on regional demand,location, etc.) for each product, my unit of observation is the UPC-Store combination.After the preceding restrictions have been made to the data, I am left with 17,708,731 ob-servations across 9,605 stores and 461 different UPCs, for a combination of 92,465 uniqueUPC-Store level observations. These are dispersed across 66 different chains of grocerystores located in 173 distinct counties in 49 states. I report the numerical descriptivestatistics for my sample in Table 2.1. One interesting feature of the SCANTRACK dataset isthat Nielsen has collected information on some products using store-level audits. A “fea-ture” is an occurrence of whether or not a product was featured in advertising for theweek. A “display” indicated whether or not a product was put on display in the store in agiven week. Unfortunately, this is only carried out on a random sample of the populationof stores; in my data, about 19.8% feature this information.However, this information alone is not sufficient to study sales; critically, the incom-pleteness of the “feature” and “display” metrics means we need to adopt a method toclassify which observations are on sale from the time series of prices. The administratorsof the SCANTRACK dataset make the suggestion that a sale is any price 5% below the av-erage price for that product in a given store. However, this is incorrect when we considermost products and especially over a long time frame. Accordingly, I develop an alternativemethod in the following section, and apply it to the sample of data. Due to the large scaleof the dataset, this is impossible to monitor manually; this is partly the purpose behindthe method I develop. I apply two low-level diagnostic tools to eliminate poor fits: first,we reject any product for which the estimation of regular and sale prices could not con-verge within a reasonable number of repetitions. Second, we apply a “warning” label to152.3. Data and Sales Classificationany observations for which the procedure failed to converge in 1000 iterations. The firstis clearly a serious mistake and should be excluded; the second is less obviously serious,since it merely implied that the clustering fit is not “optimal” for at least one repetition ofthe clustering method. After estimating, I see about 0.07% fail critically, and exclude themfrom the reduced form results.2.3.2 Sales ClassificationWhen is a product on sale? If we want to understand the decisions behind why firms putproducts on sale, we need to have some sense of when a product is actually on sale ornot. This seems simple, but is actually not straightforward at all. Generally, a researcherwill only observe a time series of prices (and perhaps quantities) for a particular productin a particular store6. Some of these prices may be high, some may be low, but classifyingthese prices into sale or regular prices is complicated by two features: (1) long-term pricefluctuations driven by structural change in the business (like wholesaler price increasesor market-wide shocks to demand) and (2) idiosyncratic fluctuations driven by store-levelunobservables, such as store price changes. To make these concerns concrete, considerFigure 2.1; this depicts the price pattern associated with 20 oz packages of a single meatproduct in a single grocery store over several years. As we can see, there is a repeatedpattern of high, then low, prices; what we would normally call “sales.” There is also a longrun change in the price, increasing over time for both the high and low prices. Additionally,we can see that in certain weeks, prices are “near” the usual low price but not exactly;are these sales or not? The problem of determining which observations are sales is notstraightforward; it requires a method to classify them.Accordingly, most studies rely on a heuristic methodology based on particular charac-teristics of the dataset. For example, Hosken and Reiffen (2004) look at the differencebelow the modal price, while Hendel and Nevo (2013) uses a fixed deviation from a typi-cal price point. The basic idea is that we can study sales in these environments by choos-6Even the creation of variables indicating whether or not a product is on sale is suspect. For example,even if you collected information on sales from flyers at a chain level, the likelihood that a given store wouldbe in perfect compliance is low. Store-specific sales could occur, or sales could run out of the sale-markedproduct. Even store-level data is suspect; without retrospective analysis of prices, even apparent sales canbe (intentionally) misleading. This is particularly acute in retail environments where “price anchoring” andmisrepresentation of regular product prices skirt closely to the legal limitations on such practices (see, forexample, Manjoo (2010); Tuttle (2014)).162.3. Data and Sales Classificationing a product we understand well, with desirable features7, then use that information todetermine what is, or is not, a sale. This has the advantage of being motivated by theeconomist’s understanding of the product to inform the definition of a sale, but also has at-tendant disadvantages. To be clear, this approach is generally a good one, but is intractablein situations where we have large amounts of data on relatively heterogeneous products orno clear priors for how sales should appear in the dataset. It’s hard to give a clear answerto the question of “what is a sale” when presented with an arbitrary product, and a “onesize fits all approach” may not be appropriate in many situation.A natural way to understand all of these heuristics is that they separate the prices intotwo groups: high “regular” prices and low “sale” prices. The difference between thesetwo prices is then detected by some kind of averaging or filtering process which tries todetermine which price is likely to be the “regular” price then classifies sales on this basis.For example, SCANTRACK suggests that the mean price is close to the “regular” price, andthe filter which screens for sale prices is a 5% margin cut-off. I take this general notion andmake it explicit: I assume that the observed series of prices is a mixture of regular pricesand sale prices, plus some noise. I then proceed to estimate a mixture model to determinethe regular and sale prices from the data; if the price sequence was relatively stable, thiswould be straightforward. However, as we can see in Figure 2.1, the sale and regularprices change over time. In order to deal with this, I break the time series in sections,and estimate the mixture model separately for each sections. This separating process isdepicted in Figure 2.2.However, rather than simply fix a single rule about how to divide the time series intosections, I adopt a more flexible method based on k-means clustering. To understandthis method, it is worthwhile to understand the economic intuition behind the idea ofboth sales and long run variation in pricing. The perspective I take is that stores makeweekly decisions about prices which are governed by a long-term pricing strategy (calleda “regime”). Returning to Figure 2.1, the idea is that in any given period a store choosesbetween two prices: a regular price, or a sale price. For many models of sales (includingthat of section 2.5) this is the natural pricing decision. This also agrees with the idea ofa “mixture” of two price levels, developed above. The value of these prices are part of apricing regime which can change over time, causing both the regular and the sale price toalso change. For example, we can see that in the first three months of 2010, the price of7For example, most products used have some form of up-stream brand connection to anchor prices, such asthe standard price of a 2L bottle of Coca-Cola.172.3. Data and Sales Classificationthis product varied by about 70 cents, repeating in a cyclic pattern of prices. Then, in April,the price variation began alternating between higher levels; eventually, by September thisvariation to about 50 cents. This kind of variation is a common occurrence. There aresimilar examples of such patterns in our dataset, and in other research on the topic (seePesendorfer (2002); Hendel and Nevo (2013) for example). These changes in the repeatingpatterns of high and low prices are changes in what I call a pricing regime; the change fromhigh to low price within the regime is the sale. Small variations in these prices which arenon-systematic, such as those in February 2013 are example of the potential idiosyncraticfeatures which may arise.This means we are defining, informally speaking, a sale as a temporary reduction inthe price of a good (to the sale price), relative to a regular price which occurs for a moresubstantial period of time. This agrees with the “time test”8 legal definition of a sale, whichone of the main legal ways a sale is regarded. As mentioned, if prices are observed for longenough, this is complicated by the fact that the ordinary price may also change over time.For example, most companies will perform systematic price changes for all their productsat least once or twice a year, which are passed through to the retail channel. Similarly,most grocery stores will re-evaluate how they price products at least annually. We havealready seen examples of this in Figure 2.1. To make this definition more formal, I nowbegin to impose some structure on the data we observe.Consider a representative product being sold at a single grocery store over time. Theresearcher observes the price Yt and a set of covariates (such as amount sold, productdepartment, weight, etc.) Xt for each (discrete) time period t = 1, 2, ..., T . We imagineat each period, the grocery store has several different options to choose for their price,Yt ∈ P . These prices correspond to the pricing strategy the grocery store has settled onusing at that time period; we leave this unmodeled, but we could imagine the pricingoptions are the equilibrium strategies of a more complicated game played against othergrocery stores. In any case, the set P is the pricing set, and is defined as follows:8In Canadian law (a similar definition holds in US or EU law), the Canadian Competition Bureau summa-rizes this (under Subsections 74.01(2) and 74.01(3) of the Competition Act):[The Act] prohibit[s] the making, or the permitting of the making, of any materially false ormisleading representation, to the public, as to the ordinary selling price of a product, in any formwhatever. The ordinary selling price is determined by using one of two tests: either a substantialvolume of the product was sold at that price or a higher price, within a reasonable period of time(volume test); or the product was offered for sale, in good faith, for a substantial period of timeat that price or a higher price (time test).182.3. Data and Sales ClassificationDefinition 2.3.1. A pricing set, denoted P is a set of prices which consist of (1) a regularprice p and (2) kp discounts from the regular price δ1, δ2, . . . , δkp > 0. The associated saleprice is sk ≡ p − δk . Similarly, a sale (at discount k) is defined as the event that Yt = skand is written Skt . The regular price can then be written as Yt = p−∑kpk=1 Skt δk. The set ofall pricing sets is P.These pricing sets are the options individual stores have to choose from; a pricingregime is the particular set that a store has adopted, which persists for a continguous time.Since this is not directly observable, this is modelled as a state variable which we indexwith the integers; we represent this indexing with a function R : Z+ → P which maps theintegers into the set of all pricing sets.9Definition 2.3.2. A pricing regime, denoted Rt, is a state variable which indexes the avail-able pricing set for a store at time t and lasts for a contiguous period of time.10 That is, ifRt = z then P = R(z) is the available set of prices. We denote the associated prices andevents by association with Rt: p(Rt) > 0, δ(Rt) = (δ1(Rt), δ2(Rt), . . . , δkp(Rt)) ∈ Rkp++,sk(Rt) ≡ p(Rt) − δk(Rt). If we define Skt to be the event that a sale of type k oc-curs, and St = (S1t , S2t , . . . , Skpt ) ∈ Ikp , then the observed price can be written as Yt ≡p(Rt)− δ(Rt) · S′t; to agree with the notion of sales being relatively infrequent, we requirethat P (St > 0) < 0.5.In general, I will make the simplifying assumption that kp = 1, except when explicitlynoted. Given that the number of regimes is not ex ante fixed, this is not overly restrictive aslong as different sale prices occur in distinct time intervals. Note, in general when we dis-cuss a single pricing regime, I will omit the Rt parameter when there is no confusion. No-tice, this implies that there are really two kinds of variation in the pricing: high-frequencyvariation driven by changes in sales (St) and low-frequency variation driven by changes inthe pricing regime (Rt). Economically, we can imagine that sales are a short-run strategicdecisions being made by the store or chain, while pricing regimes are long/medium-rundecisions about the overall pricing of a product or group of products. This distinction hasbeen carefully studied in the macroeconomic literature on price stickiness (e.g. Kashyap(1994); Bils and Klenow (2002)) due to the importance it has for general equilibriummodels. This is illustrated in Figure 2.1, but similar figures can be found in most other9Since the number of possible date/firm combinations is discrete, the use of the integers as an index set iswithout loss of generality.10That is to say if Rt = R and Rt+k = R then for all s ∈ [t, t+ k], Rs = R192.3. Data and Sales Classificationempirical studies on this topic. As we can see, the price tends to jump up and down in afairly cyclic pattern, two weeks of sales followed by two to four weeks of regular prices.However, the “regular” prices and the sale prices evolve over time, as does the frequencyand structure of sales; for instance, some periods show far fewer sales than other periods.In terms of the model, we would call the long-run evolution of the pricing the “regime” Rtwhile the week-to-week discounting of the price is the “sale” St.If we believe that our data is generated according to a series of pricing regimes, thismeans that in a given regime Rt, the distribution of prices must follow the noise-freemixture model:f0(y|Rt) =P (St = 1|Rt) if y = p− δP (St = 0|Rt) if y = p0 otherwise(2.3.1)I illustrate a method to simulate this model, with error, in the appendix. Unless oth-erwise mentioned, the counterfactual data is simulated from the appendix model unlessbeing used to illustrate the econometric features of the model. Notice that for our pur-poses, we are only interested in the parameters p and δ, since they allow us to tell whena product is on sale or not; the parameter P (St = 1|Rt) = (1 − P (St = 0|Rt)) is not ofprimary interested. This is useful, because we would like to be able to identify p and δwithout imposing a particular structure on sales; they could be independent, they could beserially correlated, etc.Lemma 2.3.1. In the model defined by equation 2.3.1, p and δ are identified, providedthat 1 > P (St = 1|Rt) > 0 and n > 2.Proof. This follows from the fact that the distribution is very stark; if Y ∼ f0(y|Rt) and1 > P (St = 1|Rt) > 0 then p − δ = min(Y ) and p = max(Y ). Therefore, p is identifieddirectly, and δ = max(Y )−min(Y ).The condition that P (St = 1|Rt) ∈ (0, 1) is the assumption that both sales and reg-ular prices occur within the regime. This is not restrictive, since if we are interested inexamining sales, it is a sensible requirement to assume that sales actually occur. Noticethat this result does not rely on imposing any structure on when sales occur; while thisterm determines the shape of the distribution, the parameters of interest for our purposes202.3. Data and Sales ClassificationObs Mean Std Dev Min MaxPrice All 5.13 2.57 0.01 27.36Units All 23.82 46.06 1 6956Feature 19.81% 0.045 0.207 0 1Display 19.80% 0.008 0.088 0 1Table 2.1: Descriptive StatisticsFigure 2.1: A typical time series (20 oz packaged meat), levels omitted for privacy212.3. Data and Sales Classificationare governed by the bounds of the distribution, which are not affected by the frequency orstructure of sales.There are two problems with actually using this model. First, we know that there aresmall variations in the prices (errors, idiosyncratic noise, etc.) which cannot be capturedby this model. Second, due to the fact that variations from this discrete distribution cannotbe captured, estimating the model can be very difficult. This is because the likelihoodfunction, in terms of p and δ is flat almost everywhere (i.e. except on a set of measurezero). This is not amenable to any estimation method which does not explicitly use thesample analogue of the identification procedure above, which is not feasible due to thefirst problem outlined above.I order to work around these difficulties, I make a make a simplifying assumption byadding an error term: namely, that the observed values are Yi = p − δSt + i wherei ∼ N(0, σ). This structure implies that the structure of the simplified model is a two-distribution mixture of Gaussians, with the restriction that the covariance matrix is diago-nal and symmetric. I choose a Gaussian noise term for two reasons: first, it has completesupport, so even a badly-specified two-point distribution will put non-zero probability oneven very different observations. Second, the Gaussian distribution is amenable to a num-ber of numerical optimizations, which make it extremely fast to evaluate, allowing rapidestimation of such a model. The main drawback is that the distribution is non-skewed andcan take on both positive and negative values, something we do not really see in the data;this is the motivation for the model in the appendix.In general, mixture models suffer from generic non-identification, and require somenormalization to be identified. However, this model naturally nests both assumptions re-quired for identification: first, the condition that δ > 0 illustrated is equivalent to anassumption on the labels of the mixture. Second, if we maintain the assumption that1 > P (St = 1|Rt) > 0 (that sales actually occur), we have identification of model as ex-plained in Titterington (1985)11. If this second condition fails, we cannot identify p or δ.However, since we (by definition) believe sales are infrequent (P (St = 1|Rt) < 12), thisimplies we can still identify p, the regular prices. Since the objective here is to classifysales, being unable to do so in regimes which do not show sales is not a serious problem.It is worth noting that, as in the simpler model above, the potential serial correlation of11The basic requirements for identification of a Gaussian mixture distribution are (1) that the mixtures havenon-zero proportion, (2) the means of the component distributions are different, and (3) that the labels of thecomponents are not exchangeable.222.3. Data and Sales Classificationdifferent components is not material; although from period to period the probability ofbeing a sale may depend on the previous period, if we consider the model ex post and justconsider P (St = 1|Rt) as an average over all possible histories, this mixture model applies,and identification of p and δ follows, as before.However, this requires that we know which points are associated with a given pricingregime. Most heuristic methods essentially amount to assuming that all points are groupedinto a single regime, then using some method to produce the mixture division made ex-plicitly before. The problem then becomes how to separate the time series into separateregimes in a way which is tractable for large amounts of data. I do this by focusing on thedata as a two dimensional object (Yt, t) and then using a clustering method while over-weighting t to preserve the sequencing of the data. Since pricing regimes are contiguousand the timing is evenly spaced, this implies that the centroid (the arithmetic mean of eachdata point) of each regime is along located the midpoint of the time duration of the regime.The height of this point is governed by the average price; in a clustering method, pointsare assigned to which of these centres is closer. The essential character of this method isto break the sequence of points into blocks which have minimal within-block dispersion.Under conditions developed in the Appendix under Assumption A.2.1, this process notonly separates the data “correctly” but also aligns the blocks with the economic structuredeveloped above.However, this structural interpretation requires some conditions. First of all, since weare interested in determining whether a product is on sales (and not which regime it isin) it is without loss of generality if we imagine all regimes are of the same duration; ifthey aren’t, we can break the larger ones into two smaller parts. This is required becausemechanically larger regimes would have more dispersion even if the price variation wasthe same. Once we have done this, correct clustering into regime amounts to determiningwhether or not points will be closest to their regime’s centroid, as opposed to another.These conditions can be explicitly stated, as I develop in the Appendix in Assumption A.2.1.They amount to the requirement that prices cannot change “too much” or “too little” fromregime to regime. Basically, a new regime should not have points too close to the centroidof the old regime.The intuition for this is straightforward: if we had a regimes with a regular price p1then a new regime with a sale price s2 = p1, if these regimes border one another andregime 2 starts with a sale, it is impossible to tell whether this point is a sale from regime2 or a regular price in regime 1. This corresponds to the new regime having prices which232.3. Data and Sales Classificationhave changed “too much”. Similarly, if for regime 2 p2 = p1 the prices have changed“too little”, and we cannot tell when one regime starts and the other ends. This is notusually a problem, however, since the classification of sales from these two regimes wouldbe the same. In general, increasing sequences of pricing regimes (provided they are notextreme) will be well-classified by this method. The most serious problem can arise whenthe price falls and a regular price of a new regime is close to the median price of theold regime. In this case, these prices will be miss-classified. In the case where there asales, this is generally fine, since the Gaussian error structure will interpret this failure ofidentification as noise. However, if there are few sales (or none) in the regime, this maycreate erroneous sales, which are actually just two regimes without sales. This classificationis not completely absurd; nothing, in the basic model, restricts this kind of sales behaviour.Nonetheless, it does not agree with what we would normally think of as a sale.This is undesirable, but difficult to control for directly within the classification method.Fortunately, because we know the classification failure must occur in a particular way,the data should display “runs” of sales which are not likely if they were truly generatedby the mixture process the model implies. This can be detected using standard runs de-tecting techniques; for instance, the Wald-Wolfowitz non-parametric runs test (Wald andWolfowitz (1940)). I use a slightly more sophisticated method, in which I used the fittedex post probability of sales to determine the likelihood of a sequence of runs, then rejectsequences which are less than 10% likely.Since, in general, the number of regimes is not known ex ante the most direct way todo this is by performing k-means clustering for several candidate values of k, then usinga cluster comparison measure to decide on the “best” value of k. In order to try to ensurewe have similar-sized clusters, and to capture approximately two price changes a year, wecheck k = 10 to 20. This also agrees with the cut-off for number of prices, which we willdiscuss in Section 3. I show the results of this technique on four simulated datasets createdvia the data generating procedure in Figure 2.2. It turns out that the most effective clustercomparison metric is the Davies-Bouldin measure (as explained in Davies and Bouldin(1979)), based on a comparison of several different metrics manually on both simulatedand actual data. As we can see, the clustering technique does a generally good job, withthe exception of the last graph, which incorrectly chooses too few clusters. This mainlyoccurs when the regime lengths are of different lengths but similar values; for instance,the same data is depicted in the third graph with a different time arrangement.242.3. Data and Sales Classification2.3.3 Estimation and ClassificationOnce the data has been divided into different sections, the remaining task is to classify thepoints within each regime into sale and ordinary prices. I do this is by directly estimatingthe mixture model given above. Since the number of mixture components is small (two),this can be done directly using maximum likelihood estimation on the Gaussian mixturemodel, since the associated likelihood function is simple:Pr(y|q, p, σ, δ) = (1− q)φ(y − pσ) + qφ(y − p+ δσ)I demonstrate the performance of this in Figure 2.3, which illustrates the fitted valuesof the two levels in the model by black lines. As we can see, in the simulated data thisfit is very good, largely due to the large sample size and well-defined regimes. However,as we can see, the actual data performance is relatively good as well. There are a fewerrors, mainly coincident with the difficulties the model had with certain regimes, buteven with these included the average fit is relatively good. I tested my methodology on arandom sub-sample of the projects, and performance is generally typical of the illustration;agreeing with what we would normally call “sales.”To next classify the points, I take the fitted values and form a p-statistic: for eachregime, I calculate the probability of a given observation being drawn from the regularprice distribution for that regime. I then classify points based on this measure: point withrelatively high p-sales are classified as regular prices, while points with relatively low p-values are classified as sales. We can see this performance in Figure 2.4, where the colourof the point indicates a high probability of being a sale price (with blue being regular andyellow being a sale). As we can see, this performs well; at the standard (p = 0.9) level ofsignificance, the data correctly classifies all points in the actual data. The method does tendto over-fit errors as sales, but this can be corrected by performing the opposite test: insteadof forming a statistic based on regular prices, estimate it also based on sales prices. Then,classify sales only for those points which pass both tests. However, given that in the actualdata the definition of what is an “error” and what is a small sale is not transparent, eithermethod can be preferred given the application being considered. This is especially truewhen we recognize that some regimes may not have any sales in them. I illustrate pointswhich pass zero, one and both tests for the data in Figure 2.5; this generally agrees with theintuition explained above. Using both tests provides a more conservative benchmark, and isdefinitely more robust to errors in the data. However, it also tends to remove large sections252.3. Data and Sales ClassificationFigure 2.2: Clustering on simulated and actual pricing dataFigure 2.3: Fitted values of model262.3. Data and Sales Classificationof the data from classification, considering it ambiguous. The performance on the latteractual sample indicates that for our application here, the one-test standard is probablyappropriate, especially given the (previously mentioned) ambiguity. This ability to provideprobabilistic assessments of the likelihood of a point being a sale is an improvement uponmost heuristic methods, which adopt “cut-off” rule decided ex ante instead of using theactual incidence of sales in the data.Once we have classified the data to an appropriate level, we can then use it to answerquestions about the dynamics and drivers of sales in the data being observed. The essenceof this method was that the pricing regime provides an economically motivated way ofdividing the time series into sections. Once this is done, we can separate the prices intotwo groups, regular and sale prices. From here, we can then estimate the probabilitythat a given observation is a sale. This method has several advantages, when comparedto other kinds of heuristics. First, it allows us to flexibly categorize many different typesof products, relying on the clustering method to group or separate the data rather thanchoosing a time frame or ignoring the variation entirely. Secondly, it explicitly providesprobabilities for observations to be sales, which gives us better ways of testing when ourmethod is not working properly. Finally, it makes explicit the economic meaning behindthe variation we observe and allows us to develop conditions under which we can be sureit is working properly. This is especially beneficial since, as discussed in section 2.3.4, thiscan be extended to an explicit structural model.In the next section, we introduce the large-scale data analysis, and illustrate the performance of the classification scheme. We thenprovide reduced form evidence for the sales detected by the classification scheme in thedata, illustrating different patterns and trends.2.3.4 Comment on Structural ModelsThis paper follows the literature by taking the following analytical approach: classify datainto sales, then analyse it. This is typical in this area (c.f. Berck et al. (2008); Pesendorfer(2002)) and is the preferred methodology for two reasons. First, it is flexible (you canexamine many different explanations easily), and second, it is suitable for large amountsof data. However, as we can see, it requires some conditions which can be difficult to testexplicitly. Many of these conditions are economically motivated; for example, the idea ofusing a runs test to infer misclassification of sales demonstrates that there is some structureto the environment we are not using. The way to address this is to incorporate it explicitly272.3. Data and Sales ClassificationFigure 2.4: Classification of sales from data282.4. Reduced Form Results: What Drives Sales?in a structural model. Essentially, this is a different analytical approach: jointly classifydata and analyse it. This has the advantage of doing away with the classification step, buthas several disadvantages. Primarily, we need to have a good idea of what we are lookingfor to produce a model which can capture the dimensions we are interested in. Secondly,it is also difficult for structural models to be used on large datasets, since they can be veryresource intensive; as is the case here. This approach is certainly valuable, and because ofthis value, I explicitly develop a structural model of sales which can be used to address thesame reduced form questions I tackle in this paper. However, because it is not the mainempirical focus (and adds nothing to the loss-leadership explanation) I develop this modelexplicitly in Chapter 4 This model is based on a hidden Markov model, adapted for thisenvironment, and provides explicit identification and estimation techniques for use in salesdata. This structural model is suitable for situations where two things are desired: first,a detailed and robust connection of sales regime and sales behaviour. For example, if aneconomist was primarily interested in examining the dynamics of sales regime changes, orhow frequently a grocery store changes its pricing strategy, the structural model would bepreferred. Second, it is suitable for smaller-scale data sets, where models can be neatlyfocused on a few variables of interest and where the number of observations is smaller. Forexample, this would be useful to economists performing detailed analysis on a few stores,or trying to analyse competition in a duopoly or oligopolistic market.2.4 Reduced Form Results: What Drives Sales?2.4.1 ModelOnce we have classified the sales observations, we can then ask which features of the datadrive sales. My benchmark specification is a linear probability model:Yit = X′iβ2 +W′itβ1 + it (2.4.1)The dependent variation is an indicator for whether or not the given observation is asale or not. The variable Xi contains the different level fixed effects, while Wit contains thetime-varying values. I present the different estimation specifications in Table 2.2. I try tocapture the different effects the literature, and theory, suggest for these kinds of products.I include terms related to whether or not a product has been selling below average for292.4. Reduced Form Results: What Drives Sales?the last several (or rolling) weeks. These terms are included, as I will discuss below, toaccount for expiring or stale-dated inventory in the stores. Unfortunately, in this datasetI do not observe explicit store inventories of products, so I attempt to measure this usingthe likely product left on the shelf in a given week instead. I look at this over several timeframes, to get an idea of the both the incidence and the timing of supply managementproblems within a store. I also include the time since the last sale, which is the standardway to capture inventory features (as in Pesendorfer (2002)). When no sale has beenobserved, I use instead the censored time since last sale as a separated variable. Theseare not expected to have positive coefficients for fresh products, but since meat productscan be frozen, this might play some effect. Motivated by the patterns I see in the data, Ialso include a variety of terms related to the periodicity of sales. The simplest is to justcount the time since the last sale; this is the “time since” measure for periods of 2,3,4, and6 weeks. I also provide an alternative which is is the whether modulus of the time sinceterm is zero; this capture whether it’s “multiples of the period”12. The main differencebetween the “time since” and “period” measure is based on an interpretation of how salesoccur: time since is effectively like a hazard rate of a sale increasing, which has a parallelin the models discussed in Section 2.2 as accumulating demand. Periodic terms are moregeneral, and can include many kinds of periodic explanations such as advertising timing orpredictability in sale timing. I also use a variety of different controls, especially for time,state, chain, and UPC code. When possible, I use heterogeneity robust standard errors.2.4.2 ResultsDue to the size of the dataset (17 million observations), which causes problems with manyestimation procedures purely due to size, I take a random 10% sub-sample of the store-UPC combinations across their entire time period. This results in 1,370,179 observationsremaining; the results presented here appear to be relatively stable, having similar coeffi-cients for (different) 1,5, and 7% sub-samples as well. I present the results for the differentspecifications in Table 2.3 and Table 2.4. Many of the results are ones we should expectfrom perishable products, based on previous studies.First, we can see that perishable products are responsive to sales. The number ofunits sold is a significant predictor of whether or not a product will be on sale or not; for12I also exclude zero; formally, 0 mod k = 0 but I omit this possibility since this is lead to a strong effectwithout much meaning, since days in which there is a sale will be in this situation.302.4. Reduced Form Results: What Drives Sales?Figure 2.5: Classification tests, two methods, p = 0.9Specification(1) (2) (3) (4)Method OLS OLS OLS OLSBelow average lags 1,2,R3 1,2,R3 R2,R3 R2,R3Time since last sale X X X XPeriodicity 2,3,4,6 2,3,4,6 2,3,4,6 2,3,4,6Period Measure Time since Time since Modulus Time sinceState Dummies X X X XUPC F.E. X - X XChain F.E. X X X XMonth F.E. X X X XYear F.E. X X X XErrors robust robust robust robustTable 2.2: Estimation Specifications312.4. Reduced Form Results: What Drives Sales?every 100 units sold, the likelihood the associated data being a sale increases by about1%. Most models of sales, such as Varian (1980) or Pesendorfer (2002) predict such arelationship. Models which do not, such as DeGraba (2006), generally do not make thisprediction because of a focus on multi-product competition (and the size of the sale goodis a normalization).One possible reason for sales on perishable products is based on the idea of expiry.Previous studies of perishable products, such as Sweeting (2012), have shown that evenrelatively unsophisticated (or, at least, individuals without access to analytical resources)are capable of correctly pricing products which have a diminishing utility over time. There-fore, we should expect that grocery stores, with their experienced managers and staff whoare incentivized to reduce losses, would also be responsive to expiring product. Becauseof the patterns of retail delivery, an increase in stale-dated13 product would be coincidentwith selling less relative to the average for a given store’s sales. The natural responsewould be to lower prices in a bid to remove the product from the shelves faster, reducingthe stale inventory. We see this in the data; if a store undersells by 10 units in a given week,they’re about 0.1% more like have a sale in the next week. However, over longer periodsof time this pattern reverses; persistently low sales over three weeks shows an decrease inthe likelihood of a sale. These types of transient sales would be the most difficult to detect,since they are likely to occur towards the end of a week, and thus would results in only asmall average price change for the whole week. With this fact in mind, we can considerthis something of a lower bound on this effect. Notice, this also belies a point made inHosken and Reiffen (2004), which suggests perishable products should not display sales;this is not generally true, and is certainty not a point against the model suggested (thatof Varian (1980)). However, a limited effect of expiring inventory is also some we shouldexpect: stores generally receive several shipments of goods during a week, and would beable to reduce their intake of product if they are not selling to expectations during the firstpart of a week. Managers are also strongly incentivised to avoid throwing away or reduc-ing margin on products, resulting in strong incentives to both accurately predict demandand then manage inventory carefully during a given week. A dramatic effect of expiry,especially which would reoccur indefinitely, indicates substantial potential improvementsin supply chain management by stores, and is at odds with profit maximization.We can also see that the suggestion that the duration since the last sale is unimportant13Product which is no longer suitable for retail sale; perhaps not strictly expired, but either has minor qualityproblems or would pose a spoilage risk on the shelf.322.4. Reduced Form Results: What Drives Sales?for perishable products is upheld by the results. The time since the last sale (whethercensored or otherwise) is a small, negligible negative predictor of whether or not a productis on sale; for most specifications, the effect is on the order of declining by 0.5 to 1% for5 weeks without a sale. This is what we would expect from perishable products; thereis no inventory motive, as in Pesendorfer (2002) for delaying sales, so they either shouldhappen infrequently and do with underselling (thus being closely related if the undersellingproblem is not corrected in time) or simply never on sale at all. This is like the case; someUPCs show strong, negative coefficients on the likelihood of a sale. This makes sense forour data here; it also replicates a finding of Berck et al. (2008) who found with orangesimilar results - despite the perishablility of orange juice being something of a question.14On the other hand, we do find good evidence of periodicity in the data. Specification(4), the modulus, indicates that a “x-weeks” rationale is less strong; but a simple assess-ment of the time since indicates that sales are 2-3% more likely 2 or 3 weeks after thelast sale, then less likely in other periods (-3%). This indicates one reason why the timesince the previous sale was insignificant in this model; it was already captured by lower-frequency variables with stronger effects. This is evidence for a inventory effect of somekind; which is not what we would expect for perishable products. There are clear rationalesfor why a retailer might want periodic (and predictable) sales: in terms of search costs,it is more efficient for the consumer, while it also allows them to accumulate inventory.However, there equally good reasons why a predictable sale period might be undesirable,since in many models it would allow opposing retailers to undercut them (as in Varian(1980)). This finding; that sales not only display temporal dependence, but actual period-icity is difficult to reconcile with the shelf life of the product in question. Is it possible thatwe are simply picking up the freezability of products? Or is there something more sophisti-cated at work here? In order to answer this question, and to distinguish between differentexplanations, I first develop a model of why we might see inventory-like behaviour in per-ishable products. I then use this model, with another consumer-level dataset, to see whichexplanations appear to hold up; this is developed in section 4.14The author notes he is not aware of anyone actually disposing of spoiled orange juice. It is perhapsfortunate that orange juice is certainly inventoriable, lasting upwards of 9 months in the pantry and 10+ daysin the fridge. Frozen juice lasts indefinitely.332.4. Reduced Form Results: What Drives Sales?(1) (2) (3) (4)VARIABLES Sale indicator Sale indicator Sale indicator Sale indicatorUnits (100s) 0.120*** 0.076*** 0.078*** 0.076***(0.003) (0.002) (0.002) (0.002)Below Average 1 (100s) -0.013*** -0.017***(0.002) (0.002)Below Average 2 (100s) -0.001 0.001(0.002) (0.001)Below Rolling 3 (100s) 0.038*** 0.030*** 0.021*** 0.033***(0.004) (0.003) (0.003) (0.003)Time since last sale (cens) -0.001*** -0.002*** -0.002*** -0.002***(0.000) (0.000) (0.000) (0.000)Time since last sale -0.002*** -0.003*** -0.002*** -0.003***(0.000) (0.000) (0.000) (0.000)Time since, 2 periods -0.034*** -0.017*** -0.019***(0.001) (0.001) (0.001)Time since, 3 periods 0.025*** 0.038*** 0.038***(0.001) (0.001) (0.001)Time since, 4 periods 0.024*** 0.030*** 0.030***(0.002) (0.002) (0.002)Time since, 6 periods -0.028*** -0.028*** -0.028***(0.002) (0.002) (0.002)Below Rolling 2 (100s) 0.001 -0.018***(0.002) (0.002)Time since, mod 2 -0.064***(0.001)Time since, mod 3 -0.039***(0.001)Time since, mod 4 0.004***(0.001)Time since, mod 6 -0.050***(0.001)Observations 1,370,179 1,370,179 1,370,179 1,370,179R-squared 0.101 0.069 0.078 0.069UPC FE X X XTime FE X X X XState FE X X X XChain FE X X X XErrors Robust Robust Robust RobustRobust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.10, † p<0.20Sampling weights used, outliers excludedTable 2.3: Results for Estimation Specifications342.4. Reduced Form Results: What Drives Sales?(1) (2) (3) (4)VARIABLES Sale indicator Sale indicator Sale indicator Sale indicatorBelow Average 1 (100s) -0.018*** -0.021***(0.002) (0.002)Below Average 2 (100s) 0.005*** 0.006***(0.001) (0.002)Below Rolling 3 (100s) -0.001 -0.008*** -0.017*** -0.005*(0.003) (0.003) (0.003) (0.003)Time since last sale (cens) -0.001*** -0.002*** -0.002*** -0.002***(0.000) (0.000) (0.000) (0.000)Time since last sale -0.002*** -0.003*** -0.002*** -0.003***(0.000) (0.000) (0.000) (0.000)Time since, 2 periods -0.035*** -0.018*** -0.021***(0.001) (0.001) (0.001)Time since, 3 periods 0.025*** 0.038*** 0.037***(0.001) (0.001) (0.001)Time since, 4 periods 0.027*** 0.033*** 0.033***(0.002) (0.002) (0.002)Time since, 6 periods -0.026*** -0.026*** -0.027***(0.002) (0.002) (0.002)Below Rolling 2 (100s) 0.001 -0.017***(0.003) (0.003)Time since, mod 2 -0.064***(0.001)Time since, mod 3 -0.038***(0.001)Time since, mod 4 0.006***(0.001)Time since, mod 6 -0.049***(0.001)Observations 1,370,179 1,370,179 1,370,179 1,370,179R-squared 0.085 0.060 0.069 0.060UPC FE X X X XTime FE X X X XState FE X X X XChain FE X X X XErrors Robust Robust Robust RobustRobust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.10, † p<0.20Sampling weights used, outliers excludedTable 2.4: Results for Estimation Specifications (No Volumes)352.5. A Dynamic Model of Loss Leadership2.5 A Dynamic Model of Loss LeadershipMy model is based on the notion of loss leadership: consumers do not purchase productsin isolation, but rather shop for bundles of different goods. Consumers bear a cost toshopping trips, and therefore prefer to buy all goods at the same location, which means alow price on one good can make the entire bundle of goods more attractive. In my model,there is a single firm selling two goods, a perishable and a storable, to a large numberof consumers. There are two kinds of consumers: singles and families. These consumersdiffer in (1) their tastes for goods (and willingness to pay) and (2) their ability to store.I assume that consumers are distributed in space, with some families (and the singles)located closer to the firm; consequentially, distant families face a higher cost of shoppingat the firm15. Families have the ability to inventory the storable good and also consumeperishables (which are not storable). Singles do not store, and also do not consume theperishable good. The ability to store on the part of the families means that, over time,their stock of the storable good changes. When families are running out of stock, theywould like to purchase both the storable and perishable goods; when they are in stock,they would like to purchase just the perishable good. Both types of consumers shop onlyonce per period, and have the option of shopping at the firm or at a local source who sellsonly the perishable good at cost.The firm then faces a problem: they cannot sell to all three groups (the singles, thelocal families, and the distant families) at their reservation prices for their baskets, sinceshopping costs drive a wedge between the different groups. In a situation where the distantfamilies have high inventories (and low demands for the storable), the monopolist wouldprefer to sell to the local families and the singles at their willingness to pay. This follows,since in order to sell to the distant families the firm must provide a discount (a sale) onthe bundle; which means lowering the price of one of the two goods. If the firm wants tohave a sale, it is optimal to lower the price of the perishable good under the condition thatthe singles are more numerous than the distant families. However, the firm only wantsto have a sale if the new market it attracts (the distant families) are more valuable thanthe profit it gives up by discounting the perishable good for the existing market (the localfamilies). If distant family inventories are high, the firm would prefer not to have a sale; ifthey are low, a sale becomes more attractive. It is also unattractive to keep the price of the15The distribution of the singles is not important for the analysis; for simplicity, I assume they are all locatednear the firm.362.5. A Dynamic Model of Loss Leadershipperishable good permanently low since the firm loses money by competing in this fashion.This decision is made inter-temporally, with the firm understanding that delaying asale is beneficial because inventories will fall over time, increasing the profit from the dis-tant families. However, they also understand that when they hold a sale, the out of stockfamilies refill their inventories and become in-stock again, resulting in an inter-temporaltrade-off. Nonetheless, a sale will eventually occur, because as time elapses eventually ev-eryone runs out-of-stock. The exact timing of the sale depends on the parameters of themodel, but it will reoccur periodically and will be at least one period after the last sale,resulting in a temporary price change (not a permanent one). Families have rational expec-tations about prices, and sort over time into periods where their bundle is most affordable.The central prediction of this model is typical of loss leadership: sale prices should be as-sociated with higher-than-normal purchasing of other goods, particularly storables. I alsodemonstrate a number of extensions, illustrating that this model is fairly robust to differentassumptions about consumer behaviour and firm expectations.Formally, we can model this as a dynamic game of complete information as follows.The first player is a monopolistic firm which produces and sells two goods j ∈ {s, p} atconstant marginal costs cj . The good s is a storable good, while good p is a perishablegood; these properties will be given a precise meaning shortly. The game takes place in adiscrete time infinite horizon setting, in which all players make decisions in time periodst = 0, 1, 2, . . . ,∞. Each period, the firm sets a single price Pj for a unit of good j withthe objective of maximizing their total, inter-temporal, profit from the whole game. Thefirm discounts profit in time period t by rate δt. In competition with the monopolistic firm,there is also a “local” firm which sells only good p at cost; that is there is always an outsideoption to buy the perishable good at price cp.16The second set of players is a large number of consumers who occur in two typesi ∈ {F, S} with masses mi. These consumers differ in both their tastes for the goods,and their ability to store. Each consumer is constrained to shop at exactly one store perperiod; they must do all their shopping at a single store, if they wish to buy anything. Imake this assumption to focus on the consumer’s purchasing decision, rather than purchasefrequency. Consumers of type S do not store and consume only good s17, while consumers16This reflects the fact that the perishable good is widely available and a low-margin product, from the pointof view of the monopolistic firm.17This is equivalent to assuming that their valuation for the good is less than cP and so the store wouldnever sell to them372.5. A Dynamic Model of Loss Leadershipof type F store k units of the good and may consume both types of goods. Furthermore,suppose that the families are distributed spatially in the economy; a fraction λ ∈ (0, 1) arelocated at a distance from the firm, while the remainder are local to the firm. This impliesthat for the distant families, they face a shopping cost τ > 0, while the local families andsingles have a cost normalized to zero.18 I will refer to these two sub-types as distantfamilies (DF ) and local families (LF ).At this point, I must make a decision about how consumers behave inter-temporally.Following Pesendorfer (2002), and in the interests of simplicity, I assume that consumersfollow a reservation pricing strategy: specifically, they have reservation prices vij for thegoods. This implies that if, in a given period, a one-good buying consumer of type i de-mands good j and Pj < vij then the consumer buys the good. Similarly, if a consumeris buying more than one good, then the sum of the prices must be less than the sum ofthe reservation prices. Consumers of type F consume both types of goods, but not everyperiod; they only want to buy good s in periods when their inventory is not completelyfull. These reservation prices can equivalently be thought of as choke prices for the goods,with the consumers having unit demand for the goods. As I will show, in equilibrium thisstrategy is rational; the lack of explicit inter-temporal optimization does not restrict theanalysis.Finally, to ground the movement of consumer inventory, notice that the type F con-sumers can be in one of k states, corresponding to the amount of inventory left. Let thedistribution of these consumers across the states at period t be denoted xt. Furthermore,suppose that xt evolves according to a Markov chain M with the properties that (M1) M isstrictly left-to-right19 and (M2) M has exactly one absorbing state at x0 ≡ (0, 0, ..., 0, 1).20These conditions on the Markov chain amount to assuming that if the consumer has aninventory level It at period t, that It+1 < It and that the limt→∞ It = 0. In other words,consumer inventories can only decrease over time, and that they eventually run out of in-ventory. The important assumption here is that this process is Markovian, in the sense thatonly the mass of consumers in the state matters, not their tenure in that state, and thatthey eventually all run out of stock. Finally, I assume that consumers shop at the lowestpossible cost, subject to the reservation prices. If consumers are indifferent between the18We could make the equivalent assumption about the singles, without loss of generality; however, as wewill see, distant singles would not add anything to the analysis.19That is to say, that M(i, j) > 0 ⇐⇒ j ≥ i and M(i, i) = 0. This strictness can be relaxed, but at theexpense of considerable additional complexity and little added to the model.20That is to say, the only stationary point of the distribution is with all consumers stocked out382.5. A Dynamic Model of Loss Leadershipmonopolist and the local store, they choose the monopolist (since they have a wider se-lection of products; a consideration I do not model explicitly). Two particular states willbe useful to define: x1 ≡ (1, 0, ..., 0), the state in which all consumers have filled up theirinventories completely, and x2 ≡ Mx1, the state immediately following the fully-stockedstate.For the results, some assumptions are necessary beyond the modelling set-up itself:• (A1) Assume that vSs = vFs = vs; both types of individuals individuals have the samereservation price for the storable good.21• (A2) Assume that vs > cs + τλk ; that it is possible for the firm to make a profit fromattracting the distant families. Note, this also implies that both goods are attractivefor the firm to sell.• (A3) Assume that vp = vFp = cp; that the families choose reservation price at marginalcost, which is rational since the local source offers at this price.22• (A4) Assume that mS > k(1− λ)mF ; the singles buy more than the local families.Under assumptions (A1)-(A4), the model can be analysed as follows.2.5.1 AnalysisThe first step to analysing this model is to note that from the firm’s point of view, they havethree different markets. The local families with mass (1 − λ)mF , the distant families withmass λmF , and the singles with mass mS . Suppose a distant family has an inventory of It;then, they would like to buy up to kt ≡ k − It units of the good. They will buy both itemsat the firm when:vp + ktvs + τ ≤ Pp + ktPsIf this condition is not met, they will forgo buying the storable good, and instead buyjust the perishable good locally at a cost of cp. The firm will never set a price Pp > cp,21This is primarily to simplify the analysis; a difference in willingness to pay leads to essentially the sameconclusions. Note that the distant families effectively must have a lower willingness to pay, since they also facea shopping cost τ .22This assumption only for notational and expositional convenience. If we relax assumption (A3), the keychange is that assumptions about the profitability of the storable good, become statements about overallprofitability; all the intuition and results are the same.392.5. A Dynamic Model of Loss Leadershipsince no one will buy perishables from them at this price; this also implies that there is nosituation in which the distant families will buy only the storable good; if they’re buying thestorable, they will buy the perishable as well, since it must be at least as inexpensive as theoutside option.Now, to begin the analysis, consider a situation in which the firm is only interested inattracting the local families and the singles. It is optimal for the firm to set Pp = cp andPs = vs, since this means that both singles and local families buy from the firm and obtainno surplus; call these “regular prices.” At the regular price, both consumers are priced atexactly their marginal cost. Define k∗(x) to be the average of x, weighted by the inventory;this is the average inventory demanded by families in state x; the total demand will bemfk∗. This yields a total profit of (under (A3)):piR(xLF , xDF ) = mS(vs − cs) + (1− λ)mFk∗(xLF )(vs − cs)Now, notice that the firm cannot charge the same prices and attract the distant families,since τ > 0. This means there is a trade-off between which consumers the firm attracts;in order to attract the distant families, the firm must offer a total discount (relative tothe regular prices) of ∆ = τ . This is discount is precisely why this is a model of “sales”rather than general price-setting. In order in order to sell to the entire group DF , the firmmust set either (a)Pp = cp − ∆ or (b) Ps = vs − ∆. That is, they can discount either theprice of the perishable product, or the price of the storable product; the linearity in utilitymeans they would never use a mixture of discounts, and similarly would never increasethe price of one good, since it only makes attracting this group more difficult. Notice thatthe discount on the storable is higher than necessary for most consumers; a consumer withlevel k∗ will be attracted when Ps = vs − ∆k∗ ; the firm could sell to just some of them bysetting a lower discount. The firm is trying to decide which of these possible strategies touse; the following Lemma shows that under assumptions (A1)-(A4), is it always better touse (a).Lemma 2.5.1. Suppose (A1)-(A4) hold. Then, the firm prefers to discount the perishableproduct, rather than any discount on the storable good.Proof. From the monopolist’s point of view, the “best” possible discount would be to setPs = vs− ∆k , since this is the smallest possible amount they could discount the bundle andstill attract consumers. Notice, similarly, that the best possible situation would be to attract402.5. A Dynamic Model of Loss LeadershipDF families all buying k; the profit from selling to this (maximal purchasing) group at thisdiscount is the highest possible for any discount level. It is important to note that this is anupper bound on the profit from a strategy of type (b); this cannot actually occur, becausenot all consumers would be buying k goods and so would want a higher discount. In anystate other than the one outlined, the profit from the lowest discount would be lower. Youalso cannot do better selling to any smaller group than this, since they must obtain thesame total discount but buy strictly less of the storable good. In this situation, the profitfrom strategy (a) is:pia = λmF (vp−∆−cp+k(vs−cs))+(1−λ)mF (vp−∆−cp+k∗(xLF )(vs−cs))+mS(vs−cs)Similarly, the upper bound on the profit from (b) is:pib = λmF (vp−cp+k(vs−cs)−∆)+(1−λ)mF (vp−cp+k∗(xLF )(vs−cs)−k∗(xL)k∆)+mS(vs−cs−∆k)Substituting in (A3), and comparing the terms we see that:pia > pib ⇐⇒ −(1− λ)mF∆ + (1− λ)mF k∗(xLF )k∆ +mS∆k> 0⇐⇒ mSk> (1− λ)mF (1− k∗(xLF )k)Since the smallest k∗(xLF ) can be is zero, under assumption (A4), the result follows.The key intuition is that the two strategies essentially target different groups. If youdiscount the perishable good, you attract the distant families but lose profit on the existinglocal families who are buying the perishable good in any case. If you discount the sundry,you lose profit from both the local families and the singles. The best possible situationis when the local families aren’t buying the storable in that period anyway, but the profitfrom the singles is still foregone. Consequentially, if the singles (relative to the best-casepurchasing from the distant families) are more numerous than the local families, the trade-off falls squarely on the side of the perishable good. This is the requirement that (A4) statesexplicitly.412.5. A Dynamic Model of Loss LeadershipWith this established, the firm’s problem becomes more straightforward; we have essen-tially eliminated one method to attract the distant families. This also has a very importantconsequence: for both groups of families, if they shop at the firm’s store, they always com-pletely refill their inventory regardless of their inventory position. This is important, so Iwill state it below:Corollary 2.5.1. If a family shops at the firm, they will always refill their inventory ofstorable goods.Proof. Notice that given Lemma 2.5.1, there are only two possible prices regimes for themonopolist over time, and in both situations the price of the storable good is always Ps =vs. In other words, given the shopping decision, the consumer always earns zero surplusfrom the storable good. Therefore, it is strictly better for them to buy a maximal quantityof the storable good whenever they purchase it, since they will never see a better price inthe future and may run out.This fact dramatically reduces the complication of the firm’s problem, since in a givenperiod the family demand for sundries is exactly their outstanding inventory position. Thatis to say, if the average stock position is k∗ then families will buy exactly k∗ in aggregate,scaled by their mass. There is no inter-temporal dynamics for consumers, given the pricesequence; they simply refill their inventories when it is profitable to do so. This similarlyimplies that the local families will only even be in state x0 or x1; they have either justrefilled their inventory, or will be refilling it this period; they never accumulate losses ofproducts. In effect, they act exactly like the singles. Notice, additionally, that since familiesare infinitesimal in size, there is no sense in which they would like to buy “less” than a fullinventory in order to try and induce the firm to change its pricing pattern; their individualdemand is simply too small to matter.Turning attention back to the distant families, if the firm wants to attract them, it musthold a sale on the perishable good. The question is, would the firm want to attract themin the first place? For the firm, if they focus on just the local market, they obtain profitpiR(xLF , xDF ), as defined above. However, if they choose to hold a sale, Lemma 2.5.1implies that they earn a profit of (under (A3)) of :pi∆(xLF , xDF ) = mS(vs− cs) + (1−λ)mFk∗(xLF )(vs− cs) +λmFk∗(xDF )(vs− cs)−∆mF422.5. A Dynamic Model of Loss LeadershipComparing these two terms, we see thatpiR(xLF , xDF ) < pi∆(xLF , xDF ) ⇐⇒ ∆ < λk∗(xDF )(vs − cs)Notice, that since ∆ = τ that when k∗(xDF ) = k we have that τ < λk(vs − cs) ⇐⇒vs > cs+τλk which is assumption (A2). In other words, assumption A2 ensures that there issome state x∗DF such that the firm would prefer to hold a sale in this period. Similarly, forall states x′ = Myx∗DF for y > 1 the firm would also want to hold a sale, since the evolutionof inventories follows a left-to-right Markov chain (condition (M1)). Additionally, we cannote that starting from any inventory position eventually such a state must be attainedbecause condition (M2) implies that:limt→∞ k(xt) = kThat is, since eventually all consumers will run out of inventory, it is eventually optimalfor the firm to hold a sale. Finally, we can complete the characterization of the behaviourof the model in a temporal setting by noting that after a sale the families all move to statex0. This implies that the model is stationary, in the sense that if it optimal to have a sale Tperiods after x0, then it is optimal to have a sale T periods after T as well.The preceding analysis simplifies the problem and helps illustrate the characteristics ofthe problem facing the firm. However, the firm’s problem is not static; they face an inter-temporal decision and understand that when they hold a sale, the model will move backto state x0. Essentially, they face a trade-off; by holding a sale now, they refill inventoriesand obtain the profit from the distant families. However, by delaying a sale, inventoriesrun down more and the marginal profit (relative to the regular prices) increases. The firmsolves such a problem in a dynamic setting; however, the preceding work allows us toformulate this as a straightforward dynamic programming problem on the part of the firm,which results in the following theorem.Theorem 2.5.1. There exists a unique pricing path in which firms place the perishablegood on sale.Proof. Denote Xt = (xDF,txLF,t) and recall that xLF,t ∈ {x0, x1}. Letting the sale de-cision be at ∈ {0, 1}, we can write this problem now as a well-defined value functionV (Xt) = maxat∈{0,1} at(piR(Xt) + δV (Xt+1(0))) + (1− at)(pi∆(Xt) + δV (Xt+1(1))), wherethe transition function is given by:432.5. A Dynamic Model of Loss LeadershipXt+1(a) =(MxDF,t, x1) if a = 0 and xLF,t = x0(MxDF,t, x0) if a = 0 and xLF,t = x1(x0, x0) if a = 1 and xLF,t = x1(x0, x1) if a = 0 and xLF,t = x0The dynamics on xLF are included for completeness, but do not affect the decisionof the firm. We can immediately note that this is a bounded function since piR and pi∆are bounded and by the note above, the time before a sale must be finite. Next, we cannote that this mapping T (V (Xt)) ≡ max{piR(Xt) + δV (Xt+1(0)), pi∆(Xt) + δV (Xt+1(1))meets Blackwell’s Sufficient Conditions: first, it is monotone since if W ≥ V for all xthen T (W ) = max{piR(Xt) + δW (Xt+1(0)), pi∆(Xt) + δW (Xt+1(1)) ≥ max{piR(Xt) +δV (Xt+1(0)), pi∆(Xt) + δV (Xt+1(1)) = T (V ). It is discounted since for any A ∈ R,T (V +A) = max{piR(Xt) + δV (Xt+1(0)) + δA, pi∆(Xt) + δV (Xt+1(1) + δA) ≤ T (V ) + δA.Therefore, it is a contraction mapping and has a fixed point; that is, there is a uniqueoptimal value function which satisfies the above. The existence of a policy sequence {at}which admits sales follows from the existence of such a value function and the fact thatsales must occur, as established above.Beyond existence, I am also able to use the facts we established earlier about the prop-erties of the equilibrium to make the following observations.Corollary 2.5.2. (Properties of Equilibrium) The unique equilibrium of the model has thefollowing properties: (1) sales will occur (in the sense of a temporary price reduction).(2) Sales will occur periodically and indefinitely. (3) When sales occur, they occur on theperishable good (not the storable). (4) Consumers who buy the perishable good on salealso buy more of the storable good than those who buy the perishable good at the regularprice. (5) There are exactly two prices charged in equilibrium for the sale (perishable)good, low and high.Proof. The first property follows immediately from the fact that at is not identically zero,since eventually sales must occur. The second property is more subtle, but holds becauseonce a sale occurs, the key state variable in the model xDF = x0. This implies that afterevery period in which a sale happens, the state of the economy is the same, and as thecomment above mentions, this model is stationary.. This implies that sales will occur pe-riodically. Property 3 follows from Lemma 2.5.1, as explained earlier. Property 4 follows442.5. A Dynamic Model of Loss Leadershipfrom the fact that the distant families buy k∗ units of the sundry good during sale periods,while the local families buy at most k∗(x1) , which can be (functionally) imagined as a nor-malization. If it is worth it to wait a little while to hold sales, as the empirical results show,the demand will be higher. Property 5 follows from the fact that the price of perishablegood is either cP or cP −∆, a distribution of prices characterized by a (high) regular priceand a (low) sale price.Most of these properties are what we would expect from the evidence presented inSection 4; they capture many of the stylized facts about perishable pricing. However, thekey idea of loss leadership is not easy to test at the retail level, since it is a statement aboutconsumer behaviour, not firm behaviour. In particular, property 4 is not testable using theretail data but makes a clear prediction: consumers who buy the perishable good on saleshould also buy more of other goods, in particular of profitable storable products. I look atthis suggestion, along with other considerations, in Section DiscussionThis model explains many features of the sale pricing of perishable goods, with a sharpcharacterization of both the type of good placed on sale and the manner in which it issold. Some of the precision of these conclusions is the result of the stark nature of someaspects of the model; these assumptions were made in order to focus on the intuitionand dynamics, rather to present a “realistic” model of store decision-making in a multi-product environment. In effect, the goal of the model is to present one “piece” of thestore’s complete pricing puzzle, in which they face many more problems but also havemany more tools and dimensions with which to solve them. In the model, I focus on thefamily/single distinction, but this is just a framing mechanism for the intuition; in reality,grocery stores, using consumer loyalty cards and purchasing information, are likely ableto come up with far more subtle connections between consumer demands. In fact, whileit is useful to give the single type of consumer a frame to guide to intuition, in the modelthey play the role of “all other sundry buying consumers” that the grocery store is seekingto sell to. This also helps motivate the assumption that their demand is large, relative tothe local families; I am not explicitly making a statement about two demographic groups,but instead saying that the group which the firm seeks to distinguish using loss leadershipis small, relative to all other consumers. This is a way of attempting to disentangle the452.5. A Dynamic Model of Loss Leadershipfirm’s problem and consider just one class of goods, instead of considering the problem asa whole.This discussion also highlights the important role of the assumption that perishablepurchasing perfectly separates the two consumer types. If singles bought some of the per-ishable product, the relative sizes of the two groups mi would matter more directly, sincethe firms trade-off between in pricing the products becomes more entangled. Firms wouldstill need to jointly price each bundle, but now they would face the problem where theycannot charge completely independent prices for the different bundles. This is related tothe discussion over how firms select certain products to be loss-leaders; in this model,and that of DeGraba (2006), a product is selected for loss leadership on the basis of anassociation with another, profitable product. My model, with its explicit treatment of shop-ping costs is closest to Lal and Matutes (1994) where loss-leaders are selected based onconsumer willingness to pay, but trades off dynamic pricing and inventory for explicit mod-elling of expectations and advertising. This is similar to the motivation in Chen and Rey(2012), where firms use consumer bundles to attract shoppers to their store. However, itdiffers with explanations like Johnson (2014), which have to do with the salience of theproduct being offered. Nonetheless, these two models make similar predictions; if we be-lieve that perishable stock is well-understood by consumers23, this should be selected inthe model of Johnson (2014) as well as my model. Overall, this makes it difficult to ruleout a behavioural component; the two motivations are complementary, from a firm’s pointof view. A more subtle point has to be why we expect the sundry/perishable relationshipto be focused on by firms; after all, there are many possible relationships which could beexploited, using different types of consumer demands. The behavioural complementarityis one possibility; an intuitive alternative is that sundry goods are highly profitable, so ifa firm was focused on trying to increase volume in its profitable segments, they would fo-cus on sundries which would (if the model holds) lead them to discount perishables. Thiswould be in contrast, or alongside, sales on other goods with desirable affiliations; a modelwhich is not developed in this paper.The assumption about the inventory process of the singles being very stark (no storage,unit demand) is also not essential; as I mentioned, this group of consumers is really arepresentation of “all other” demand for the sundry good not related to the perishableproduct. In this case, unit demand could be easily related. Storage, however, would create23A good test is as follows: how much chicken do you have in the fridge at home? How many weeks ofpaper towels? Which of these questions was easier to answer?462.5. A Dynamic Model of Loss Leadershipa dynamic feature to both the families and singles which would change the condition forwithin-period optimality of a sale on perishable to be mtS ≥ mF which may or may nothold. As mentioned in the analysis, the local families act “like” singles in a sense; givingthe singles inventory dynamics would make them even more similar. In this case, sales onsundries could occur in periods in which the firm considers the demand to be relatively low.However, there would (as in the original model) be inter-temporal considerations necessarywhich complicate straightforward analysis. Similarly, the notion that the completely out-of-stock consumers buy no storables when they run out is really just a simplification; we canimagine them buying from the local source some “hold-over” amount at an unattractiveprice to keep them going while they wait to refill their inventory. This variation on themodel does not change the implications, but is omitted since it adds complication at littleexplanatory benefit.The Markovian nature of the consumer inventory is also a restriction, primarily forsimplicity. Families will always consume some inventory, but the amount they consume isonly dependent on their existing stock, not their prior consumption. In other words, atthe micro-foundation level the consumers are being hit with independent demand shocksat the end of each period. In aggregate, these effectively average out since there arevery many consumers. However, a more robust process is also feasible at the expense ofconsiderable additional complication on the part of the model. I omit this to focus onthe connection between products, rather than trying to perfectly model how consumersdeplete their inventory of a storable good.A final general comment is about the reservation price assumption used; this is a limita-tion in the set of equilibria we are considering. A more complicated model would explicitlymodel a “waiting cost” to being out of stock, then have consumers inter-temporally sortbased on their expectations and understanding of the state. However, within the model,the restrictions our assumptions place are minimal. For instance, the assumption thatvFp = cp is rational; consumers in equilibrium face either a price Pp = cp or cp −∆ both ofwhich meet the reservation price restriction; the reservation price is consistent with con-sumer expectations. The other reservation price, vs is also rational since only a single price(the expected price) will be charged in equilibrium. Notice that it is also rational for con-sumers to fully refill their inventory when they are out of stock and see a sale; since salesare always the same and always repeat, there is no benefit to delaying since the optimallylowest price has already been achieved. These features demonstrate that our equilibrium,while of a particular sort, includes both forward looking and fully rational consumers in a472.5. A Dynamic Model of Loss Leadershipframework which I believe is a good approximation of how consumers might make thesekinds of choices in the real world.Competition, Duopoly, and MonopolyIn this model, the firm serves effectively as a monopoly; they are the only one settingprices. However, the existence of the outside option makes this not a pure monopolyenvironment; consumers have the opportunity to buy the perishable good elsewhere, ifnecessary. If the outside option did not exist, there would be no trade off on the part of thefamilies; they have to go to the store every period no matter what. This would allow themonopolist to always charge regular prices, and the model would not capture the dynamicswe are interested in. Alternatively, we would have to model forgoing the perishable, whichgreatly complicates the model by adding more (undesirable) inter-temporal dynamics onthe part of the families. It is not necessary for the outside option’s price point to be cp;this is primarily for notational and expositional reasons, as explained in the model set-up. However, it does has the (appealing) feature of capturing the fact that perishables arerelatively low margins with many substitutes, while storables are not.In order to analyse the role of competition more directly, I also develop in AppendixA a general model of loss leadership which nests both the duopoly and competitive al-ternatives. The set up is similar, in which we have two groups of consumers: bachelorsand families. These two types of consumers have different lifestyles: bachelors live alone,in small apartments, while families live together in larger houses. This manifests in tworelated ways: first of all, families have more storage capacity for the sundries than bach-elors, and secondly, the families consume more perishables per capita from the grocerystore, since the bachelors find it difficult to cook and prepare for one (and so eat out more,instead). These basic differences between the consumers will be ultimately result in sales,since inventory dynamics on the part of the families lead them to occasionally purchasemore than the bachelors. The fact that they also purchase meat gives the stores a wayto price discriminate between the two consumers, offering a more attractive total bundleprice to the families when they are buying large amounts of sundries. High purchases ofsundries occur in a cyclic fashion alongside low prices of the perishable good, resulting ina cyclic pattern for sales.The key difference between the two models is that in the duopoly setting, the con-sumers are more limited in their decision making and sophistication, which results in a482.5. A Dynamic Model of Loss Leadershipcloser connection between the inventory process and the sale pattern. In our model, this ismoderated through the reservation price, inventory dynamics, and inter-temporal sorting.The conclusions from both models are largely similar, with sales driven by the connectionbetween consumer inventories and tastes. However, the existing model makes sharper pre-dictions and admits a better representation of the consumer decision makers, which makesit my focus in this paper. On the other hand, the conditions required of the inventory pro-cess (specifically, its periodicity) are more clearly spelled out in the duopoly setting, as arethe implications of competition. As in the pure monopoly case discussed above, competi-tion eliminates sales, but this time as firms compete away their profits; it is clear that inall three settings, the role of profitability of the different consumer groups to the firm isvery important. It is also important to note that the conclusion of this model seem largelyrobust to variations in the set-up of the model; similar conclusions can be reached througha variety of modelling approaches, which illustrates that the underlying economic intuitionis robust.Quantity Discounts and Other Competitive StrategiesAs mentioned earlier, the strategy of discounting perishable products to attract the con-sumers who buy more of other (profitable) products is just one tool available to firms. Anatural alternative is the use the of quantity discounts. Note that in the model the single in-dividuals always purchase unit demands of the storable good, while the families purchaseup to their k units. It would be sensible that an alternative way for the firms to competewould be in terms of a quantity discount for the package. This is certainly possible, andcould be an alternative pricing strategy, since it would allow them to effectively charge twoprices to the two different sets of consumers. However, this relies closely on the assump-tion of unitary demand on the part of the single individuals. As explained earlier, this is nota fundamental part of the model; it is just a frame of reference, to provide a comparisonwith the families.In the discussion of Lemma 2.5.1, it was clear that different consumers with differentdemands would need different marginal discounts. Offering a range of quantity discountswould be a way to implement this, resulting in perfect discrimination between consumersand no incentive to put sales on. However, if packages don’t fully cover the range ofdemands, this would not be an effective strategy; for instance, perhaps the wholesale pro-ducers of the goods do not make all package sizes, or perhaps there are costs to offering492.6. Going Behind Sales: Consumer Choice and Sales Pricingmany products. The heart of the trade-off facing the firm is the cost of lowering the priceof the sundry versus the price of the perishable; package size is one way of making thistrade-off more granular, and therefore enact a more profitable strategy. The firm seeks, fun-damentally, to discriminate between different types of consumers; different characteristicsof demand, including both bundles composition and package size can do this.This highlights the important role that viewing sales on perishables as part of a pricingstrategy on the part of the firm plays. Not all pricing or competitive features are capturedby this model, and the existence of other kinds of strategies are not evidence against themodel. Instead, they show the complex manner in which stores can price products, and theoverlapping considerations different strategies must take into account. In particular, thereis a close relationship between sales and quantity discounts in this model; this is madeeven more explicit in Section A.3 which outlines a sense in which quantity discounts canbe weakly dominated by sales.2.6 Going Behind Sales: Consumer Choice and Sales PricingMost of the predictions of our theoretical model occur at the retail level, and capture thestylized facts I presented in Section 2.4. However, the prediction that perishable-buyingconsumers also purchase more storable goods in sale periods when compared to non-saleperiods is not testable using retail level. In order to test this prediction (and rule out otherexplanations), in this section I go beyond the retail level data by extending my data setusing a household panel dataset, the Nielsen-Kilts HOMESCAN or HMS survey.2.6.1 The Nielsen-Kilts HMS SurveyIn addition to the retail-level data used in section 2.4 to both impute when sales haveoccurred, then to determine the different factors which are related to when they occur,the Nielsen-Kilts dataset also collects a companion dataset knowns as the HMS. This is alarge panel of consumer scanner data; households, selected by Nielsen on the basis of theirfactors, household demographics, and other factors are tracked over several years as theybuy products from different stores. The intent of the dataset is to be representative of theUnited States population as a whole; thus they employ a particular sampling methodologyto select and retain households in the survey. Each household is given a household scanner,with which they record the amount and price of products they purchased at retail shopping502.6. Going Behind Sales: Consumer Choice and Sales Pricing“trips.” For each trip, the products and their prices are recorded, as well as informationabout which store the goods were purchased from. This information allows a link to bemade between the consumer level and the retail level, since Nielsen uses the same UPCand same store code at both levels.The scale of this dataset is very large, although much smaller than the retail datasetused earlier. The dataset is also divided into two sub-groups, referred to as “magnet” and“non-magnet.” The magnet households also record spending on loose (non-UPC) codeditems. The non-magnet households do not record such data. Unfortunately, the samplingmethodology used by Nielsen means that the two groups are not comparable; analysismust be done with either the magnet or non-magnet households. I restrict my sample tojust the non-magnet households for three reasons: first, the data is more reliable, sincemagnet data needs to be entered manually. Second, the magnet households are muchsmaller in number than then non-magnet group. Third, the number of matches I can makebetween my retail data and the households is larger for the non-magnet subset. I showsome preliminary statistics for this dataset in Table 2.5. Each year contains around 51million different products across 5.5 million trips, with around 60,000 households eachspending approximately $75 per trip. However, there is substantial variation in the spend;some households spending nothing on a given trip, while others spending thousands ofdollars.In order to match my retail-level data, I use the years 2010-2013 of the consumer paneldataset. In order to cross-reference with my sales data from the preceding step, I matchthe UPC-store combinations from the retail level with the UPC-store combinations by week.One problem arises in that the consumer panel is recorded daily (or, at least whenever aconsumer makes a trip) while the retail data is aggregated to the weekly level. In orderto match them up, I re-assign the consumer panel observations to the week in which theywould have been recorded at the retail level. Effectively, this means that they are recordedas being the Saturday of the week in which they were originally recorded.Since the goal is to examine the bundles individuals buy when they are, or are not, onsale, I collapse the dataset down to the trip-level. For trips which do not include any of theUPC codes in the linked retail-level dataset, I omit these from consideration. The resultingdataset is described in Table 2.6. The consolidated dataset contains 28,474 unique trips,of which about 23% include an item purchased on sale. Typical spending is about $69.68per trip; this is lower than the complete panel outlined in Table 2.5 primarily because thissubset restricts attention to only grocery stores, while the complete panel includes mass-512.6. Going Behind Sales: Consumer Choice and Sales Pricingmerchandisers (like Walmart) or pharmacies which have higher cost items in general. Ialso break out the non-linked spending by sale or non-sale; in general, it is slightly lowerfor sales at $62.75. The average trip is made by a household of 2 adults and 1 or 2 children,who live in a single family residence, and make between $50,000-$60,000 a year. Theyare generally married or cohabiting, 50-54 years old, with high-school or college educa-tion and work full-time or are retired and are white, with non-Hispanic origins. In whatfollows, these individuals are weighted using the proportional sampling weights providedby Nielsen.2.6.2 Empirical Model and ResultsTo begin looking behind the data, I first try to see what the characteristics of individualbundles are once they are chosen by consumers. Specifically, I denoted the linked items,from which I can tell whether or not they are on sale, as targeted items (as in, can betargeted for being on sale). I then calculate the amount spent by consumers on productsother than those targeted, and compare how this reacts. My general specification will beto use a regression of the form:Yij = Sij ×Hijθ +Bjβ1 + Cijβ2 + ijwhere i is the trip and j is the household. The dependent variable, Yij is the spendingon other goods, Sij is an indicator for whether the targeted good is on sale, Bj and Cij arecontrols which deal with the different household characteristics and trip characteristics (forinstance, how large their bundle is, or their income level). The term Hij is an interactionto capture different effects across consumers and goods; in different specifications it hasdifferent definitions.For example, in the first specification, I regress this total all-other sales variable on anindicator for whether the targeted product was on sale, and a full set of time, UPC, anddemographic controls. I also include the average spending by household, to try an correctfor any unobserved heterogeneity in spending patterns. Finally, because not all productsare likely to react similarly to sales, I include interactions between the sale indicatorsand the UPC, creating a “UPC-specific” sales variable. These results are reported in Table2.7; because the interaction variables are very numerous, I break these out in the tableseparately. Of the 285 iteration terms which are not naturally coded, 90 are omitted dueto insufficient variation. Of the remaining 195, at the 10% level approximately 38 are522.6. Going Behind Sales: Consumer Choice and Sales Pricing2010 2011 2012 2013Products purchased 51.67m 54.13m 50.94m 50.93mTrips 5.52m 5.76m 5.39m 5.30mHouseholds 60,423 61,824 60,315 60,916Mean spending per trip $73.65 $76.20 $77.56 $79.40Table 2.5: Statistics for the HMS SurveyVARIABLE Obs Mean Std. Dev. Min MaxSpending on Linked Product 28,474 6.57 4.99 0.59 232.29Spending on Sale 28,474 1.39 3.23 0.00 107.73Total Spending 28,474 69.68 62.74 1.38 1086.99Average non-sale spending 28,474 63.63 51.28 0.00 679.44Average sale spending 28,474 62.75 51.72 0.00 690.44Spending on Non-Linked Items 28,474 63.16 61.99 0.00 1080.12On Sale Indicator 28,474 0.231 0.421 0 1Categorical: Higher implies more for numerical variablesIncome 28,474 21.38 5.73 3 27Household Size 28,474 2.41 1.17 1 9Residence Type 28,474 1.71 1.65 1 7Composition 28,474 2.22 2.07 1 8Age 28,474 7.75 2.51 1 9Female Head Age 28,474 6.61 2.53 0 9Male Head Age 28,474 5.75 3.31 0 9Male Head Employment 28,474 3.74 3.23 0 9Female Head Employment 28,474 4.81 3.44 0 9Male Education 28,474 3.40 2.03 0 6Female Education 28,474 4.03 1.53 0 6Male Occupation 28,474 5.69 4.79 0 12Female Occupation 26,953 6.48 4.93 0 12Martial Status 28,474 1.65 1.08 1 4Race 28,474 1.27 0.69 1 4Hispanic Origin 28,474 1.96 0.20 1 2TV Items 28,310 2.22 0.84 1 3Internet 28,474 1.09 0.28 1 2Table 2.6: Summary Statistics for Consolidated Trip-Level Data532.6. Going Behind Sales: Consumer Choice and Sales Pricingsignificant while the remainder cannot be distinguished from zero. I show these, alongwith the negative of the level of the sale variable in the lower panel; the bounds indicatean 90% confidence interval. Wald tests on the majority of these coefficients indicate thatthe sum of these two terms is also different from zero at the 10% level.542.6.GoingBehindSales:ConsumerChoiceandSalesPricingDependent variable: A/O SalesVariable Coeff SEAverage Spending 0.939*** 0.011Sale Indicator -12.699 9.376UPC Controls XTime Controls XDemographic Controls XUPC X Sale InteractionsTotal Number 285Omitted 90Signifigant (p = 0.10) 38Insignifigant 157Table 2.7: Regression of A/O Spending552.6. Going Behind Sales: Consumer Choice and Sales PricingThis indicates that for some products, the purchases of all-other goods increase whenthe project goes on sale, although the effect is mixed and weak. This makes me suspectthat the characteristics of the consumer’s bundle matter; this is explicitly the case in thetheoretical model. So, in order to control for the bundle of products being purchased, Idivide goods into ten categories, corresponding with the retail departments in a typicalgrocery store. These categories are Health and Beauty, Non-food Grocery24, Alcohol, Gen-eral Merchandise25, Dry Grocery26 items, Dairy, Deli, Frozen foods, Packaged meat (thetarget category), and Fresh produce. The first four are referred to as Sundries; when thisis extended this to include Dry Grocery, I call the group Sundries+. In Table 2.8, I reportthe regression of the full set of controls on a sale indicator and the number of categories;this is the baseline result. In specification (2), I add in the average spend variable. Spec-ification (3) removes this variables, as well as the UPC level controls. Specification (4)and (5) repeat (1) and (2) focusing just on Sundries. The final specification repeats thebaseline for the Sundries+ variable instead. I focus on sundries because my model makesthe prediction that these kinds of inventoriable goods should be closely related. Moreover,sundries are known to be one of the more profitable and high volume segments of the retailgrocery market. As in my model, and based on DeGraba (2006) and Pesendorfer (2002)these desirable (profitable) segments are likely to be bundled by sale-buying consumer.24Paper towels, toilet paper, plastic wrapping, etc.25Glue, pots and pans, camping supplies, etc.26Cookies, crackers, canned soups, condiments, etc.562.6.GoingBehindSales:ConsumerChoiceandSalesPricing(1) (2) (3) (4) (5) (6)VARIABLES A/O spending A/O spending A/O spending Sundries Sundries Sundries+Average A/O spending 0.72*** 0.08***(0.01) (0.01)Sale indicator 5.29* 6.51*** 3.83† 3.42*** 3.56*** 3.50*(2.73) (2.13) (2.64) (0.83) (0.80) (2.03)Num of Departments 22.30*** 11.78*** 22.47*** 4.93*** 3.74*** 13.84***(0.31) (0.30) (0.31) (0.11) (0.11) (0.23)Sale × num departments -2.01*** -1.74*** -1.90*** -0.83*** -0.80*** -1.42***(0.62) (0.48) (0.61) (0.19) (0.18) (0.46)Observations 26,862 26,862 26,862 26,862 26,862 26,862R-squared 0.54 0.74 0.51 0.34 0.37 0.47Demographic Controls X X X X X XUPC FE X X X X XTime FE X X X X X XRobust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.10, † p<0.20Sampling weights used, outliers excludedTable 2.8: Regressions on number of departments and sales572.6. Going Behind Sales: Consumer Choice and Sales PricingFirst, we can see immediately from the results that sales increase the amount spent onother goods; between $3.8 and $6.5. This indicates that consumers are not substitutingaway from goods in favour of the sale product, which would be undesirable from thestore’s point of view. However, we can also see that this effect is not the same for allbundles of goods. Consumers purchasing in more departments (having “larger baskets”)mechanically spend more, but consumers buying a larger basket have a diminishing effecton sales. This is plotted in the first panel of Figure 2.6; we can see the break-even forall-other goods is somewhere between two and three departments, but it is difficult torule out no effect for higher values as well. This means that consumers who buy largerbaskets of products spend incrementally less on all-other goods when there is a sale on,eventually becoming negligible or zero. We see similar, but stronger, results when we lookat sundries specifically. Despite sundries on average making up on average 12.3% of all-other spending, we can see that the coefficient on incremental sale spending is 66% aslarge as the equivalent result for total spending. In fact, it is statistically indistinguishableat most significance levels. This indicates that the incremental spending we see is beingdriven primarily by the high-margin sundries departments. We can also see from columns(5) and (6) that the coefficient is much more stable, indicating that the main effect iscaptured by the sundries departments, and not unobserved factors or dry goods. The sameU-shaped trade-off between number of categories and the impact of sales is also evident forsundries. This is depicted in the second panel of Figure 2.6; the crossing point, and whichsales no longer add spending occurs somewhere between 4 and 6 departments, indicatinga much more robust effect; only consumer buying the most diverse bundles of products arelikely to not spend more when buying a sale product. This is potentially because they werealready planning to purchase a robust number of items when arriving at the store, and thesale does not affect them for this reason. One other possibility is that multiple goods couldbe on sale at the same time; although I do not have sale variables for the other products,I am able to compare average prices. Looking at the spending-weighted average prices ofsundry goods across the consumer who buy the perishable one sale versus those that donot, we find that the price is $5.29 at regular prices versus $5.06 on sale, a difference of lessthan a quarter, and statistically not significant. This implies that the spending variations forsundries is not driven by own-price sales, since the implied elasticity would be implausiblylarge.582.6. Going Behind Sales: Consumer Choice and Sales PricingFigure 2.6: Plot of crossing points in Table 2.8592.6.GoingBehindSales:ConsumerChoiceandSalesPricing(1) (2) (3) (4) (5) (6)VARIABLES Total spending Total spending Total spending Sundries Sundries SundriesNum of Departments 22.27*** 11.64*** 22.40*** 5.29*** 4.09*** 5.35***(0.32) (0.30) (0.31) (0.12) (0.12) (0.13)Sale × num departments -2.04*** -1.77*** -1.92*** -0.90*** -0.88*** -0.89***(0.62) (0.49) (0.62) (0.21) (0.20) (0.22)Sale indicator 4.46† 5.70*** 3.17 3.87*** 4.04*** 3.29***(2.77) (2.16) (2.66) (0.98) (0.94) (0.97)Single item shopper 6.59*** 6.48*** 6.44***(0.42) (0.40) (0.39)Sale × SI shopper -1.19* -1.29** -1.42**(0.68) (0.64) (0.67)Average A/O spending 0.72*** 0.08***(0.01) (0.01)Observations 26,862 26,862 26,862 26,862 26,862 26,862R-squared 0.54 0.74 0.51 0.34 0.38 0.31Demographic Controls X X X X X XUPC FE X X X XTime FE X X X X X XRobust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.10, † p<0.20Sampling weights used, outliers excludedTable 2.9: Robustness Results I602.6. Going Behind Sales: Consumer Choice and Sales PricingIn order to look more closely at the income and substitution effects of sales, I alsoperform the same regressions as in Table 2.8, but this time for all spending. These arereported in columns (1)-(3) of Table 2.9. In a standard, two period consumer choice model,we would expect a sale to change the quantity demanded of both products; however, theaggregate spending would remain the same. We can see this is not the case; for consumerswith with smaller basket sizes they actually spend more in total across all goods. Thisis eventually reversed, but only for who buy relatively more categories of goods (4+).In either case, there is clearly something more going on than a simple substitution andincome effect. One issue might be these results might be highly leveraged by consumersbuying a very small number of products; particularly, the targeted or sale products. Totest this, I create an indicator variable Single Item Shopper which represents whetheror not a consumer spends 50% or more of their spending on the targeted product. Theseconsumers go into the store and primarily purchase just the targeted product. These resultsare presented in columns (4)-(6) of Table 2.9, which replicated the Sundries results fromthe main specification, but with this variable (and interactions) added. Interestingly, wecan see that single item shoppers spend substantially more on sundries when they shop.This is likely because we also control for the number of departments being shopped in, andthus it is likely they would buy only one or two other groups, which (apparently) happento be sundries. The main results are not changed by this specification; the magnitudeand direction of the sale coefficient is the same, as is the interaction with the number ofdepartments shopped in. One factor, however, is that single item shoppers tend to buyless of everything else when buying on sale; this makes sense, since these are likely to bethe most frugal of the consumers. Interestingly, the magnitude of this effect is quite small,indicating it is not a major consideration; for instance, the sum of the coefficients on thisinteraction and the other related variables is positive, indicating they are still a desirablegroup since they spend more even during a sale period.As an additional test of the robustness of this result, I also look at the substitutabilityof the target products. One issue which complicates the analysis is the ability of certainmeat products to be frozen, then consumed later. Frozen meat products take more timeto prepare than fresh, and also suffer from quality deterioration. In particular, even withideal freezing and thawing procedures to minimize quality loss, meat will lose moistureduring the thawing process. The protein structure of the meat is also affected, as is thecolour, pH balance, tenderness and microbial balance Leygonie et al. (2012). These areexacerbated by long periods in the freezer or non-ideal freezing or thawing methods. Most612.6. Going Behind Sales: Consumer Choice and Sales Pricing(1) (2)VARIABLES Frozen goods Sundries Frozen goods SundriesAverage A/O spending 0.09*** 0.08***(0.00) (0.00)Sale indicator 1.57*** 4.09*** 1.58*** 4.10***(0.47) (0.60) (0.44) (0.59)Num of Departments 2.97*** 5.05*** 1.53*** 3.81***(0.04) (0.05) (0.05) (0.06)Sale × num departments -0.38*** -0.98*** -0.31*** -0.92***(0.09) (0.11) (0.08) (0.11)Observations 28,381 28,381 28,381 28,381R-squared 0.27 0.33 0.36 0.36Demographic Controls X X X XUPC FE X X X XTime FE X X X XStandard errors in parentheses*** p<0.01, ** p<0.05, * p<0.10, † p<0.20Analytical weights used, outliers excluded, education and occupation controls excludedTable 2.10: Frozen goods622.6. Going Behind Sales: Consumer Choice and Sales Pricingchefs would agree: frozen meat is generally inferior to fresh. Nonetheless, the fact thatindividuals are still able to store fresh meat with a quality decline would result in a sortof inventory-like behaviour, which would explain the patterns found in the data. Theconsumer level data allows us to test this too, albeit only indirectly. Specifically, since inorder to preserve meat27 it must be frozen there is a close substitutability between the fresh(to be frozen) meat and extant frozen meat products and other substitutes. Thus wouldimply that frozen products should have a much greater degree of substitutability than otherproducts with respect to frozen products. Specifically, we would expect the sales of frozenproducts to be much smaller than those of the sundries. I test this using a general linearframework, repeating the benchmark regression with two dependent variables: sundries,and spending on frozen products. These results are reported in Table 2.10; in specification(1), I use the baseline model, while in (2) I include average A/O spending. The resultsindicate that we have similar effects for frozen foods as we do for sundries, albeit at adifferent scale. In particular, we note that the coefficients move in the same direction,which indicates that substitution between frozen products and meat products on sale isnot a major consideration.These results together paint a very interesting picture of how grocery stores and con-sumers are behaving with respect to sales. If we believe supermarkets are behaving ra-tionally, the results indicate that they use sales in order to target profitable (or at least,differentially profitable) sub-segments of the consumer market. These correspond to theconsumers who are purchasing different sized bundles. About 50% of the population liesin a region where a sale induces them to buy more overall; in a general sense, this agreeswith the idea of loss leadership. Grocery stores place goods at a lower price, so that con-sumers will come into the store and “buy more” of other goods with a higher profit margin.This story, while appealing at first, requires careful consideration in light of the results wesee. Specifically, there appears to be a trade-off between inducing some consumers to buymore while other consumers essentially get a “free lunch” and wind up spending less. Ac-cordingly, in order to explain this, I develop a model of loss leadership, which is elaboratedin section A.3.27For most households; we omit the possibility that consumers could be smoking or curing grocery storeproduct in any substantial amount632.7. Conclusion2.7 ConclusionThis chapter makes three main contributions. First, it shows explicitly that regular, periodicsales are an important part of the story surrounding the pricing of perishable products.This is important because it means that when we talk about sales, or model them, ourexplanations need to either take into account the special nature of these kinds of productsor acknowledge that a "one size fits all" approach to explaining sales is probably not goingto work. This also requires us to carefully consider what kinds of products our modelsapply to, and why.Second, I illustrate how to do this by developing a model which explains the periodicnature of perishable sales. As the discussion in Section 5 shows, I explain how this fitsinto the set of possible models and frameworks. I further show that this model is plausiblegiven then data by showing that the central causal connection necessary is supported byconsumer choice data. This demonstrates the third contribution of this paper: showinghow the connection of retail and consumer choice data can help evaluate different types oftheoretical models. This necessitates the development of more flexible tools to determinewhen sales occur, and focuses attention on the economic content of these kinds of heuris-tic decisions empirical researchers make. Simple heuristics may provide straightforwardanswers, but they often neglect or overlook potentially important aspects of the data.This work also leaves room for further development in several areas. First, as de-veloped in Chapter 4 the classification method can be extended to an explicit structuralmodel. Preliminary empirical work on this shows similar results to that reported here, butthe structural relationships are potentially more detailed and reflect an alternative analyti-cal framework which would be interesting to compare with the existing literature. Second,the method used in this chapter can be readily applied to other products, particularly sun-dries. This would allows the correlation of sales on different products to be explicitlyanalysed, resulting in a more holistic picture of how stores use sales as a pricing strategy.Finally, the model in question leaves substantial room for refinement or sophistication. Inparticular, profitability and the role of chain-brand competition is not addressed in the ex-isting framework but is known to be an important feature of store decision making. Thesepoints illustrate that, despite several decades of attention to the questions and problemssales pose, new techniques and data continue to offer opportunities to learn and improveour understanding of this important area.64Chapter 3Large Contributions andCrowdfunding Success3.1 OverviewIn recent years, small businesses and startup companies have struggled to raisecapital. The traditional methods of raising capital have become increasinglyout of reach for many startups and small businesses. [...] Low-dollar invest-ments from ordinary Americans may help fill the void, providing a new avenueof funding to the small businesses that are the engine of job creation. [...] Thepromise of crowdfunding is that investments in small amounts, made throughtransparent online forums, can allow the “wisdom of the crowd” to providefunding for small, innovative companies. It allows ordinary Americans to getin on the ground floor of the next big idea. It is American entrepreneurism atits best, which is why it has the support of the President and many in the busi-ness community. (Senator Jeff Merkley, Congressional Record, 112th Congress,December 8th, 2011)Crowdfunding is a new form of financing by which small, unsophisticated individuals orbusinesses with an idea for a project or event can “crowd-source” the financing for theirproject and bring it to market. It has been a dramatic success, funding hundreds of thou-sands of projects and raising billions of dollars (Cantillon (2014)). The combination ofentrepreneurship, consumer interest, and social media which drives crowdfunding hasdrawn the attention of the media, business owners, and investment experts. While thereception has been mixed (Agrawal et al. (2013)) among experts, there is little doubt thatcrowdfunding is emerging as a legitimate and commonplace method of financing. As theWall Street Journal put it:Crowdfunding has the potential to revolutionize the financing of small business,653.1. Overviewtransforming millions of users of social media such as Facebook into overnightventure capitalists, and giving life to valuable business ideas that might other-wise go unfunded. (Gubler (2013))It has drawn specific attention from the United States government, who have provided itwith legal grounding and protection, clarifying its role as an alternative to venture capitalor traditional (bank-based) financing. Nonetheless, we are still just beginning understandthe fundamental forces behind crowdfunding. There remain many important questionswhich we are only starting to answer: why do some projects succeed and others fail? Howdo you design a project for success? What kinds of things make one project a good fit forcrowdfunding, and another a bad fit?In the rhetoric surrounding crowdfunding, this method of funding takes on somethingof a Gulliver versus the Lilliputians flavour: many small individuals, motivated by theirsupport for an idea or project, combine their efforts to help it succeed despite it being aninsurmountable obstacle to each person individually. This is compounded by the sense thatcrowdfunding is an alternative to the “big guys:” banks, developers, or studios which areunwilling to support the project. This gives crowdfunding a very populist, individualisticcharacter: the elite, monied interests are out of touch with what the common person caresabout, so the little guys will pool their effort and do it anyways.However, how true is this description of crowdfunding? Are many small contributorsreally the driving force behind project success? In this chapter, I look at a specific aspect ofthis question: how important is it to attract large contributors to a project? Does this deter-mine success in a meaningful way? What features of projects lead to large contributions?More importantly, can I say anything causal about the connection between project successand large contributions? In order to do this, I turn to a novel panel dataset on crowd-funding projects spanning three years (2012-2014). This dataset comes from the largestcrowdfunding platform (Kickstarter), and allows me to look behind the cross-sectional suc-cess or failure of a project to examine the different sizes of contributions.I find that large contributions are driven by a unique economic process, with an un-derstandable motivation. Large contributors appear to care about helping projects suc-ceed, providing more when projects are in need, and when their contributions are likelyto be impactful. This is especially apparent on the “day of success”: the pivotal day onwhich projects succeed in which large contributions frequently push projects over theirgoal threshold, and into success. This behaviour is rationalizable using a consumer choice663.1. Overviewmodel of crowdfunding, in which individuals care about the benefit of a project, but alsounderstand their impact on the success of a project. Large contributions can be regardedas groups of individuals affiliated with the project, or fans of the projects: both groupswill typically will want to donate when a projects needs a “final push” to get it over thegoal, conditional on the goal, their willingness to pay, and the state of the project. Thisagrees with a strand of analysis in the literature on how crowdfunding often serves as away of formalizing previously informal funding networks among family members or friends(Agrawal et al. (2013)). Which characterization is at play also has important consequencesfor policy makers and regulators.I find that large contributions appear to be important for success: using a linear prob-ability model on the data cross-sectionally, I find that projects which can attract at leastone large contribution are 30% more likely to succeed, after controlling for a variety oftime-effects and covariates. This is robust to a number of specifications, including discretechoice models. However, there are several sources of endogenity, not the least of which isthe fact that the causality discussed above can be reversed; the success event is endoge-nous to large contributions. In order to address this, and assess the causal nature of therelationship, I use an instrumental variables approach. My instrument exploits the factthat the underlying data is not cross-sectional, but an unbalanced panel: I choose a setof secular and religious holidays which are associated with giving, purchasing or incomeshocks (i.e. Christmas, Black Friday, etc.). By exploiting the fact that project start timesand lengths (called “tenures”) vary, and controlling for time fixed effects, I can use thenumber of elapsed holidays to instrument for the presence of large contributions. The keyidentification assumption is that the tenure decision is not sufficiently sophisticated so tocapture these dates at a level more granular than the month; given the fact that crowd-funding is specifically designed and used by unsophisticated individuals, I believe this isreasonable. I try to verify this assumption in several ways, showing that holidays are notproportionally more likely to be included in a project’s tenure. This IV approach showsthat the underlying results are in fact an underestimate: the IV results are 10-30% largerthan the baseline multiple regression framework used. However, because of local averagetreatment effects, based on the instrument used, it is difficult to say with certainty thatthis is typical of all projects, and not specifically those affected by the instrument. Somerobustness checks demonstrate that this might be the case, but certain explanations cannotbe ruled out.I also dispel a mechanical interpretation of the results: one might expect that since673.1. Overviewlarge contributions are larger than average, they might be helpful simply due to theirsize alone. In order to examine this, I perform a counterfactual in which we pretend alarge contribution occur as simply more individual small contributions. I find that onlyabout 1/3 to 1/5 of the effect of a large contribution is driven by size alone; the majorityof the impact is caused by other factors, such as timing or learning by agents. In otherwords, I find that large contributions do not simply provide more money, but they providemore money when it is most instrumental for a project’s success in a number of ways. Inaddition, I also perform several naive counterfactuals, demonstrating that the IV estimateslikely reflect this “instrumentality” of large contributions better than the ordinary leastsquares estimation.This chapter builds on a new, but growing, literature on the determinants of crowdfund-ing success. The problem of predicting what features of projects determine success, andconsequently how best to structure a project if you are a project-owner has already beenthe subject of some attention (see Xu et al. (2014); Zvilichovsky et al. (2013); Belleflammeet al. (2013); Lambert and Schwienbacher (2010); Mollick and Kuppuswamy (2014); Mol-lick (2014) for a few examples). However, because of the nature of most of this literature(cross-sectional data on projects) the role of large contributions has been difficult to assess.The use of a panel dataset in this study allows me to consider this aspect of the project, andadditionally to try to correct for endogeneity using an instrumental variables approach. Ifind that the standard crowdfunding narrative does not paint a complete picture: whilethe “little guys” might be important, people with relatively more money than the othersare also quite important.These findings are influential in two ways: first, for policy-makers concerned withthe administration and legislative structure of crowdfunding, it is important that contri-bution size is carefully considered and managed. For example, in the debate leading upthe the amendments of the 2012 JOBS Act concerning equity crowdfunding, there wassubstantial discussion regarding contribution limits and total caps; this research indicatesthat these concerns were appropriate, and should probably be considered for other (non-equity) forms of crowdfunding. Second, this paper informs future modeling of crowdfund-ing. There is a small, but growing, literature which seeks to model the individual choicedecisions involved in crowdfunding (see Marwell (2016); Chang (2016) for examples);however, most ignore the “intensive” margin of contribution size, instead focusing on thenumber of contributors and assuming the amount they give is fixed. This research indicatesthat this is a serious omission and conclusions drawn from such structural models do not683.2. Backgroundreflect a substantial fraction of the actual process governing crowdfunding research.The remainder of this chapter is structured as follows. In Section 3.2, I give some back-ground on crowdfunding and large contributions, specifically in the context of this paperand with an eye to establish how this new form of financing “archetypically” works. In Sec-tion 3.3, I explain the data source and show some stylized facts about large contributionsand crowdfunding; these motivate the analysis explained above. In Section 3.4, I provide amodel to make predictions and explain the results, and then describe modelling approachby presenting a simple consumer choice model of crowdfunding. I then, in Sections 3.5and 3.6, outline my empirical specifications and present my results; section 3.7 discussesthese results, and presents some robustness checks. Finally, Section 3.8 concludes, whileAppendix B provides additional detail.3.2 Background3.2.1 What is Crowdfunding?Crowdfunding is a new form of financing for projects and businesses, which reached a wideaudience beginning in 2010-2012. Since then, it has grown into a billion dollar industry,and become a major source of funding for individuals and small-to-medium sized busi-nesses. This is particularly the case for those in creative or artistic industries like music,comic books, art, or video game design. The fundamental characteristic of crowdfundingis that of the “crowd”: through the use of an internet-based platform, many individualsare solicited for small contributions towards a project. Generally, these are paired witha reward which is also usually conditional on certain fund-raising thresholds beings met.Consider the following example, which was the motivation behind the largest crowdfund-ing platform, Kickstarter:Imagine you are a promoter trying to put on a concert. You have an agreementwith the musicians, and you have arranged with the venue for a location: theonly issues that remains are the financing and the audience. Who pays for thedown-payment for all these individuals? If you put up the money yourself, andthe audience is small, you will lose money. Worse, you may simply not have thefunding to go forward in the first place, even if you are are sure the audiencewill materialize. The concert does not go ahead, in spite of the fact that ifeveryone could have cooperated it would have been a success.693.2. BackgroundThis was the situation the founders of Kickstarter saw themselves in: valuable projectsthey wanted to pursue were failing to materialize because of a combination of demanduncertainty, the speculative nature of the project, and simple credit constraints. Theirinsight was to try to reach out to the potential consumers to provide funding for the project:I thought: “What if people could go to a site and pledge to buy tickets for ashow? And if enough money was pledged they would be charged and the showwould happen. If not, it wouldn’t.”(Kickstarter (2014))This kind of arrangement, in which project-owners tap the unrealized demand or un-derlying support for an idea or project to secure start-up funding, is the basic innovationwhich has driven the success of crowdfunding. This feature also differentiates it from themost common alternatives open to individuals seeking funding:• Unlike traditional debt funding (from a bank or large investor) crowdfunding raisesmoney through the prospective delivery of a product in the future, not a promise torepay a loan or give up collateral. While in most cases the product is a concrete good(like a consumer durable or consumable), crowdfunding is also used to fund projectswhich (at best) produce only an artistic or public good, which banks are typicallyunwilling to support.• Unlike venture capital funding, crowdfunding raises money from a large numberof small, non-expert investors interested only in the final project being created. Incontrast, venture capitalists provide most or all of the funding for a new start-up,as well as expert advice, in exchange for an equity or debt position in the business.They generally have little actual interest in the product being created beyond itsmarketability. In most jurisdictions, crowdfunding backers cannot legally be given afinancial stake in the company.• Unlike a traditional pre-sale, the delivery of any reward is not guaranteed, insteadit is explicitly contingent on meeting a pre-arranged funding threshold. While somepre-orders function in a similar way (e.g. John Deer tractors), the explicit and centralnature of the contingent funding threshold is not present.With this said, there are several ways in which the details of crowdfunding can berelaxed. The most prominent examples have to do with the reward and the contingent na-ture of the goal: crowdfunding traditionally deals with contributions not donations, in the703.2. Backgroundsense that there is a conditional reward offered. However, there are several large platformssometimes referred to as “crowdfunding” which deal exclusively in straightforward dona-tions, such as GoFundMe. In these sites, the goal is not contingent, and there is no rewardpromised. Individuals simply use these websites to donate money to causes they feel areworthy of support. This is not crowdfunding as this paper and most of the literature definesit. It lacks any new features which make it stand out from simply internet-based charitabledonations, which have existed since the early 1990’s. If anything, the crowdfunding labelis likely to confuse visitors to these websites, since the goals are non-binding, and are just“wish lists” for how much a cause seeks to raise.28.However, this is complicated by the fact that some websites include both charitableand crowdfunding options for a single project. For example, the website IndieGoGo offers“flexible” and “fixed” funding options for projects. Fixed funding is traditional crowdfund-ing, as described above, with a contingent goal and reward. However, flexible fundingis non-contingent, much like GoFundMe. The pledges made are simply donations, andthe reward is open to re-negotiation if the overall goal is not met. The lines are furtherblurred by the fact that many “crowdfunding” projects, especially highly artistic ones, mayhave very abstract rewards. Is a conceptual art piece, with no actual event, carried outvia flexible funding actually different than simply a charitable cause? The difference is notimmediately obvious. However, even in this situation, the way these projects are presentedto consumers is different. Consider Figure 3.1, which displays several projects featuredon the front page of IndieGoGo and GoFundMe during April 2017. A qualitative study ofthe way these websites, and projects, present themselves demonstrates clear differences.Even flexibly funded projects emphasize the entrepreneurial nature of the undertaking, fo-cusing on “helping a project to succeed” or “achieving our dream” if any charitable aspectis mentioned. On the other hand, explicitly charitable projects focus directly on the needbeing addressed by the donations: funding for treating a disease, covering unexpected ex-penses, or delivering aid to people in need. While the line may be blurred from a technicalpoint of view, there are real and tangible differences between crowdfunding and charitabledonations which make them different fields of interest.28To put this another way, if these websites are crowdfunding, then most churches have been crowdfundingto fix their leaky roofs for hundreds of years, and the term has lost any useful meaning713.2.BackgroundFigure 3.1: Comparison of Featured Projects on GoFundMe (top panel) and IndieGoGo (bottom panel) as of April 24,2017723.2. BackgroundContemporary crowdfunding is also part of a continuum of similar fundraising mecha-nisms. For example, agricultural farm shares often share a similar arrangement. In these,farmers sell shares in the upcoming harvest to individual consumers, conditional on reach-ing sufficient funding to plant the fields. When harvest time comes, the consumers aregiven the produce grown in the fields. This helps farmers with the start-up costs, such aspurchasing seeds and fertilizer, and preparing the fields during the winter and spring. Inthe charitable donation literature, the study of so-called “kickers,” or fundraising boostsat given thresholds has been studied extensively. A closer example from Canadian history,featured prominently in Leacock (2016), is a “whirlwind campaign.” In this type of cam-paign, individuals pledge money to a cause, conditional on reaching certain fundraisinggoals. Although this is purely charitable, the conditional nature of the pledges is centralto the mechanism (and Leacock’s story). When looked at with the right perspective, manyhistorical and traditional forms of fundraising have a crowdfunding-like aspect to them.Crowdfunding, as carried out by Kickstarter and discussed in the literature, seems to havebeen a relatively minor innovation. Why has it suddenly become so successful?The answer lies in two intersecting forces: (1) the rise of social media and the interac-tive world wide web (Web 2.0) and (2) the technical infrastructure to guarantee and pro-cess transactions easily. Both of these innovations occur through the creation of crowdfund-ing platforms like Kickstarter or IndieGoGo. Large numbers of disaggregated consumersis central to the way crowdfunding operates: if there is no crowd, there can be no crowd-funding. Crowdfunding platforms provide this by allowing for a single, central repositoryfor projects. Rather than having an individual site for each project, interested consumercan peruse thousands of projects at once. This also compels project owners to standardizewhat they offer, how they explain their project, and how they market it to consumers. Thismakes comparison “shopping” or project discovery much easier for consumers. This wouldbe impossible without the interactive nature of contemporary websites. Interactive, user-generated content is fundamental to crowdfunding. Platforms do not manage or createproject pages. They merely provide the technical framework for project-owners to do thework themselves, much like YouTube, Blogger, or other user-content sites. The social natureof the modern web also facilitates this. Rather than relying on the platforms themselves topromote their project, or relying on random discovery, project-owners can promote theirideas and products using social networks. The often tightly-knit communities surround-ing niche products can be immediately leveraged through forums like Reddit or Twitter,sharing projects consumers might be interested in. Friends can quickly share and promote733.2. Backgroundinteresting projects through sites like Facebook, linking directly to the crowdfunding plat-form. All of these features allow project owners to amplify the reach of their message in away that was much more difficult, or impossible, to do before these technologies becamewide-spread.Most crowdfunding sites, especially Kickstarter, weave social media into every part ofthe crowdfunding process. Both backers and project owners create and maintain personalprofiles (much like on Facebook or Google Plus), complete with photographs and personalvignettes. The number of projects supported, or created, is prominently displayed anddiscussion between individuals is encouraged through both comment features and privatemessaging. More traditional forms of social media are closely integrated, allowing indi-viduals to promote projects they are interested seamlessly through Twitter, Facebook, andmany other social media platforms. The modern crowdfunding site seeks to incubate anenvironment in which there is an easy back-and-forth between investor and investee, be-tween the business network and social network. This provides both a critical communalaspect to crowdfunding but also furthers the “marketplace” notion which drives the pro-cess: when every supporter can reach out to their network with a single click to promotea project, the number of people exposed becomes larger and larger. This “viral” type ofexposure is often the driving force behind very large projects which end up attracting anaudience of backers far larger than a typical project could hope to be exposed to.The second force, the existence of robust and sophisticated transaction processing, mat-ters because it allows crowdfunding platforms to exist. Platforms are beneficial from aproject-owner point of view, because they provide a single location to host projects. Thisleads to increased visibility and discovery. They also are designed to be highly accessibleand easy-to-use: no web development skills are necessary, making hosting a project as easyas writing a news article or making a post on Facebook. However, platforms also serve asa guarantor in several ways. Most importantly, they process payments and guarantee thecontigent nature of the crowdfunding contract. When an individual pledges money to aproject, it is the platform which ensures they are charged if and only if the project reachesits goal. This effectively makes them a single digital storefront for a large number of indi-viduals who, alone, would have difficulty being trusted with consumer’s money or creditinformation. Platforms also provide security by enforcing truthfulness, setting standardsfor disclosure, and being the first line of recourse if consumers feel misled by a project. Forexample, Kickstarter mandates that projects disclose potential risks to consumers, and pre-vents them from using misleading images, such as photorealistic render of a non-existent743.2. Backgroundfinal product. Kickstarter also restricts items which do not meet community standards,could be dangerous, or could be illegal to own or produce, such as weapons, drugs or drugparaphernalia, or live animals. Projects which violate these rules are subject to shutdownby the platform, forfeiting any funds raised.This allows platforms to provide security and guarantees to individuals taking part incrowdfunding. Individual projects may have problems or challenges, but there is a clearminimal standard being met and enforced. This lets consumers focus on evaluating theproject on offer, rather than worrying about whether their payment system is secure or ifthere’s some fine print which could lead to them being scammed or otherwise misled. Thisrole is clearly front of mind for platform operators, as many of the rules have evolved overtime in response to controversies surrounding certain projects. For instance, Kickstarter’srule disallowing photorealistic renders of products was likely a response to doubts sur-rounding the Ouya, an Android-based video game console, and the Pebble, a smartwatch,in which digital renders were easy to mistake for prototypes. Platforms make their moneyby taking a small percentage (usually in the range of 5%) of the total amount raised. Itis in their best interests to build and maintain a reputation for honesty by setting rulesand enforcing them. This makes the platform more appealing to consumers, which in turnmakes it more appealing for project-owners.These innovations, coupled with a unique fundraising structure, have been central tothe success of crowdfunding. While still evolving, it is clear that crowdfunding plays animportant role in the financing of many types of projects, particularly when traditionalforms of financing may be difficult to come by. It is also clear that while crowdfunding ispart of a rich and lengthy history of similar kinds of financing, it is also a markedly contem-porary enterprise, with much of its success due to the modern internet and social medialandscape. This innately digital nature is also useful to researchers, since it also means thatdata is much easier to collect and study. The centralized nature of crowdfunding platformsalso makes the underlying structure much more consistent, and helps study the topic morebroadly than taking a single project-by-project perspective.3.2.2 Consumer Crowdfunding on KickstarterGiven the wide, and often flexible, forms crowdfunding can take, it is useful to explicitlysummarize crowdfunding as it is studied in this paper. I call this “consumer crowdfunding”since it deals primarily with the financing of consumer goods. It is also constitutes an753.2. Backgroundarchetypical description of crowdfunding in general, since it describes crowdfunding asit was originally carried out on the first major crowdfunding platform Kickstarter, circa2010.29 As discussed in Section 3.2.1, this is not a complete description of all crowdfundingprojects or platforms, but it is accurate for the data used in this project, and serves as aguide to the generic structure of other platforms.As discussed earlier, consumer crowdfunding is associated with two main features (1)a contingent goal and (2) a specific, tangible deliverable. To illustrate the process in de-tail, consider the following example, motivated by Kickstarter’s founding story: you (theproject-owner) are a promoter who wants to put on an exciting new concert. However, thefinancing is expensive, and funds are limited; worse, if you put up your own funding andpeople don’t attend, you stand to lose a great deal of money. Accordingly, you decide touse crowdfunding; to do this, you:1. Put together a pitch which explains who you are (a successful concert promoter),what you’re going to do (put on a concert), and how it’s going to work (we’ve ar-ranged for this venue, these bands, and these dates).2. Set a funding goal and deadline ($6,000 by July 31st) which is the amount yourcrowdfunding campaign seeks to raise by the time chosen.3. Describe the rewards supporters will receive if they support the project and it succeeds(a ticket to the concert).4. Associate each reward with a price which must be pledged in order to receive thereward (if you pledge $20 you receive a ticket to the concert).5. Post the pitch, along with the other information on a crowdfunding platform (Kick-starter).After your pitch is posted on the website and becomes visible, individuals can view it on thecrowdfunding website and decide whether or not they wish to support your campaign. Anindividual who decides to support the project is called a backer. If enough money is pledgedto reach the goal by the deadline, the project is a success and the money pledged is paid to29The degree of influence Kickstarter has had on crowdfunding as the first major player is difficult to under-state. As an illustrative point, crowdfunding campaigns are sometimes referred to as “Kickstarters” regardlessof where they are being carried out; in a similar manner like Kleenex has so defined tissue paper, where thebrand becomes a byword for object itself.763.2. Backgroundthe project owner by the backers, while the project owner is then obligated to deliver therewards promised. If the goal is not reached by the deadline, the project is a failure and nomoney is exchanged, but also the rewards are not produced; the contract dissolves. Theprogress of a project towards its goal, the number of backers, and comments/updates frombackers or the owners are all publicly visible on the crowdfunding platform for potentialbackers to see.This contract, if successful, forms a legally binding agreement between the project-owner and the backers; the project-owner must make a good-faith effort to deliver what hasbeen promised, and similarly cannot misrepresent their skills or the deliverables promised.The legalities surrounding crowdfunding in this respect were formalized in 2012; provi-sions of the 2012 U.S. JOBS act were specifically enacted to enable forms of crowdfunding,while clarifying that it was not designed for use with investment funds or to supplant therole of traditional venture capital. In the best study to date, Mollick and Kuppuswamy(2014) found that while a large majority of projects eventually delivered on their promisesto backers, there were typically some delays or changes to the final products. Additionally,while a large majority of backers were ultimately satisfied by the outcome of the project,a minority were unhappy and a small minority of projects failed to ever produce a usablegood or refund the backers’ money.3.2.3 Literature ReviewThe literature studying crowdfunding is still in its infancy, and is complicated by the ongo-ing discussion over what, exactly constitutes “crowdfunding” as discussed in Section 3.2.1.Two basic programs have emerged however: in the first, researchers try to study a par-ticular crowdfunding market, looking in detail at what makes projects successful and theconsequences this has for consumers and project-owners. This program also tries to placecrowdfunding, as a funding mechanism, within the family of other alternatives. This high-lights the role it plays as a source of funds, and why it might be attractive to certain project-owners. The second program capitalizes on the different structures different markets have,using them to try to analyse consumer behaviour in crowdfunding markets. With this said,there remain many crowdfunding platforms and styles which remain largely unstudied.Within the first set of studies, understanding the dynamics of backing projects is im-portant. Cross-sectional study of project is highly limiting, because crowdfunding is anexplicitly dynamic process. It is also the case that the dynamics of crowdfunding shed light773.2. Backgrounddirectly on the valuations of individual backers, which can then be connected to the successof the project. The question of to what extent pro-social or altruistic motivations play incrowdfunding is very important to understanding this market. For example, cross-sectionalstudies have looked at what makes a project successful. There is evidence of complex dy-namics at work for backers; for example Kuppuswamy and Bayus (2013) and Belleflammeet al. (2013), motivated by evidence from their data, and qualitative surveys, suggest thatthere is a behavioural social benefit to many projects. Indeed, many theoretical papersexplicitly require an social benefit to crowdfunding in order to get the qualitative predic-tions they make to agree with the data. These types models are the subject of some of thebest empirical papers on the topic (see also Rao et al. (2014) for example). This is oftencouched in a “common value” or “social value” framework; in terms of how crowdfundingplatforms present themselves, and much of the early discussion on the subject, this remainsan influential views of how this form of financing operates.However, another important aspect of crowdfunding concerns the role of information,such as demand uncertainty or asymmetric information. These can arise on both the con-sumer and project-owner side of the market. One prominent example has been the sug-gestion of social learning and herding within a crowdfunding project. As individuals backa project, their action is publicly observed by other individuals. When viewed in this way,crowdfunding becomes a very complicated information setting for backers. The role of thegoal becomes an event in which individuals can condition their decision making. For exam-ple, imagine a project which requires a dozen backers to reach its is either quite valuableor worthless (if funded), and suppose a single potential backer has some information thatindicates the project is worthless. However, suppose they know (also) that their informa-tion is of relatively poor quality, and following them, every other backer interested has verygood information (say, perfect). Should they back the project? Of course! The structureof the crowdfunding contract is such that they will only actually pay the money when thegoal is crossed, which is only the case if the other individuals know that reached project isquite valuable. Thus, they condition the expected value of the project on the event that it isfunded: which in this case is equivalent to conditioning on the knowledge that the projectis of high quality.This threshold-conditioning effect is closest, in the existing literature to papers whichstudy sequential voting: for instance Ali and Kartik (2006); Callander (2007). Indeed, withthe right perspective, we can imagine that crowdfunding is sort of like a vote: a decision toback is a “vote” in favour of the project being of high quality. However, unlike in sequential783.2. Backgroundvoting environments, the decisions matters directly and not just instrumentally throughoutcome. This herding-type (or bandwagon) concerns inform much of the skepticism aboutcrowdfunding in the light of asymmetric information. When coupled with the viral natureof many crowdfunding campaigns, the ability for private information to be swamped by thecrowd appears to be a relevant concern. One aspect that has largely avoided discussion,however, is the conditioning point raised: herding, in the sense of Banerjee (1992) relieson the existence of a history of decisions. In crowdfunding, the structure of the contractleads to conditional information: a prospective herd. In the example given, for instance,the individual herds despite being the first person to make a decision.This can lead to a piling-on effect, which is difficult to study empirically but has beendiscussed in several papers (see Agrawal et al. (2013); Kuppuswamy and Bayus (2013))who find some suggestive evidence. Kuppuswamy and Bayus (2013) has found some ev-idence of social learning and herding in previous crowdfunding data. This is usually ob-served as accelerating co-movements of backing decisions, especially after the thresholdhas been crossed. In general, though, the previous evidence on how backers behave ismixed; this is not their dominant explanation for the empirical patterns in the data, whichrelies on the social model of valuation described above. There are (certainly) time fixedeffects: the arrival rate of individuals at a project changes over the lifespan of the project,which make it difficult to disentangle the different dynamic predictions of a model. Forexample, a U-shaped pattern of supporters could potentially be a function of the simplefact that new (or ending) projects attract more attention but it can also speak to some ofthe social aspects which underlie crowdfunding (as in Hekman and Brussee (2013)).On the other side of the market, Cimon (2017) studies the role uncertainty plays inthe decision to undertake crowdfunding. Theoretically, he shows that crowdfunding isparticularly valuable for individual investors who face demand uncertainty. This finding isin line with what qualitative surveys such as Mollick and Kuppuswamy (2014) have found:many individuals use crowdfunding not just as a fundraising tool, but also to help themassess the potential market for their product. It can also serve as a form of free advertising,using the reach of social media and the excitement generated by a new project to interactdirectly with their consumer base.These more qualitative aspects of crowdfunding are also highly important, and havebeen the attention of a literature based mainly in marketing. For example, Agrawal et al.(2013) studies a wide-ranging set of explanations and motivations, but makes the intrigu-ing suggestion that crowdfunding often plays the role of formalizing previous informal793.2. Backgroundfundraising arrangements between friends and family. While generally crowdfunding plat-forms provide security as middlemen to ensure that the terms of agreements are honoured,and perform a coordination role allowing backers to meet project owners, they also canbe a way of explicitly “contractualizing” previous non-contract based loans or gifts. This ispossible due to the fact that all the coordination is handled by the platform, making crowd-funding projects, in terms of financing methods, extremely simple to set up and accessibleto the non-expert. They are entirely self-driven: there are no onerous capital or collateralrequirements, there is no interview with a bank, there is not even a formal legal prospectusrequired. Only the basic structure of the pitch is specified: the level of detail, expertise,etc. are all left up to the individual. This gives crowdfunding extremely low barriers toentry, making it attractive for this formalizing role.As mentioned in Section 3.2.1, an important part of crowdfunding deals with whathappens after a project reaches its funding threshold. Primarily because data cuts off atthis point, this has been largely unstudied. Mollick and Kuppuswamy (2014) takes the firstattempt at this, by manually reaching out the supporters of a selection of successful projectsto find out what the ultimate results were. His findings largely supported the notion thatmost projects were carried out in good faith, with a large majority delivering their promisedrewards, and backers being largely satisfied with what they received. However, he alsohighlights that many projects are late with their delivery; over-promising on their deliverytime-lines.This ultimate outcome can legally be in a gray area: while crowdfunding is novel, it isnot fundamentally different in terms of its obligations to its consumer or backers. Whena project goes ahead, they are legally obligated to fulfill their promises to backers to thebest of their abilities. So, while a project may go into production and ultimately fail this isnot prima facie a legal issue: the context in which the failure occurs matters. A project inwhich the owners simply abscond with the money, never make a serious effort to producethe goods promised, or blatantly lied about the qualities or qualifications involved in theproject are open to class action lawsuits by backers. On the other hand, a project simplyovertaken by anticipated challenges or hurdles despite a good faith effort by the owners islegally blameless. For the majority of projects, which lie somewhere in-between these twopoles, the protection for the backers and liability of the project owners can be somewhat upin the air. A major part of the 2012 US JOBS Act dealt with clarifying class-action liabilityfor crowdfunding projects, helping to provide a legal framework to address these kinds ofconcerns.803.2. BackgroundThe second set of papers studies build on work such as Agrawal et al. (2013) and studythe choice of platform and funding structure. As mentioned every platform is slightlydifferent, with different rules. Kickstarter, IndieGoGo, and GoFundMe were discussed atlength in Section 3.2.1, and are typical of many platforms. Other platforms (MicroVentures,SeedUp) relax the idea of the project, and provide equity stakes (akin to “micro-venture”capital) in entire companies, but at a level much smaller than traditional venture fundinglevels in jurisdictions which allow this. Finally, others (Patreon) ask for repeated weekly ormonthly contributions to support non-discrete ventures, such as an ongoing comic or videoseries. Each of these different structures has a different implication for how crowdfundingtakes place. This also means that the differences in structure can shed light on consumerbehaviour or on the decisions of project-owners. For example, Chang (2016) theoreticallystudies the implications of different fundraising structures for consumers, project-owners,and overall social welfare. Marwell (2016) exploits the difference between IndieGoGoand Kickstarter to determine how individuals behave and the consequences these kind ofstructures have for firms in an empirical context.However, implicitly or explicitly, the attention in this literature has primarily focused onsmall individuals, mirroring the rhetoric (with Agrawal et al. (2013) the notable exception)of the business side of crowdfunding. That is to say, the focus has been on understandingthe (extensive) decision of whether or not to back, rather than understanding the (inten-sive) choice of how much to pledge. This is usually because most projects have a well-defined reward subjects would want, and additional spending does not commensuratelyprovide better deliverables. However, it is a feature of many digital markets to featuretransaction data that has a similar pattern to that traditionally seen in a casino: most ofthe individuals are small time players, contributing small amounts. However, there are asmall number of very large players, who spend orders of magnitude more money than thetypical player. Adopting the language from casino managers, game developers and analystsrefer to these individuals as “whales” (due to their large size) (Kokkonen et al. (2014)).Many digital environments are designed to encourage these kinds of players to pick up,and continue using their software (Lescop and Lescop (2014); Drachen et al. (2012)).The nature of such large contributors to the total revenue makes capturing and retainingtheir patronage extremely important. Analysts associated with some of the largest namesin the video game industry acknowledge that motivations behind these individuals remainopaque, but appear to be at least in part motivated by rational concerns about achievementand tangible benefits (Sinclair (2014)).813.3. Data and Facts About Large ContributionsI present early evidence that this is also the case for crowdfunding data. When welook at the pattern of contributions, they are generally similar on a per-backer basis dayover day. This is consistent with the fact that, for most projects, there is a well-defineddeliverable with a well-defined price which most backers would desire (e.g. a ticket tothe concert). However, there are also a small number of much larger backers which ap-pear in the data as much higher than normal per-capita contribution levels. These largecontributors constitute a small fraction of the total number of backers, but just as in themicrotransactions literature, they are disproportionately large sources of income. In theremainder of this chapter, I discuss the data on which these conclusions are reached, andpresent some stylized facts supporting this point of view. I also demonstrate that large con-tributions are predictable, rational, and not driven by statistical variation in contributionsize.3.3 Data and Facts About Large ContributionsThe data for this chapter come from a novel sample of projects on the Kickstarter crowd-funding platform over a three year period. Each project was observed at regular intervals(daily, with some exceptions) and a variety of variables were captured, resulting in over3 million observations; this forms a kind of unbalanced panel. In addition, project-levelcovariates were collected including the deadline for the project, the fundraising goal, andthe category, which constituted a (fairly granular) self-selected indicator for an approxi-mate “subject” such as “Christian Music” or “Board Games.” The time-varying variablescollected include the number of backers, the amount raised, and the number of commentsand updates. Comments were publicly visible feedback left for project-owners by back-ers, while updates were notes appended to the pitch by the project owner. They consistof 128,172 projects with an average of 25.86 observations. Within the sample, projectsraised an average of $8,621, with a wide standard deviation: a large number of projectsraised nothing, while the largest projects raised tens of millions of dollars. These projectswere backed by an average of 115 supporters, with the largest projects bringing in overa hundred thousand supporters. The average project had a goal of $34,320, which wasreached an average of 38% of the time; about 1.5% of projects succeeded on their firstday. Projects, on average, were 3230 days in length, with a strong peak at 30 days, but also30A reviewer pointed out the discrepancy between the average of 25.86 observations per panel, and thelength being 32. This is primarily due to two features of the data. First, for projects at the beginning or end823.3. Data and Facts About Large ContributionsVARIABLE MEAN STD DEV MIN MAXTotal raised to date ($1000s) 5.134 63.936 0 13283.28Total backers to date (100s) 0.701 6.946 0 1058.52Total comments to date 12.169 441.679 0 120,498Total updates to date 1.288 2.861 0 147Project funding goal ($10k) 0.343 8.766 0.000 1000Total backers (100s) 1.152 9.713 0 1,058.52Total funding raised ($1000s) 8.621 8.780 0 13,283.28Duration of project 31.981 10.760 1 60# backers today 4.524 67.554 0 37968Amount raised today ($1000s) 0.338 5.418 0 2351.92Avg contribution size 82.672 247.329 0 39,840Is large contribution? 0.008 0.088 0 1Amount of goal left ($10k) 3.315 99.186 -1,323.328 10,000% of goal reached, capped at 1 0.263 0.350 0 1Project successful? 0.377 0.484 0 1Size of donation relative to mean 1 1.108 0 32.453Fraction with large contribution 0.198 0.396 0 1OBSERVATIONS 2,774,993 (Panel: 109,411)Table 3.1: Summary statisticsskewed to the right (with a spike at 60 days, the maximum possible). On a typical day, aproject raised $347 from 4.5 backers. However, these numbers vary greatly as the projectelapses. For a comparison of how this data set fits with other longitudinal studies of crowd-funding, see Appendix B. After cleaning the data for consistency in this project (excludingprojects with incomplete data or data that was collected potentially erroneously31), I amleft with 109,411 projects with a total of 2.7 million observations, whose characteristicsare reported in Table 3.1.This study focuses primarily on so-called “large” contributions; this poses a challenge,because the data is at the project-day level. This means that we cannot observe individualbacker choices directly; either in terms of the decision to back, or the decision about howof the panel, not all dates are captured; a form of either right or left censorsing. Second, due to idiosyncraciesin the day projects are captured some observations correspond to more than one date. For example, the mostcommon collection error is where a date was missing, and two days collected at once. Dropping this data, ordirectly correcting for this does not change the results of this paper.31The overwhelming majority of these exclusions have to do with the data collection mechanism: throughmechanical error, a project-day may be accidentally skipped but recorded, resulting in “phantom” days inwhich the total number of backers declines substantially then rebounds.833.3. Data and Facts About Large Contributionsmuch to contribute. Ideally, this would be observable to the econometrician but we arelimited by the data collection process. So, to infer when large contributions occur, I usethe following method to identify a “large contribution day” (LCD) instead. The method isstraightforward: for each project j on a given day t = 1, 2, ..., T I compute the average per-backer contribution (Cjt) by taking the amount raised on that day (Rjt) and dividing by thenumber of backers (Bjt), so Cjt = Rjt/Bjt. I exclude days in which no one donated, or inwhich contributions were cancelled. I then, by project, compute a project “price” by takingthe average of the day-level per-capita contributions, and also compute their standarddeviation: Cj = 1T∑tCjt, σj =√1T∑t(Cjt − Cj)2. I then denote a large contributionday to be a day in which the per-backer contribution exceeds the price by more than threestandard deviations: LCDjt = I{Cjt > Cj + 3σj}.For an illustration see Figure 3.2; in this example, I would conclude that day 11 wasa large-contribution day for this project. Under the assumption that valuations are inde-pendently drawn from a single (Gaussian) population, this implies that “large” donationslie above the 99.7-percentile of the distribution. For a statistical discussion of this defini-tion, and its statistical properties, see Appendix B. In terms of economics, however, thereare several things to note. Firstly, a LCD has a straightforward interpretation as a day onwhich there was at least one backer who contributed more than three standard deviationsmore than average; there might be more, but by the pigeonhole principle there is at leasta single large contributor. Unfortunately, this is as much as we can say given the dataavailable; in a sense, this is a conservative way to determine large contributions, becausemany small contributions may “wash out” a large contribution making it difficult to detect.Thus, LCD indicate truly large amounts of money being pledged to project; likely a singlelarge contribution laying much further than three standard deviations from the average.Second, it is important to note that this is a project-level definition of what is a “large”contribution; each project, based on its average contribution defines what makes a con-tribution large. This means that for some project, heuristically “large” amounts of moneyper-backer may not qualify as a large contribution. I made this decision primarily becauseprojects do not, in Kickstarter, actually have a well-defined “price.” Instead, they have mayhave several pledge levels and it is difficult a priori to infer which of these is relevant forbackers; accordingly, I remain agnostic here about what is the “correct” price individualsbelieve they need to pay to support a project and instead infer it from the average. Thisdefinition is relatively robust to variations in the number of deviations or the use of themean versus the median to infer “price.”843.3. Data and Facts About Large ContributionsFinally, I focus on the size of contributions relative to other contributions and not thegoal or or absolute size. This is because many projects have very small or very large goals,making it difficult to compare what is large or small in terms of a contribution acrossprojects. For example, the largest project asked for more than $100m32 while the smallestasked only for a single dollar. A metric like “20% of the goal” for example would yield theperplexing conclusion that a shiny quarter given to the latter would be considered large,while $10m given to the former would not. To avoid these kinds of complexities I focus onthe behaviour of individuals relative to one another, and not to the goal, which is driven bya decision on the project-owner’s side. This has the drawback that for many projects whichattract very few backers that qualitatively “large” contributions are not considered largesince the “price” for support is inferred to be large; this is probably reasonable, becausewe could imagine that such a small project would have a very high “premium” for supportfrom it backers. Alternatively, this means we’re eliminating “non-credible” large donations,where large sums are given to projects with no hope of succeeding (since the donations iscontigent on success).In the actual data, approximately 0.78% of the days are characterized by large contri-butions; an indication that the process is not simply a function of variation in the averagecontribution size, since this is much higher than the frequency we would expect under anormal distribution. Furthermore, large contributions do not occur randomly across thelife-cycle of projects. Approximately 71% of large contributions occur before a projectmeets its goal. As Fig 3.3 shows, it is also the case that large contributions occur towardsthe end of a project’s life-cycle. Large contributions tend to linearly increase in likeli-hood up until the project is completed. However, we can also see that large contributionstend to fall off as projects are close to reaching their goal, becoming most frequent whenthe project is further from success. The interaction of these two terms is unclear fromthis figure, but it implies there may be different motivations behind large contributionswhich occur at different times. More interestingly, fully 15% of large contributions occurco-incidentally with the project’s success; within the data, there is a single day on whichprojects cross their threshold. Disproportionately many of these observations occur on thisdate. A straightforward interpretation of this would be that these are “pivotal” donations:this is the date on which the project succeeds because it is a date which attracts a large32This may seem like an absurd amount of money, but it is actually feasible for some projects. To date,two crowdfunding projects (Star Citizen and The DAO) have raised more than this goal, although not throughKickstarter directly.853.3. Data and Facts About Large Contributions1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202468Day$ContributedContributions Average BoundFigure 3.2: Illustration of Inferring Large Contribution-Dayscontribution.To look closer at these motivations, I consider the “excess size” of a large contribution.To get a sense of this, I take the total raised and subtract off the amount that “would havebeen raised” if the donations had been typical for the project. This gives an idea of howmuch a large contribution must have contributed on that day. In terms of a percentage ofthe average contribution, the typical large contribution day raised 6 times more than theaverage day. However, this is strongly non-normal: the 80th-percentile contributed almostten times more, and the 95th-percentile contributed in excess of twenty times. The tailof the distribution is even more extreme. To see how these kinds of contributions are dis-tributed between different projects, we can compare the excess size of these contributionsto the percentage of the threshold remaining at the start of the day. On average, a largecontribution closes about 38% of the gap remaining. However, this is even more strikingwhen we restrict attention to large contributions on the “pivotal” days. As illustrated in Fig3.4, there appears to be good evidence that large contributions are instrumental in closingthe gap, with large numbers closing in excess of 70% of the gap remaining, and manyclosing far more.863.3. Data and Facts About Large ContributionsFigure 3.3: Histogram of Large Contributions versus % Elapsed and % of Goal873.3. Data and Facts About Large ContributionsFigure 3.4: Pivotality of Contributions883.4. A Model of Large ContributionsHowever, we can also see this is not the entire story. When comparing projects whichsucceed with those that fail, we can see that as a percentage of the goal, contributions toultimately successful projects tend to be larger. This implies that backers are concernedwith the effectiveness of their contributions: they are more likely to give large sums toprojects which are on the road to success and where they have a meaningful effect onthe outcome. This provokes the question: do projects which have large contributionsprove more likely to succeed? Indeed, projects which have at least one large contributionhave a 79% probability of succeeding, while those that do not have only a 27% chance ofsucceeding. To see whether or not these large contributions are pivotal in a direct manner,I perform a naive counterfactual: for each project, I re-calculate the total raised, assumingthat the large contributions were of average size instead. In fact, 1 in 3 of the successfulprojects no longer reach their goal following this calculation: that is, they would havefailed if not for the direct role of the large contributions. However, because we cannotdetermine the dynamic effects of these large donation, nor can we exclude endogeneity,it is difficult to assess the meaningfulness or causality of this conclusion. Furthermore, italso poses something of a puzzle: based on this thought experiment, approximately 36% ofthe effectiveness of large contributions is driven by factors other than the size, potentiallyincluding the timing and efficiency of the contributions made.However, these facts together provide evidence that large contributions play an im-portant direct role in how crowdfunding projects are funded. The ability to attract these“whales” to a project can be important indicator of success, both because they donate morethan average and because they provide funding when necessary. While pointing towardsa conclusion, these stylized facts are difficult to analyse in isolation. In order to try andunderstand the motivations which might explain the kinds of behaviour we are seeing, Idevelop a consumer choice model of this kind of decision making, to help motivate the es-timation and provide a framework for the results observed. This provides some predictionsabout how large contributions should behave, which I then test against the data, which ispresented in the subsequent section.3.4 A Model of Large ContributionsI model large contributions in a non-standard consumer choice framework. In order to fo-cus on the intensive margin, I consider the decision about how much to contribute, rather893.4. A Model of Large Contributionsthan whether or not to contribute. This means that utility is the benefit an individual re-ceives, conditional on backing, after contributing an amount b above the minimum to arepresentative project at time t. That is, these individuals are going to back the projectbut they have not decided whether to contribute a small or large amount. They deriveadditional utility from increasing the likelihood of a project succeeding, but prefer to do soat minimum cost. I assume their utility takes the form U(b) = P (b; st)[v − b] where P is afunction relating the state of the project (st) and the amount donated (b) to the probabil-ity of the project succeeding, while and v is the value of the project (if successful) to theconsumer. Note that this captures the contingent nature of crowdfunding contracts, sincethe individual only receives the good (and pays b) when the project succeeds. In otherwords, they decide on contribution size based on their expected utility from a given contri-bution, understanding that more money affects the likelihood of project success. Becauseb enters both terms of the expression, it is simpler to consider the following “unrestricted”utility function U(p, b) = p[v − b] where an individual chooses both b and p. Similar tohow in production theory not all combinations of output are possible, I assume that this isrestricted by the “probability curve” p = P (b; st). This allows us to depict this situation in astandard consumer choice diagram; the utility function corresponds to indifference curveswhere the highest is selected by the consumer subject to the probability curve restraint.However, the shape of this “budget” constraint is governed by the shape of the P functionin the given state, and may result in the feasible set not being convex. For simplicity ofcomparison, I normalize the goal of a project to 100, so b is in terms of the percentage ofthe goal, and I abstract away from explicit financial constraints; they would arise as a limiton the amount of b possible.It is important to note that I model the backing decision as a “snap-shot” of the dynamicprocess of crowdfunding. I assume that a single backer arrives at the project at time t,observes the state of the project st and the associated probability curve, and makes asingle irrevocable decision about how much to contribute. The probability curve capturesthe consumer’s expectations about how their contributions will affect the success of theproject, taking into account expected future dynamics. The shape of this curve and itsbehaviour reflect both the changing state of the project and the consumer’s beliefs abouthow that state relates to future success.Notice first that the unrestricted utility function is a hyperbolic paraboloid (in b, p space)with asymptote at v, the project value. The indifference curves are hyperbolas in terms ofb. This means that as v gets smaller, the indifference curves become steeper and never903.4. A Model of Large Contributionscross the line v = b, which corresponds to the situation in which the individual no longervalues the good at the contribution level. One interpretation of this is as the way in whichthe overall budget constraint enters the model; this is one dimension we are viewing,entering through v. As v increases, the curves become relatively flatter, and take on amodest u-shaped aspect. This form of the utility function U(p, b) is the simplest possible;we could imagine that if individuals incorporate altruistic, warm-glow, or other behaviouralpreferences into their decisions, the shape of the indifference curves can be different.As described, the probability curve captures consumer expectations about the futuresuccess of a project, given their contributions. This is an expectation-based model, butit is reasonable that it should agree with the observed distribution of project success, ifconsumers are rational. As described earlier, this tends to be strongly bimodal: projectstend to raise either all of their funding, or very little of it. This means that the probabilitycurve takes on an S-shape which generally rises and flattens out as a project gets closerto its goal (holding time constant). Projects with different amounts contributed to datehave different effective goals g∗, as well as different amounts of time remaining, backerarrival rates, and other time-invariant covariates having to do with the project. The shapeof the curve is motivated by individual backer expectations about how likely a project is tosucceed given the current state and the goal outstanding. However, notice that this is time-dependent; as projects run out of time, this likelihood declines, resulting in a deformationof the probability curve over time. I model this using a logistic function, in which both thelower asymptote and curvature change as the project evolves over time.We can see these features of the model illustrated in Figure 3.5. In Panel (A), I de-pict a pair of typical consumers with different willingnesses to pay facing a project that isjust beginning (notice that the effective goal g∗ = g; the project is only sure to succeedif it raises g from the individuals). Both consumers have a maximum utility (achieved atPoint A) where they contribute nothing in excess but the project succeeds with certainty;as indicated, utility increases in the direction of this point. The probability curve is indi-cated in bright red, and represents the trade-off for both individuals between contributionsand the probability of project success. Since this project is beginning, I depict this as alogistic curve with a small lower asymptote and a steep curvature to reflect the observeddistribution of projects. The first individual (solid lines) has high willingness to pay forthe project; consequentially, they are optimally decide to contribute a large amount to seethe project succeed (Point B). This point satisfies the standard tangency condition for aninterior solution ∂U(p,b)∂b /∂U(p,b)∂p = −dP (b)db . The second individual (dashed lines) has lower913.4. A Model of Large ContributionsPanel (A)p0b1100∂u>0BACContribution Size (goal = 100)ProbabilityofSuccessFigure 3.5: Consumer Choice Models (Panel A)willingness to pay for the project. The slope of their indifference curve is steeper thanthe average slope of the probability curve, and consequentially they adopt a corner solu-tion, contributing nothing (marginally) to the project. These two individuals correspondto large contributors and small contributors, respectively. The nature of a “large” contribu-tion is such that it forms an interior solution to the consumers (constrained) optimizationproblem33; this fact will be useful later.In Panel (B), we can see how the probability curve changes shape as time elapses.While this is based on the beliefs of the consumers, I make some reasonable inferencesabout how such a curve behaves based on the funding patterns and fund-raising goals.33With the exception where a consumer has infinite willingness to pay, in which case it forms a cornersolution on the opposite boundary to the other consumer.923.4. A Model of Large ContributionsPanel (B)p0b1100g∗t=1t=2t=3t = 4 (terminal)Contribution Size (goal = 100)ProbabilityofSuccessFigure 3.6: Consumer Choice Models (Panel B)933.4. A Model of Large ContributionsPanel (C)p0b1g∗1 = g = 100g∗2g∗3g∗4g∗ declinesContribution Size (goal = 100)ProbabilityofSuccessFigure 3.7: Consumer Choice Models (Panel C)943.4. A Model of Large ContributionsIn this illustration, individuals can only arrive at four time periods, t = 1, 2, 3, T ≡ 4and we hold constant the amount donated to the project, so the outstanding goal g∗ isfixed. I also assume this function is smooth, mainly for simplicity of analysis. As timeelapses, the probability of success for low contributions will fall, since the amount of timeremaining for a project to success is running out. Notice that in general one might expectthat for a given contribution level success should fall monotonically as time elapses, butthis is not necessarily true (pictured by t = 1 versus t = 2, 3) in some situations. Forexample, individuals may have different expectations about how certain types of projectsare inclined to raise funding based on their current fund-raising; imagine a project whichis known to attract attention if it is believed to be “behind” by consumers. Nonetheless,restricting attention to smooth probability curves, we know that the limiting behaviour issuch that success for levels of contribution less than g∗ falls, while success for level at g∗is 100%; this necessitates a “steepening” of the logistic-shaped curve into a step-functionin the limit. That is to say, the slope of the function becomes locally very close to 0 aboutb = 0 and g∗; m(g∗) = m(0) ≈ 0. Furthermore, as time elapses, these flat regions becomecloser together in terms of the b-axis, but remain separate in terms of p. That is to say thatfor small  at t ≈ T , P (g∗ − ; st) ≈ 0 and P (g∗) = 1, while m(g∗ − ; st) ≈ m(g∗) = 0. But,then by smoothness of the function P , it must be the case that the slope become arbitrarilysteep as time runs out: limt→TdP (b)db = ∞. Essentially, as time run out, fewer backersare available to support a project in the future, which both makes success less likely andincreases the importance of a large contribution in terms of helping a project reach success.In the limit if you are the last possible contributor, the project will be successful if and onlyif you contribute the outstanding funds. An opposite situation occurs when we hold theamount of time remaining constant, but vary the goal outstanding, g∗. As illustrated inPanel (C), as the goal outstanding falls, the S-shaped probability set shifts upward and tothe left, generally flattening in a similar fashion to the previous example and g∗ → 0. Theupper and lower asymptotes draw closer together as the project’s success becomes morecertain, and the curve flattens out.3.4.1 Simulation and PredictionsIn order to investigate the model more precisely, and verify our intuition, I perform a nu-merical simulation of the solutions to the model. Explicit theoretical solution is difficultbecause of a partial failure of Berge’s maximum theorem: the “budget constraint” set cre-953.4. A Model of Large Contributionsated by the S-shaped probability curve is non-convex. This means that the set of solutionsis not convex as the budget curve changes and the solution can “jump” between differentpoints. In what follows, I assume that consumers have expected utility represented by ahyperbolic parabolic U(p, b) = p[v − b] where v is the consumer’s value for the project.For simplicity of illustration, I make the change (relative to the theoretical model) that thecontribution is already scaled in terms of % remaining. That is, a contribution of b = 1corresponds to the total goal remaining, not the total goal. In other words, I adjust thex-axis so that it is in terms of g∗ rather than g. This has no implications for the solutions,save that the interpretation of v in the budget constraint is relative to g∗ not g; since g∗ isfixed in each example, this is without loss of generality.To form the probability set, I parametrize my S-shaped curve as the generalized logisticfunctionP (b) = a+1− a1 + exp(−k(b−m))The parameter a governs the lower asymptote; in this case, the likelihood a project withg∗ outstanding would succeed without a large contribution from the donor. m governs thelocation of the inflection point; for illustration, I initially set m = 12 in what follows.Naturally, in specific situations other locations might be more reasonable; I explore thislater in this section. The parameter k governs the slope of the curve; higher values aremore aggressively sloped. In what follows, I choose k = 20 except where noted.The consumer’s problem is to maximize U(p, b) subject to the constraint that p ≤ P (b).Since utility is strictly increasing in p and decreasing in b, this implies that this constraintshould hold with equality at an optimum. First, I illustrate the two key solutions pointedout theoretically developed earlier: small contributions, which arise as a corner solution,and large contributions which arise as interior solutions. This is depicted for v = 0.8 inFigure 3.8. As we can see, when a = 0.2, the initial likelihood of success is 20%, theindividual chooses to give a large contribution to the project, in excess of 60% of theoutstanding goal. However, when a = 0.3, the individual does not choose to give a largecontribution. Essentially, when the individual values the project highly, if the probabilityof success is not high enough, they are willing to give a large amount to make it happen.However, if the project is in “good shape” from their point of view, they do not feel itnecessary to do so. This is the central insight of the model illustrated in this environment.I explore this further in Figure 3.9, which illustrates the way the solution to the model963.4. A Model of Large ContributionsFigure 3.8: Illustration of Numerical Solutions to Model v=0.8Figure 3.9: Evolution of Numerical Solution to Model v=0.8, Varying a values973.4. A Model of Large Contributionsevolves as a changes. We can see there is a sharp non-convexity in the solution set, whichoccurs at a = 0.25. Prior to this value, all of the contributions are large. After this value,all of the contributions are small. The significance of a = 0.25 is that it happens to bethe value of a at which this particular consumer is indifferent between providing a largeand a small contribution to the project. One interpretation of this is as an illustration ofa consumer’s decision as the time remaining falls. Projects with less time remaining fora given outstanding goal have a lower probability of success, which can be modelled as afalling a-value.Figure 3.10 also illustrates one additional consequence of the model: the inflectionpoint matters. In this figure, I depict the evolution of the numerical solution as the inflec-tion point evolves from 30% to 60% of the goal outstanding. As we can see, this leads to achange in the intra-marginal contribution decision. When a large contribution is relativelyeffective at creating success individuals are likely to give less. In terms of the model, whenthe inflection point is relatively low, individuals feel the need to give less. As the inflectionpoint moves away, the individuals are likely to give more, up to a point. Eventually, asthe inflection point moves far enough away from the origin, individual decide that a largecontribution is no longer useful; it is simply too costly to support the provide at this point,since they are not effective enough at helping it succeed. This also has a natural analogywith the model: when projects are running out of time, the inflection point move awayfrom the origin, as depicted in Figure 3.6. As the simulation show, conditional on a largecontribution occurring, the contribution size is likely to be large. This is also impactedby the value of k, the curvature, which likely also increases along the same dimensions.However, without a movement in the inflection point this is not a major determinant ofcontribution size.Based on the numerical investigations of the model, and the theoretical set up, wecan formulate some predictions the model will make about when projects should see largecontributions. These can be based both on the simulations and on extrapolations based onthe theoretical model:• Prediction 1: Large contributions should be more common as time remaining drops,conditional on goal outstanding. Theoretically, this prediction is supported by notingthat a large contribution occurs as an interior solution to the consumer optimizationproblem described above. In particular, it requires that tangency exists between theirindifference curves and the slope of the probability curve. In this model, this is more983.4. A Model of Large ContributionsFigure 3.10: Evolution of Numerical Solution to Model v=0.8, Varying m valueslikely as projects run out of time, since the slope of the budget line evolves from beingvery flat to very steep. The main limitation on this is that the region of steepness mustoccur before v, the maximum willingness to pay; otherwise the point is not feasiblefor the consumer to purchase. This can be interpreted as the notion that individualswith high value for the project are more likely to contribute large sums of moneytowards the end of the project, either because they have high income (and thereforelow marginal utility of wealth) or they value the good substantially more than otherconsumers. We can see direct support for this in our simulated results, in which timerunning out can be viewed as a fall in the value of a. This prediction is illustrated inFigure 3.9.• Prediction 2: If backer arrivals change over time, large contributions are more likelyin period in which there is a relatively higher mass of arrivals, all else equal. In par-ticular, if project attract more attention early in their lifespan, large contributions aremore likely early. This prediction is a straightforward extrapolation from Prediction1; if more individuals occur, conditional on the model set up, more large contribu-tions should occur relative to a lower arrival period.993.5. Why Do Individuals Provide Large Contributions?• Prediction 3: Large contributions close to a project’s deadline should be more likelyto be pivotal than those earlier in the project’s lifespan. This follows from the samereasoning as Prediction 1, noting that not only are they more likely to occur, but alsothe point at which the contribution is placed is closer to g∗ the outstanding goal formost WTPs (slopes). In the numerical simulations, this follows from the investigationof the change in m. Pivotality, in the numerical model, is described by large values ofb. The numerical simulations predict that shorter time frames imply larger m, whichmean that conditional on wanting to provide a large contribution, the size should belarge. This is illustrated in Figure 3.10.• Prediction 4: Large contributions should be much less likely after success. Theo-retically, this follows from the fact that large contributions are not necessary oncea project has succeeded; the optimum point for all agents can be achieved at zeroadditional contribution to the project. The numerical model also captures this, sincea successful project is a point on the extreme upper left of the feasible set, whichimplies it must be a corner solution (and a small contribution).Some of these predictions have already been spoken to by some of the preliminary factsaddressed in Section 3.3, but there is a problem; primarily, there is a great deal of hetero-geneity in projects, with many different dimensions changing at the same time. In order toaddress this challenge, in the next section I describe a linear probability model to examinewhen large contributions occur. The linear regression framework used allows me to controlfor the different covariates of interest, analysing them in a single framework,3.5 Why Do Individuals Provide Large Contributions?This chapter first looks at what features of projects are related to the occurrence of largecontributions; what determines whether or not a large contribution occurs on a given day?As described in Section 3.4, this is a complicated issue, with many, often-contradictoryparts. I use a multiple regression framework in order to separate the effects of differentparts, while still addressing them together. Specifically, I regress an indicator variable forwhether or not a contribution day was “large” on a variety of covariates, both time-varyingand static. This implies that the baseline specification is a linear probability model withequation:1003.5. Why Do Individuals Provide Large Contributions?LCDjt = X′iβ1 +W′jtβ2 + jtwhere jt is the error term, LCDjt ∈ {0, 1} is the presence of a large contribution forproject j at time t,Xj is a set of project-specific covariates, whileWjt is a set of time-varyingcovariates.As an alternative, and in order to control for potential project-level effects not capturedby the set of covariates Xi, I also adopt a fixed-effects model in which:LCDjt = βj +X′jβ1 +W′jtβ2 + jt ⇐⇒ ¯LCDjt = W¯jt′β2 + ¯jtwhere the bar notation indicates the within-panel transformation. The drawback of thismethod is that, because of the fixed effects assumption, only the time varying componentscan be included.It is important to note that the relationships in this model are correlations, not causaleffects; many of the regressors evolve endogenously with the project, and even in theabsence of this factor the identification of the model is probably implausible to assume,given the unobserved heterogeneity which is likely in the data. However, these do allowus to see which components of the projects move together, and assess how they balanceagainst the different behavioural explanations for large contributions. I find that, even inthe absence of a causal effect, the regression framework supports the story being told bythe descriptive evidence: large contributions care about project success, and try to helpprojects succeed. We are also able to give an assessment of the predictions made by ourtheoretical model.The project level variables used are the length of the project, its goal, its average con-tribution size, and a detailed indicator variable for the category of the project. The cate-gories comprise 164 different groups selected by the project owner as a description of theirproject. For example, categories include Video Games, Publishing, Product Design, Sculp-ture, Performance Art and Classical Music, to name a few. The baseline omitted category isnormalized to be the most populous (generic “Art”). The time-varying covariates includedchange for different specifications, but in total include: time fixed effects for day of week,year, and month, the goal outstanding, a quadratic term based on the percentage of timeelapsed, whether or not the project has succeeded, the number of comments, backers andannouncements for a project, and interaction terms relating to the goal outstanding andsuccess, or percentage elapsed. The results are summarized for the baseline specification in1013.5. Why Do Individuals Provide Large Contributions?Table 3.2. Standard errors are heteroskedasticity robust, and clustered at the project levelwhen considering fixed effect models. For time variables, I omit January, Sunday, and 2012as the baseline comparison for fixed effects, when present. I also omit projects which neverreceive any backers, since they have are generally uninformative for the analysis here.A first important fact to note is that, contrary to an interpretation of large contribu-tions as simply being a mechanical effect based on the incidence of large values from apopulation, we can see that both the duration of the project and the average contributionsize have very small effects: both on the order of 0.01%; they are significant, but tightlyestimated around zero. In fact, the coefficient on longer projects is actually negative, al-beit very small. Similarly, the number of total comments has a small, negative effect -indicating that while this may directionally be bad news, the aggregate number of backerswho comment does not play a direct role in the incidence of large contributions. This canalso be seen directly: considering the number of backers on a given day, the coefficientsis very small. This means that large contributions are, as we suspected from the stylizedfacts, truly being driven by a process distinct from the overall arrival and backing rates ofprojects.On the other hand, projects do tend to attract more large contributions when they arereaching their deadline: a project at the end of its lifespan is 0.04% more likely to see alarge contribution than one at the beginning, peaking at about 60% completion. This effectis small, over when compared to the primary drivers. First, project post-success are lesslikely to attract a large contribution: an effect which becomes larger with time, peakingat about -0.3%. In the baseline and RE model, the size of the goal outstanding is non-significant. The largest coefficient belongs to the “pivotal day” indicator; this is the day onwhich projects succeeded (and is therefore neither fully pre or post success); projects arefully between 1% and 9% more likely to see a large contribution on the day of successful,depending on when that day occurs in the lifespan of the project.Turning to the fixed effects model, we see the results are generally robust, but do dif-fer slightly. This indicates that unobserved project-level effects are probably at play. Mostnotably, we now see a statistically significant effect of the outstanding goal; a larger out-standing goal leads to fewer large contributions, indicating a preference for “pivotality” orbeing effective in the large donation. Next, while projects are still more likely to attractlarge contributions as they approach the end of their lifespan, now purely increasing overthe life of the project. Furthermore, projects are also much less likely to see large contribu-tions after success. These results are also general, if we more beyond the linear probability1023.5. Why Do Individuals Provide Large Contributions?model and look at the marginal effects of the outcome of a probit specification. This shouldnot be read that this “pivotal” day causes a large contribution; rather it is likely the oppo-site, in which large contributions are very likely to be pivotal themselves, pushing a projectover the threshold. This provides good evidence that we should believe large contributionsare likely to be instrumental in project success.It is also clear that, on some level, there is a clear story being told about large contribu-tions. Large contributions show a preference for being effective, occurring more frequentlyin projects which have not yet succeeded, but are nearer their goals. They also show astrong “pivotality” affiliation, being very likely to occur simultaneously with project suc-cess, especially if the project is late in its life. A major drawback with this is the fact thatthis study is observational; due to dynamic effects and endogeneity, it is unlikely to tell ustoo much about the direct effects of any of these features.3.5.1 Assessment of Theoretical PredictionsOur theoretical model makes several different predictions about when large contributionsshould occur, and what they should look like.• Evaluation of Prediction 1: Large contributions should be more common as timeremaining drops, conditional on goal outstanding. This hypothesis is confirmed byour analysis; first of all, we see that pivotal contributions, which as the hypothesispoints should be more common as time runs out, are strongly related to the timeremaining. A pivotal day is about 8% more likely to be a large contribution daytowards the end of the project, relative to the beginning. We also see that non-pivotallarge contributions are more likely towards the end of the project as well by lookingat the time remaining, although this is not a linear relationship and peaks at about60%; the reason for not seeing a monotone relationship is likely due to the trade-off between pivotality and non-pivotality; projects which are closer to their deadlineare likely to have a smaller goal outstanding which implies an equally-sized largecontribution is more likely to be pivotal towards the end, causing a hump-shapedrelationship as we observe.• Evaluation of Prediction 2: If projects attract more attention in a given period,then large contributions are more likely in this period. We do not find support forthis hypothesis. In particular, it appears that although total numbers of backers (or1033.5. Why Do Individuals Provide Large Contributions?potential backers, see Marwell (2016) for example) are larger earlier on, this doesnot translate into large contributors. As we see the number of backers today is notinfluential on the number of large contributors, with an essentially zero coefficient.This is likely because large contributors are willing to wait and observe a project toassess whether or not it is actually in need of their aid.• Evaluation of Prediction 3: Large contributions close to a project’s deadline shouldbe more likely to be pivotal than those earlier in the project’s lifespan. This hypothesisis confirmed by our data. As we note, large contributions are substantially more likelyto occur on pivotal days (1% more likely) and this effect is increasing over time, to amaximum of 9% at the end of the project’s lifespan.• Evaluation of Prediction 4: Large contributions should be much less likely aftersuccess. This is somewhat supported by our data; we can see that only about 15%of contributions occur after success, and the coefficient on success has a negativecoefficient. However, the fact that some contributions still occur, and the size of thecoefficient indicate that this is not conclusive. This is possibly because of individualpreferences which are not captured by the model, or alternatively because projectsmay still ask for additional money after reading their goal; these so-called “stretchgoals” are unofficial add-ons which serve as quality improvers, and may motivateadditional large contributions.We can also rule out some alternative explanations. Specifically, we can see that large con-tributions are not a product of statistical variation in the size of contributions: the preciselyestimated zero-order coefficients on length, average contribution size, number of backers,and number of comments left on a project all speak to the fact that large contributionsare purposely and involved. This also rules out naive private value explanations, such asa situation where the size of a contribution is solely driven by exogenous variation in theWTP for the reward. Additionally, we can also see that there are project-level fixed effectsat play: the inclusion of fixed effect into the model provide similar results, but generallysupports the notion of unobserved heterogeneity in the data, indicating this is a concernfor further analysis.In general, we find that large contributions largely match the dynamic behaviour wewould expect from the consumer choice model outlined. Large contributions are interestedin the reward from a project, but are also interested in influencing the probability of suc-1043.6. How Important Are Large Contributions to Crowdfunding Success?cess. This interest in being influential speaks to the reason we see large contributions andpivotality being so closely linked in the data; the donation which funds a project is a locusof the behaviour of individuals in the model. However, as Prediction 4 outlines, there isclearly something more going on in some cases; potentially due to aspect of the projectswe cannot see, or due to different types of behaviour on the part of consumers we do notmodel. The overall picture remains clear: large contributions are motivated by improv-ing the probability of project success, and individuals are willing to pledge large sums ofmoney in order to help make these projects work. This leads to the second question: ifthis is the motivation for large contributions, then how effective are large contributions atachieving this end. In the balance of the paper, I seek to address this question, beginningin section 6.3.6 How Important Are Large Contributions to CrowdfundingSuccess?To assess the impact of these large contributions on the ultimate outcome of a crowdfund-ing project, I consider the project-level data set, and collapse over the time dimension. Iuse the baseline linear probability model to regress an indicator variable for whether ornot a project was successful (Yj) on a set of covariates (Xi). The covariates in this modelinclude all of the project-level covariates from the panel data regressions, plus a variety ofvariables relating the presence of large contributions: an indicator for having a large con-tribution, the number of large contributions, and the excess size of the contributions overthe average. I omit projects which did not see any contributions from which a price couldbe generated, since these cannot be informative about the impact of large contributions.These results are reported in Table 3.4.In the baseline model, we can see that the impact of an additional large donation in-creases the likelihood of a project succeeding by 33.7%; this is tightly estimated. This is anorder of magnitude stronger than any other variable on project success. The impact is non-linear, however: regression on an indicator for any large contributions has approximatelythe same coefficient. This could imply that it is the presence of large contributions, not nec-essarily their number, which matters. This is also likely the case because most projects onlyhave one or zero large contributions, so the further marginal effect is not very present inthe data-set. There is a knock-on effect of large donations outside of the simple monetary1053.6. How Important Are Large Contributions to Crowdfunding Success?Baseline RE Model FE ModelCoeff/Std. err. Coeff/Std. err. Coeff/Std. err.Duration of project (10s) -0.0048*** -0.0047***(4.3e-05) (4.3e-05)Project funding goal ($100k) 0.0001 0.0001(0.0002) (0.0002)Number of backers today (100s) -9.9e-04*** -9.9e-04*** -0.0015***(1.9e-04) (1.9e-04) (3.4e-04)Total backers to date (100s) 0.0001*** 0.0001*** -0.0007***(0.0000) (0.0000) (0.0001)Price (100s) 0.0012*** 0.0012***(4.4e-05) (4.4e-05)Amount of goal left ($100k) -0.0001 -0.00012 -0.0008***(0.0002) (0.0002) (0.0002)% of project duration elapsed 0.0028*** 0.0027*** 0.0023**(0.0008) (0.0008) (0.0008)% elapsed squared -0.0025*** -0.0023** 0.0017*(0.0007) (0.0007) (0.0007)Goal Outstd. X % Elapsed -2.0e-06 -2.0e-06 -9.5e-07(1.5e-06) (1.5e-06) (5.8e-07)Total comments to date -1.2e-06*** -1.2e-06*** -3.1e-07(1.7e-07) (1.7e-07) (3.9e-07)Total updates to date (100s) 0.1234*** 0.1236*** 0.0686***(0.0034) (0.0034) (0.0066)Pivotal day indicator 0.0101*** 0.0101*** 0.0103***(0.0024) (0.0024) (0.0024)(Piv Ind)*(Pct elapsed) 0.0837*** 0.0837*** 0.0858***(0.0035) (0.0035) (0.0036)Succeeded 0.0009 0.0009 -0.0065***(0.0005) (0.0005) (0.0008)(Succeeded)*(Pct elapsed) -0.0031*** -0.0032*** -0.0030**(0.0007) (0.0007) (0.0009)Category controls X XTime controls XProject FE XConstant 0.0050*** 0.0056*** 0.0107**(0.0004) (0.0006) (0.0041)N 2.5e+06 2.5e+06 2.5e+06VCE robust robust clusterTable 3.2: What leads to large contributions?1063.6. How Important Are Large Contributions to Crowdfunding Success?benefit being provided, which is generally less than the level of impact would indicate.The central problem is that the large contribution measure might be endogenous forseveral reasons: trivially, large contributions could be correlated with unobserved project-specific variables which are also correlated with project success. More importantly, how-ever, we explicitly have a situation of reverse causality in which large contributions areaffected by success, and vice versa. In order to overcome this, I use an instrumental vari-ables approach. Specifically, because we think large contributions may have something todo with individuals caring about a project in an outsized way, I created a list of 15 religiousand secular holidays which are related to gift-giving, charity, or generosity34. Becauseprojects vary in terms of when they start and the length of time they cover (the “tenure”),the number of holidays they capture also varies, which introduces useful variation in thedata. Since I control for the number of backers, this implies that the instrument can onlyaffect the project through the amount given, not the decision about whether or not to backthe project. Additionally, since I control for length, month and year fixed effects (this isalso robust to more granular time controls, such as week-of-year), the key identificationassumption then becomes that the tenure start dates, controlling for month, year and otherproject covariates, are independent of the success of the project. In particular, this meansthat we must believe project-owners are not manipulating the start/end dates of theirprojects to pick off marginal holidays. That is, a decision like “we will start our projectin December because of Christmas” is fine but a decision like “we will delay our projectfrom November 23 to 25th to capture Christmas” would bias our instrument. I believethis assumption is credible for several reasons; first of all, the reason most individuals usecrowdfunding is because they are relatively unsophisticated, and have trouble accessingtraditional capital markets. This unsophistication is also likely to mean they are unableto make very granular decisions about project timing. An additional reason is that projectlaunch date is could be difficult to completely control by the project-owner: there are manystakeholders and moving parts involved, which make the exact launch date difficult to setwith precision. This is supported by the fact that, with the exception of the end of the year(around New Years), project start dates are fairly evenly distributed across the year. If thisassumption fails, the results are likely to be overestimates; we are accidentally picking upthe effect of “sophistication,” in the time-manipulation sense, on the likelihood of projectsuccess, and it seems plausible that more sophisticated project-owners are more likely to34For example, Christmas, Black Friday, the start of Lent, Boxing Day, Chinese New Year, etc. See appendixfor more details.1073.6. How Important Are Large Contributions to Crowdfunding Success?Holidays Non-Holidays 95% CI Least Likely Difference# projects (mean) # projects (mean) (Difference) (P-value)All Dates 2766.44 3316.63 [-56.5,1156.5] Ha < 0 (0.96)2013 Only 3195.23 3248.00 [-158.8,264.83] Ha < 0 (0.68)Dec 2013 Only 3112.92 3354.05 [-165.5,649.5] Ha < 0 (0.88)Table 3.3: IV Robustness Checkssucceed.To try to detect this “time-manipulation” I consider whether or not projects are morelikely to pick up holidays during their tenure than other, unrelated dates. In Table 3.3, I cal-culate the number of active projects on each date, then look at the average by holiday/non-holiday. As we can see, most it appears that projects are less likely to be active during theholiday dates than other days; however, because data collection begins in 2012 and endsin 2014, this might mean that the early/late data might be incomplete (especially in 2012,since crowdfunding was just beginning). Accordingly, restrict my sample to just 2013, thebest full year in the sample. We see the same result, and are able to reject the hypothesisthat projects are more likely to capture holidays. In the IV estimates, since we are partic-ularly concerned about month controls, I also look at selected months; December 2013 ispresented here, but the results are similar for other months. It appears that projects arenot more likely to be active on holidays than other dates, supporting the idea that projectowners are not trying to select additional holidays. I also check whether or not the projectsappear different within a month, versus outside; specifically, I focus on December 2013and perform the same IV regression as before; the results are similar, indicating that evencomparing projects which elapse during December, the IV estimates are stable, indicatingthat unobserved effects are not driving manipulation which might affect the instrument’svalidity. Additionally, running the IV regression excluding Christmas, Christmas Eve andBoxing Day (since these are the most “visible” holidays from a US perspective) gives similarresults with a point estimate of 0.44 (Std err. 0.09).The identification assumption, backed by the time controls, and the number of back-ers, means that the number of holidays are not conditionally correlated with the overallproject’s success except through their impact on contributions. This allows us to use thisas an instrument for large contributions, via the TSLS approach. However, we might beconcerned that the instrument effect is very small, causing a weak instrument situation. Tocheck this intuition, I perform a regression of whether or not a day has a large contribution1083.7. Discussionon the variables in the baseline model, plus an indicator for the holiday variable, has a sig-nificant coefficient; it does, at the 5-pecentile. The first stage of the TSLS is similarly strong,with an F-stat of over 40; these facts together provide evidence that the instrumentation isstrong enough to be valid.As we can see, the results are similar but generally stronger when we using instrumentalvariables. The effect of having large donations is about 10-30% stronger in this case;this implies that for many projects, the endogenous project-level factors are negativelycorrelated. This is plausible if we imagine that some large contributions are not madeseriously (i.e. with the goal of obtaining a reward) and instead made for other reasons,such as costlessly signalling interest in a project or simply for a joke. This means that thecausal effect of large contributions in the baseline model is actually probably higher than30%. This indicates that large donations are important for project success - a project’sability to attract large donors is an important element for their success. This agrees withthe naive uncontrolled evidence we saw earlier, in which approximately 20% of projectswould have failed if the large donations had been of average size instead.These results indicate strongly that large contributions play an important role in projectsuccess, and are worthy of further study. However, because of the identification methodsused, there are interpretation issues surrounding the results. I discuss further attempts toalternatively evaluation these results, and discuss robustness and some further explorationin the next section.3.7 Discussion3.7.1 Effectiveness of large contributions relative to sizeThe preceding section has established that there is a relationship between project successand large contributions which goes beyond unobserved project-level effects or statisticalnoise. In fact, large contributions appear to be very important for overall success. Wecould imagine this occurs for different reasons: the first is mechanical. Simply put, largecontributions contribute more money than a typical contribution, which goes towards thegoal, and therefore is relatively effective at reaching the goal. While simple, this is actu-ally difficult to assess, primarily because we cannot control for the amount raised and stillinterpret the regression model sensibly (the dependent variable is a linear combination ofthe amount raised and the goal). As an alternative, I perform a “smoothing” exercise: in1093.7. DiscussionModel 2: LP Model 2: IVCoeff/Std. err. Coeff/Std. err.Price (100s) 0.0190*** 0.0126***(0.0022) (0.0035)Goal (1000s) -0.1101** -0.1019*(0.0417) (000397)Length -0.0082*** -0.0089***(0.0001) (0.0003)Number of Large Contribs 0.3374*** 0.5298***(0.0038) (0.0920)Total Comments (100s) -0.0022*** -0.0017***(4.77e-04) (4.82e-04)Total updates 0.0256*** 0.02014***(0.0007) (0.0027)Total Backers (100s) 0.3807*** 0.3458***(0.0724) (0.0694)Constant 0.6857*** 0.6841***(0.0130) (0.0133)Category Controls X XTime controls X XN 95,792 95,792Errors robust robustTable 3.4: How important are large contributions?1103.7. DiscussionAdjustment of BackersChange in Variable 29.6(1) Impact (Baseline) 11.25%(2) Impact (IV) 10.01%Net Impact of Large Contributions (Baseline) 22.49%Net Impact of Large Contributions (IV) 42.96%Table 3.5: Counterfactual “smoothing” exercisethe data, the average large contribution is $2,098 larger than expected. We could imaginemoving this amount around in the model by shifting how it arrived. Basically, we imaginemore individuals arrived, and compute, given the median contribution, how many incre-mental arrivals the typical large contribution “represented”. Then, a measure of the directeffect would be a comparison of the change in the probability by trading off this weight.This is depicted in Table 3.5, which illustrates that the effect is about 10% directly. Thismeans that about 2/3s-3/4s of the effect of large contributions is being driven by factorsother than the simple amount being donated. The simplest explanation is that large con-tributions are not merely providing more money: they are providing more money whenit is needed and in the amount needed. This “targeting” of contributions to when theyare most urgent would explain the fact that these contributions are essentially about 3-4times more effective than we would expect. An alternative explanation, which is difficultto analyse in the data, has to do potential informational concerns. Some authors (namelyMollick (2014)) have discussed the potential of information transmission or herding fol-lowing contributions. It could be possible that some of these effectiveness is being drivenby large contribution providing a positive signal to the market, inducing more success byattracting others to the project.The magnitude of these effects roughly coincides with the more naive assessment re-ported previously; when we eliminate the large contributions excess, and instead pretendit was a single regular contribution, we see about 2/3rds of projects would have failedwithout their large contributions. Of course, this does not control for any of the factorsdiscussed previously, but it does agree in magnitude with the total effect reported by theIV estimates, giving us confidence that these results are reasonable in magnitude.1113.7. Discussion3.7.2 Interpretation and Robustness of IV estimatesWe can also try to examine the magnitude of these coefficients by imagining (as discussedearlier) there are two kinds of backers: those interested in the material reward (fans)primarily, and those interested in the project’s success (friends of the project). If you areinterested in the reward, then a large contribution might be reasonable to make early on,since you have limited liability if the project fails. Otherwise, individuals most concernedwith success tend to donate when they are most needed, generally occurring when theproject is “pushed” over the edge of its goal. An alternative way to think about this is toimagine these “friends” are directly related to the project owner; then, if most projectsare funding by a mixture of personal and crowdfunded capital, these large contributionsare a way of relaxing the commitment to the funding goal by using more personal capital.This can be advantageous if the project-owner is unsure of their ability to raise fundsfrom the crowd; however, there is a cost associated since the crowdfunding platform willtake a fraction of the contribution, making this akin to a kind of self-insurance.35 TheIV approach should address this issue, as discussed earlier, but we can also analyse thisdirectly by separating the indicator for a large contribution into two parts: before and aftersuccess. Looking just at large contributions which occurred prior to the project succeeding,we can repeat the previous analysis with this in place of the large contribution variable.The results are reported in Table 3.6.As expected, the coefficient is more modest: large contributions prior to success lead toa 21% increase in the likelihood of success. On the other hand, the IV regression carried outbefore is still valid, and even more dramatic: a large contribution prior to success leads toa 70% higher chance of the project succeeding. This is consistent with our understandingof why these contributions would occur. For individuals who care about helping a projectsucceed, they seek to spend their money in a manner which is most effective. This meansgiving large sums to money to projects in such a way that it “pushes” over the cusp ofsuccess. This is, in general, not by directly filling the gap, but by closing the gap to such anextent that it becomes very likely that other backers will provide the remainder.To confirm this intuition, I also include variables for large contributions both precedingand post-success. Now, due to endogeneity, we cannot meaningfully interpret the post-success coefficient, but this allows us to compare the baseline and preceding regression.We can see the results are, as expected, intermediate. The effect of the pre-success con-35I thank Daniel Ershov in particular for this suggestion.1123.7. Discussiontributions remains much lower than the baseline, but the IV results are still accordinglyhigh. This concurs with the interpretation that pre-success contributions are causal in themanner described: individuals want to “tip” projects over into the success. The low co-efficient in the non-IV model is because this isn’t always effective, and many projects arelost causes even with large contributions. The high coefficient in the baseline model is cre-ated by a combination of this factor (helping success) and individuals who are primarilyself-interested claiming large rewards after the project has succeed.1133.7.DiscussionModel 3: Model 3: Model 3: Model 3:Pre-success Pre-success (IV) Pre/Post Pre/Post (IV)Coeff/Std. err. Coeff/Std. err. Coeff/Std. err. Coeff/Std. err.Average price (100s) 0.025*** 0.012** 0.020*** 0.013***(0.003) (0.004) (0.002) (0.004)Goal (1000s) -0.121** -0.112* -0.107** -0.101*(0.045) (0.044) (0.040) (0.039)Length -0.008*** -0.009*** -0.008*** -0.009***(0.000) (0.000) (0.000) (0.000)Pre-success Large 0.208*** 0.703*** 0.266*** 0.513***(0.004) (0.135) (0.004) (0.113)Total Comments (100s) -0.003*** -0.002*** -0.002*** -0.002***(5.49e-06) (4.94e-06) (4.77e-06) (4.71e-06)Total Updates 0.031*** 0.022*** 0.025*** 0.020***(0.001) (0.003) (0.001) (0.002)Total Backers (100s) 0.443*** 0.444*** 0.347*** 0.336***(0.080) (0.078) (0.069) (0.067)Post-success Large 0.521*** 0.581***(0.004) (0.028)Category/Time FE X X X XConstant X X X XN 95,792 95,792 95,792 95,792Errors robust robust robust robustTable 3.6: Robustness I: Pre/Post Success1143.7. DiscussionOne reason why these results may not be generally true for all projects has to do withthe IV procedure. Specifically, we know that the coefficient being estimated is not necessar-ily the effect for the average project in the sample; instead, it is a local average, specificallyfor those projects which have large contributions influenced by the instrument chosen. Aswe recall, the instruments I chose were based around holidays which have to do with gift-giving or charity. The idea was to encourage “large” contributions by affiliating the resultswith days on which individuals feel more like giving large sums of money. Of course, notall individuals feel the same about all types of projects. Backers of a punk rock concertmay feel substantially less moved by the onset of Lent36 than backers of a Christian rockalbum. Similarly, tech toys and video games may be very appealing on Black Friday, but aperformance art event may not. The coefficients measured have to do, specifically, with thepool of individuals (and projects they are interested in) which are affected by the instru-ment. Thus, the 70% increase in Table 3.6 reflects the marginal effect of large donationson projects for which giving or charitable holidays affect the backer pool. If we think, asis probably reasonable, that projects which have backers with react to charity are slightlymore generous or interested in the outcome relative to a generic project, this estimate isoverstated.In order to examine this, I break out my regression into sub-groups, based on an intu-itive assessment of which projects might be affected differently by the instrument. Thesegroups are broken out by category, and are presented in Table 3.7. As we can see, differentproject types clearly react differently. The board games category appears to be strongly in-fluenced by these kinds of support. This is typically because board games are a very niche,fan-oriented product, in which people care a great deal about the project succeeding. Onthe other hand, video games shows a non-significant but relatively tightly estimated zero.This indicates that these concerns are less relevant for supporters of video games. Fash-ion has a similar, but less tightly estimated coefficient. Documentary is interesting, thisthat while the coefficient is small, it shows wide variance despite a larger than normalsample size. This is potentially because projects in the documentary category tend to belong shorts, or fall into the category discussed. This makes the influence of the individualsattempting to help projects succeed less relevant, as it is diluted by individuals supporting“lost causes” or pet projects.A similar issue could be related to unobserved variables, in particular the subjective36Lent is the Christian religious month of penance and prayer, celebrated mainly by Catholics, Orthodox,and the more traditionalist members of the Anglican Communion.1153.7. Discussion“quality” of a project. The IV procedure used is robust to variations in project quality, aslong as they do not have an impact on the holidays captured by the tenure. However, if weimagine that individuals who are motivated to give to a project are more likely to give tohigh quality projects, this could pose a problem. In particular, if “quality” is also related tothe amount the project raises, it could be the case that high quality projects are both moreinfluenced by the instrument (i.e. more likely to get a large contribution on a holiday)while also needing a lower size of large contribution to create success. These combina-tions of effects could overstate the impact of a large contribution, since it would meanthe projects impacted the most by the IV are also projects which are relatively sensitiveto large contributions. Unfortunately, this is difficult to test for explicitly, since “quality”is difficult to ascertain; it is difficult to infer from the covariates, and even if we assumeit was capturable by the covariates (“no selection on unobservables”), approaches whichtry to control for unobserved heterogeneity, such as propensity score matching, fail to ac-count for the reverse causality built into the standard suggestion. Indeed, propensity scoreestimates of the effect of a large contribution on success indicate a negligible effect; thisis actually expected, since we know that large contributions are less likely after success,which means that projects near the success-failure boundary are likely to be influential inthis comparison, but then these over-state the rate of failure for large contributions, oncematched. In other words, the decision process behind large contributions is endogenous tothe (expected) success or failure of a project and therefore serves as a kind of “levelling”process which makes the liminal projects very similar in terms of their success. At bestthis indicates that unobserved variables might drive some of the variation, but we have tobelieve that controlling for them fails in the IV procedure.A further concern has to do with the second potential issue: the fact remains that thereduced form evidence shows that causality travels both directions: large contributionsmay lead to success, but success may also attract large contributions. As an alternativeto the instrumental variables approach, we could consider this directly. Specifically, thisimplies that the model can be depicted as a system of equations; the probability of a projectsucceeding and the probability of large contributions can both be viewed a dependentvariables. By excluding the number of backers and the comments and updates from theequation determining the likelihood of a large contribution (since these were insignificantand do not control for scale in the baseline model), we can use 3SLS to estimate jointlythe probability of a project attracting a large donation, and succeeding. The coefficienton large donations is insignificant; large variation in the coefficient size is possible. The1163.7. DiscussionBoard games Documentary Fashion Video gamesCoeff/Std. err. Coeff/Std. err. Coeff/Std. err. Coeff/Std. err.Pre success lrg 1.53197* 0.1533902 0.8881502 0.362638(0.6236692) (0.6084708) (0.7551741) (0.2613181)Average price -0.0001658 0.0002604 0.000205 0.0001396(0.0003977) (0.0001631) (0.0002982) (0.0001174)Goal ($10k) -.0175291** -0.0096762*** -0.0057097 -0.0055031***(.0064334) (0.0023282) (0.0037605) (0.0009464)Length -0.0162298*** -0.006306* -0.0078877*** -0.0047164***(.0031471) (0.0028043) (0.0014181) (0.0008091)Total comments -0.0000239*** -0.0019844 0.001435 -3.50e-06(5.95e-06) (.0013042) (.0010702) (5.39e-06)Total updates 0.0169161*** 0.0253838 0.028132 0.0234804***(0.001739) (0.0177603) (0.0240123) (0.0070042)Total backers 0.0168696*** 0.0438697** 0.0092527* 0.0026646***(0.002733) (0.0142683) (0.0036724) (0.0004791)Constant 0.7003526*** 0.5376474*** 0.4152503*** 0.2178438***(0.0890382) (0.0513129) (0.0501182) (0.0468504)N 2218 5,040 3,479 3,280vce robust robust robust robustTable 3.7: Robustness II - By Category1173.7. Discussioncoefficient on success is 5%, with a 95% confidence interval between 4.9% and 5.2%,indicating that once we control for the simultaneous nature of the project, the effect isstronger than before. This is likely due to the fact that this method separates out “non-credible” large contributions from those actually determined by the success of the project.The main difficulty in evaluating this method is that the exclusion restriction is difficult toestablish in this framework; we simply have too much noise to assess the coefficient valueaccurately. The relates to the propensity score problem earlier; exclusion restrictions arerequired for identification in both methods, which do not readily arise in either context,leaving the IV estimation as the most credible.I also perform robustness checks to see whether or not outliers or skewness in thecovariates might affect the results. The most skewed and influential variable is the numberof backers. I perform several non-linear controls, both in absolute terms and scaled, tocheck whether or not this matter. I also perform robust regression to control for outliers.These are depicted in Panel (A) of Table 3.8. As we can see, the outliers do not seem toplay a large role in the regression; the coefficients are similar to those in the baseline OLSspecifications. Similarly, most of the non-linear controls provide similar coefficient signsand magnitudes to the IV estimates found earlier. However, I find that the specificationseems sensitive to the log transform; a similar, but weaker result is found with the cube-root specification, although not to the same degree. This is somewhat puzzling since earlierstudies (c.f. Kuppuswamy and Bayus (2013); Mollick (2014)) which compared the twospecifications did not see a difference. A qualitative assessment of the distribution residualsalso does not show large changes; they are relatively well-distributed in both cases. Themain difference between this specification and the others is that it makes small and largevalues “more similar,” effectively compressing the distribution of the covariates.Motivated by this, I examine whether or not variation in the size of project (measuredin different ways) has a heterogeneous effect on the IV coefficient; this is depicted in Panel(B) of Table 3.8. As we can see, the standard results are driven by projects slightly abovethe average number of backers; between 150 and 300; these positive and large significantcoefficients in the baseline IV specification drive the results. A similar result is shownwhen we look at total funding. This implies that when we compress the number of totalbackers, we essentially make these groups harder to distinguish, washing out the results inthe middle which drive the baseline result. In other words, what are detecting is actuallyheterogeneity on the part of the impact; projects which struggle to attract backers, orattract a very large number of backers are not as impacted by a large contribution. This is1183.7. Discussionlikely because they are in the process of irrevocably failing, or are already nearly sure tosucceed. This agrees with our model of how projects should be funded, but it does requireuse to bear in mind that not all projects are the same.The major drawback of this assessment is that the specifications have much smallersample size and are not precisely comparable, due to variable sample composition; thislikely understates the problem, since the controls become more likely to be highly predic-tive as the sample size falls. The result for funding also explains why other literature mightnot see a difference; many papers trim off projects which are small (typically raising lessthan $5000) which appears to be influential for these results.3.7.3 Policy ImplicationsThe implications of large contributions primarily have to do with which interpretationof the motivations you adhere to. Since we cannot directly identify large contributors,this must remain a judgement call on the part of the policy-maker; however, it still givesclear direction regarding what must be taken into account to ensure crowdfunding worksas intended. The most benign interpretation is that of the “fans” discussed previously.If large contributions are primarily driven by interested, uninvolved parties, the scopefor malfeasance is limited. The main concern revolves around the role of asymmetricinformation. If consumers are not aware of the prominent role of large contributions, anduse the goal (for reasons such as those discussed in Section 3.2) as an indication of thelevel of support necessary, they may end up being misled into believing a project needsmore support (in terms of number of supporters) than it really does. If a policy makerbelieves that crowdfunding is useful because it helps to overcome limited information onthe part of some consumers, this research indicates that large contributions complicate thisrole. Finally, a common concern regarding crowdfunding surrounds the sophistication ofbackers; if we believe these large contributing individuals are more sophisticated, or simplymore careful, than the average person, the fact that they are so relevant for success canameliorate these concerns.This is compounded by the self-insurance role that friends or family might play. Iflarge contributions are a form of limited commitment to the threshold, this again can havenegative implications when asymmetric information is involved. The reason is similar, butis compounded by the fact that if the individuals making the contributions are involvedin the project, then the goal conveys even less information than in the preceding case.1193.7. DiscussionPanel (A)Robust (OLS) Cube Root Quadratic Cubic LogCoeff/Std. err. Coeff/Std. err. Coeff/Std. err. Coeff/Std. err. Coeff/Std. err.Price (100s) 0.0593*** 0.0175*** 0.0132*** 0.0136*** 0.0144***(0.00105) (0.00367) (0.00348) (0.0035) (.0000316)Goal -0.0196*** -0.0001** -0.0001** -0.0001** -0.0001*(0.0000) (0.0000) (0.0000) (0.0000) (0.0000)Length (10s) -0.0696*** -0.0673*** -0.0884*** -0.0880*** -0.0422***(0.0013) (0.0046) (0.0034) (0.0034) (0.0064)Total comments -0.0260*** -0.0038*** -0.0021*** -0.0016*** -0.0014***(100s) (1.51e-04) (7.49e-04) (4.27e-04) (4.58e-04) (3.37e-04)Total updates 0.0146*** 0.0048*** 0.0183*** 0.0173*** 0.0027**(0.0003) (0.0012) (0.0026) (0.0025) (0.0009)Total backers 0.1834*** 0.0098*** 0.0142***(0.0003) (0.0007) (0.0010)Total backers2 -0.0080*** -0.0000*** -0.0000***(3.79e-07) (1.25e-06) (6.09e-06)Large cont. 0.2481*** 0.1835 0.5220*** 0.5130*** -0.1472(0.0033) (0.1099) (0.0918) (0.0919) (0.1314)Total backers1/3 0.4837***(0.0383)Total backers3 2.88e-08***(6.36e-09)log(total backers) 0.1963***(.0151268)Constant 0.5819*** 0.3791*** 0.6827*** 0.6823*** 1.005***(0.0123) (0.0262) (0.0132) (0.0132) (.0276231)Category Controls X X X X XTime Controls X X X X XN 95,792 95,792 95,792 95,792 95,792VCE robust robust robust robustPanel (B)Backers (100s) 0-0.5 0.5-1 1-1.5 1.5-2 2-3 3-4 4-5 5-6 6+Coefficient -0.12 -0.19 0.09 0.59 0.38 -0.41 -0.10 0.33 -0.17p-value 0.71 0.377 0.64 0.15† 0.24 0.44 0.63 0.63 0.86Total funding ($10k) 0-0.5 0.5-1 1-2 2-3 3-4 4-5 5-6 6-7 7+Coefficient 0.29 -0.02 0.07 0.34 0.94 -0.25 0.53 0.30 -0.45p-value 0.15 0.93 0.7 0.19† 0.83 0.62 0.34 0.72 0.50Table 3.8: Specification Robustness and Heterogeneity 1203.8. ConclusionEspecially for very small contributions, this could lead to very misleading beliefs. It alsowidens the scope for moral hazard issues. Since the main route to address malfeasance onthe part of a project owner is through a class action lawsuit, small projects are particularlyvulnerable to creators who are not serious about carrying through with their rewards (asthe class might be too narrow to profitably certify). Large contributions by involved partiesprovide a mechanism to fund projects which have no reasonable chance of reaching theirgoal otherwise, and which are unlikely to deliver their promised rewards.A second consequence of large contributions has to do with the formalization role dis-cussed in Agrawal et al. (2013). Formalization of previously informal funding arrange-ments implies some degree of external communication and bargaining; we could imaginethis as an explicit equity arrangement. Crowdfunding, because of the all-or-nothing fund-ing arrangement provides much more leverage for interested parties, especially if a projectproceeds to go through several rounds of funding. Consider a project seeking to raise $1m,which is $50,000 short close to the deadline. An unscrupulous but interested investorcould promise to give the $50,000 in exchange for favourable terms in a later round ofequity funding. The nature of crowdfunding gives this offer disproportionately more im-pact than it would on its own, potentially making these projects vulnerable to predatoryinvesting. From a welfare point of view this is not critical, but it does present a problemfor regulators.3.8 ConclusionThis chapter has shown some of the first evidence that the traditional narrative behindcrowdfunding is not the complete story: projects are highly influenced by the presence oflarge contributors. Specifically, we see that projects are much more likely to be successful inreaching their crowdfunding goals when they are able to attract a large contribution. Thisis directly relevant for both individuals thinking about carrying out projects, researchers,and policy makers. First of all, if you are considering crowdfunding, consider your sourcesof funding. Is it likely you have some large supporters out there who will be able to assistyou? Are they friends, family, or just individuals passionate about your product? If youcannot answer “yes” to this question, it is unlikely your project will have the resources tosucceed. One suggestion indicated by this research is when using a mixture of personal andcrowdfunded capital, reserve the personal capital for later in favour of a more aggressive1213.8. Conclusiongoal; you can use a large backer holding this personal money later to “rescue” your projectif it is in distress.For academic researchers, in addition to providing some understanding of the crowd-funding process, it also provides salient guidance for modelling. Most structural models ofcrowdfunding only look at the number of backers, assuming (implicitly or explicitly) thatthey all pay the same amount. This paper demonstrates that this is a poor assumption;large contributions appear to be at least equally important to understanding the successor failure of campaigns. The degree to which the failure to model this aspect of crowd-funding affects the structural estimates depends on the model under consideration, as wellas the speculative reasons for large contributions; for example, in models such as Marwell(2016) an “insurance” interpretation of large contributions would complicate interpreta-tion but not be critical to the analysis. However, this paper also suggests future ways toimprove structural models, and other potential ways to build up these models into a morerobust explanatory framework. This also suggests future work: analysing the causal deter-minants of large contributions is an obvious extension. It would also be useful to go behindthe data, and link crowdfunding projects and their different pledge levels, to get a sense ofhow these large contributions are claimed.122Chapter 4Sales Classification via HiddenMarkov Models4.1 OverviewWhat is a sale? In everyday life, we think of a sale as a kind of “deal” - some kind of tem-porary discount relative to a “regular” price. Sales are attractive to many consumers, sincethey indicate good value for money, and are frequently advertised. To prevent consumersfrom being misled, the intuitive notion of a sale is given strength by a precise legal defi-nition in many jurisdictions. For example, in Canada, the Competition Act specifies that asale may only be advertised if the offered price is a “bargain” relative to the ordinary priceof the product37. However, it is not always straightforward for consumers or academicresearchers to tell whether or not a given price is a sale. Frequently, we only observe theprice of a given product, and not information about whether not that price is a sale. If theseprices are observed for long periods of time, natural price adjustments due to supply-chainvariations or changing market conditions may make it very difficult to infer which pricesare sales, or otherwise. For instance, when is a price reduction truly a sale, rather than anadjustment in the face of increased competition? The question is rarely straightforward toanswer.The task of deciding whether or not a given price is a sale is called classification. Forconsumers, it is important since it helps them know whether they are getting a bargain ornot. For researchers, it is important since sales are an integral part of the retail economy,and it is necessary to know which prices should be regarded as sales or not. However,this is not an easy task. The fundamental problem of sales classification amounts to theobservation that prices change both in the short run and in the long run and this maycause confusion. If a retailer lowers their regular price for a product, an observer might37Which can be defined by either duration or volume, as in the Canadian Competition Act, Sections 74.01,74.041234.1. Overviewbe misled into thinking this new, lower, price is a sale - especially initially. Similarly, ifinflation causes all prices to rise, a consumer might be mistaken into thinking that the new,higher sale price might be a regular price.This is compounded by the fact that direct auditing of prices - say, by visiting a storeand observing which products are advertised as “on sale” or not - is generally difficult tocarry out. From a practical point of view, visiting a large number of stores on a weekly (ormore frequent) basis and carefully collecting information on all the products in the storemay be costly, if not impossible. Equally difficult is the fact that many firms deliberatelyobfuscate the different kinds of pricing in their stores. A given store may offer “sale prices,”“everyday low prices,” “cardholder prices,” “manager specials,” “buy large and save” offersto mention only a few of the many schemes available. Which of these are true sales? Theyare all advertised similarly, with brightly coloured tags calling out their offers, and manyeven see similar placement in newspaper circulars. This frenzied battle for the attention ofthe shopping public does little to aid researchers, who might very well make mistakes injudgement about what constitutes a true sale or not.The solution is, of course, to let the numbers speak for themselves. Faced with thedaunting task of trying to infer sales from incomplete and deliberately baffling information,researchers fall back on heuristic definitions based on the observed sequence of prices.These can be recorded accurately and repeatedly with ease at high frequency, generallythrough the use of retail scanner data (i.e. prices recorded by computer systems at thetime of purchase). Since we have a natural definition of a sale as a “lower” price relativeto some standard, the researcher’s problem then becomes a question of how to make thisdefinition practically useful. There have been many solutions proposed to this problem,almost as many as there are papers written studying sales. Some papers adopt a fixedstandard as sale price (i.e. below $d, as in Pesendorfer (2002) or Hendel and Nevo (2013)),while others choose a discount relative to some average standard (e.g. 25% below themode, as in Berck et al. (2008) or Hosken and Reiffen (2004)). The largest scanner dataprovider, Nielsen-Kilts, suggests the method of 5% below the (local) average. My own workin Chapter 2 adopted a more flexible method, based on clustering and mixture models.Nonetheless, all of these methods are ad hoc: they rely on the researcher’s understandingof the products, their prices, and their stability to choose a definition which is correct.Moreover, all of this must be carried out before any analysis: prices must be classified intosale and regular prices, after which reduced-form investigations can be carried out. Thisis typically without any acknowledgement of, let alone adjustment for, the uncertainty1244.1. Overviewinherent in such a classification.This chapter develops an alternative model, which directly addresses many of thedownsides existing methods of sales classification face. The way consumers typically re-solve the fundamental problem of sales classification is by using more information: if aprice is stable over a long period of time, it’s probably the normal price. If the pricegoes up after a higher price then stays there, the first price observed was probably a sale.Consumers use information about the price sequence and its dynamics to classify sales.However, in academic research this is often flipped on its head: the rationale is that firstone determines what is, or is not, a sale, and then uses that determination to uncoverinteresting correlations or relationships with other data (like sales dynamics). However,then by definition we are not using all of the information available when we perform theclassification step! In the standard method, classification is a one-way street, showing re-lationships between sales (and their dynamics) and covariates but never using them in thedetermination of sales themselves. This oversight is largely ignored by the literature onsales, mainly because it is innate in the structure of classification itself: most methods re-sult in a clean yes/no cut-off (entering into estimation as a binary indicator variable basedon a stark cut-off). If we define a sale as $1.00 less than the modal price, then what if aproduct is $0.99 off? Why is a dollar discount definitely a sale, while something sold atonly a penny more is not? Why don’t we include information like the fact that the second-to-last price was also $0.99 more? Although methods that make this kind of choice arebacked by careful consideration of the price distribution, there is something unsatisfyingabout this kind of rule. A better method should help a researcher quantify this statisticaluncertainty, either using it making a judgement call about what is a sale or not (e.g. morethan 90% likely, as in Chapter 2), or better still incorporating it directly into the study. Itshould also use the dynamics of prices explicitly, as our intuition from consumer behaviourimplies.This chapter develops such a method, using the hidden Markov model framework todescribe the way sales function. At their most basic, hidden Markov models are descrip-tions of structured but random processes where the observed outcomes are determinedindirectly by a hidden state. The key hidden state in the study of sales is the central pieceof missing information: is this product on sale at a given time? By considering the observedsequence of prices as a result of an unobserved sequence of sales, we can use this modelto begin to piece together the driving forces behind price variation. The power of the hid-den Markov framework arises through the combination of inference about the unobserved1254.1. Overviewstate of the world and its connection to the observed reality. Hidden Markov models cap-ture closely the intuitive mechanism through which sales become observed prices, resultingin a powerful technique for classification. To overcome the fundamental problem, I modelthe sale and regular prices in the manner originally developed in Chapter 2: as a changingset of regimes. So, rather than facing a single pair of prices, the researcher regards the ob-served data as being a sequence of these pairs, changing over time. This long-run pricingevolution can be captured in the same manner as the shorter term sale variation, providedthat we impose some conditions on the underlying process governing the states.To be precise, in this chapter we will adapt the definitions from Chapter 2. Considera representative product being sold at a single grocery store over time. There is a funda-mental price Yˆt for each (discrete) time period t = 1, 2, ..., T . We imagine at each period,the grocery store has several different options to choose for their price, Yˆt ∈ P . Theseprices correspond to the pricing strategy the grocery store has settled on using at that timeperiod; in any case, the set P is the pricing set, and is defined as follows:Definition 4.1.1. A pricing set denoted P is a pair of prices which consist of (1) a regularprice p and (2) a discount from the regular price δ > 0. The associated sale price iss ≡ p − δ. Similarly, a sale is defined as the event that Yˆt = s and is written St = 1. Thefundamental price can then be written as Yˆt = p− Stδ. The set of all pricing sets is P.These pricing sets are the options individual stores have to choose from; a pricingregime is the particular set that a store has adopted, which persists for a contiguous time.Since this is not directly observable, this is modelled as a state variable which we indexwith the integers; we represent this indexing with a function R : Z+ → P which maps theintegers into the set of all pricing sets.Definition 4.1.2. A pricing regime, denoted Rt, is a state variable which indexes the avail-able pricing set for a store at time t and lasts for a contiguous period of time.38 That is, ifRt = z then P = R(z) is the available set of prices. We denote the associated prices andevents by association with Rt: p(Rt) > 0, δ(Rt) > 0, s(Rt) ≡ p(Rt)− δ(Rt). If we define Stto be the event that a sale occurs, the observed price can be written as Yˆt ≡ p(Rt)−δ(Rt)St.This is a simplification of the definition from Chapter 2 for the case where there is onlya single discount level being offered, as most sales environments feature. The extensionof the methods developed in this chapter to more discounts is straightforward. Notice,38That is to say if Rt = R and Rt+k = R then for all s ∈ [t, t+ k], Rs = R1264.1. Overviewalso, that this specifies the fundamental price, not necessarily the observed price. Mostenvironments with sales will feature noise in the observed price sequence; for white noise,the interpretation is that Yˆt is the mean of the distribution of prices at a given period.As I demonstrate, the correct way to model this kind of a process is through the useof block left-to-right Markov models, a term which we will define precisely in Section 4.3.These kind of Markov models are typically used to model processes which are one direc-tional, like sequences of words in human speech, or developmental processes over time,of which sales are a good example. In sale pricing, there is both short-run and long-runvariation; the short-run variation is the sale, while long-run is how sales change over time.The blocks correspond to periods of sales with particular prices, and the movement be-tween blocks corresponds to the long-run variation. These kinds of models violate thebasic assumptions for hidden Markov models, as typically used. Accordingly, I developexplicit identification conditions for these models which rely on cross-sectional variation.Next, I show that the basic description of sales as a hidden Markov model falls neatlywithin the general framework developed. I also illustrate that this can neatly incorporateinformation about multiple covariates or other variables of interest. This fact not only in-corporates more information into the classification of sales, but also allows researchers toexplicitly test correlations between sales and variables of interest at the classification step.This removes the necessity for reduced-form analysis, and provides a powerful tool for theevaluation of competing models of sales. I then show how even complex dependenciesbetween sales over time can be modelled using higher-order Markov models, which againfall within the framework under consideration.I evaluate the performance of my technique on a variety of Monte Carlo simulatedscenarios, ranging from simple to complex. I simulate sequences of prices including sales,then apply the model to the simulated data to evaluate performance: because we know thetrue generating process, we can easily evaluate how well the model recovers the hiddenstates (sales). The model generally performs very well on both dimensions of interest.First, the estimates of the relationship between sales and observed outcomes are accurate,allowing for correct inference about the true relationships between sales, prices, and othercovariates. More importantly, the classification remains highly accurate, even in situationsdesigned to be difficult to make inferences. This implies that this method can be usedto classify sequences of observations into sale and regular prices in a number of wayswhich will be generally correct. This classification method is highly robust and powerful,presenting itself as a natural alternative to other ad hoc alternatives. Finally, I investigate1274.2. Background on Hidden Markov Modelsthe small-scale properties of the model, in situations where the cross-sectional nature ofthe data is dubious. I find that the model, provided it is correctly specified and well-instantiated, remains accurate. This demonstrates its utility even for problems where agreat deal of data is not available.The remainder of this chapter is organized as follows: in Section 4.2, I provide somebackground on Markov chains and hidden Markov models which serves as a (largely) self-contained introduction to the general setting. Section 4.3 extends these results to blockleft-to-right models, and then demonstrates how sales can be modelled using this frame-work. Section 4.4 provides evidence of the performance of the method using simulations,and investigates the small-scale properties of the model. Finally, Section 4.5 concludes.Some selected proofs are reserved to Appendix C.4.2 Background on Hidden Markov ModelsIn this section, I begin with a general background on hidden Markov models and Markovchains. We will connect these to sales explicitly in Section 4.3; this section serves to defineterminology, notation, and as an introduction to the area for those unfamiliar with thesubject matter. A hidden Markov model (HMM) is statistical method designed for analysingsituations where the observed outcomes depend on an underlying state of the world whichis itself not observable. For example, perhaps an economist is trying to understand howtheir research output varies with their innate productivity. They know that on some daysthey are in a “productive” mood, while on other days their mood is unproductive. However,they (perhaps due to a lack of objective introspection) cannot observe which mood they’rein on a given day, only some measure of their productivity - say, number of words written.Productive days naturally agree with large numbers of words, while unproductive dayscorrespond with smaller amounts written. The economist is interested in two questions.First, how does the number of words written vary with the unobserved mood? For example,what is the average (or expected) number of words they write on a productive day. Second,how do moods evolve or change over time? This would let them understand not only thedynamics of their moods, but also how long they expect to be in a productive mood ina given length of time. Understanding these two questions together gives them a goodunderstanding of their research productivity, and would help them do things like planwhen to take holidays (during periods of expected low productivity, for example), and1284.2. Background on Hidden Markov Modelsforecast their research output in the future. In the context of sales, the economist observesa sequence of prices over time. These prices are driven by an underlying pair of states:whether or not there is a sale being offered, and which pricing regime is currently active.As the underlying state changes between sales and non-sale periods, the price rises andfalls. Similarly, the level of these prices also changes as the active regime moves over time.Sales classification is the task of trying to recover the underlying state (sale or non-sale)from only the observed information.As the examples show, hidden Markov models are particularly useful in trying to anal-yse behaviour in which the underlying, hidden, state is important for a researcher. Theresearcher would like to understand not only the relationship between the state and theobserved variables, but would also like to understand the hidden state itself. These twosources of randomness are why HMMs are a type of doubly stochastic processes; the ob-servations are random and the randomness is itself driven by a random process. Thisdoubly-random feature also forms the key complication which makes them different froma traditional model which could be solved by maximum-likelihood: if the hidden state wereknown, standard techniques would suffice, a fact exploited by Baum et al. (1970). Usually,the way we would solve this would be to “average out” the unobserved state39. However,as we will see, this is not feasible in this environment, and would make it difficult to inferthe hidden state. This interest in the underlying state itself also means we are consideringa specific use of Markov models, where we are interested in the model parameters andstructure itself, not merely as a means of forecasting or data mining a process (as sum-marized in Cappé et al. (2005)) which differs from many applications (particularly in thisarea).First explicitly studied by Baum and Petrie (1966), Hidden Markov models form partof a rich set of related processes, including Markov-switching models, Markov-jump pro-cesses, among only a few (see Frühwirth-Schnatter (2006); Creal (2012) and Cappé et al.(2005) for examples).The key assumption which makes hidden Markov models particularly tractable is thatunderlying state changes in a manner which depends only on its current value, and noton its entire history. This type of process is called Markovian: in our setting here, wewill assume that it that takes a specific form called a Markov chain. In a hidden Markovmodel, the Markov chain governs the behaviour of the hidden states, while the other parts39For example, if the probability of an observation Y depends on some unobserved X with distribution g(x),we can calculate P (Y ) =∫XP (Y |s)g(s)ds, averaging out the X variable1294.2. Background on Hidden Markov Modelsof model govern the way these hidden states link to the observed variables. In our salescontext, the Markov chain would define which regime is active and whether there is asale or not, while the rest of the model would determine how this translates into theobserved prices (and other covariates). Therefore, it is useful to carefully define a Markovchain, and some related terms, as they will be used in this chapter, since they are centralto the idea of a hidden Markov model. It is important, to note, however, that there aremany generalizations and relaxations of the basic structure given here; I will point outparticularly useful ones as necessary.Intuitively, a Markov chain is a sequence of random variables which change between afixed number of values or states. The next value of the chain depends only on the currentvalue of the chain, and not its entire history. The analogy of a “chain” is used to highlightthe fact that each period is “linked” to others only through the single connections formedby the immediate past and future. In the context of sales, the chain models the “true” stateof the world; whether or not in a given period the retailer is holding a sale or not andwhich regime is active. We can mathematically express this definition of a Markov chain,as used in this chapter, as follows:Definition 4.2.1. A Markov chain is a discrete-time stochastic process {Xt} which takes onk values in a finite set S, and evolves such that there is a fixed transition probability aijsuch that for any time period t ≥ 0, and states j, i0, i1, . . . , it−1 in S thatP (Xt+1 = i|Xt = j,Xt−1 = it−1, . . . , X0 = i0) = P (Xt+1 = i|Xt = j) = aijThat is, the conditional distribution of a future state given a history of states dependsonly on the current state. The value aij is the probability of the chain moving from state jto i, known as the transition probability.The fact that the transition probabilities do not change over time makes this a time-homogeneous Markov chain, as opposed to a situation in which the underlying chain changesover time. It is also worth noting that this is a finite, discrete time Markov chain, in contrastto one which exists in continuous time or has infinite states. The finite, discrete, natureof the states, and the movement of the chain in time, gives a convenient representationof the evolution of the Markov chain in a transition matrix, formed by collecting all of thetransition probabilities:1304.2. Background on Hidden Markov ModelsDefinition 4.2.2. Suppose Xt is a Markov chain. Then, the k-by-k matrix containing thetransition probabilities for all i, j in SA = [aij ]i,j∈Sis called the (one-step) transition matrix for the Markov chain.This is also why the states of a Markov chain are typically labelled 1, 2, . . . , k to coincidewith the rows and columns of the transition matrix. In many applications these statesmay be have real meaning, while in others they may simply be a data artefact; akin tothe meaningfulness of clusters in cluster estimation (see Cappé et al. (2005) chapter 1for a discussion). This kind of process is natural in many economic environments, suchas consumer and firm decision making. For example, many learning processes such asBanerjee (1992) turn out to be Markovian, in that the history can be summarized in sucha way that it only enters through the current state of the world. Implicitly, many models ofconsumer decision making are similarly structured, in that consumers base their decisionsonly on the current state of their finances and not on their past consumption choices.The property that the future state of the Markov chain only depends on the previousstate is an assumption; this kind of Markov chain is referred to as first-order Markov chain.Higher order chains, where the future state depends on more past values are also possible.However, attention is usually restricted only to first-order chains, since by a suitable re-definition, higher order Markov chains can be transformed into first-order chains in manyapplications. For example, in the context of sales, we may be interested in sales which areperiodic and become more likely after a certain number of periods. This will be discussedin more detail in Section 4.3.3.We can also introduce some notation for the state of a Markov chain. The distributionof states in a Markov chain evolves over time, according to the transition matrix. This isdefined as follows:Definition 4.2.3. The state distribution in a given Markov chain at time t is denoted pit,where∑k pit(k) = 1. In particular, the initial distribution is denoted pi1 = pi.Notice that definition, combined with the transition matrix, gives a convenient expres-sion for the dynamics of the states of a Markov chain:pit+1 = Apit1314.2. Background on Hidden Markov ModelsThis allows for easy calculation of distributions at arbitrary periods into the future, sinceby the Markovian condition, pit+k = Akpit. This summarizes the basic facts about a Markovchain necessary to introduce a hidden Markov model. Markov chains, by themselves, forman interesting and complex field of study, with many interesting features. I have focusedonly on the facts and properties necessary for this chapter; for more details, interestedreaders are referred to Ross (2014), Ching and Ng (2006), or Freedman (2012).4.2.1 Notation and BasicsWe know begin our development of a general hidden Markov model, using the machinerybuilt to understand hidden Markov chains in the previous section. At its most basic level,a hidden Markov model is a series of realizations of a pair of random variables {Xt, Yt}over a period of time t = 1, 2, . . . , T . However, only the variable Yt is observed at eachperiod, resulting a single series of observations {y1, y2, . . . , yT }. In what follows here, Iwill use the generally accepted notation of Rabiner (1989) and Cappé et al. (2005), whereappropriate.As explained earlier, the connection between Yt and Xt in a hidden Markov model isthat the underlying variable {Xt} is a Markov chain which evolves over time according toits transition matrix A, and it produces an observable output in the form of Yt. This leadsYt to often be referred to as the emission from the underlying state space. This processis sometimes depicted in graphical form as shown in Figure 4.1. In the context of sales,the underlying state of the world determines where or not there is a sale (Xt), while theMarkov model determines how this translates into observed prices (Yt). The mathematicalexpression of this notion is summarized in the following two assumptions, one on theobserved variable and one on the hidden variable:Assumption 4.2.1. {Yt}Tt=1, conditional on {Xt}Tt=1 is a sequence of independent randomvariables where the conditional distribution of Yt depends only on Xt (the current state).Assumption 4.2.2. The sequence of random variables {Xt} forms a Markov chain (withtransition matrix A).In order to give the emissions Yt some structure, we assume that they come from aknown family of distributions and depend on the hidden variable Xt. Each state of Xtselects a member from the family of distributions, which then results in the emission.1324.2. Background on Hidden Markov ModelsXt−2 Xt−1 Xt+0 Xt+1Yt−2 Yt−1 Yt+0 Yt+1f(·; θX)Figure 4.1: A graphical depiction of a first-order hidden Markov modelIn this chapter, I consider real-valued continuous random variables which come from aparametric family. We can make this definition precise in the following assumption:Assumption 4.2.3. Let {f(·; θ)|θ ∈ Θ} be a family of density functions on the real line,parametrized by θ and {θ1, θ2, . . . , θk} ⊆ Θ. Then, the distribution of Yt conditional on Xtis given by f(·; θXt).In our new notation, we can write that {Yt}Tt=1 is generated as a sequence of draws from{f(·; θXt)}Tt=1 where the sequence of distributions is given by the realization of {Xt}Tt=1.Each state selects a member of the parametric family θXt , which in turn produces a par-ticular emission for that state. In many applications (including sales in situations withvery stark pricing), these variables need not be continuous. The study of discrete-valuedemissions (known as “symbols”) is one of the major applications of hidden Markov modelsin computer science and signals processing (Callander (2007); Creal (2012); Frühwirth-Schnatter (2006)). For example, in sales, a sale period may translate into a distributionwith a low average price, while a non-sale period will have a higher average price.With the basic ideas pinned down, we can now define a hidden Markov model as a pairof random variablesXt and Yt which meet the assumptions laid out above. In mathematicalterms, I define a hidden Markov model as follows:Definition 4.2.4. A hidden Markov model is a sequence of random variables {Xt, Yt} overa period of time t = 1, 2, . . . , T , in which only {Yt}Tt=1 is observed, and which meet the1334.2. Background on Hidden Markov Modelsconditions of Assumptions 4.2.1, 4.2.2, and 4.2.3. The parameters associated with themodel are denoted Ψ ≡ (A, {θk}Kk=1, pi)From an applied point of view, if we wish to study a phenomenon using a hiddenMarkov model, the estimation of the model parameters Ψ is critical, since they fully char-acterize the HMM. Rabiner (1989) introduces three basic problems for analysing a HMM:1. Given an observation sequence y ≡ {y1, y2, . . . , yT } and a set of parameters Ψ, canwe calculate P (y|Ψ)?2. Given an observation sequence y and a set of parameters Ψ, what is the distributionof hidden states {X1, X2, . . . , XT } which produces the observed data?3. Given an observation sequence y and a calculation of P (y|Ψ), how do we adjust Ψ tomaximize P (y|Ψ)?The first and third problems basically amount to the standard question of maximumlikelihood: given a set of observed data, what is the probability of the data given themodel, and how do we maximize that probability? The second problem is particular to thehidden nature of a HMM. Inferring the most likely sequence of hidden states is important,first for calculation of the likelihood, but also because in many applications the states area key object of study. For example, in sales the first and third problem amount to correctlyspecifying the relationship between sales and prices. The second problem is recoveringwhich periods are sales or not. Unfortunately, direct calculation of the likelihood function(by summing over the unobserved hidden states) involves on the order of T ·KT elementaryoperations (Maruotti (2007))40.To overcome this difficulty, the key innovation in hidden Markov models, developed byBaum et al. (1970), is the forward-backward or Baum-Welch algorithm. Essentially, thismethod calculates the likelihood function in a convenient, inductive manner which alsofacilitates other calculations. This is explained in Rabiner (1989), specifically for discretesymbols, and I will illustrate it here for completeness in our environment.We can do this by introducing some new variables which will help us calculate theprobability of observed sequences of emissions, and their hidden states. The first variable40That is, for even a small HMM with 4 states observed for only 30 periods, there would be 3.4 × 1019additions and multiplications required for each evaluation of the likelihood function1344.2. Background on Hidden Markov Modelsto be considered is αt(k), which is defined the probability of the sequence of observationsup to time t, if the state at time t is k. Precisely:αt(k) ≡ P (y1, y2, . . . , yt, Xt = k|Ψ) (4.2.1)This term can is particularly useful because it can be defined inductively. Intuitively,the first time period, the probability of seeing a value y1 conditional on being in statek is just the probability of being in state k (given by pi, the initial distribution) and theemission distribution created by k. From here, we can simply use the transition matrix,the probabilities from the previous step, and the emission distributions, to iterate forwarduntil time T . That is:Basis: α1(k) = pi(k)f(y1; θk) 1 ≤ k ≤ KInduction: αt+1(j) =[∑Kk=1 αt(k)akj]f(yt+1; θj) 1 ≤ j ≤ KThe process ends with period T , at which point all of the observations have been ex-hausted, and the induction step gives the expression for the likelihood of the entire se-quence of observations by summing over the possible states:P (y|Ψ) =K∑k=1αT (k) (4.2.2)This resolves the first of the questions given above, yielding a simple way to calculatethe likelihood function. In order to answer the next question, about the hidden states, weneed to define a counterpart to the forward probability α. This “backward” term asks theopposite question: suppose we are in state k at time t. What is the probability of the futureobservations, conditional on this fact. Specifically, it we define this as:βt(k) ≡ P (yt+1, yt+2, . . . , yT |Xt = k,Ψ) (4.2.3)Again, this term is useful due an inductive calculation. We start by defining the term attime T as one, since at the last period there is no future states (and so any future path hasprobabilities 1). We can then iterate backwards in the same manner as we did for α, using1354.2. Background on Hidden Markov Modelsthe transition matrix and the future probabilities. This can be carried out as follows:Basis: βT (k) = 1 1 ≤ k ≤ KInduction: βt(k) =∑Kj=1 akjf(yt+1; θj)βt+1(j) 1 ≤ j ≤ KThe key use of β is not in calculating the likelihood function, but rather to infer thehidden states in the model. The probability of being in a given state, given the observations,is the most natural way of thinking about this. We call this distribution γ, defined as:γt(k) = P (Xt = k|y,Ψ) (4.2.4)That is, γt is the likelihood of being in a given state in period t, given the observed dataand the parameters of the model. Then, by Bayes’ rule, we can calculate that:γt(k) =αt(k)βt(k)∑Ki=1 αt(i)βt(i)(4.2.5)This allows for the recovery of the underlying states. The related question of what is themost likely set of states can be calculated in several different ways, as discussed in Rabiner(1989) or Cappé et al. (2005). The estimation of the transition matrix can then be recov-ered from the distribution of γ, by counting up the probability of transitions of states. Allof these methods generalize easily for multiple sequences of independently drawn Markovmodels, as explained in Cappé et al. (2005). Essentially, all of the above terms (α, β, γ)canbe calculated sequence-by-sequence, and the aggregate likelihoods calculated as the prod-ucts of the individual components. Transition matrices can be re-estimated by summingover all transitions for each sequence.The final problem, that of actually maximizing the likelihood functions calculatedabove, is more difficult. Essentially any method desired can be used, since given the abovediscussion, the underlying calculation of the likelihood function is straightforward. In mostapplications, a version of the EM algorithm is used, which is guaranteed to find a local max-imization of the likelihood function (See Bilmes et al. (1998) for a clear illustration of thisprocedure, or McLachlan and Peel (2004)). Along with proper initialization of the Markovchain, this will coincide with the true optimal set of parameters. The EM algorithm, in the1364.2. Background on Hidden Markov Modelscontext of a hidden Markov model is also called the Baum-Welch algorithm, in light of thespecific innovations necessary to make computation feasible in this environment.The existence and uniqueness of a global solution to this problem is a related and im-portant question: that of identification. Unfortunately, because much of the research in thisarea comes from a practical (engineering, computer science) point of view, many sourcesdo not explicitly make it clear what conditions are required. For most basic applications ofa hidden Markov model, researchers are concerned with a single sequence of observations,and the limiting results deal with the process as T → ∞. That is, we observe the Markovchain over an arbitrarily long period of time.The basic requirements placed on such a process concern the two key elements of themodel: the Markov chain and the emissions. We can think about this as follows: in order todetermine the transitions and other elements of the Markov chain, we need to observe (inthe population) a large number of transitions between states. We wouldn’t want a chainto get “stuck” in only a small subset of the states, since then we couldn’t learn about theother states. In other words, we want a chain which, as time goes on, visits every state aninfinite number of times. We also want to be able to tell the emissions apart; if we’re notsure about the state, the observed emission will be a combination of our best estimates ofthe emissions associated with the hidden states. We want to make sure that these emissionscan be teased apart statistically. Mathematically, the way to express these are as follows:Assumption 4.2.4. The Markov chain Xt is ergodicAssumption 4.2.5. The distribution formed by a mixture of at most K components of{f(·; θ)|θ ∈ Θ} is identifiable (up to relabelling of the components).Assumption 4.2.4 for a discrete, finite Markov chain has a convenient representation.Specifically, it is sufficient that two conditions are met: (1) that the Markov chain is irre-ducible, i.e. that it is possible to reach any state from any state (including itself) and (2)that there is some number N such that it is possible to reach a given state in at most Nsteps from any other state. In particular, this means that the Markov chain must eventuallyreturn to any given state an infinite number of times as t → ∞, as we intuitively desired.Assumption 4.2.5 highlights the close connection between hidden Markov models and fi-nite mixture models (see McLachlan and Peel (2004) for a discussion of these models). Aswe discussed, in a given period, the probability of a given yt is a mixture of the f(·; θk)distributions, weighted by γt, the distribution over hidden states. Since γt itself is a combi-nation of A and the values of θk, the assumption of identifiability (up to relabelling), this1374.3. Applications of Hidden Markov Models to Salesimplies that the mixture parameters and the weights are recoverable from the observeddistribution over time. The structure of the Markov model itself allows for a natural la-belling, since the underlying states have meaning and transit in a particular pattern. Thisallows for both A and θk to be recovered, provided that all states are reached; somethingwhich follows from Assumption 4.2.4.This also highlights why it is necessary to look specifically at sales: the assumption ofergodicity is not reasonable for most sale situations. In particular, the long-run variationcaused by regime changes means that it is not possible to to return to a past regime froma future one. This means there isn’t enough information available to calculate transitionprobabilities accurately, since each regime transition happens exactly once. In addition,we typically do not imagine that T → ∞ in the study of sales, since the time dimensionis usually small. These limitations necessitate an extension in order to use hidden Markovmodels for sales classification.Identifiability of a mixture distribution seems like a tall order, but fortunately, there aremany families of density functions which meet these conditions. As cited in McLachlan andPeel (2004), Titterington (1985) points out that most finite mixtures for continuous den-sity functions are identifiable in the above sense. One particularly useful category is that ofmultivariate Gaussian density functions, which not only are identifiable but also are par-ticularly computationally tractable, leading to their natural adoption in many applications,especially where no explicit alternative is obvious.This summarizes the basic material needed for the following study of the role of hiddenMarkov models in sales. Next, I show how a generalization of the basic hidden Markovmodel framework has can solve our problem with ergodicity as a necessary condition, andhas natural role in classifying and understanding sales. This requires the development ofsome conditions for more complicated hidden Markov models, with special attention paidto the role of identification on this extended environment.4.3 Applications of Hidden Markov Models to SalesHaving established the necessary background and notation, we can now begin to specifyour environment to its use in the classification of sales. As we have discussed earlier in thischapter, and in Chapter 2, the typical way sales are studied in industrial organization is atwo-stage process. First, observations are classified into sales, then the classified data is1384.3. Applications of Hidden Markov Models to Salesused to perform reduced-form analysis. Unfortunately, most classification techniques focuson either observation-by-observation classification, or small windows about a given obser-vation rather than the entire sequence of prices. They also typically do not use informationprovided by variables other than the price itself. Many are also unable to provide informa-tion about the uncertainty of a given classifier. This means that in many applications wemay be unsure of how a classifier performs, or under what conditions it is accurate. AsI note in Section 2.1, the classification method suggested by Nielsen-Kilts itself, for theirown dataset, does not perform well in many applications. In this section, I demonstratehow hidden Markov models form a natural way of classifying sales which overcomes thesedifficulties, and also can allow researcher to directly inspect features of interest withoutthe need for an additional reduced form analysis step.This technique allows the researcher to explicitly capture sales dynamics in the classifi-cation itself; either at a lower level, to provide a more “robust” version of the classificationtechnique developed in section 2.3, or at a more structural level to avoid reduced-forminference entirely if the problem is suitable. The drawback of this method is that it in-volves significantly more computational and analytical overhead than other alternatives.As discussed in Section 4.2, finding an initial starting point for the parameter estimation ina hidden Markov model is very important, since only local maxima are guaranteed by theBaum-Welch algorithm. While the literature suggests general heuristics for finding sucha point, in a specific application like this, best practice suggests that a basic classificationmethod must to be performed first in order to provide a robust starting point for estimation.Additionally, the technique developed is multiplicatively more cumbersome to use as thenumber of lagged variables within the model increases, as we will discuss in Section 4.3.3.In this environment, consider a representative product being sold at a single grocerystore over time. We observe the price Yt and a set of covariates Wt which may vary for eachtime period t = 1, 2, ..., T . I assume that the underlying data-generating process consistsof R pricing regimes, Rt ∈ {1, 2, 3, ..., R} ≡ R. These reflect the underlying channel pricesbeing offered to the grocery store over time for the product; they may rise, or fall, based onmarket conditions and relationships with suppliers. However, in particular, I assume thatthese trends occur at a level “higher” than the individual store; the regime changes areindependent of any individual store’s sales or performance. I also normalize the regimessuch that they occur in order; at time t = 1, Rt = 1 and if Rs > Rs′ then s > s′.This is a natural environment for a hidden Markov model. We have an underlying,hidden state of the world, which manifests in observable outcomes Yt and Wt that are1394.3. Applications of Hidden Markov Models to Salesgenerated through some process which is not directly observed. If we take this seriously,and suppose the probability of shifting from regime i to j follows a Markov chain with rateqij , this means that the transition matrix takes the block-diagonal form:Ar =q11 1− q11 0 . . . 0 00 q22 1− q22 . . . 0 00 0 q33 . . . 0 0.......... . . 1− qR−2,R−1 00 0 0 0 pR−1,R−1 1− pR−1,R−10 0 0 . . . 0 1(4.3.1)However, we have not yet introduced sales. As defined in the introduction, I assumethat a regime, in the context of this model, will consist of a pair of prices: one “regular”price µr ≡ p(r) and one potential discount δr ≡ δ(r). This results in two prices: a reg-ular and sale price. The discount will occur and the sale price change when a productintermittently undergoes a sale.Given the Markov structure of the pricing regime, we can incorporate sales by consid-ering them as a higher frequency variable layered on top of the slowly changing regime.For exposition, suppose that lagged variables are not important here (we will return to thistopic in Section 4.3.3). Then, we can consider the joint movement of regimes and sales asa large Markov chain by assigning to each period a unique state consisting of two elementsXt = (Rt, St) where St is the hidden variable indicating the presence of a sale in period t.In total there are 2R such states. The first would be (1, 0) consisting of regime 1 without asale. The second would be (1, 1) which consists of regime 1 with a sale, and so forth. Thishas an associated transition matrix with some restrictions on values, since not all states can1404.3. Applications of Hidden Markov Models to Salesbe reached from each value. This is a block diagonal matrix, as depicted in equation 4.3.2:A ≡(1, 0) (1, 1) (2, 0) (2, 1) . . . (R, 0) (R, 1)(1, 0) α11 α12 α13 α14 . . . 0 0(1, 1) α21 α22 α23 α24 . . . 0 0(2, 0) 0 0 α33 α34 . . . 0 0(2, 1) 0 0 α43 α44 . . . 0 0......(R, 0) 0 0 0 0 . . . αR−1,R−1 αR−1,R(R, 1) 0 0 0 0 . . . αR,R−1 αR,R(4.3.2)Then, with this chain defined, we can specify the price in state Xt = (Rt, St) as:Yt(Xt) = µRt + δRtSt + t (4.3.3)where t is a noise term. Essentially, the model assumes that prices are generated froma hidden Markov model with emissions governed by the distribution of the noise term. Theobjects of key interest are the parameters of this model, and the distribution of likely statesfor each observation γt.However, by inspection we can see the regime Markov chain given by Equation 4.3.1is “left-to-right” (or Bakis, see Rabiner (1989) or Cappé et al. (2005)); that is, we movebetween regimes at most once. This is generally the case with state space models where thetransitions model an evolving process. However, this is a problem: how can one correctlyrecover the transition probabilities if we only observe them once? Standard hidden Markovmodels make the assumption that the underlying Markov chain is ergodic (as discussed inSection 4.2), which implies that the long-run distribution of the states in the model matchesthe stationary distribution. In particular, all ergodic Markov chains are recurrent, and soevery state occurs infinitely often in the long run. This is not the case with left-to-rightMarkov chains.The necessary approach is to use cross-sectional variation to study these kinds of en-vironments. Essentially, by looking at several different Markov chains, rather than just asingle long one, we can infer the desired transition probabilities. However, our model here(Equation 4.3.2) is not explicitly left-to-right , and while some attention has been paid tothe estimation of these kinds of models, a careful development of identification and itsapplication to sales is still desirable. In the following section, I do exactly this, highlighting1414.3. Applications of Hidden Markov Models to Saleshow these kinds of models can be useful for economists.4.3.1 Identification of Block Left-to-Right Hidden Markov ModelsThis section establishes the econometric properties of hidden Markov models (HMM) ina generalization of the environment developed earlier in this chapter, then provides suffi-cient conditions for parametric identification of the structured Markov model described de-veloped earlier in this section. The notation and set-up is standard, following Section 4.2.However, because we now consider cross-sectional variation as well, we suppose we have apanel i = 1, 2, ..., N of observations Yit, each observed for a series of periods t = 1, 2, ..., T .In general, we will imagine that T is fixed, while N may vary. This is the opposite assump-tion for a standard HMM, which assumes N = 1 is fixed, while T is large and may vary.Let the unobserved variable be Xit, the elements of a stationary first-order Markov chainwith state space S = {1, 2, ...,K} and associated transition probability matrix A = [αij ]ij .Then, suppose {f(·; θ)|θ ∈ Θ} is a family of density functions on the real line and let{θ1, θ2, ..., θK} be elements of Θ.To make this a hidden Markov model, we assume that {Ynt}Tt=1 is generated as asequence of draws from {f(·; θXit)}Tt=1 where the sequence of distributions is given bythe realization of {Xit}Tt=1. This process occurs identically and independently for eachi = 1, 2, ..., N in the panel. In other words, we assume that each member of the panel isgenerated as a standard hidden Markov model. Furthermore, for generality, let’s assumethat the parameters of the model are (possibly) driven by some underlying parametric spec-ification φ: αij(φ),θi(φ). In the general case, φ is identical with the underlying parameterspace.This underlying specification is useful because we may have some structural parame-ters in mind for testing, which (through an economic model) can directly connect to thehidden Markov model. For example, suppose that we believe that regime r changes withprobability qr, but across regimes the probability of a sale is the same qs. Then, startingin regime r the probability of transitioning from a regular price into a sale in regime r is(1− qr)qs and the probability of transitions from a regular price into a sale in regime r+ 1is (1− qr+1)qs. These can be linked to the transition probabilities in the underlying model,and the sale probability recovered directly. This allows a researcher to make statementslike “the probability of a sale is qs” in a grounded and specific way41.41This also makes the communication of the results more transparent, since it avoids a discussion of the1424.3. Applications of Hidden Markov Models to SalesAs discussed, we are particularly interested in considering HMMs which are non-ergodic;their transition matrices typically are not recurrent. In this case, the initial starting point ofthe underlying Markov chain is very important. Denote the distribution of starting statesin the population by pi. Consider the following related definitions, which will be used tomotivate which states can be identified within the model:Definition 4.3.1. A transition matrix (of size K) is left-to-right if it takes the form:p11 1− p11 0 . . . 00 p22 1− p22 . . . 00 0 p33 . . . 0.......... . ....0 0 0 . . . 1where pii ∈ (0, 1) for i < K.Essentially, a left-to-right matrix is one in which the state can only move forward intime; never backwards, and although it may linger in particular states for a substantialperiod of time it will not remain there indefinitely.Definition 4.3.2. A transition matrix (of size K) is block left-to-right if it takes the formA11 A12 0 . . . 00 A22 A23 . . . 00 0 A33 . . . 0.......... . ....0 0 0 . . . Fwhere the Aij are square and have elements in [0, 1), and the sum by columns of Aijand Ai(j+1) is 1 (i.e. they form a well-defined transition matrix), and F is an absorbingstate (i.e. one in which the chain cannot move away from)A block left-to-right matrix is a generalization of the left-to-right form, in which the“states” in a standard left-to-right matrix are composed of sets of states in a larger matrix.Next, I introduce some conditions which will be useful for identification:Assumption 4.3.1. Identification Conditions:hidden Markov model framework1434.3. Applications of Hidden Markov Models to Sales• Condition 1: The family of mixtures of at most K components of {f(·; θ)|θ ∈ Θ} isidentifiable.• Condition 2: For each i, j, k the functions αij(φ) and θk(φ) are continuous, 1:1 map-pings.• Condition 3: For each i, the Markov chains are independent and identically generatedfrom an initial distribution pi.Some of these assumptions are familiar from the discussion in Section 4.2. Condition 1is identical to Assumption 4.2.5 for regular Markov chains, and the rationale is the same:it provides a way to identify the emission parameters and the mixing coefficients, whichare key parts of the Markov chain. Condition 3 is what allows us to use cross-sectionalvariation to capture the behaviour of a left-to-right Markov chain. Condition 2 allowsfor the identification of lower-level parameters from the structure of the hidden Markovmodel.The first result is a lemma, something of a folk theorem, which establishes the identifi-ability of left-to-right matrices.Lemma 4.3.1. Under Conditions 1,3, suppose the transition matrix A ≡ [aij ] is left-to-right, pi = (1, 0, ..., 0)′, and K < T . Then, we can identify (1) A, and (2) θk for all k.Proof. See appendix.This shows that a generic left-right transition matrix creates an identifiable Markovmodel. However, what if the structure is more flexible, as in our model of a block left-to-right model? In this case, it will depend on the initial vector pi and the structure of thechain. We can generalize the basic result using the notion of accessibility, which is definedfor our environment as:Definition 4.3.3. We say that a state j is accessible by T from state i if [aij ]T > 0. Similarly,a state is accessible from pi (a vector) if the state is accessible by a state associated withnon-zero member of that vector.Theorem 4.3.1. Suppose Conditions 1 and 3 hold. Let S′ be the set of states which areaccessible by T from pi. Suppose K < T . Then, we can identify: (1) [a′ij ], the sub-matrixof [αij ] composed of transitions between states in S′, (2) θk for k ∈ S′, and (3) the initialstate distribution pi.1444.3. Applications of Hidden Markov Models to SalesProof. See appendix.This result is very useful, because it generalizes the left-to-right structure. It meansthat if, during the period of observation (T ), there is a chance that a given transition (andstate) will be observed, then we can recover both the likelihood of that transition, andthe attendant state parameters associated with it. We can also extend this result to theunderlying parameter space, if it differs from the hidden Markov model, via the following:Corollary 4.3.1. Under Condition 2, the underlying parameter vector φ is identified for ahidden Markov model which meets the assumptions of Theorem 4.3.1.In other words, we can recover the underlying parameters which generate a hiddenMarkov model if the mapping is sufficiently smooth and 1:1. This is a necessary condition,but is certainly not sufficient. Weaker conditions are certainly possible (e.g. rank-orderconditions) given details of the relationship between the model and the more primitiveparametrization.We can also apply this result to the block left-to-right environment which we havedeveloped for sales, using the following result:Corollary 4.3.2. Under Conditions 1, 2 and 3, the parameters underlying a block left-to-right hidden Markov model which begins from block A11 are identified if (1) all the stateswithin a given block (r) are accessible from every other state that block within time tr (2)the sum of all such tr is less than or equal to T , and (3) K < T .This is a stronger result which depends on the particular structure of a block left-to-right matrix, in a similar way to Lemma 4.3.1. In this situation, since all states in a blockare accessible within a time which can be met within T , the block left-to-right model canbe identified completely.Corollary 4.3.2 is the key piece of theory we need to connect hidden Markov models tosales classification. The framework developed earlier in this section is a block left-to-righthidden Markov model which meets the three conditions of the theorem provided the timeframe is reasonably long. This means that a hidden Markov model such as the one wehave developed for sales is identifiable using cross-sectional data on these kinds of Markovchains. This means that we can determine the parameters which govern our model ofsales, thereby recover which periods are most likely to be sales: a classification method.However, what if there are underlying covariates or heterogeneity. For example, maybe we1454.3. Applications of Hidden Markov Models to Salesare considering a particular product, but perhaps at different stores? Additionally, how dowe incorporate information about other variables, like volume, into the determination of asale (and the associated prices)? This question is analysed in the following section.4.3.2 Heterogeneity and CovariatesSo far, we have only considered hidden Markov models which a single observable vari-able (Yit) and used the cross-sectional variation in i to identify the parameters of interest.However, it is unlikely in some applications to find a large panel of identical draws from asingle Markov model, meeting Condition 3. The first possible extension of the the modelis to include observable heterogeneity in i. For example, suppose that we can observethat products come from different chains of stores which are observable. In this case, wewould like to classify sales understanding that these different stores might offer differentprices. This section considers such situation. Denote the vector of observed panel-levelvariables by Wi. Then, we can admit the possibility that φ, the vector of underlying modelparameters, may depend on Wi, in the sense that alphaij(φi), θk(φi) and φi = φ(Wi).Then, we can consider the following replacement for Condition 3 in Assumption 4.3.1:Assumption 4.3.2. (Condition 3a) The observed heterogeneity Wi is independently andidentically distributed for all i. Denote the distribution of Wi by g(·).This assumption means that the heterogeneous component of the variation in Wi iscreated independently by a process independent of the model. That is, while there maybe heterogeneity which informs the manner in which the different components transit themodel, this variation is itself exogenous. We immediately have the following result:Corollary 4.3.3. Under Conditions 1,2, and 3a, the parameters underlying a block left-to-right hidden Markov model which begins from block A11 are identified if (1) all the stateswithin a given block (r) are accessible from every other state that block within time tr forall Wi (2) the sum of all such tr is less than or equal to T , and (3) K < T .This is the heterogeneous version of Corollary 4.3.2; the proof is identical. The intu-ition is that since Wi is exogenous, the model with heterogeneity is a combination of theunconditional model in proportions given by g(·). Since Wi is observable, g is known, andthe heterogeneity can be separated out and the identification carried out piecewise. Essen-tially, it is like identify the model separately for each Wi, the putting them together. The1464.3. Applications of Hidden Markov Models to Salescritical assumption is that the observed heterogeneity is independent of the model; thismeans, in particular, that it is independent of Xit and Yit, except through the parametersof the model. In particular, this restricts a result like where two different Wi have differentfamilies for their emissions, or transit a different number of states.42A more challenging problem arises if we consider the problem of time-varying covari-ates Wit. For example, we may not only observe pricing information for a given store, butperhaps volume or another set of variables as well. This variable might be informativeabout the presence of a sale or not in a given time period. As discussed in Section 4.2, thisis one of the major drawbacks of most sales classification methods: the inability to use allavailable information to determine whether something is a sale or not. The use of suchother information in classification also assesses the degree to which it is associated with asale in general, which is a direct way of testing many models of sales.In the hidden Markov model framework, the way to incorporate covariates is by treat-ing them as a joint emission from the underlying Markov chain: Yit ≡ {Yit,Wit}. SupposeWit consists of M − 1 > 0 variables. Then, we define {f(·; θ)|θ ∈ Θ} to be a family ofdensity functions on RM (and {θ1, θ2, . . . , θK} be elements of Θ, as before). Then, thecovariate-robust extension of the original hidden Markov model is to assume that {Yit}Tt=1is generated as a sequence of draws from {f(·; θXit)}Tt=1, where the sequence of distribu-tions is given by {Xit}, the states of the underlying Markov chain.Essentially, the key difference is that we consider all variables together as a set of jointemissions, and adjust the distributions and other parts of the model accordingly. Theassumption which needs to be adjusted is as follows:Assumption 4.3.3. (Condition 1a) The family of mixtures of at most K components of{f(·; θ)|θ ∈ Θ} is identifiable.If we replace Condition 1 with Condition 1a, Corollary 4.3.2 goes through withoutmodification. This assumption restricts the number of families which are acceptable foremissions from the underlying model. However, there are still a large number of choices:most notably, the multivariate Gaussian distribution meets the requirements of Condition1a. This generalization of the basic model means that the underlying model is identifiableusing the joint distribution of all the variables. In terms of sales, this means that sales canbe classified using this more complete set of information.42This is particularly restrictive for models like those discussed in Section 4.3.3; if regimes are the only majordifference between two models, the covariate with fewer regimes will be over-fitted, resulting in a spuriousset of regimes for this model.1474.3. Applications of Hidden Markov Models to SalesThis also means, as explained, that we can test the association of sales with differ-ent covariates. For example, suppose we were interested in knowing whether or not thevolume of product sold increased during a sale (in addition to price), and we chose a mul-tivariate Gaussian specification. Then, we could test whether volume increased during asale by comparing µW,X , the average parameter of the Gaussian distribution for volume(W ) when X is a sale or not. Under the null hypothesis of no relationship, the meansshould be the same in both states. The alternative hypothesis would indicate a differencein the means between the states, and would provide evidence for this variable being asso-ciated with sales. In general, if there are R regimes, this would require R comparisons. Amultiple-testing procedure like the Bonferroni correction can be used to correct the size ofthe test to take this into account if desired.While this allows a researcher to robustly test the relationship between underlyingstates and different variables, it does not necessarily make the causal relationship explicit.For example, it could be the case that a covariate is higher during a sale because the salecauses it to increase. However, it could also be the case that a higher level of that variableled to a sale. This method does not give insight into which of these is necessarily the case,in a similar way that a regression model is silent on the causal direction once the modelis specified. The researcher must rely on an underlying economic model or motivation tounderstand the implications of a model. However, most models provide predictions aboutcorrelations between given variables (like sales and volume), which this framework is idealfor testing.4.3.3 Higher Order Markov ProcessesOne important situation which does not fall within the framework developed in Sec-tions 4.3.2 or 4.3.1 is where we would like to include some kind of temporal dependencein the Markov chain. For example, in many regression models of sales, researchers willinclude lagged values of the sale indicator variable. This is usually to try to capture somemanner of periodicity in the underlying data generating process which is not recoverablefrom exogenous timing variables. For example, the model of Chapter 2 explicitly studiesthis kind of model.The framework we have developed relies exclusively on first-order, finite, left-to-rightMarkov chains, which do not admit longer order Markov dependence. However, as men-tioned in Section 4.2, this is largely without loss of generality (at least formally). We can1484.3. Applications of Hidden Markov Models to Salesinclude higher order Markov dependence, and therefore the presence of lagged state vari-ables, by converting the higher order Markov chain into a first order chain. This is done byredefining the state space so that each vector of past states (up to the order of the originalchain) is a unique state. This also then implies restrictions on the transition matrix, sincethen some states are by definition inaccessible from other states.For example, consider our basic model with a left-to-right structure for the pricingregimes. For exposition, suppose that the only lagged variable of importance is the unob-served sale variable last period. In other words, we are considering a second-order Markovchain. Then, we can imagine the system as a very large Markov chain by assigning a pe-riod a unique state consisting of three elements Xt = (Rt, St−1, St); there are in total 22Rsuch states. The first would be (1, 0, 0) consisting of regime 1 without a present sale, andwithout a sale last period. The second would be (1, 1, 0) which consists of regime 1 withouta present sale but with a sale last period, and so forth in this manner. This model has anassociated transition matrix with restrictions on values, since not all states can be reachedfrom each value. This is again a block diagonal matrix, with block elements of the form:(1, 0, 0) (1, 0, 1) (1, 1, 0) (1, 1, 1) (2, 0, 0) (2, 0, 1) (2, 1, 0) (2, 1, 1) . . .(1, 0, 0) p11 p12 0 0 p15 p16 0 0(1, 0, 1) 0 0 p23 p24 0 0 p27 p28(1, 1, 0) p31 p32 0 0 p35 p36 0 0(1, 1, 1) 0 0 p43 p44 0 0 p47 p48....... . .This redefinition of the higher-order Markov chain allows for all of the results of Sec-tion 4.3.1 and 4.3.2 to be applied to models with this kind of dependence. The fact thatthe identification and general method of application is identical to the simpler model issomewhat deceptive: this kind of dependence comes at a cost. Suppose a model has Rregimes and we consider a Markov process of order L. Then, the number of states in themodel is R · 2(L+1) states and approximately (L+ 1)R · 2(L+2) transition probabilities. Foreven relatively small orders of lagged dependence, this can become very large. From anidentification point of view, the condition that T > K means that lagged models requireexponentially more periods of observation than the basic model. From a practical point ofview, even if the identification condition is met, it may be the case that for a given set of1494.4. Monte Carlo Simulationsdata drawn from a model, very large amounts of data may be required to get accurate esti-mates of the many parameters of the model. This can lead to difficulty making inferences,and in particular recovering structural parameters or testing hypotheses as in Section 4.3.2.These kind of problems are particularly acute in the case where the lagged state variableis of a long duration. For sales, this may not be that reasonable, but it depends on thelength of the period being considered43. However, this is much more reasonable in thecase where the researcher believes that regime itself has some kind of internal dynamics.For example, it may follow a fragility process, wherein it is very unlikely to change shortlyafter creation, but then suddenly becomes exponentially more fragile after a certain pointin time is crossed. If this was modelled as straightforward time dependence, this wouldprobably become highly infeasible for most datasets, since a large number of lags wouldneed to be included in the Markov chain.An alternative is to take advantage of the hidden nature of the Markov model andinstead model the regime fragility as a sequence of states itself: one of lower order thanthe expected duration. In a left-to-right Markov chain, the mean duration in a given stateis approximately proportional to the inverse of the transition probability from that state44.This means that an appropriately granular underlying set of states with sufficient durationcan capture fragility concerns for regimes in a natural way. The method is identical to thetransformation of the process for lagged state variables above: merely define a state as thecombination of hidden variables.4.4 Monte Carlo SimulationsIn order to investigate the utility of this framework in a practical setting, in this sectionI report the results of a series of Monte Carlo simulations of sales. Generating simulateddata is useful, because it allows me to test the accuracy of the hidden Markov modelframework directly. We can compare the estimates from the model to the true underlyingparameters. I generate data using a process which is plausible for sales. To be specific, asdescribed in Section 4.3, there is a hidden regime and a hidden sale variable which jointlygovern the observed distribution of prices (and covariates). I model the emissions as being43For example, a lag of 10 periods may be unreasonable for weekly sales data, but might be very reasonablefor daily sales data.44In a general Markov chain, the expected duration in a state j conditional on starting from state i is definedby sij = δi,j +∑k Pikskj where δ is the indicator delta and P is the matrix of transition probabilities oftransient states. This can be solved as in Ross (2014) for the durations.1504.4. Monte Carlo Simulations(multivariate) normal and uncorrelated in terms of noise. As each regime changes, theregular and sale prices in the model change. The central objects of interest are (1) theprices (sale and otherwise) imputed to the different situations and (2) the classificationof observations into sales. For simplicity, I focus on models which start in regime 1 at aregular prices (that is, pi is known); investigation of alternative simulations indicate thatthis assumption is not critical to the results.For the basic model (Model 1), I choose R = 4 regimes, and consider only first-orderMarkov behaviour, resulting in eight total states. I assume a panel size of N = 500 andsuppose this is observed weekly for four years, resulting in T = 208 time periods. Motivatedby Chapter 2, I suppose sales happen every four weeks, along with idiosyncratic randomsales. The fragility of the regimes is q = 1104 , resulting in an average regime change everyone to two years. Sale prices within a regime are a $5 discount from the regular price,which follows an inflationary trajectory (25, 28, 30, 32) as the regimes change. It is worthnoting that this model meets the assumptions required for Corollary 4.3.2 to hold.I estimate the model using the techniques developed in section 4.2, implemented for thepanel structure of the data. As a starting point, I choose locations near the true parameters.In order to calculate confidence intervals (and standard errors), I use 500 panel bootstrapre-estimates of the sample. These results are robust to other bootstrap sizes. My results forthe basic model are presented in Table 4.1. In short, the model does a remarkably goodjob. The means and deviations of the emissions are all accurate, being close to the trueparameters. This means that the price imputation from the model is essentially correct.More importantly, from a classification point of view, the fit of γ, the predicted state foreach observation, is essentially perfect - nearly 100% accuracy45 One point to note is thatbecause the fragility of each regime is q = 1104 , this means that the expected number ofregime changes is two during the 208 time periods of the model. This means that not allof the panels each the fourth regime. This is why we see increasing standard errors for thevariables as the regime numbers increase: every panel reaches regime 1 but only a quarteror so reach regime 4.In order to pressure the framework, I also simulated a more complicated model (Model2). This model retains the sample and panel sizes (N = 500, T = 208) but greatly reduces45I calculate accuracy for γ by comparing the true state to the vector of probabilities imputed by the model.For example, if we represent the distribution across states as a vector v = (v1, v2, . . . , vN ) (for N states) andthe true state by t = (t1, t2, . . . , tN ), then the error rate is 12∑Nn=1 |vn − tn|, which can then be averaged overall observations to get a fit. For example, if every predicted state was 75% correct, then the fit would be 75%.1514.4. Monte Carlo SimulationsRegime 1 Regime 2 Regime 3 Regime 4Estimate Actual Estimate Actual Estimate Actual Estimate Actualµr 24.999 25.000 27.999 28.000 29.999 30.000 31.998 32.000(0.0006) - (0.0006) - (0.0009) - (0.0011) -µs 20.000 20.000 23.000 23.000 25.002 25.000 28.002 28.000(0.0009) - (0.0011) - (0.0013) - (0.0016) -σr 9.9 10 10.1 10 9.9 10 10.4 10(1000-ths) (0.080) - (0.091) - (0.127) - (0.158) -σs 9.8 10 9.9 10 10.2 10 10.3 10(1000-ths) (0.125) - (0.160) - (0.213) - (0.254) -γ-fit 100.00%Table 4.1: Estimates relative to Monte Carlo Simulation: Model 1 (Basic), no Covariatesthe separation between the different regime prices. I choose R = 4 regimes, with the samefragility of q = 1104 as before. However, I deliberately make the regimes more difficult totell apart. The regular prices are set at (25, 25, 30, 32) while the sale prices are chosen tobe (22, 20, 25, 25). This means that regimes 1 and 2 have the same regular prices, and onlydiffer in their sale price (by $2). Additionally, regimes 2 and 3 have the feature that thesale price of regime 3 is the regular price of regime 3. This kind of switch-over is difficultfor many classification methods (as discussed in Chapter 2 to handle). I estimate the modelin the same manner as for the basic model, with no change in the procedures.The results are presented in Table 4.2. As we can see, the results remain broadlysimilar to the simpler model. The parameter estimates for the means and variances of theemissions in each of the four regimes are very close to their true values. This is despite thefact that these were chosen to be deliberately misleading and difficult to infer from regimeto regime. Consequentially, the fit γ is also very good; 99.4% on average. The majority ofthe errors occurred in either Regime 1 or Regime 2, and were generally small. In general,this implies that the classification of the model (especially in terms of sales) remains veryaccurate. We will return to this point in the next section, where we look at the inclusion ofcovariates and the consequences of this extension.4.4.1 CovariatesIn this section, I examine the role covariates can play in the specification of a hiddenMarkov model. As discussed in Section 4.3, one of the major roles (besides classification)1524.4. Monte Carlo SimulationsRegime 1 Regime 2 Regime 3 Regime 4Estimate Actual Estimate Actual Estimate Actual Estimate Actualµr 24.9996 25 24.9996 25 29.9994 30 31.9988 32(0.0006) - (0.0006) - 0.0009 - (0.0011) -µs 22.0003 22 20.0001 20 25.0020 25 25.0026 25(0.0009) - (0.0011) - 0.0013 - (0.0016) -σr 9.9207 10 9.9032 10 9.9123 10 10.3615 10(1000-ths) (0.0798) - (0.0907) - 0.1271 - (0.1581) -σs 9.8494 10 10.1299 10 10.1766 10 10.3343 10(1000-ths) (0.125) - (0.1592) - 0.2145 - (0.2585) -γ-fit 99.40%Table 4.2: Estimates relative to Monte Carlo Simulation: Model 2 (Complex), no Covariatesthat hidden Markov models can play in the analysis of sales is by evaluating the associationof different covariates and the (hidden) sale variable. This provides additional power, sinceit lets other associated variables guide the model’s definition of a sale, and not just theprice.I first extend Model 1 by including two covariates of interest. The first, X1, is correlatedwith sales; we can imagine it as volume. In periods without a sale, it has a mean valueof 0.66. In periods with a sale, however, it jumps up to a mean of 1.52. The secondvariable, X2 is uncorrelated with sales. In every period, regardless of whether it is a saleperiod or otherwise, it takes on an average value of 0.86. I estimate this new model in thesame manner as before, except that I also include these covariates as emissions from theunderlying state variables.The results of the estimation are presented in Table 4.3. In this table, I omit the esti-mates of the noise variance; they are also accurate and similar to that of Table 4.1. Theyconfirm the predictions of the model, and agree with the true interpretation of the covari-ates. As we can see, similar to the result from the basic model, we have extremely goodestimates of the average price in each regime and sale state. As we would expect from theprevious results, this also leads to a nearly perfect fit of the γ probabilities, indicating thatthe classification of observation into sales is accurate. The results for the covariates arejust as good: accurate predictions of the means and variances of the emissions. Equally im-portantly, the accuracy of these variables also means that the interpretation of the hiddenMarkov model agrees with the true interpretation of the variables. A hypothesis test (eitherindividually or jointly) robustly rejects the null hypothesis that µX1,r = µX1,s at standard1534.4. Monte Carlo Simulationsconfidence levels. This indicates that the X1 variables significantly co-varies along withthe hidden sale variable. Similarly, a hypothesis test of µX2,r = µX2,s is not rejected atmost confidence levels. This would imply to a researcher that X1 is associated with saleswhile X2 is not; a conclusion which is correct. We can also see the fact repeated from thebasic model that as the regimes increase, variance also rises, consistent with the relativelyscarcer data in later regimes.As with the basic model without covariates, I also estimate the intentionally compli-cated model (Model 2) with the inclusion of covariates. I include X1 and X2, generatedprecisely as in the basic model, with the same means and variances. I also include anothercovariate, X3 which is an indicator variable for the presence of being in the fourth pe-riod. The idea of this covariate is to highlight how indicator variables for certain periodicpatterns, as in Chapter 2, can be included in the model. The results are presented in Ta-ble 4.4. They highlight a few things: first of all, even with the more complicated model, thefit remains very good. The hidden Markov framework remains able to correctly recover themeans of the prices and the other of the generating variables from the model. The standarderrors of the fits generally increase, mainly due to uncertainty over some of the state fittingin the model. The variances are not presented in this table, but are similarly accurate.The periodic variable, X3 which was not generated as a multivariate normal emission, alsoperforms well. In particular, it correctly captures the fact that there is a 4-period patternin sales. Sales are 86% more likely in a sale period than in a non-sale period. Given thatthere are 3 non-sale periods, each with a 5% chance to have an idiosyncratic sale, this isgood assessment of the relative probability.More importantly, we can see that even within a framework designed to confuse theestimation, the γ fit remains high at 99.5%, meaning that most of the states are accuratelyimputed by the model. The fact that this is marginally better than using pricing aloneindicates that the inclusion of more covariates can help classify sales more accurately thanjust using price alone, a fact typically overlooked by most classification methods. Lookingmore in-depth, we can see that the model primarily made one of two mistakes: eitherconfusing Regime 1 and Regime 2 or Regime 3 and Regime 4, as predicted. The first erroris much more common, mainly because Regimes 1 and 2 are more frequent, as are regularprices. Importantly, most of the errors also came from the regular prices, with only about25% of the error rate (0.1%) attributable to the sale periods. This is likely because salesare less frequent; if we correct for this, errors are still less common for sales, albeit notas dramatically (approximately equally likely). The typical size of the mistakes is also not1544.4. Monte Carlo SimulationsFigure 4.2: Histogram of State Errors, Model 2 (Complex), Covariateslarge: on average, errors are not complete failures to impute the state, being only a mistakeof about 36.1%. On average, the state imputed is more correct than not, and very few arehighly incorrect. I illustrate the distribution of errors by size (1 = 100%) as a histogramin Figure 4.2. This fact implies that most methods, such as the naive γ best-fit method orthe Viterbi algorithm (as discussed in Rabiner (1989)) would correctly impute the correctstates. It also means that classification on such a basis, such as simply choosing the mostlikely state, would be very likely to classify the states correctly.4.4.2 Higher Order Markov ProcessesIn this section, I consider the performance of hidden Markov models for examining higher-order processes, as developed in Section 4.3.3. The first model is an extension of thebasic model for lower order processes. I retain R = 4 regimes, but consider third-orderMarkov behaviour, resulting in sixteen total states. Basically, we assume that sales occurfour weeks after the last sale, in addition to a 5% idiosyncratic chance each period. Thisis different than the absolute reference point (every four weeks) used in the first-orderprocess considered before. I continue to use a panel size of N = 500 observed for T = 208time periods. The fragility of the regimes remains q = 1104 . Sale prices within a regime are1554.4. Monte Carlo SimulationsRegime 1 Regime 2 Regime 3 Regime 4Estimate Actual Estimate Actual Estimate Actual Estimate Actualµr 24.999 25.000 27.999 28.000 29.999 30.000 31.998 32.000(0.0006) - (0.0006) - (0.0009) - (0.0011) -µs 20.000 20.000 23.000 23.000 25.002 25.000 28.002 28.000(0.0009) - (0.0011) - (0.0013) - (0.0016) -µX1,r 0.660 0.66 0.660 0.66 0.659 0.66 0.660 0.66(0.0002) - (0.0002) - (0.0003) - (0.0003) -µX1,s 1.519 1.52 1.519 1.52 1.519 1.52 1.519 1.52(0.0003) - (0.0003) - (0.0004) - (0.0006) -µX2,r 0.860 0.86 0.860 0.86 0.860 0.86 0.859 0.86(0.0002) - (0.0002) - (0.0003) - (0.0003) -µX2,r 0.860 0.86 0.859 0.86 0.859 0.86 0.860 0.86(0.0003) - (0.0003) - (0.0004) - (0.0006) -γ-fit 100.00%Table 4.3: Estimates relative to Monte Carlo Simulation: Model 1 (Basic), Covariatesthe same: a $5 discount from the regular price, which follows an inflationary trajectory(25,28,30,32) as the regimes change. However, there are now three regular price stateswithin a regime, and one sale state. The emissions from the model are Gaussian, and Idenote the average of the emission by µr, L for regular prices at a given lag L. I alsoinclude the two covariates, X1 and X2, used before, with the same generating process: X1is either 0.66 or 1.52 on average in non-sale and sale periods, while X2 is always 0.66 onaverage in every period.The results are displayed in Table 4.5. As before, I present results for the means ofthe emission variables; the variances are also similarly accurate. The model once againperforms extremely well. As we can see, the model correctly imputes the prices associatedwith the true values in all of the states, and across regimes. It also correctly estimatesthe emissions associates with the other covariates, X1 and X2 in all of the states. Thisalso means that the fit of the predicted states γ is essentially completely accurate. This isremarkable for two reasons: first, it means that classification, even in much more compli-cated environment, is just as accurate as in the simplest model. More importantly, however,it also means that the model is able to correctly impute states which are detectable onlythrough their transitions. Note that there is no observable difference, in terms of emissions,between µr, L1 and µr, L1. The only way to detect these states is through (1) the struc-1564.4. Monte Carlo SimulationsPrice X1 X2 X3Mean Estimate Actual Estimate Actual Estimate Actual EstimateRegime1 µr 24.9996 25 0.6601 0.66 0.8601 0.86 0.0000(0.0006) - (0.0002) - (0.0002) - (0.0000)µs 22.0003 22 1.5197 1.52 0.8601 0.86 0.8736(0.0009) - (0.0003) - (0.0003) - (0.0026)Regime2 µr 24.9996 25 0.6600 0.66 0.8600 0.86 0.0000(0.0006) - (0.0002) - (0.0002) - (0.0000)µs 20.0001 20 1.5196 1.52 0.8599 0.86 0.8753(0.0011) - (0.0003) - (0.0003) - (0.0032)Regime3 µr 29.9994 30 0.6595 0.66 0.8601 0.86 0.0000(0.0009) - (0.0003) - (0.0003) - (0.0000)µs 25.0020 25 1.5199 1.52 0.8597 0.86 0.8800(0.0013) - (0.0004) - (0.0004) - (0.0041)Regime4 µr 31.9988 32 0.6603 0.66 0.8599 0.86 0.0000(0.0011) - (0.0003) - (0.0003) - (0.0000)µs 25.0026 25 1.5192 1.52 0.8604 0.86 0.8737(0.0016) - (0.0006) - (0.0006) - (0.0052)γ-fit 99.5%Table 4.4: Estimates relative to Monte Carlo Simulation: Model 2 (Complex), Covariatesture of the left-to-right model and (2) their estimated correlation with emission-correlatedstates (like sales). This means that even very subtle modelling assumptions for sales canbe captured accurately by a hidden Markov model, provided sufficiently good initial valuescan be supplied to start the model near the global maximum.In Table 4.6, I present the results for the complex version (Model 2) of the parameters,which are intentionally designed to be difficult to impute sales from. As we can see, theresults remains very good. The hidden Markov model correctly infers the means of theprice variables, as well as the covariates. The estimates for the variances are also similarlyaccurate. As a consequence, the fit for the model is also very good, with most errors beingless than 50%. This means that, as discussed earlier, methods for imputing the true state(and classifying sales) will typically be correct. Generally, the errors arise due to the modelbeing unsure about which price belong to which regime, which is compounded by the factthat some of the underlying states are hidden (based only on the distance from the mostrecent sale). Overall, the model performs generally very well, even in the complicatedenvironment with many variables necessary to estimate.1574.4. Monte Carlo Simulations4.4.3 Small Sample PerformanceIn this section, I examine a question of practical, rather than theoretical, interest: howuseful is the hidden Markov framework for classification of sales when a researcher doesnot have very many observations? The identification results developed in section 4.3.1relied on the panel structure of the data to get precise predictions, and in the precedingMonte Carlo investigation, I used samples of size N = 500 to simulate the model. However,in many applications we may only have a limited amount of data: N = 1 is a commonsituation. In this section, I repeat my Monte Carlo exercise, but looking specifically atsmall sample sizes.I consider the complex model (model 2), with T = 208 and N = 1. I choose onlysamples which reach the fourth regime (making the specification correct), and I repeat theestimation carried out in the previous section. Of the 150 candidate samples, 5 (3.3%)fail to converge during estimation. This mainly occurs when a regime (typically Regime 4)is too short for estimation to be carried out well; the resulting local maximum is too farfrom the initial point for estimation to converge. Looking at the remainder of the sampleshows the results are still fairly good: in terms of classification, the fit is approximately94.4% accurate on average. The accuracy varies between about 89% and 98% within thepanel, which is depicted in Figure 4.3. This implies that the classification, even with muchsmaller samples than required, still performs very well on average. However, fit errors,when they occur, tend to be complete: that is, states are mistaken for one another nearlycompletely. This means that errors, when they occur, are likely to change the classificationof the mistaken state. However, most of these occur as expected between regimes and notbetween the sales themselves.The fit for the means of the emission variables are similarly accurate. On average, themeans of the emission variables have an error rate46 of 1.03%. They range between 0%and 2% within the samples considered; their distribution is depicted in Figure 4.4. This islargely expected, given the results for the overall classification fit, above. However, this stilldemonstrates that sale prices can be properly imputed, as can the values of associated co-variates. This implies that even for small samples the techniques and inferences developedin Section 4.3.2 can be applied.I finally consider the small sample performance of the classification method with ahigher-order Markov chain. I continue to use model 2, the complex version, this time re-46Calculated as the %-difference between the estimated value and the true value1584.4. Monte Carlo SimulationsFigure 4.3: Histogram of γ-fit Errors in Small Sample Estimates (Model 2)peating the analysis in the preceding section for N = 1, as above. In this model, there arealso 150 candidate samples, of which 5 (3.3%) also fail to converge for similar reasons asin the more basic model. Of the remainder, the γ-fit is 99.2% accurate, and the fits rangebetween 97% and 100% accurate. This indicates that even with the more complicatedmodel driving the data, the fit remains very good. This carries over to the means of theemission variables, which show an average error rate of 1.62%, ranging between approx-imately zero and 2.5%. The overall fit of the model is comparable to the simpler model,without the lagged covariates.This demonstrates the utility of the overall framework, even with very small samplesizes. The robustness and strength of the hidden Markov model framework, structuredcarefully to be used for sales, will perform well even in environments in which it is notguaranteed to perform well. However, it is still very important to correctly specify theunderlying model for the data generating process, in order to structure the model properly.It is equally important to provide a good initial condition for the estimation, so that it canboth efficiently and correctly locate the local maximum corresponding to the true values ofthe parameters.1594.5. ConclusionFigure 4.4: Histogram of µ Percentage Errors in Small Sample Estimates (Model 2)4.5 ConclusionIn this chapter, I have developed and investigated a new method for classifying sales fromdata. Based on the hidden Markov model framework, this method is both intuitive eco-nomically and robust empirically. The underlying structure of the model closely mirrorsthe way sales intuitively arise in the economy, and conditions necessary for this frameworkto work are both natural and practical in many applications. The idea that sales manifestas co-movements in prices and other variables in the economy is a natural insight, and onewhich my method takes full advantage of. By using both price information as well as othervariables, my method is able to provide a careful and capable classification procedure forsales, which uses all of the available information to infer whether a given observation is asale or not. This also allows researchers to side-step the need for reduced-form analysis inmany cases. Since many important questions about sales have to do with the correlationsbetween a sale and a given covariate, all that is necessary to examine these relationshipsis a measure of that covariance. My classification method allows researchers to do thisdirectly, jointly estimating both the covariance of a given variable with sales with the clas-sification of sales itself. This eliminates the need for the regression framework used in1604.5. ConclusionFigure 4.5: Histogram of γ-fit Errors in Small Sample Estimates, Higher Order (Model 2)many studies, and is a powerful tool for understanding sales. It also provides a naturaldefault to for sales classification, this time based on explicit economic reasoning and withclear identification requirements; it answers the question “which method should we use”by providing a powerful and robust framework which works in many environments.My technique is particularly well-suited to sales environments typical of retail scannerdata. Choosing products which are observed for a long period of time, and have similarpricing structures is a natural fit for this model. For example, products like beer or softdrinks, which have clear competitive structures governing their prices are an excellentproduct group for study. Alternatively, chains of stores with homogeneous prices are alsoa natural fit, since then a given product across the stores will help provide a natural panelfor study.As my Monte Carlo investigation shows, the method is highly accurate both at classi-fying sales from data and recovering the underlying relationships which appear alongsidesales. This is true with both complicated environments, including many lagged state vari-ables, and small sample sizes (something not guaranteed by the theory). However, thisrequires a good initial position for the model to estimate from, including accurate stateclassification and emission estimates. Many alternatives, such as that developed in Chap-1614.5. ConclusionFigure 4.6: Histogram of µ Percentage Errors in Small Sample Estimates, Higher Order(Model 2)ter 2, provide such a method in certain circumstances. A researcher can run a lower-levelclassification scheme to provide initial conditions, then run the method developed in thischapter to obtain a final classification. In many situations this is not markedly more com-plex than running this method alone.This chapter also provides room for additional research, both theoretically and empir-ically. Theoretical work on the small scale properties of the estimation remains an openquestion, as does the consequences of miss-specification of the underlying model. Addi-tionally, the key requirement for the use of panel data was the lack of ergodicity in mostsales data. However, it is possible that a transformation of the data could restore thisproperty. This would be advantageous since it would allow the more robust set of toolsdeveloped for ergodic hidden Markov models to be applied here, and it would make thetechnique now widely applicable. Empirically, the application of this to more types of saleenvironments will help provide more information on robust and overall performance onreal-world data. Performing a similar exercise as this with actual sale information gatheredfirst-hand from stores would be an interesting counterpoint to this chapter.1624.5. ConclusionPrice X1 X2Variable Estimate Actual Estimate Actual Estimate ActualRegime1µr, L1 25.000 25 0.6606 0.66 0.8602 0.86(0.0009) - (0.0003) - (0.0003) -µr, L2 25.000 25 0.6600 0.66 0.8600 0.86(0.0009) - (0.0003) - (0.0003) -µr, L3 24.999 25 0.6598 0.66 0.8602 0.86(0.0009) - (0.0003) - (0.0003) -µs 19.999 20 1.5196 1.52 0.8599 0.86(0.0009) - (0.0003) - (0.0003) -Regime2µr, L1 27.998 28 0.6596 0.66 0.8598 0.86(0.0011) - (0.0003) - (0.0003) -µr, L2 28.001 28 0.6604 0.66 0.8600 0.86(0.0012) - (0.0003) - (0.0003) -µr, L3 28.000 28 0.6604 0.66 0.8602 0.86(0.0012) - (0.0004) - (0.0003) -µs 23.000 23 1.5192 1.52 0.8600 0.86(0.0011) - (0.0003) - (0.0003) -Regime3µr, L1 29.999 30 0.6591 0.66 0.8603 0.86(0.0013) - (0.0004) - (0.0005) -µr, L2 30.001 30 0.6595 0.66 0.8592 0.86(0.0016) - (0.0005) - (0.0004) -µr, L3 30.000 30 0.6602 0.66 0.8603 0.86(0.0016) - (0.0005) - (0.0004) -µs 25.001 25 1.5196 1.52 0.8602 0.86(0.0015) - (0.0004) - (0.0004) -Regime4µr, L1 32.003 32 0.6596 0.66 0.8598 0.86(0.0018) - (0.0006) - (0.0006) -µr, L2 31.996 32 0.6605 0.66 0.8605 0.86(0.0019) - (0.0006) - (0.0005) -µr, L3 32.001 32 0.6599 0.66 0.8602 0.86(0.0023) - (0.0006) - (0.0006) -µs 28.000 28 1.5202 1.52 0.8598 0.86(0.0016) - (0.0006) - (0.0006) -γ-fit 100.00%Table 4.5: Higher-Order Markov Chain: Model 1 (Basic), with Covariates1634.5. ConclusionPrice X1 X2Variable Estimate Actual Estimate Actual Estimate ActualRegime1µr, L1 25.0005 25 0.6606 0.66 0.8602 0.86(0.0009) - (0.0003) - (0.0003) -µr, L2 25.0005 25 0.6600 0.66 0.8600 0.86(0.001) - (0.0003) - (0.0003) -µr, L3 24.9992 25 0.6599 0.66 0.8602 0.86(0.001) - (0.0003) - (0.0003) -µs 21.9990 22 1.5196 1.52 0.8599 0.86(0.001) - (0.0003) - (0.0003) -Regime2µr, L1 24.9974 25 0.6597 0.66 0.8598 0.86(0.0011) - (0.0003) - (0.0003) -µr, L2 25.0008 25 0.6604 0.66 0.8600 0.86(0.0012) - (0.0003) - (0.0003) -µr, L3 25.0003 25 0.6604 0.66 0.8602 0.86(0.0012) - (0.0004) - (0.0003) -µs 20.0004 20 1.5192 1.52 0.8600 0.86(0.0011) - (0.0003) - (0.0003) -Regime3µr, L1 29.9994 30 0.6592 0.66 0.8603 0.86(0.0013) - (0.0004) - (0.0005) -µr, L2 30.0005 30 0.6595 0.66 0.8592 0.86(0.0016) - (0.0005) - (0.0004) -µr, L3 30.0001 30 0.6602 0.66 0.8603 0.86(0.0015) - (0.0005) - (0.0004) -µs 25.0005 25 1.5196 1.52 0.8602 0.86(0.0015) - (0.0004) - (0.0004) -Regime4µr, L1 32.0026 32 0.6596 0.66 0.8598 0.86(0.0018) - (0.0006) - (0.0006) -µr, L2 31.9965 32 0.6605 0.66 0.8605 0.86(0.0019) - (0.0006) - (0.0005) -µr, L3 32.0011 32 0.6599 0.66 0.8602 0.86(0.0023) - (0.0006) - (0.0006) -µs 24.9997 25 1.5202 1.52 0.8598 0.86(0.0016) - (0.0006) - (0.0006) -γ-fit 99.48%Table 4.6: Higher-Order Markov Chain: Model 2 (Complex), with Covariates164Chapter 5ConclusionIn this dissertation, I have contributed to the literature on industrial organization in severalrelated ways. First, Chapter 2 looks in a detailed fashion at the role of sales in the pricingof perishable products. It makes several contributions, first by establishing that periodicsales are an important part of the pricing of perishable products. This is the opposite of thepredictions several models of sales make for these kinds of products, and necessitates thesecond contribution: developing a model which can explain the periodic nature of salesfor these kinds of products. I also demonstrate how this model fits into the set of othermodels which are used to study sales, and highlight some of the pros and cons of usingthese models. Next, I go beyond the theoretical model to show how the central causalconnection necessary is supported by consumer choice data. This highlights the important,and largely overlooked, role that joint retail and consumer data can play in the study ofsales. This also demonstrates the need for flexible heuristic tools to classify and study sales;a subject I return to in Chapter 4.I next move from the area of traditional retail sales to the emerging world of sellingproducts (or developing products to sell) through crowdfunding in Chapter 3. After re-viewing the history and economic content of crowdfunding, I focus on the specific contextof consumer crowdfunding. After defining this unique type of fundraising, I provide someof the first evidence that large contributions play a very important role in the financing ofcrowdfunding projects. This is interesting because not only does it have important policyconsequences for the regulation and supervision of crowdfunding, but also goes against theway crowdfunding portrays itself as largely driven by many small consumers. This chapterdemonstrates that large contributors are both very common, but also appear to follow arational economic motivation: they seek to try and help projects succeed in reaching theirgoal. I develop an intuitive model of this behaviour, based on a modified consumer choiceframework, which succeeds in capturing the main features we observe for large contri-butions. Next, I try and assess the extent to which large contributions are important inhelping projects succeed. Using an instrumental variables framework, I find that they are165Chapter 5. Conclusionhighly important; inverting the standard logic of crowdfunding. I also show that this is notpurely due to their size, but also their timing, demonstrating sophistication on the part oflarge contributors.In Chapter 4, I return to the question of sales, this time to consider their classificationand the nature of reduced-form study in this area. I develop a new method for classificationbased on a hidden Markov model, which has two key advantages over other methods: first,it uses all of the information available, including not just the entire price sequence but alsoother variables of interest. Second, it allows for estimation of the connection between salesand other covariates jointly with the classification of sales. This allows for many interestingquestions to be answered without the need for reduced-form estimation and its attendantuncertainties. I also demonstrate empirically that the performance of these kinds of modelsis very good, even in complicated environments. This shows the utility of my classificationmethod for many areas of study, and highlights a new tool available for the economic studyof topics like sales in industrial organization.166BibliographyAaker, D. A. (1996). Measuring brand equity across products and markets. Californiamanagement review, 38(3):102–120.Agrawal, A. K., Catalini, C., and Goldfarb, A. (2013). Some simple economics of crowd-funding. National Bureau of Economics Working Papers.Ali, S. N. and Kartik, N. (2006). A theory of momentum in sequential voting. UC San Diego,Stanford University.Banerjee, A. V. (1992). A simple model of herd behavior. The Quarterly Journal of Eco-nomics, pages 797–817.Baum, L. E. and Petrie, T. (1966). Statistical inference for probabilistic functions of finitestate markov chains. The annals of mathematical statistics, 37(6):1554–1563.Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occur-ring in the statistical analysis of probabilistic functions of markov chains. The annals ofmathematical statistics, 41(1):164–171.Belleflamme, P., Lambert, T., and Schwienbacher, A. (2013). Crowdfunding: Tapping theright crowd. Journal of Business Venturing.Berck, P., Brown, J., Perloff, J. M., and Villas-Boas, S. B. (2008). Sales: tests of theories oncausality and timing. International Journal of Industrial Organization, 26(6):1257–1273.Bilmes, J. A. et al. (1998). A gentle tutorial of the em algorithm and its application toparameter estimation for gaussian mixture and hidden markov models. InternationalComputer Science Institute, 4(510):126.Bils, M. and Klenow, P. J. (2002). Some evidence on the importance of sticky prices.Technical report, National Bureau of Economic Research.167BibliographyBlattberg, R. C., Eppen, G. D., and Lieberman, J. (1981). A theoretical and empiricalevaluation of price deals for consumer nondurables. The Journal of Marketing, pages116–129.Bronnenberg, B. J., Dhar, S. K., and Dubé, J.-P. H. (2009). Brand history, geography, andthe persistence of brand shares. Journal of political Economy, 117(1):87–115.Bronnenberg, B. J., Dubé, J.-P. H., and Gentzkow, M. (2012). The evolution of brandpreferences: Evidence from consumer migration. The American Economic Review, pages2472–2508.Callander, S. (2007). Bandwagons and momentum in sequential voting. The Review ofEconomic Studies, 74(3):653–684.Cantillon, R. (2014). Kickstarter entering crowded irish scene. Irish Times.Cappé, O., Rydén, T., and Moulines, E. (2005). Inference in Hidden Markov Models. SpringerScience & Business Media.Chang, J.-W. (2016). The economics of crowdfunding. UCLA Job Market Paper.Chen, Z. and Rey, P. (2012). Loss leading as an exploitative practice. The American Eco-nomic Review, 102(7):3462–3482.Chevalier, J. A., Kashyap, A. K., and Rossi, P. E. (2000). Why don’t prices rise duringperiods of peak demand? evidence from scanner data. Technical report, National Bureauof Economic Research.Ching, W. K. and Ng, M. K. (2006). Markov chains. Springer.Cimon, D. A. (2017). Crowdfunding and risk. CEA Working Paper.Conlisk, J., Gerstner, E., and Sobel, J. (1984). Cyclic pricing by a durable goods monopolist.The Quarterly Journal of Economics, pages 489–505.Creal, D. (2012). A survey of sequential monte carlo methods for economics and finance.Econometric reviews, 31(3):245–296.Danaher, P. J., Bonfrer, A., and Dhar, S. (2008). The effect of competitive advertisinginterference on sales for packaged goods. Journal of Marketing Research, 45(2):211–225.168BibliographyDaniel, C. R., Cross, A. J., Koebnick, C., and Sinha, R. (2011). Trends in meat consumptionin the usa. Public health nutrition, 14(04):575–583.Davies, D. L. and Bouldin, D. W. (1979). A cluster separation measure. Pattern Analysisand Machine Intelligence, IEEE Transactions on, (2):224–227.DeGraba, P. (2006). The loss leader is a turkey: Targeted discounts from multi-productcompetitors. International journal of industrial organization, 24(3):613–628.Drachen, A., Sifa, R., Bauckhage, C., and Thurau, C. (2012). Guns, swords and data: Clus-tering of player behavior in computer games in the wild. In Computational Intelligenceand Games (CIG), 2012 IEEE Conference on, pages 163–170. IEEE.Freedman, D. (2012). Markov chains. Springer Science & Business Media.Frühwirth-Schnatter, S. (2006). Finite mixture and Markov switching models. SpringerScience & Business Media.Graves, J. (2015). Crowdfunding: a structured approach.Gubler, Z. (2013). Inventive funding deserves creative regulation. Wall Street Journal.Hekman, E. and Brussee, R. (2013). Crowdfunding and online social networks.Hendel, I. and Nevo, A. (2006). Measuring the implications of sales and consumer inven-tory behavior. Econometrica, 74(6):1637–1673.Hendel, I. and Nevo, A. (2013). Intertemporal price discrimination in storable goods mar-kets. American Economic Review, 103(7):2722–51.Hosken, D. and Reiffen, D. (2004). Patterns of retail price variation. RAND Journal ofEconomics, pages 128–146.Hotelling, H. (1990). Stability in competition. Springer.Johnson, J. P. (2014). Unplanned purchases and retail competition. Available at SSRN2319929.Kashyap, A. K. (1994). Sticky prices: New evidence from retail catalogs. Technical report,National Bureau of Economic Research.169BibliographyKickstarter (2014). Press - kickstarter.Kokkonen, T. et al. (2014). Microtransactions in an android game. Master’s thesis, KareliaUniversity of Applied Sciences.Kuppuswamy, V. and Bayus, B. L. (2013). Crowdfunding creative ideas: The dynamics ofproject backers in kickstarter. SSRN Electronic Journal.Lal, R. (1990). Price promotions: Limiting competitive encroachment. Marketing Science,9(3):247–262.Lal, R. and Matutes, C. (1994). Retail pricing and advertising strategies. Journal of Busi-ness, pages 345–370.Lal, R. and Villas-Boas, J. M. (1998). Price promotions and trade deals with multiproductretailers. Management Science, 44(7):935–949.Lambert, T. and Schwienbacher, A. (2010). An empirical analysis of crowdfunding. SocialScience Research Network, 1578175.Leacock, S. (2016). Sunshine sketches of a little town. Read Books Ltd.Leroux, B. G. (1992). Maximum-likelihood estimation for hidden markov models. Stochas-tic processes and their applications, 40(1):127–143.Lescop, D. and Lescop, E. (2014). Exploring mobile gaming revenues: the price tag ofimpatience, stress and release. Communications & Strategies, (94):99.Leygonie, C., Britz, T. J., and Hoffman, L. C. (2012). Impact of freezing and thawing onthe quality of meat: Review. Meat science, 91(2):93–98.Manjoo, F. (2010). Online deals for holiday shopping: Buyer beware.Maruotti, A. (2007). Hidden markov models for longitudinal data.Marwell, N. B. (2016). Competing fundraising models in crowdfunding markets. Universityof Wisconsin-Madison Job Market Paper.McLachlan, G. and Peel, D. (2004). Finite mixture models. John Wiley & Sons.170BibliographyMollick, E. (2014). The dynamics of crowdfunding: An exploratory study. Journal ofBusiness Venturing, 29(1):1–16.Mollick, E. and Kuppuswamy, V. (2014). After the campaign: Outcomes of crowdfunding.Available at SSRN.Nickell, S. (1981). Biases in dynamic models with fixed effects. Econometrica: Journal ofthe Econometric Society, pages 1417–1426.Pesendorfer, M. (2002). Retail sales: A study of pricing behavior in supermarkets*. TheJournal of Business, 75(1):33–66.Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications inspeech recognition. Proceedings of the IEEE, 77(2):257–286.Rao, H., Xu, A., Yang, X., and Fu, W.-T. (2014). Emerging dynamics in crowdfundingcampaigns. In Social Computing, Behavioral-Cultural Modeling and Prediction, pages 333–340. Springer.Ross, S. M. (2014). Introduction to probability models. Academic press.Salop, S. and Stiglitz, J. (1977). Bargains and ripoffs: A model of monopolistically com-petitive price dispersion. The Review of Economic Studies, 44(3):493–510.Salop, S. and Stiglitz, J. E. (1982). The theory of sales: A simple model of equilibrium pricedispersion with identical agents. The American Economic Review, 72(5):1121–1130.Shelegia, S. (2012). Multiproduct pricing in oligopoly. International Journal of IndustrialOrganization, 30(2):231–242.Shilony, Y. (1977). Mixed pricing in oligopoly. Journal of Economic Theory, 14(2):373–388.Sinclair, B. (2014). Free-to-play whales more rational than assumed. GamesIndustry.biz.Sweeting, A. (2012). Dynamic pricing behavior in perishable goods markets: Evidencefrom secondary markets for major league baseball tickets. Journal of Political Economy,120(6):1133–1172.Teicher, H. (1967). Identifiability of mixtures of product measures. The Annals of Mathe-matical Statistics, 38(4):1300–1302.171Titterington, D. M. (1985). Statistical analysis of finite mixture distributions. Wiley.Tuttle, B. (2014). Why all those holiday deals aren’t such good deals. Time Magazine.Varian, H. R. (1980). A model of sales. The American Economic Review, 70(4):651–659.Varian, H. R. (1989). Price discrimination. Handbook of industrial organization, 1:597–654.Wald, A. and Wolfowitz, J. (1940). On a test whether two samples are from the samepopulation. The Annals of Mathematical Statistics, 11(2):147–162.Wong, A. (2016). Wealth in the pantry: Implications of consumer inventory stockpiling forhousehold savings. Technical report, Northwestern University Working Paper Series.Xu, A., Yang, X., Rao, H., Fu, W.-T., Huang, S.-W., and Bailey, B. P. (2014). Show me themoney!: An analysis of project updates during crowdfunding campaigns. In Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems, pages 591–600. ACM.Yoo, B., Donthu, N., and Lee, S. (2000). An examination of selected marketing mix ele-ments and brand equity. Journal of the academy of marketing science, 28(2):195–211.Zvilichovsky, D., Inbar, Y., and Barzilay, O. (2013). Playing both sides of the market:Success and reciprocity on crowdfunding platforms. SSRN Working Paper.172Appendix AAppendix to “Sales and PerishableProducts”This appendix contains several sections which cover different aspects, extensions, and de-tails for the main material in Chapter 2. Each section is independent of the others, exceptwhen noted, and constitute separate extensions of the underlying material. To summarize,in Section A.1 I develop a more realistic error model for price fluctuations than the Gaus-sian assumption made in the paper to test whether the error pattern is material or not. InSection A.2, I develop Assumption A.2.1, the conditions under which k-means clusteringrecovering the underlying pricing regimes in sales data. In Section A.3, I develop a duopolyversion of the monopolistic model in Section 2.5, which also examines the role of compe-tition and other kinds of discounting. Finally, in Section A.4, I extend the duopoly modelto relax some of the assumptions about consumer behaviour, such as myopia or inventoryrestrictions.A.1 A More Realistic Error ModelThe distribution used in the model is very stark; it does not allow for measurement error,or the possibility of a misspecification of kp within the regime. Denote P (St = 1|Rt) = q.Accordingly, I allow for the addition of a uniform error term  ∼ unif[a, b] with a ≤ 0 ≤ bwhich can occur with probability m:Yt|Rt = p− δSt +MtiWhere Mt is the event that an error has occurred. Note that the error term is not nec-essarily white noise; this is because, in general, pricing mistakes or errors are not equallylikely to be above or below a given price point. This implies that we have the followingdistribution:173A.1. A More Realistic Error ModelP (Yt = y|Rt) = (1−m)f0(Yt = y|Rt) +mf1(y)f1(y) = q · u(a+ p− δ, b+ p− δ) + (1− q)u(a+ p, b+ p)Where u(a, b) is the PDF of the uniform distribution on [a, b]. The associated CDF is:P (Yt ≤ y|Rt) = (1−m)F0(y|Rt) +mF1(y)F0 =0 if y < p− δq if y ∈ [p− δ, p]1 if y ≥ pF1 = q · U(a+ p− δ, b+ p− δ) + (1− q)U(a+ p, b+ p)Where U(a, b) is the CDF of the uniform distribution on [a, b]. Let Qτ (Yt) denote theτ -th quantile of the distribution. Notice that:Q0(Yt|Rt) = p− δ + aQ0.5(Yt|Rt) = p− δq + m2(a+ b)Q1(Yt|Rt) = p+ bNow, in order to identify the distribution more easily, let’s (A1) assume that p + a >p− δ + b; that is, the largest negative deviation from the top half of the distribution is stilllarger than the greatest positive deviation from the bottom half of the distribution. Then,if we definite Q′τ to be the quantile of the distribution condition on Yt ≤ p − δ + b, thenwe know the quantiles of this lower distribution are Q′τ = m{(1 − τ)(a + p − δ) + τ(b +p− δ)}+ (1−m)(p− δ). Then, it is clear we can recover a, b and c = p− δ by examiningthe quantiles of this power distribution. But, given these terms, the three overall quantilesabove have only three unknowns (q, p (or δ) and m). Thus, this distribution is identifiablefrom the distribution of prices.In order to distinguish between different pricing regimes, let’s assume that m, a, b are174A.2. k-means Clustering on Pricing Regimesthe same across regimes. Then, Q1(Yt|Rt) − Q1(Yt|Rt) = p(Rt) − p(R′t), Q0(Yt|Rt) −Q0(Yt|R′t) = p(Rt) − p(R′t) − [δ(Rt) − δ(R′t)]. Since for at least one of these parame-ters must differ, this implies that by considering the maximum, minimum, and span ofthe two distributions, they must be distinguishable. It is also convenient to assume thatQ0.5(Yt|Rt) 6= Q0.5(Yt|R′t) if two distributions differ.A.2 k-means Clustering on Pricing RegimesFor simplicity, let’s assume that regimes have equal lengths (L), and consider the case oftwo sequential regimes, R1 and R2. Because any regime can be divided into two smallerregimes continuously, this is not a restrictive assumption. Furthermore, as will becomeapparent, the choice of considering two regimes immediately generalizes, since regimesare separated contiguously in time. Associate with regime i starting point ti and endingpoint Ti = ti + L, regular price pi and discount δi. Let the discount occur with rate12 > qi > 0. Also, let’s suppose that the observations have a time delay d; for example, ifthey are weekly, d = 7 while if they are daily, d = 1. Then, without loss of generality assumethat t2 = T1 + d, and normalize t1 = 0. Then, in the population the centroid of regime 1 isC1 = (12L, p1−q1δ1) and the centroid of regime 2 is C2 = (12L+L+d, p2−q2δ2). Now, a pointP is classified as in regime 1 if d(P,C1) < d(P,C2). Naturally, then ,we need only considerpoints on [12L,32L+ d] (since points on the far side of either centroid will mechanically becloser to the given centroid than the other option. So, consider PL = (L, YL), the last pointin regime 1 or PL+d = (L + d, YL+d) the first point in regime 2; these will be the criticalclassification points, since if these are classified correctly then all of the other points in therespective groups will also. Let’s suppose initially that PL is a regular price: PL = (L, p1).Then, letting ∆ = p1 − p2d(PL, C1)2 = (12L)2 + (q1δ1)2d(PL, C2)2 = (12L+ d)2 + (∆ + q2δ2)2Then,d(PL, C2)2 − d(PL, C1)2 = d2 + dL+ 2q2δ2∆ + q22δ22 − q1δ21 + ∆2175A.2. k-means Clustering on Pricing Regimes=⇒ d(PL, C2)2 > d(PL, C1)2 ⇐⇒ d2 + dL+ 2q2δ2∆ + ∆2 > (q1δ1)2 − (q2δ2)2 (A.2.1)Thus, this point is classified as long as, relative to the length of the regime and thegap d, the change in the prices relatively larger and positive. Essentially, the change in theprices needs to be larger than the change in the average deviation from the regular price,controlling for d and L. For instance, suppose δ1 = δ2 and q1 = q2. Then, this conditionreduces to:d2 + dL+ 2q2δ2∆ + ∆2 > 0Since,d2 + dL+ 2q2δ2∆ + ∆2 > d2 + dL+ 2q2δ2∆it is sufficient for ∆ > −d2−dL2q2δ2.In other words, if we believe that only the regular price ischanging, and everything else is respectively the same, this amount to not allowing theprice to fall by more than d(d+L)2q2δ2 . For instance, an inflationary set of price regime changeswould suffice to meet this condition.Now, when PL is a sale price, PL = (L, p1 − δ1) and:d(PL, C1)2 = (12L)2 + (p1 − (1− q1)δ1)2d(PL, C2)2 = (12L+ d)2 + (∆ + q2δ2 − δ1)2=⇒ d(PL, C2)2 > d(PL, C1)2 ⇐⇒ d2+dL+(p1−(1−q1)δ1)2−(∆+q2δ2−δ1)2 > 0 (A.2.2)Similarly, for the other point:(12L+ d)2 + (∆− q1δ1)2 > (12L)2 + (q2δ2)2 (A.2.3)(12L+ d)2 + (∆− q1δ1 + δ2)2 > (12L)2 + ((1− q2)δ2)2 (A.2.4)176A.3. A Duopoly Model of Loss LeadershipWhich can be simplified as the preceding expression. The following assumption thenensures that the k-means clustering algorithm will successfully identify clusters:Assumption A.2.1. Equations A.2.1-A.2.4 are met for all regimes in the data.A.3 A Duopoly Model of Loss LeadershipMy duopoly model of loss leadership is based on that of DeGraba (2006); in it, two mo-nopolisitically competitive grocery stores compete over a population of consumers whobuy different bundles composed of perishables and sundries. To capture the intuition, sup-pose that these consumers come in two types (j): (1) bachelors (j = B) and (2) families(j = F ). These two types of consumers have different lifestyles: bachelors live alone,in small apartments, while families live together in larger houses. This manifests in tworelated ways: first of all, families have more storage capacity for the sundries than bach-elors, and secondly, the families consume more perishables per capita from the grocerystore, since the bachelors find it difficult to cook and prepare for one (and so eat out more,instead). These basic differences between the consumers will be ultimately result in sales,since inventory dynamics on the part of the families lead them to occasionally purchasemore than the bachelors. The fact that they also purchase meat gives the stores a wayto price discriminate between the two consumers, offering a more attractive total bundleprice to the families when they are buying large amounts of sundries. High purchases ofsundries occur in a cyclic fashion alongside low prices of the perishable good, resulting ina cyclic pattern for sales.To make things concrete, consider a discrete time environment t = 1, 2, 3, ..., and sup-pose that bachelors always demand one unit of the sundry good every period SBt = 1 sincethey cannot store goods very well. Similarly, suppose that that families always demand oneunit of the perishable product every period (since it expires, and cannot be stored) whilethe bachelors never demand the perishable product. The families also buy the sundry good,but do so in an inventory-based fashion; that is, they have an inventory level It which isreplenished to some (full) level K when the sundry good is purchased. The demand perperiod is denoted SFt ≥ 0 and depends on the number of families buying the sundry in agiven period. Both types of consumers are risk-neutral, expected utility maximizers, andsuppose they occur in masses mj .On the firms’ side, I model competition using a standard Hotelling model (as in Hotelling177A.3. A Duopoly Model of Loss Leadership(1990)) where the stores are located at either end of the unit line. Stores compete in pricesover two goods i = M,S, and I assume that grocery stores face constant marginal costsof production for both goods, cj , and the stores are risk-neutral profit maximizers. Con-sumers are located uniformly along the unit line, and face a marginal cost of transportk > 0, which is the degree of differentiation between the firms. The timing of the game isdemand realization, price-setting, purchasing, and then consumption.The decision framework of the agents depends on the degree of consumer sophistica-tion and the way in which sundries deplete. The simplest benchmark is to suppose thatconsumers are completely myopic, and follow a heuristic rule to replenish their inventoriesonly when they they have exhausted their supply. Economically, this could correspond toa situation where it is only cost-effective for the families to refill their supply when theyrun out; for instance, if there is only one package size available it is very “costly” to haveexcess product spilling out of the inventory. To model the family inventory, suppose eachperiod a representative family receives an independent shock to their expected consump-tion, sj ∈ {0, 1, 2, ...K} which is the amount of the sundry they will use up in the upcomingweek. If they have inventory level I, then they will need to purchase a new package ofsundries when sj ≥ I, since otherwise they will run out of the good.If we consider the inventory level the associated state variable, then the probabilitya family with inventory I purchases the sundry is P (I) ≡ ∑Kj=I sj . The path of theirinventory state is determined by a Markov chain, It+1 = MIt where the transition matrixM is given byM =sK sK−1 sK−2 sK−3 · · · s1 s00 s0 + sK sK−1 sK−2 · · · s2 s10 s1 s0 + sK sK−1 · · · s3 s20 s2 s1 s0 + sK . . . s4 s3............. . .......sK sK−1 sK−2 sK−3 · · · s1 s0A critical feature of this matrix is the value of s0. If s0 = 0, then the associated Markovchain is aperiodic and may not admit a stationary distribution; the number of individuals ineach state may never settle down, rotating forever between different values. More impor-tantly, if the Markov chain does have a stationary distribution, then there is no requirementthat it is also a limiting distribution. This will be very important for the dynamics of sales.178A.3. A Duopoly Model of Loss LeadershipDenote the distribution of families with a given inventory by the vector x. Then, theprobability a randomly selected family purchases the product is∑Ki=0 P (i)xi; thus, in ag-gregate they purchase this many units, multiplied by their mass and K. Now, because con-sumers are myopic, they do not inter-temporally adjust their purchasing decisions. Thus,no decision on the part of the firms will effect the choices from period-to-period, and firmswill play a repeated static game against one another. We therefore have the followingresults:Proposition A.3.1. The price of the leading good increases or decreases depending on thevalue r(x) ≡∑Ki=0 P (i)xi, independently of mj .Proof. As in DeGraba (2006), the utility function of firm i is given by:Πi = (PMi − cM + SFt(PSi − cS))(12+ PM(−i) − PMi+SFt(PSi − PS(−i))/2k)mF + (SBt(PSi − cS))(12+ SBt(PS(−i) − PSi)/2k)mSwhich has associated first order conditionsPMi − cM + SFt(PSi − cS) = kSBt(PSi − cS) = kand imply that:P ∗Mi = cM + k(1−SFtSBt)P ∗Si = cS +kSBtHowever, as we established earlier, the average consumption of S by the F -type con-sumers isSFt = KK∑i=0P (i)xt(i)179A.3. A Duopoly Model of Loss Leadershipand similarlySBt = 1The result immediately follows by inspection since P ∗Mi = cL + k(1−SFt) = cL + k(1−K∑Ki=0 P (i)xt(i)).This leads to the conclusion that sales depend varying levels of demand for the associ-ated sundry good. It also means that we have the following results which illustrate directlythe importance of the periodicity of the inventory process:Corollary A.3.1. (1) Suppose that the inventory Markov chain M is periodic and the dis-tribution of consumers do not start in a stationary distribution for the Markov chain. Then,sales on the perishable product occur in a cyclic pattern forever. (2) Suppose that the in-ventory Markov chain M is aperiodic; then, sales will occur temporarily, before eventuallyceasing.The condition that the inventory process be periodic may seem restrictive, but it isactually not; for example, it is sufficient that s0 = 0. Many common inventory processes fallinto this category; for instance, a deterministic process in which families always consumeone unit of the sundry falls is periodic. Similarly, a situation where the family may consumeone or two units is periodic as long as the package size is three or higher. If this conditiondoes not hold, then sales may occur temporarily but eventually stop as the the distributionof inventory across consumers becomes stationary. In the stationary distribution, althoughindividuals will go through cycles of buying products, the aggregate population will notwhich results in permanently lower prices which are stable.This model also generates some predictions about sales; first of all, they should becyclic, since they occur alongside the periodicity of the inventory process. However, thisis not unique; many different models make predictions that sales should be cyclic. Fortu-nately, this model also make predictions about the composition of the consumer’s bundle:specifically, among the perishable-purchasing consumers, they should also purchase moreof other sundry goods when the perishable good is on sale. This prediction is distinctfrom other models, especially inventory-based explanations for sales, because it tells usthat there should be a link between sales and the kinds of goods being purchased in thewhole bundle (and not just consider the discounted product alone). This prediction is alsotestable, although not at the retail level; by looking at the consumers’ choice bundles, we180A.3. A Duopoly Model of Loss Leadershipshould be able to assess whether not consumers purchase more sundries when the perish-able is on sale, or not. This is summarized in the proposition below:Corollary A.3.2. In periods when consumers purchase perishable products on sale thedemand for the sundry products should be higher, when compared with periods whereconsumers purchase the perishable product at its regular price. In terms of the model,then SFt > SFt′ when the product is on sale at time t and not at time t′.A.3.1 DiscussionThis model produces fairly precise predictions about what conditions are necessary forsales; this precision is largely due to the starkness of the assumptions made on the differ-ent parts of the model. This starkness is used to focus on the intuition and economics of themodel, which are complicated even in this basic environment. Essentially, consumers differin a single dimension (the “type”), which is associated with both their ability to store andtheir desire to purchase perishable products. In the model, I call this the family/bachelordistinction, but this is just a framing mechanism for the intuition; in reality, grocery stores,using consumer loyalty cards and purchasing information, are likely able to come up withfar more subtle connections between consumer demands. The basic intuition of the modelis that consumers who hold sundry inventory in a periodic fashion will go through periodsof higher purchases of the sundries; as consumers run out, their willingness to pay in-creases, which manifests in an increase in demand. In the myopic model, this variation isquite stark, but the story is similar if this is relaxed. When consumers demand larger bun-dles, they are more valuable to attract to your store; competition between the two storescreates the incentive for them to lower the price of this bundle to attract these valuableconsumers.However, because firms can choose prices for both the sundry goods and the perishablegoods, the problem becomes to decide which price they should lower in order to change theprice of the total bundle. The optimum equilibrium behaviour uses the fact that perishablepurchasing distinguishes the two types of consumers. To see this, imagine that initiallyfirms just tried to sell to the bachelors; the optimal price would be as given above as thesundry price. Then, supposed they decide to just sell to the families, taking the sundryprice as given. The equilibrium price would be as given, again. If a firm chose to lowerPS and raise PM they would lose profit from the bachelors while keeping the profit for the181A.3. A Duopoly Model of Loss Leadershipfamilies the same47. Intuitively, changing PM has no impact on profit from the bachelors,and so it is optimal to effectively “segregate” the market into two parts; this is similar tohow a monopolist would try to discriminate between the two groups of consumers.As mentioned, the dynamic aspect of sales is generated primarily through the inven-tory process. In the myopic model, this process is easy to translate from the underlyingstochastic demands because there is no inter-temporal decisions made by the consumers,and therefore also by the firms. The inclusion of forward-looking behaviour on the partof consumers changes this demand process significantly, but the real challenge arises inthe behaviour of the firms, who now may compete inter-temporally as well. I explore thismodel in the Appendix, illustrating that while myopia gives a simpler characterization, itis not essential; however, firms need to be long-lived in order to close down inter-temporalcompetition.The assumption about the inventory process of the bachelors being very stark (no stor-age) is also not essential. The essential requirements are the same as in the simpler model,but the statements have to do about the differences in the stationary or periodic distribu-tions of the inventory process. One way to see this intuitively is to pretend that that theassumption that the bachelors always demand a single unit of the sundry is a “normal-ization” instead of an assumption; then, Proposition 15 becomes a statement of relative,as opposed to absolute demand. This variation of the model is developed further in theappendix.Related to the above is the question of firm expectations; in the model, we assume thatconsumers basically commit to making trips to the store, then firms set prices to attractthem to their particular store. This is not really realistic, since firms actually are uncer-tain about demand (which occurs over a period of time) and set prices ahead of time.The assumption that consumer demand is observable is probably not very realistic; this isbest understood to be expectations about consumer demand, which are (in equilibrium)correct. The fact that consumers form a mass with infinite density makes this possible.However, how reasonable are these expectations? Would firms understand when demandis high or low? This is difficult to assess directly, but we can imagine that firms have goodplanning processes and understand (in aggregate) what consumer replenishment of par-ticular products must look like. This could be from features like loyalty cards or club cards.This is the process by which they would understand the perishable/sundry connection in47See DeGraba (2006) for a detailed explanation of this in the static context, which follows the same logic.182A.3. A Duopoly Model of Loss Leadershipthe first place, so it would also be reasonable for them to make predictions of consumerdemand. Indeed, in equilibrium, these could be based explicitly only on past demand andpricing patterns, since these would re-occur over time.A.3.2 Quantity Discount versus SalesA second comment has to do with the use of price discounts on different products ver-sus quantity discounts. Because consumers are myopic, the families refill their inventoryentirely; this is equivalent to assuming they purchasing a package of size K at the store.However, they pay a KPS which is equivalent to buying K units of the bachelor’s pack-age size. It would be sensible that an alternative way for the firms to compete would bein terms of a quantity discount for the package. This is certainly possible, and could bean alternative pricing strategy. However, any uniform discount (δ for the larger package)across both firms would result in the same incentive to discount perishable products, notresulting in a change in the model. More importantly, it cannot increase profit since theequilibrium strategy maximizes profit for both types of consumers individually; any seg-regation on the basis of bundle size rather than bundle composition can at best recoverthe same profit level. This means this is more of an “alternative” strategy which could beadopted for competitive reasons. In this model, these are equivalent; however, relaxationof the inventory size assumption for the bachelors makes this no longer true; perishablesales become more attractive again.This can be formally stated in the following set up: suppose that there is a measuremSB of “storing bachelors;” that is, bachelors who have the ability to store the sundrygood. These are similar to the agents in the model developed in the appendix where bothtypes of agents can store. Now, consider the limiting value as mSB → 0; for small valuesof mSB the aggregate per-bachelor demand is mBmB+mSB +mSBmB+mSBSSB ≈ 1 which meansthe equilibrium is approximately the same as above. Then, we have the following result:Proposition A.3.2. Suppose mSB → 0. Then, sales on sundries weakly dominate quantitydiscounts.Proof. Let the equilibrium prices in the original model be p∗m and p∗s. Then, bachelors atotal price of p∗s and families pay a total price of p∗m + kp∗s. In equilibrium, we know thatthese two bundle prices constitute an equilibrium. So, consider the alternative ps1 = p∗s,pm = p = sup{p∗m} (a constant) and psk = kp∗s + p∗m − p. Notice that psk > p∗s = ps1183A.3. A Duopoly Model of Loss Leadershipand pm − p∗m ≥ 0. That is, bachelors do not want to purchase the bundled sundry, andthe meat price is fixed (and higher) than the meat price in the sales model. However,notice that the total spending for both consumer bundles is the same as in the originalequilibrium. Therefore, sales do as well as quantity discounts. However, for the mSBbachelors who purchase the inventory-sized bundle, quantity discounts do strictly worse,since psk < kp∗s which is their spending in the original equilibrium. Therefore, sales arepreferred to quantity discounts.This is intuitive, because a quantity discount on a product bought by both consumersmay attract the smaller customers to “buy up” which is undesirable if an alternative exists;only the stark assumption on the bachelor’s inventory makes this possible. Of course, inthe real world, the assumption about complete separation of perishable purchasing byconsumers is also too stark; a relaxation of both these assumptions likely explains why, inthe real world, we see both sales and quantity discounts being used.A.3.3 Competition and MonopolyThe simplest way to examine the importance of the assumption that competition is mo-nopolistic is by considering the limiting properties of k, the differentiation parameter. Ask → 0, firm differentiation is eliminated and the market become Bertrand competition. Asin the standard model, equations 1 and 2 should that the prices approach the marginalcosts as markets become more competitive. In particular, because prices become constantthis eliminates sales both intermittently and periodically. This occurs because loss leader-ship (and sales in general) are only useful through attracting valuable customers to yourstore. These customers are attracted by a lower price; however, there is a trade-off be-tween price and location. This leads stores to shade their pricing, which in turn createssales. When all consumers are equally price sensitive, small deviations in price becomemore valuable, resulting in a Bertrand-style race to the bottom which also eliminates pricevariation.At the other extreme, as k → ∞ the market becomes monopolistic. In this case, theoptimal price for a bundle approaches Mj the maximal consumer willingness to pay for abundle. This result holds in both the myopic and forward-looking models under commit-ment. Since consumers must buy the product at some point, the monopolist appropriatesall the surplus. One issue is that since consumers of different types may have differentwillingnesses to pay, the monopolist will effectively cross subsidize if Mf < Mb, setting184A.4. Extensions to the Duopoly Model of Loss LeadershipPs = Mb and Pm = Mf −kPs as long as Pm > cm. If this does not hold, it is not optimal forthe monopolist to offer perishables. However, choice of product line is beyond the scope ofthis model, so we can set this case aside. Since these prices are fixed, monopoly leads to asimilar situation as competition: fixed prices. This eliminates sales of both kinds, just as inthe case when k → 0 but for the opposite reason. Sales as useful because they allow storesto compete over consumers: if there is no need to compete, there’s no need to offer sales.A.4 Extensions to the Duopoly Model of Loss LeadershipThis appendix introduces several extensions to the duopoly loss leadership model. Thebasics are essentially the same, but are summarized below for completeness; the key dif-ference is (1) in an allowing the bachelors to store inventory and (2) relaxing the assump-tion on myopic consumers. These two variations are explored together, in the followingsections. To make things more clear, and to highlight the differences in the model, I usethe notation L and S to refer to the leading goods (the perishable) and the side good (thesundry); the consumers, who now differ only in their taste for the perishable, are identi-fied with whether or not they consume that good. In the context of the original model, thebachelors would be the S-consumers, while the families would be the L-consumers. Theremainder of the model is similar; changes are pointed out where they are relevant.These stores are monopolisitically competitive, which is represented by a standardHotelling model where the stores are located at either end of the unit line. Stores competein prices over two goods indexed by j: the leading good L and the sundry good S. I assumethat grocery stores face constant marginal costs of production for both goods, cj , and thestores are risk-neutral profit maximizers. Consumers are located uniformly along the unitline, and face a marginal cost of transport τ > 0. Each consumer demands Sj units of goodj, and is a risk-neutral expected utility maximizer.I assume that the preceding occurs within a discrete time framework, t = 1, 2, 3, ...,and the two goods differ in terms of their durability. The sundry good S is inventoriable;each consumer has an inventory level I which can be replenished to a maximum amountK by purchasing the sundry. Unlike in the baseline model, because both consumers buythe same type of good, I omit package-size or unit considerations; there is only one size(K) available, sold for a single price. I assume there are two kinds of consumers: L-buyersand S-buyers. These consumers differ in their tastes over the two goods. Specifically, the185A.4. Extensions to the Duopoly Model of Loss LeadershipL-buyers always demand one unit of L every period, while the S-buyers never demand L.However, both types of consumers consume S, albeit in different amounts Sj depending ontheir decision to buy the sundry good. These consumers occurs in measures mj , uniformlydistributed along the unit line. Within each period, I assume there is a demand realiza-tion step, a purchasing step, then a consumption step. So, for instance, a consumer willpurchase and add to their inventory before consuming any goods.The decision framework of the agents depends strongly on how the degree of consumersophistication, the way in which sundries deplete, and idiosyncratic noise. The simplestbenchmark is to imagine that consumers are completely myopic, and follow a heuristic ruleto replenish their inventories only when they they have exhausted their supply. I developthis in the following section.A.4.1 Myopic Consumers, Both with InventoriesIn this section, suppose that inventories deplete according to a random walk, which candiffer for each type of consumer (I suppress this at the moment, for clarity). Unlike inthe baseline model, now both consumers are capable of holding inventory. Each period, aconsumer may need to consume j units of the sundry, j ∈ {0, 1, 2, ...K}, independently ofany other instance of consumption, each with probability sj . Then, for an individual withinventory I the probability that they purchase the sundry is P (I) ≡∑Kj=I sj . If we imaginethe inventory levels for a consumer are states, the path of their inventory, under optimaldecision making, forms a discrete time Markov chain with transition matrix:M =sK sK−1 sK−2 sK−3 · · · s1 s00 s0 + sK sK−1 sK−2 · · · s2 s10 s1 s0 + sK sK−1 · · · s3 s20 s2 s1 s0 + sK . . . s4 s3............. . .......sK sK−1 sK−2 sK−3 · · · s1 s0In general, M may different for the different types of consumers as Mj . Notice, thisMarkov chain is recurrent; all states are visited with positive probability provided s0, sK >0. Under this condition, this chain is also aperiodic. Then, this Markov chain has anassociated stationary distribution pi where:186A.4. Extensions to the Duopoly Model of Loss Leadershippii =CMi= limn→∞M(n)iiwhereMi is the mean recurrence time and C the normalizing constant. If the precedingcondition is not met, then the Markov chain is periodic and may not admit a stationary dis-tribution; the number of individuals in each state may never settle down, rotating foreverbetween different values. More importantly, if the Markov chain does have a stationarydistribution, then there is no requirement that it is also a limiting distribution.Suppose the distribution of consumers with a given inventory is given by the vectorx. Then, the probability a randomly selected consumer of type j purchases the product is∑Ki=0 Pj(i)xi; thus, in aggregate they purchase this many units, multiplied by their mass.Now, because consumers are myopic, they do not inter-temporally adjust their purchasingdecisions. Thus, no decision on the part of the firms will effect the choices from period-to-period, and firms will play a repeated static game against one another. We therefore havethe following results:Proposition A.4.1. The price of the leading good increases or decreases depending on theratio r(xL, xS) ≡∑Ki=0 PL(i)xLi∑Ki=0 PS(i)xSi, independently of mj .Proof. The proof is identical to before; the first-order conditions imply that the prices are:P ∗Li = cL + k(1−SLtSSt)P ∗Si = cS +kSStHowever, as we established earlier, the average consumption of S by the L-type con-sumers is now:SLt =K∑i=0PL(i)xLt(i)and similarlySSt =K∑i=0PS(i)xSt(i)The result immediately follows by inspection.187A.4. Extensions to the Duopoly Model of Loss LeadershipThese results yield a variation of the original proposition.Corollary A.4.1. (1) Suppose Mj is periodic, xj does not begin in a stationary distributionof Mj , and either (i)M−j is periodic and x−j 6= x−j or (ii) Mj is aperiodic. Then, saleswill re-occur infinitely for j = S or j = L(2) If the conditions of (1) do not hold, then sales will either (i) never occur, and priceswill be stable or (ii) occur temporarily, diminishing beforeWe can see the consequences of these different outcomes in Figure A.1. The idea ofthis result is that the properties of the Markov chain determined by consumer inventorychoices translates into statements about sales. Sales, in this context, are reductions inthe price of a given product. In this model, this corresponds to periods where the ratior(xL, xS) is smaller than other periods. This depends on the distribution of individuals ineach state xj; as the Markov chain causes these distributions to evolve, so too does theratio, and therefore the prices. However, if the Markov chain is ergodic, the long-term(and the steady state) distribution of both types of consumers becomes more and moresimilar which causes the r(xL, xS) to approach unity, which means that price variationeventually ceases. This can occur in two ways; demands rising or falling for the two typesof consumes; it is the relative size which is critical here.Again, the key requirement is that the inventory accumulation be aperiodic. Consumerinventories need to rotate, in order to generate cyclic patterns of sales. If they do not,then the Markov chain eventually converges to a steady state; the duration of this period(and the length of time in which there are detectable sales) is governed by the eigenvaluesof the inventory process. We can see, as discussed previously that the essential characterof the model is unchanged, although the interpretation and some of the predictions aremore subtle. In particular, the prediction of when sales should occur is now relative to thecustomer groups; this implies that the original prediction is a sufficient, but not necessary,condition for there to be sales. In other words, sales can still occur even without the centralprediction of the baseline model; the behaviour necessary is, however, very difficult todetect from data.A.4.2 Forward-looking ConsumersIn this section, suppose that inventories deplete according to a random walk, as in themyopic model, with consumers at time t requiring ct units of the sundry. However, instead188A.4. Extensions to the Duopoly Model of Loss LeadershipFigure A.1: Different Sales Patterns189A.4. Extensions to the Duopoly Model of Loss Leadershipof assuming consumers are myopic, suppose that they live for T periods, at which pointthey die and are replaced with more consumers. Furthermore, let’s suppose that consumerswho meet their demand for the good in a given period obtain utility u, but are punishedwith dis-utility v if they run out of inventory in a given period. I denote the price of thebundle by P , understanding that for L-type consumers the utility includes the price of theL-good (which they must purchase each period). All consumers share a common discountfactor β < 1, while firms have a discount rate δ ≤ 1. Furthermore, following Hendeland Nevo (2006), let’s suppose that consumers believe prices follow a first-order Markovprocess. Then, consumers solve a dynamic optimization problem with Bellman equation:Vt(It, ct, Pt) = maxat∈0,1u− I(It ≤ ct)(1− at)v − Ptat + βEVt+1(It+1, ct+1, Pt+1)It+1 = atK + (1− at)It − ctVT (IT , cT , PT ) =max{u− v, u− PT } if IT − cT ≤ 0u if IT − cT > 0The profit of the firm at time T is given by the static solution found above, and istherefore:Π∗ =14mL +14mSwhich does not depend on the number of consumers purchasing at time T . Then, wecan find an equilibrium as follows:Proposition A.4.2. Suppose firms are very long-lived, relative to consumers, so δ = 1.Then, there is an rational expectations equilibrium in which firms set prices according toProposition 14, and consumers behave according to the Bellman equation above.Proof. Since at time T , the only equilibrium action for the firms is to play according to theProposition 14, the result holds in this period. But, in this case, there is no change in thelast period profit as the inventory varies. Therefore, at time T − 1, if the firms set pricesaccording to Proposition 14, there is no difference in obtaining the profit now or in thefuture since δ = 1. Thus, setting prices according to Proposition 14 is optimal in period190A.4. Extensions to the Duopoly Model of Loss LeadershipT−1. The result follows, since the Bellman equation given above is a contraction mapping,and consumers are rationally correct in their expectations about the pricing process.The basic intuition behind this result is that in the terminal period of the consumer’slives, firms do not compete against themselves in the future for the current consumers. Thisfixes their profit at a level given by the static equilibrium of the Hotelling game. The equi-librium of this game is such that it does not depend on the relative amounts the consumersbuy, since the surplus is competed away by the two firms, So, when firms are sufficientlylong-lived, they do not care whether they buy from consumers in the current period, later,or earlier. This allows the static equilibrium outcome to persist, because inter-temporalcompetition does not affect the profitability of the firms. This kind of assumption is sim-ilar to Blattberg et al. (1981), in that the management of the inventory of consumers isviewed by firms as being parallel to the inventory in-store: essentially, because firms areaware consumers will buy at some point, goods can be “earmarked” for future consump-tion, making it immaterial when the good is actually moved from shelf to pantry (unlessthere are inventory issues on the part of the firm, which is the focus on the paper).From the Bellman equations, it is possible to show the following result:Corollary A.4.2. Consider a particular time t. Then, consumers are more likely to buy thesundry good when the price is low at time t.Proof. This follows from the fact that, since consumer form a unit mass (with measure mj)an individual consumer decision will not affect the overall distribution of states or demand,both now or in the future. Now, at time T , we can see that dVT (IT , cT , PT )/dPT ≤ 0. Now,at time T − 1, if the price goes up, fewer consumers will buy now, therefore increasing thenumber buying at time T . An increase in the number buying at time T lowers the price ofthe sundry. So,dVT−1(at = 1)dPT−1= −1 + βE dVTdPT−1= −1 + βE dVTdPTdPTdPT−1= −1 + β(+)(−) < 0This means that an increase in the price lowers the propensity to consumer the good.Repeating this process backwards in time produces the result.191Appendix BAppendix to “Large Contributionsand Crowdfunding Success”B.1 Large Contribution SizeSimilar to the literature on crowdfunding developed in Graves (2015), a benchmark modelof the process of backer arrival is to consider a Poisson process. Suppose that backers arriveat the project on a given date t at a Poisson rate λt. Furthermore, suppose each of themhas a “most-preferred” option to back the project - which is either low L or high H > L,with H much larger than L. Suppose the probability of a person having valuation V ispV . Then, the arrival rate of L-backers is pLλt and the arrival rate of H-backers is pHλt.Furthermore, suppose that the amount V -backer i contributes is V+i, where i ∼ N(0, σV )is a normally distributed white noise process. Then, for a given number of V -backers, theper capita contribution is normally distributed with mean V and variance σV . Supposethat pL ∼ 1 and λ is bounded for all t. Then, the probability in a discrete number ofperiods of an H-backer arriving is zero. Thus, under the standard assumption, the averagedonation should be normally distributed as above. Therefore, the likelihood of a “large”contribution coming from the underlying L distribution is 0.3%. Given that H is large, theBayesian probability that a large donation comes from the L distribution is:Prob(Vi = L|Di > L+3σL) = Prob(Di > L+ 3σL|Vi = L)PLProb(Di > L+ 3σL|Vi = L)PL + Prob(Di > L+ 3σL|Vi = H)PH=⇒ Prob(Vi = L|Di > L+ 3σL) = 0.0003PL0.0003PL + (1− Φ(L+3σL−HσH ))(1− PL)192B.2. Overview of Crowdfunding Data∼ 0.0003PL1− 0.9997PL ∼0.00031PL− 0.9997which is very small in general: for PL = 0.99, this is still less than 5%, and for PL =0.995, it is less than 8%. The break-even point (where H and L are equally likely) occursat around 0.9997.B.1.1 Discussion and Size ConsiderationsOne challenge with interpreting the large contributions in the data is that they are de-termined (essentially) as being outliers from the mean of the time series of per-capitacontributions. Furthermore, this time series is computed based on averages; essentially,they are outliers from an average of averages. This means that, in particular, this indicatesthat the large contribution measure tends to be an underestimate of the number of largecontributions, since on a particular day a large number of small backers can “dilute” alarge contribution. This mainly affects the interpretation of large contributions - it is likelythat since the definition chosen systematically understates the presence in certain kinds ofprojects, this means our analysis is specific to the kinds of projects which are likely to havelarge contributions detected; small to mid-sized. However, this seems to be a modest effect.Plotting the 5-percentiles of the dataset for those projects who attact at least one backer,we can see in Figure B.1 the frequency of large contributions is generally monotonic exceptfor the very large projects. However, even for projects in the top 5% of the data, we stilldetect large contribution in more than 50% of them, indicating that this problem does notfully exclude all large projects or unduly stratify the sample.B.2 Overview of Crowdfunding DataThis section generally overviews the data used in this chapter for comparison with otherpapers working in the same literature. We see the same basic patterns most individualshave mentioned, particularly (1) a hump-shaped pattern of backers over time and (2)a clearly bimodal distribution of projects, with most being either very successful or notsuccessful at all.As Fig B.2 shows, the number of backers raised today (as a fraction of the total) variesin a U-shaped manner as projects approach completion. This is typical of crowdfunding193B.2. Overview of Crowdfunding DataFigure B.1: Distribution of Large Contributionsdata (see, for example Kuppuswamy and Bayus (2013)), although the reasons behind thisfeature of the data are unclear. The first possible interpretation is simply that projectsgarner more attention at the beginning and end of the project cycles, which mechanicallyintroduces more backers. However, previous studies have shown that even conditional onthis affect, there is still a recoverable U-shape in the data. An alternative suggestion is that apublic good effect is at work, which depending on the formulation induces a positive effectat the beginning and end of the project cycle. Another possibility is that individuals selectendogenous arrival periods, in a manner which creates the observed variation. Trying totease apart these different explanations is a major goal of my research.We can see from Figure B.3 the number of projects over time is growing; this is mainlydue to the fact that Kickstarter, as a platform, was still in its early stages, leading to growthover time. The only reason the 2014 bar is not higher is because the data lacks the last194B.2. Overview of Crowdfunding DataFigure B.2: % of total backers raised today versus % of project-time elapsedFigure B.3: Number of Projects by Year195B.2. Overview of Crowdfunding DataFigure B.4: Project histogram of final % of goal (capped at 1)three months of 2014; including these, it is likely to be approximately as large. This fact isalso well known, since Kickstarter has shown year-over-year growth.Figure B.4 demonstrates that projects tend to fall into two categories: very successful,or very un-successful. There are very few liminal projects, which get close to their goalbut don’t quite reach it. In fact, empirically, the probability of being successful contingenton having raised only 40% of your goal is in excess of 90%. This demonstrates a kind ofseparation in the data; however, what drives this is up in the air. The natural implication,from the previous discussion is that individuals tend to pull projects across the line if theyare on the bubble - but there could be other factors at work, such a sorting of projects bygoal into feasible and infeasible groups. This speaks to an aspect of crowdfunding whichis not well explored - the role project creators play in choosing goals and other aspects ofthe project.In the run-up to projects succeeding, the days just prior to success have a greatly in-creased number of backers. This effect diminishes linearly as we look ahead of the successdate. This indicates that as projects are about to succeed, backers become more likely toback the project. We can also see, from a selection of projects, that a large “reaction” istypical of many projects - and while, in general it relates to the success threshold, it maybe more general. For example, see Figure B.2.In order to understand how the different stylized facts are related, I first use a standard196B.2. Overview of Crowdfunding Datawithin FE model to estimate the results. I chose the FE model because the data stronglysuggests there are idiosyncratic project-level effects which have a serious causal impact onthe data. This can be verified using a Hausmann test to demonstrate that a RE model isresoundingly ruled out by the data, but the empirical evidence is suggestive enough on itsown. The FE model has two drawbacks: first, it cannot take into account time-invariantproject characteristics; these must be captured by the project-level fixed effects, which arebeing differences out from the panel.The second issue relates to the issue of endogenity in the model. A number of thevariables (such as the pre-success indicators or % of goal) are functions of the laggedvalues of the dependent variable. Indeed, in one specification presented I directly includethe lagged dependent variable. Because of the within transformation, these estimates areknown to be biased. However, mitigating this is the fact that because the length of thepanel is (on average) 35 days in length, and (as in Nickell (1981)) the bias is knownto be of order O(1/35) ≈ O(0.029), the Nickell bias is likely to small since the panel issufficiently long. Secondly, because we know that the bias tends to negatively biased when(as in my model) we expect a positive correlation in values, this implies the results are anunderestimate of the values.Finally, I present two specifications: one including the lagged dependent variable, andone without. The inclusion of the lagged dependent variable is used to try and assess the197B.2. Overview of Crowdfunding Dataextent of herding, and control for an acceleration effect in the size of the project. Thespecification excluding this variables allows us to more carefully examine the first periodof the model, which is known, from the stylized facts, to be very relevant to the projectoutcomes. These are included in the appendix. The dependent variable is the number ofbackers arriving on a given day; this is preferred to % of goal or funding-based measure,because it better captures the backer dynamics, and any scale effects (big versus smallprojects) should be differences out by the within transformation.Looking first at Table B.1, I include a quadratic term in both % of project elapsed, and in% of goal reached; this is to try to capture the non-stationary time and goal effects notedearlier. Both coefficients demonstrate the expected U-shaped curve; as projects becomecloser to their deadline (crossing the threshold) they become more likely to be supported.In addition to this, we can see there are large first observation effects. However, these havelittle in comparison to the pre-success indicator: an increase of (on average) 69 backerson the day just prior to (or including) success occurring. In addition, we can see that thepost-success variable has a much smaller, but opposite effect; post-success, backers fall byan average of 7.Also of note, is that these effects are non-stationary: over time, both diminish; with(for instance) a pre-success occurring in the last 25% of the project’s lifespan having onlyabout 25% of the baseline effect. Similarly, the post-success effect drop in the last partof the project is similarly reduced in scale. Communication between project-owners andbackers seems to have small effect on the number of backers, but it difficult to argue this isa fully causal effect. Finally, the day of week terms largely agree with the facts presentedearlier (although they are not shown here for space reasons.)Next, Table B.2 indicates that these results are generally consistent when we include alagged dependent variable term. The coefficient on backers today is positive and substan-tial. This could indicate some kind of herding effect, or simply a “spread of information”effect, as individuals who back a project share the word about the project. The directionalresults from before are largely unchanged: however, the magnitude of most of the termshas been reduced. Regardless, there is still a large and substantial pre-success effect, andpost-success drop in the number of backers arriving. However, it is difficult to interpret themeaning of these coefficients in a vacuum: first of all, the lagged term necessitates chang-ing the sample (losing one day to the lagging), which happens to exclude the first day, andtherefore a large number of projects who succeed on day 1, and the indicator for the firstobservation. Secondly, it is difficult to interpret the coefficient on the lagged dependent198B.3. Holidays Used For Instrumentationvariable, and thus it is hard to say whether the decline in the size of the coefficients ismeaningful here.These results are generally robust to the inclusion of other time fixed effects, or chang-ing the sample specifications. It is clear that some of the coefficient values are driven bythe largest projects; by restricting the sample to projects of less than $200,000 in totalfunding, we obtain smaller coefficient estimates more in line with the second specification.This indicates the difference between the two specifications is probably mostly driven bylarge projects who have their first observations dropped on the first day (on which theyusually succeeded). Indeed, for large projects, the rate of success on the first day is 28%,compared with 1.5% for the entire sample.VARIABLE Coef Std dev t p %95 ConfidenceFirst observation 5.911519 0.184623 32.02 0 5.549664 6.273374Update today? 2.822745 0.122702 23 0 2.582254 3.063237% of time elapsed -31.745 0.500512 -63.42 0 -32.7259 -30.764% elapsed squared 27.28646 0.42752 63.82 0 26.44854 28.12439Days in capture 0.235139 0.072432 3.25 0.001 0.093174 0.377103Days in capture sq 0.044555 0.002392 18.62 0 0.039866 0.049244Pre-success indicator 69.43112 0.639596 108.55 0 68.17753 70.6847Pre-success X % elapsed -70.3452 0.770081 -91.35 0 -71.8546 -68.8359Post-success indicator -7.44758 0.425197 -17.52 0 -8.28095 -6.6142Post-success X % elapsed 2.216039 0.410276 5.4 0 1.411912 3.020165% of goal reached 20.24087 0.892339 22.68 0 18.49191 21.98982% of goal squared -12.3724 0.881135 -14.04 0 -14.0994 -10.6454Constant 6.274616 2.065283 3.04 0.002 2.226735 10.3225Year indicators YesMonth indicators YesDay of week indicators YesN 3,186,363Table B.1: FE Regression on Number of Backers - Baseline SpecificationB.3 Holidays Used For InstrumentationThis table presents the holidays used to instrument for large contributions. All of thedates are for American holidays in the years listed (when different countries disagree, e.g.199B.3. Holidays Used For InstrumentationVARIABLE Coef Std dev t p %95 ConfidenceBackers Today (t-1) 0.396599 0.000336 1181.24 0 0.395941 0.397257Update today? 4.00926 0.072452 55.34 0 3.867257 4.151264% of time elapsed -12.7182 0.299331 -42.49 0 -13.3049 -12.1315% elapsed squared 11.25743 0.254383 44.25 0 10.75885 11.75601Days in capture -1.29638 0.693902 -1.87 0.062 -2.6564 0.063648Days in capture sq 0.222935 0.230228 0.97 0.333 -0.2283 0.674174Pre-success indicator 23.19426 0.413043 56.15 0 22.38472 24.00381Pre-success X % elapsed -20.0114 0.493131 -40.58 0 -20.9779 -19.0449Post-success indicator -11.7195 0.253527 -46.23 0 -12.2164 -11.2226Post-success X % elapsed 9.214561 0.243205 37.89 0 8.737888 9.691234% of goal reached 1.109525 0.574982 1.93 0.054 -0.01742 2.23647% of goal squared 2.873862 0.552795 5.2 0 1.790402 3.957321Constant 5.001013 1.327311 3.77 0 2.399531 7.602496Year indicators YesMonth indicators YesDay of week indicators YesN 3,186,363Table B.2: FE Regression on Number of Backers - Lagged Dependent VariableThanksgiving); for moveable Christian religious feasts such as Easter or Lent, the datespublished by the American Council of Catholic Bishops are used. The Islamic calendarused is the official sighting-based Hijri calendar produced by Saudi Arabia and used by theIslamic Society of North America (to coincide with the Hajj preperations in Saudi Arabia).200B.3. Holidays Used For InstrumentationHOLIDAY NAME DATE FREQUENCYBlack Friday Friday following Thanksgiving 3Boxing Day December 26th 3Chinese New Years 1st day, Chinese Lunar Calendar 3Christmas December 25th 3Christmas Eve December 24th 3Cyber Monday Monday following Thanksgiving 3Easter Complicated (Sunday in March or April) 3Father’s Day 3rd Sunday in June 3Halloween October 31st 3Mother’s Day Second Sunday of May 3New Year’s January 1st 3Ramadan Starts 9th Month, Islamic Lunar Calendar 3Lent Starts (Ash Wednesday) 46 Days Prior to Easter 3Thanksgiving Fourth Thursday in November 3Valentine’s Day February 14th 3Table B.3: List of Holidays201Appendix CAppendix to “Sales Classification viaHidden Markov Models”C.1 ProofsC.1.1 Proof of Lemma 4.3.1Proof. We can write the probability of the observations at time t simply, since the Markovchain begins in state 1: f(y1) = f(y1; θ1). This implies from the distribution of the time 1values we can recover θ1 by Condition 1. Next, note that:f(y2) = a11f(y2; θ1) + (1− a12)f(y2; θ2)Again, by condition 1, we can identify a11, a12 and θ2. We can then repeat the process,since in general, f(yt) = [aij ]tpi · [f(yt|θj)]j . Since each of these steps adds exactly oneelement of θk and aij , this implies we can recover all the parameters as long as the maximalnumber of steps, T is at least K, which is assumed.C.1.2 Proof of Theorem 4.3.1Proof. The first step is noticing that at the first observation, t = 1, the observed variablesare Yi1 which have a mixture distribution given byf(yi1) = pi(1)f(yi1; θ1) + pi(2)f(yi1; θ2) + ...+ pi(K)f(yi1; θK)Now, this is a mixture distribution with (at most) K components. By Condition 1, thisis identifiable, and by Condition 3 we observe a population of this from the data. Thus, wecan identify pi and θk where pi(k) > 0. Then, consider t = 2. At time 2, the distributionof the states evolves according to the underlying Markov chain pi2 = [aij ]pi. Then, at this202C.1. Proofsstate, the distribution of the observable is given by:f(yi2) = pi(1)2f(yi2; θ1) + pi(2)2f(yi2; θ2) + ...+ pi(K)2f(yi2; θK)and again, by Condition 1, this is identifiable, so we have recovered pi2 and any θkwhere k is accessible by 2 from pi. Clearly, this is repeatable for any t = 1, 2, ..., T . There-fore, Result (2) follows immediately since K < T and k ∈ S′. We also have the vector pitfor each time period. Now, we can note that for t < T , the joint distribution of ynt andyn,t+1 is given by:f(ynt, yn,t+1) =∑i∑jpi(i)taijf(ynt; θi)f(yn,t+1; θj) =∑ipi(i)t∑jaijf(ynt; θi)f(yn,t+1; θj)︸ ︷︷ ︸≡g(aij)This is exactly a mixture of products with parameters aij . Then, as in Leroux (1992),we know that this is itself identifiable via a result in Teicher (1967). In particular, thismeans the mixing coefficients of this mixture are also identifiable, and therefore we canrecover aij . for all i such that pi(i)t > 0. But, since for some t, any state in S′ has thatpi(i)t = [aij ]Tpi > 0, Result (1) immediately follows.203


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items