UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essays in estimating air transport demand processes and the formation of oligopolies Hasheminia, Hamed 2012

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2012_fall_hasheminia_hamed.pdf [ 1.55MB ]
JSON: 24-1.0073279.json
JSON-LD: 24-1.0073279-ld.json
RDF/XML (Pretty): 24-1.0073279-rdf.xml
RDF/JSON: 24-1.0073279-rdf.json
Turtle: 24-1.0073279-turtle.txt
N-Triples: 24-1.0073279-rdf-ntriples.txt
Original Record: 24-1.0073279-source.json
Full Text

Full Text

Essays in Estimating Air Transport Demand Processes and the Formation of Oligopolies by Hamed Hasheminia B.Sc., University of Tehran, 2003 M.Sc., Sharif University of Technology, 2005 M.A., The University of British Columbia, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Business Administration) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2012 c© Hamed Hasheminia 2012 Abstract This dissertation studies three different topics in estimating air transport demand processes and the formation of oligopolies. Chapter 1 provides an overview of the thesis. Chapter 2 investigates the sensitivity of demand for air travel by singleton passengers, couples, and families. It examines how the demand for air travel by these groups is po- tentially different. In this study, a compound Poisson structure of the demand of different passenger groups is considered and aggregate demand observations and decompounding techniques are used to estimate demand sensitivity of each group of customers to price, time, season, and the economic cycle. The methodology is applied to Canadian market data and the results indicate there are significant differences among the different groups of customers. In Chapter 3 a new decompounding procedure based on rudimentary number theory is developed. The advantages and disadvantages of this new framework are discussed, and the efficiency of these methodologies for certain class of problems is demonstrated. The framework is capable of decompounding when group sizes are either pairwise co-prime or composed of two elements. Under some conditions, the methodologies are general- ized to cases where data are recorded in non-equal intervals. It is also not dependent on a restrictive assumption of having some zero observations that exists in conventional decompounding algorithms such as the Panjer recursion algorithm. Chapter 4 shows how hierarchical decision making and franchising is used as a fine- tuned strategy for brands to both compete aggressively and softly. In a hierarchical deci- sion making process, as a part of long-run plan, the head office of a brand first decides on how many franchises they will grant. At the second stage, the flagships or company owned divisions decide on their level of output and lastly franchises decide how much to produce. We show brands can use this strategy both as a commitment not to compete fiercely with ii Abstract other brands who share the same cost efficiency and to credibly threaten or possibly keep the inefficient brands out of the market. The efficient brands’ incentive to pre-empt the competition is high when either the market size is small or their cost advantage is substan- tial. iii Preface This section presents a statement of co-authorship for the works provided in this disserta- tion. The research conducted in Chapter 2 was initiated based on the results of series of weekly discussions I had with Prof. David Gillen. I contributed to this research by devel- oping the econometric models, coding the models in computer, and interpreting most of the results. A version of the Chapter is prepared and submitted for publication. In addition to supervising the whole project, Prof. David Gillen has edited the manuscript of the paper and wrote parts of the literature review. The work in Chapter 3 was identified and initiated by me. Some of the proofs and results were derived as a result of couple of meetings I had with Dr. Masoud Kamgarpour. A version of this paper is currently being prepared for submission for publication. Prof. David Gillen has edited the manuscripts and supervised the project. The work found in Chapter 4 was initiated in collaboration with Changmin Jiang and Prof. David Gillen. Initial results in Section 4.2.1 and Section 4.2.2 were developed in collaboration with Changmin Jiang. Results in Section 4.2.3 are mainly developed by me. Section 4.1 is mainly written by Changmin Jiang and the rest of the Sections in Chapter 4 are written by me. Prof. David Gillen has contributed to editing the manuscripts, revising Section 4.1, and extending the literature review. Prof. David Gillen also directed and supervised the entire project. Many people including Prof. Robin Lindsey and Prof. Nicole Adler provided invaluable comments and suggestions for this project. iv Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Estimating the Demand Responses for Different Size of Air Passenger Groups 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Poisson Distributions . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 More Advanced Models for Count Data . . . . . . . . . . . . . . 10 2.2.3 Compound Poissons and Challenges . . . . . . . . . . . . . . . 11 2.3 Data Specification and Research Question . . . . . . . . . . . . . . . . . 13 2.4 Econometric Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.1 Model 1 - Compound Poisson Process . . . . . . . . . . . . . . 15 2.4.2 Model 2 - Compound Poisson Process with Error Terms . . . . . 17 v Table of Contents 2.4.3 Constructing Confidence Intervals . . . . . . . . . . . . . . . . . 18 2.4.4 Pseudo R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4.5 Parameters to Be Estimated and Possible Sources of Endogeneity 19 2.4.6 Computational Challenges . . . . . . . . . . . . . . . . . . . . . 20 2.5 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 30 3 Decompounding Poisson Random Sums Via Arithmetic . . . . . . . . . . . 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.1 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . 33 3.1.2 Relevant Literature . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Clarifying Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.1 Evenly Spaced Observations . . . . . . . . . . . . . . . . . . . . 37 3.2.2 Unevenly Spaced Observations . . . . . . . . . . . . . . . . . . 37 3.2.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Estimation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.1 Decompounding When Set A Has Two Elements . . . . . . . . . 38 3.3.2 Decompounding When Set A Has More Than Two Elements . . . 46 3.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1 Simulation for A = {M1,M2} and gcd(M1,M2) = 1 . . . . . . . . 49 3.4.2 Simulation for A = {1,M2} . . . . . . . . . . . . . . . . . . . . . 50 3.4.3 Simulation for Non-equal Intervals between Successive Observa- tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.5 Summary and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4 Hierarchical Decision Making and Franchising: A Mechanism to Compete both Aggressively and Softly . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.1 The Structure of Strategic Interaction . . . . . . . . . . . . . . . 63 4.2.2 Homogenous Brands . . . . . . . . . . . . . . . . . . . . . . . . 65 vi Table of Contents 4.2.3 Heterogeneous Brands . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 76 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Future Research Suggestions . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.1 Future Research Suggestions for Chapter 2 . . . . . . . . . . . . 79 5.2.2 Future Research Suggestions for Chapter 3 . . . . . . . . . . . . 80 5.2.3 Future Research Suggestions for Chapter 4 . . . . . . . . . . . . 80 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Appendices A Proof of Lemma 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 B Vandermonde Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 C Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 D Simulation Results for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 93 E Nonlinear Demands - Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . 107 E.1 P = Q−T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 E.2 P = A − Q 1nT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 vii List of Tables 2.1 The results of applying the first model on the first set of data on hand . . . 23 2.2 The results of applying the second model on the first set of data on hand . 23 2.3 The results of applying the first model on the second set of data on hand . 24 2.4 The results of applying the second model on the second set of data on hand 24 2.5 Pseudo R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1 The horse kick data - estimated parameters . . . . . . . . . . . . . . . . . 50 3.2 The horse kick data - expected value versus observations . . . . . . . . . 50 viii List of Figures 2.1 Calculated price elasticities for each demand group . . . . . . . . . . . . 26 2.2 Percentage of lost demand with respect to base price $350 . . . . . . . . 28 2.3 Proportion of different groups of customers with respect to price . . . . . 29 2.4 Expected frequencies versus frequency of data . . . . . . . . . . . . . . 31 3.1 In each panel of figure 3.1, λ̂1 is estimated under the process with parame- ters λ1, and λ2, and arrival sizes of M1 and M2. The regression line shows the relationship between the variance of the estimators and N number of observations. The extremely high R2 suggests the variance of the intro- duced estimators decrease by order of N and is inline with expectations . 54 3.2 In each panel of figure 3.2, λ̂2 is estimated under the process with parame- ters λ1, and λ2, and arrival sizes of M1 and M2. The regression line shows the relationship between the variance of the estimators and N number of observations. The extremely high R2 suggests the variance of the intro- duced estimators decrease by order of N and is inline with expectations . 55 3.3 The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 2, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . . 56 3.4 The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 2, (λ1, λ2) = (0.5, 1). . . . . . . . . . . . . . . . . . . . . . 57 4.1 Decision of the most efficient brand for different CA and n. Region A corre- sponds to conditions where the efficient brand will not open any franchise. In area C it will open 1 franchise and in area B it will produce more than one franchise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 ix List of Figures D.1 The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 3, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . . 93 D.2 The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 2, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . 94 D.3 The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 3, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . 95 D.4 The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 2, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . 96 D.5 The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 3, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . 97 D.6 The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 2, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . 98 D.7 The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 3, (λ1, λ2) = (1, 1). . . . . . . . . . . . . . . . . . . . . . 99 D.8 The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 3, (λ1, λ2) = (0.5, 1) . . . . . . . . . . . . . . . . . . . . . 100 D.9 The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 2, (λ1, λ2) = (0.5, 1) . . . . . . . . . . . . . . . . . . . . . 101 D.10 The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 3, (λ1, λ2) = (0.5, 1) . . . . . . . . . . . . . . . . . . . . . 102 D.11 The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 2, (λ1, λ2) = (0.5, 1) . . . . . . . . . . . . . . . . . . . . . 103 D.12 The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 3, (λ1, λ2) = (0.5, 1) . . . . . . . . . . . . . . . . . . . . . 104 D.13 The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 2, (λ1, λ2) = (0.5, 1) . . . . . . . . . . . . . . . . . . . . . 105 D.14 The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 3, (λ1, λ2) = (0.5, 1) . . . . . . . . . . . . . . . . . . . . . 106 x Acknowledgements Prof. David Gillen, my knowledgeable supervisor: I express my sincere gratitude and appreciation for your guidance, support, and encouragement throughout my entire Ph.D. journey. Your eternal patience, brilliant judgment, and valuable advice inspired and mo- tivated me from the very early stages of the Ph.D. program.Without your guidance and support this thesis would not be possible. Prof. Robin Lindsey and Prof. Nicole Adler: I owe my deepest gratitude to you for your knowledgeable advice and for serving on my thesis committee. You were always supportive and willing to help. Dr. Masoud Kamgarpour, Alberto Romero, and Changmin Jiang: It gives me immense pleasure to acknowledge you for being there whenever I needed. This thesis would not have come into reality if it were not for your invaluable advice, exceptional suggestions, and patience. Elaine Cho and Lorra Ward: please accept my special and sincere thanks. I could not recall a single incident that I asked you for help and been refused. I cannot say enough for your positive attitude, your willingness to help, and your sense of responsibility. All my wonderful family members, friends, and colleagues: A very special and heart- felt thanks to all my family members and friends whose unconditional love, friendship, and faith gave me such a pleasant time while working in UBC. xi Dedication To my lovely parents, Farideh and Hamid, and my amazing brother, Omid. xii Chapter 1 Introduction The airline industry has mastered setting different prices for different market segments. To maximize their profit, airlines use revenge management techniques and continuously vary the ticket prices against select criteria such as deviation from the expected booking curves and prices offered by competitors ([Su, 2007] and [Mantin and Koo, 2009]). The airline industry is not the only industry that adopts such a strategy. Hotels, car rental companies, and concert halls frequently adopt different prices for different market segments which show different demand elasticities. The price elasticity and price sensitivity of passengers with different trip purposes such as business or leisure have been studied extensively in the air transport literature. The summary of the results of hundreds of such research can be found in ([Oum et al., 1992] and [Gillen and Morrison, 2007]). One of the interesting questions to ask is whether price elasticities differ as the size of the travel group increases. Despite the anecdotal evidence that supports the assumption that the larger group size of travellers are more price sensitive in comparison to the smaller group sizes, we have not found a systematic empirical investigation on how price elasticities may differ among singleton passengers, couples, and larger groups of travellers. In Chapter 2, we use booking data of a long-haul route of a Canadian airline to estimate the price sensitivity of different group sizes. In addition to the price sensitivity, we also measure the sensitivity of different group sizes to seasonality, the economic cycle, and time to the flight. The booking data that was provided to us by a Canadian airline contained no information of group sizes.1 In order to overcome the problem of not having access to transactional level data, we considered Compound Poisson models and decompounding methods. In order to measure 1It is very rare to receive transactional level booking data, data with all actual records from the firms. 1 Chapter 1. Introduction the economic variables of interest such as price elasticity of demand, we considered the ex- ponential functional form for the parameters of Poisson distributions. We used numerical optimization to solve the maximum likelihood optimization problems; each mathematical optimization problem took 7 to 30 days to solve. Despite the difficulty of dealing with aggregated data, the data were unique in a couple of aspects. The airline was new to the route under study and had a significant market share. The observed volatile pricing of the airline had fares ranging from $100 to $700, which suggests the airline was changing prices to learn the demand function. As is discussed in section 2.4.5, such strategies will potentially alleviate the problem of endogeneity. Chapter 2 also discusses how to interpret the coefficients of compound Poisson prob- lems. In this Chapter, the formula for the elasticity is derived and it is shown that the elasticity is a function of price. After developing the models and applying them to the data sets, it was found that singletons were the least price sensitive group. After singletons, couples showed the least price sensitivity. Finally, as expected, the groups of size 3 were less price sensitive than groups of size 4. In this study, the weighted average of the elasticities of all groups was between 0.31 and 2.03. This number was in line with findings in air transportation literature. [Oum et al., 1992] reported that the results of air travel elasticity estimated by choice models vary between 0.18 to 0.62. The result of time series models also suggest that the price elasticity of air travel demand vary between 0.42 to 1.98. [Gillen and Morrison, 2007] also reported that the demand elasticity of long-haul domestic leisure travellers reported in literature were between 0.44 and 3.2. In addition to price elasticities, Chapter 2 shows the sensitivity of each group to changes in the economic cycle. It is found that singleton demand is least sensitive to economic down turns whereas groups of size three are the most sensitive. The weighted average of sensitivity of customers to recession suggested that on average, the late 2000’s recession has resulted in a 20% decrease in demand. The latter finding is in line with empirical evidences (see [“Air passenger numbers”, 2009]). The most challenging problems in dealing with compound Poisson models in Chapter 2 were the computational challenges. The likelihood functions we used had more than 2 Chapter 1. Introduction 1, 000, 000 non-linear terms. We were able to estimate over 20 parameters for data sets with more than 40, 000 data points with over a month of computation. It seemed that working with larger data sets and larger candidates for group sizes is computationally infeasible. Chapter 3 was developed to improve on the current techniques used to estimate com- pound Poisson parameters. [Panjer, 1981] introduced one the most famous algorithms for decompounding. The algorithm is called the Panjer recursion algorithm. One of the most impressive facets of the Panjer recursion algorithm was that even in 1982, pocket calculators could be used to solve small size problems (see e.g. [Gerber, 1982]). [Buchmann and Grubel, 2003] used the Panjer algorithm and introduced plug-in esti- mators based on Panjer regression algorithms. Later, in [Buchmann and Grubel, 2004] the authors introduced a more advanced version of their plug-in estimators. Despite simplicity and elegance of the Panjer recursion algorithms, the application of such algorithms is restricted to some strong assumptions. For example, Panjer recursion algorithms can only work if we have some zero observations. In the absence of zero ob- servations the algorithm fails to provide any information on any of the parameters. This problem can sometimes become quite serious. For example, the data set we had on hand for the empirical research being introduced in Chapter 2 did not contain any zero observa- tions. In order to use the Panjer regression algorithms the data should also be recorded in equal intervals. Again, in the absence of data recorded in equally spaced intervals, the Panjer regression method does not provide any practical solution. For instance, the data set we used in Chapter 2 was recorded in irregular and non-equal intervals. The last but not least restriction of applying Panjer recursion algorithms is that it can only be applied for four classes of distributions: Poisson, Binomial, Negative Binomial, and Geometric (see [Sundt and Jewell, 1981]). In Chapter 3 we introduce a new class of decompounding techniques. We discuss how it is possible to use the properties of Integer numbers and number theory to solve certain class of compound processes. For example, it is discussed how such an approach will overcome the zero observation problem. It is also shown that by applying this new 3 Chapter 1. Introduction methodology, one is able to decompound Poisson parameters of a Poisson distribution even if it is compounded by an unknown distribution. The simulation results also show how such methodologies can be used under certain circumstances to estimate Poisson parameters even for irregular and unequal time intervals. In Chapter 3, numerical simulation is used to test the efficiency of the models. We discuss the limitation of using the new approach. Couple of open questions of Chapter 3 are discussed in Chapter 5. Chapter 4 focuses on the incentives of competing suppliers to open company owned and non-company owned franchises. Competition between firms can form in different dimensions such as pricing, quantities, service levels, and access. Both theoretically and empirically, the outcomes of monopolistic markets as well as fully competitive markets are clear. However, the outcomes of oligopolistic markets are not as clear. The degree of firm independence as well as the governance structure affect the equilibrium outcomes. One of the interesting observations in oligopolistic markets, where franchising is a common practice, is that the proportion of company owned units and franchised units are stable in most of the brands (see [Lafontaine and Shaw, 2005]; [Cliquet, 2000]; [Windsperger, 2004]). Though franchising has been proven to be a successful business model, such a strategy can only be found in some industries and is completely absent in others. Economic theory predicts that if the number of independent competing units increases, profit decreases and eventually converges to zero. However, even in industries such as the coffee shop industry, where the number of franchises are exceedingly high, we still observe positive profits. Such an observation suggests that these brands may not be competing fiercely with each other. This may suggest that only cost efficient brands are opening franchises [Barros and Perrigot, 2007]. Numerous papers have studied different aspects of franchising including the pattern of franchising systems and contractual structure of franchises ([Lafontaine, 1992]; [Brickley and Dark, 1987]; [Manolis et al., 1995]; [Dant and Kaufmann, 2003]) yet some mysteries such as why there exists a stable plural form (a mixture of company-owned retailers and franchises) has remained unexplained. After the seminal paper of [Salant et al., 1983], [Schwartz and Thompson, 1986] were the first to propose that divisionalization could be a tool that could be used to credibly 4 Chapter 1. Introduction threaten any entry. [Polasky, 1992] showed that with linear demand and costless division- alization, the competing brands will continuously expand the number of their divisions and as a result the outcome of the market even with two competing brands will converge to full competition. Several attempts have been made to solve the seemingly unreasonable predictions made by [Polasky, 1992]. For instance, [Baye et al., 1996a] argued it is costly to open new franchises. By introducing cost in opening new units (franchises or divisions) there exists a Nash equilibrium where each brand will choose multiple but finite number of di- visions. [Yuan, 1999] tried another approach and argued that product differentiation can ensure the existence of a sub-game perfect Nash equilibrium where firms tend to open multiple yet finite number of franchises. To our knowledge, none of the current literature is capable of explaining the unreason- able findings introduced based on the key assumptions originally made by [Polasky, 1992]. Chapter 4 argues that hierarchical decision making and franchising may be the answer to this 20 year old dilemma. It is shown that even under the extreme assumptions of [Polasky, 1992] on linearity of demand and costless divisionalization, hierarchical decision making can deter competing firms from over-divisionalization. It will also demonstrate why some brands, such as McDonald’s, have both company owned outlets and franchises at the same time. In Chapter 4 we argue that hierarchical decision making in addition to the mixture of company owned outlets and franchised units can serve as a unique strategic tool to both compete fiercely with less cost efficient brands and at the same time not fiercely compete with seemingly similar cost efficient brands. In the end, it will be also explained why usually only most efficient brands adopt franchising practice. 5 Chapter 2 Estimating the Demand Responses for Different Size of Air Passenger Groups 2.1 Introduction The airline industry has turned price differentiation almost into an art form as it is set- ting prices for different market segments and varying these prices against select criteria on a continuous time basis; (see for example [Su, 2007] and [Mantin and Koo, 2009]). This practice, called yield management, is now also applied in pricing hotels, rental cars, concert tickets, and in other markets where there are a number of market segments with differing demand elasticities. There is also intertemporal price discrimination. For ex- ample, as the departure date approaches, airfares generally rise but not at the same rate across all fare classes. Such yield management requires highly detailed data as well as sophisticated forecasting systems and measures of customer responsiveness to changes in prices. The price sensitivity of passengers with differing trip purposes (i.e. business, leisure, visiting-friends and relatives) has been examined in both the economics and yield manage- ment literature. One can find the summary of hundreds of papers related to price sensitivity of different classes of customers and different routes in ([Oum et al., 1992] and [Gillen and Morrison, 2007]). Although there is anecdotal evidence that price sensitivity differs as the size of the travel group increases, there has not been a systematic rigorous investigation of whether and by how much price elasticities may differ among singletons, couples and larger groups of travellers.2 Additionally, there is the challenge of how such differences in 2Some demand models have included a variable for the size of the group in the estimation equation. For example, [Alperovich and Machnes, 1994] reported they have included family size in the regression models 6 2.1. Introduction price sensitivity would be revealed. In case of business versus leisure travel, airlines can identify which passenger fits into which group; last minute bookings, short trips, time of flight, choice of destination are all possible means of discriminations. Booking data, unless one has access to actual records with the firm (i.e. airline, hotel, car rental, etc.), do not provide information on group size. Perhaps if the time of purchase and seat choice could be observed, one could make a judgment that three seats purchased at one time for the same flight is a group of three traveling together but it may not necessarily be true. There is also nothing to reveal whether the group of three is more or less price sensitive than a couple for example.3 An airline will observe the booking curve for a given flight and adjust fares depending on the degree to which the booking curve is deviating from the forecast booking behaviour. The research questions asked in this chapter are: if there are differences in the price sensitivity contingent on the size of the group traveling together and second, how demand functions for different sizes of purchasing (in our case passenger) groups can be estimated when only aggregated data are available. A key requirement for a successful revenue management system is accurate demand information including measures of the sensitivity of demand to changes in prices, service and other factors affecting demand. For example, when purchasing tickets, singletons behave differently from couples, or other groups of customers due to the simple fact there are fewer constraints. Knowing how different groups of customers are affected by different demand variables can equip airlines with the information to effectively price differentiate among different groups of customers. As an example, they can provide special pricing packages in special seasons to families with 1 or 2 children in order to motivate them to travel more or travel to select destinations. [Davis, 1994] showed American airlines generated over $1.4 billion over the course of 3 years due to successfully applying revenue management techniques. This resulted in a sizable $891 million in net profit. Such a tremendous increase in profit undoubtedly jus- tifies why airlines were among the first industries that applied sophisticated mathematical and operations research pricing tools. The results of this chapter suggest significant - in cases near 10 times - differences in price sensitivity among different groups of customers but they did not discuss it since their coefficient turned out insignificant. 3Airlines do offer special pricing to larger groups but these are generally nine or more. 7 2.2. Count Data can potentially be translated into an opportunity for airlines to reap more profit by consid- ering the size of the travel group in both price differentiation and yield management. As an example, based on the data used here, the airline could increase profit by giving discounts to larger groups in times of economic downturn while maintaining the price charged to singletons. Currently pricing assumes all passengers are singletons and an average elastic- ity is used in price setting which results in fare adjustments for all passengers regardless of group size. One of the properties that is used extensively in this chapter is the property of integer number and count data. Section 2.2 provides an introduction to count data models and describes their role and importance in developing the models used in the estimations. Sec- tion 2.3 describes the specifications of the data, the empirical research question this data is used to address, and describes why compound Poisson processes and decompounding techniques are the most appropriate models among all possible candidates. An important contribution of this chapter and this dissertation is the econometric method used in the estimation. Section 2.4 describes the development of the method and why it is appropriate for both the research question asked and the type of data available. Section 2.5 reports the empirical results, provides methods to use the results of the models to compute the elasticities and marginal effects, and illustrates how the results can be used in developing pricing and other revenue strategies. A summary is provided in Section 2.6. 2.2 Count Data 2.2.1 Poisson Distributions Data counted in the form of integer numbers are referred to as count data. Usually count data are non-negative numbers; for instance, the number of days individuals are absent from work are considered as count data. Upon reflection, a significant proportion of data used in the operations management and transportation literature are count data; number of people in a queue, number of available seats in a particular flight, the inventory level, etc are all count data. In general, anything that can be counted as discrete entities can be regarded as count data. 8 2.2. Count Data The Poisson distribution is commonly used to represent the distributions of count data. This distribution has the feature that if the expected number of occurrences in a prede- fined interval is λ > 0, then based on the Poisson distribution the probability of exactly k occurrences is computed by: f (Y = k; λ) = λke−λ k! (2.1) An interesting, but restrictive characteristic of Poisson distributions is that their mean and variance are both λ. This property allows us to fully identify the whole distribution by knowing the expected number of occurrences.4 Another attribute of Poisson distributions is regularity when time intervals are changed. If an interval which is t times the original time interval is considered, the number of k occurrences resulting from such a time interval change can easily be computed as a Poisson process with parameter λt: f (Y = k; λ, t) = (λt)ke−λt k! (2.2) It is possible to consider functional forms for parameters of Poisson processes.5 Adding functional forms to the parameter(s) makes it possible to analyse how different factors change the count data outcomes. For example, by defining λ = f (X1, X2, X3), the way that changes in X1, X2, and X3 affect the outcome, Y , could be analyzed. One of the main issues with considering functional forms for the parameter(s) is the λ > 0 restriction. One of the most commonly used transformations which ensures the transformation result will lie in the ]0,∞[ interval is λ = eθ1X1+θ2X2+...+θnXn+θn+1; (see [Hausman et al., 1984] and [Gourieroux et al., 1984]). It is also possible to introduce an error term to the functional form. However, it is essential that the final result never lies out of ]0,∞[ interval. One of the common ways to add such an error term is to consider the functional form of η = e , where  has any 4However, such a property restricts us from using Poisson distributions for data that exhibits a significant difference between the mean and variance. 5A useful feature of Poisson processes is that the sum of Poisson processes is still Poisson; for exam- ple when two or more Poisson processes Y1,Y2, ...,Yn with respective parameters of λ1, λ2, ..., λn are added together, the resulting process is still Poisson with parameter Λ = ∑n i=1 λi 9 2.2. Count Data arbitrary distribution, with mean zero and variance σ2. The Poisson distribution with a random parameter is called Mixed Poisson distribution.6 There are several ways to estimate the parameters θ1, θ2, ..., θn+1. Maximum Likelihood, Pseudo Maximum Likelihood (see [Gourieroux et al., 1984]), and Normal Distribution ap- proximation are among the most common estimation techniques. If λ is large enough (say more than 1000) the Poisson distribution can be approximated by a normal distribution with mean and variance λ. Square root of the Poisson distribution with parameter λ > 10 can be approximated by a normal distribution with mean √ λ, and variance 14 . 2.2.2 More Advanced Models for Count Data In addition to Poisson distributions, there exists more advanced and flexible models that can be applied to count data; negative binomial regression models are the most well-known models for count data after Poisson models. Negative binomial regression models are usu- ally used to solve overdispersion (mean  variance) problems. There also exist method- ologies to deal with common problems that may occur while dealing with certain types of count data such as excess number of zeros, and underdispersion. Zero inflated Poisson, zero inflated Negative Binomial, Gamma distributions (see [Oh et al., 2006], [Winkelmann and Zimmermann, 1995], and [Lord et al., 2005]) are the most common distributions used to solve the problem of excess zeros and underdispersion. One can refer to [Lord and Mannering, 2010] for a detailed list of most relevant models used for count data in trans- portation science and more specifically crash-data. Recently, due to the advent of more powerful computers, more sophisticated models such as finite mixture and Markov switching models received attention [Schnatter, 2006]. Since it may be easy to confound the notion of finite mixture models with compound Poisson models, we describe what finite mixture models do and then compare them with compound Poisson literature through an example. Assume a population is made up of m subgroups which are randomly mixed in propor- tion to the size of each group η1, η2, ..., ηm. Also assume a random feature Y is heterogenous 6Sometimes mixed Poisson processes, the Poisson processes with a random parameter, are incorrectly termed compound Poisson processes. 10 2.2. Count Data across and homogenous within each of the m subgroups. Y has a probability distribution, p(y|θ), with θ differing for each subgroup. If, without knowing the group indicator, we randomly sample from our population the mixture density will become p(y) = m∑ i=1 ηi p(y|θi) (2.3) One transportation example for mixture models is as follows. Assume passengers are all singleton and are either high or low income passengers. Assume, as is reasonable, that the price elasticity of these two groups differ. Assume you cannot observe the type of a customer directly, but that it is possible to observe all transactions, meaning you can ob- serve the price at which tickets are purchased in addition to some of the attributes of each passenger. Since each transaction is generated by one of the groups and with some proba- bility, applying finite mixture models will help to disentangle the specific distributions of each of the groups. 2.2.3 Compound Poissons and Challenges Another class of Poisson processes is Compound Poisson Processes. The probability distribution of the sum of “Poisson distributed number” of iid (independent identically- distributed) random variables is called a Compound Poisson Process. Let Y = ∑N i=0 Gi,N be Poisson distributed with rate λ, and Gi; i ≥ 0 are iid variables with distribution function F. It is assumed that F and N are independent. Compound Poisson processes are frequently used in number of fields including insurance mathematics [Lin and Pavlova, 2006], inventory models ([Srinivasan and Lee, 1991] , [Song et al., 2010] and [Nenes et al., 2010]), queueing theory [Carrillo, 1991], and modeling the scattering of charged particles [Ning et al., 1995], for example. Compound Poisson processes can be regarded as convolution of Poisson processes if wis be known, finite, and mutually exclusive real numbers that represent the ith group size and Nwis be independent Poisson distributions. N{t} = n∑ i=1 wiNwi{t} (2.4) 11 2.2. Count Data N{t} is a convolution of Poisson processes. In this chapter compound Poisson processes are represented by (2.4). In order to show the distinction between compound Poisson pro- cesses and mixture models consider the use of analogy used to depict mixture density. If you sample from a convolution of Poisson distributions introduced in (2.4), the compound density will become p(y) = ∑ {x1,x2,...,xn}∈Rn ηi p(xi|θi) ∀x′i s which satis f y ∑ xiwi = y (2.5) Now, consider a simple example of a compound Poisson context. Assume we are living in a world where the passengers are either Singletons or Couples. Also assume at the end of each day you can only observe the total volume of tickets being purchased; for example you can observe 5 tickets are sold at the end of day 1, 10 tickets at the end of day 2, etc. Apparently what is unobservable is how many tickets are bought by each group on each day: 5 tickets can be bought by 2 couples and 1 single passenger or 1 couple and 3 singletons, or 5 singles, etc, and that is not observable. This is what is depicted as a condition on xi’s in Equation (2.5) above. Therefore, in the absence of observing transactional level data finding the parameters which would allow one to identify the total volume of purchases made by each of the groups is contingent on considering all sets of feasible combination of xi’s. One of the most challenging aspects of considering compound Poisson distributions is to estimate parameters based on aggregated observations; such a procedure is called de- compounding. One of the best methods available for decompounding was developed by [Buchmann and Grubel, 2004]. This methodology is based on well-known Panjer Recur- sion algorithm, [Panjer, 1981]. Such methods are easy to use and efficient to estimate the parameters of compound Poisson processes when the parameters do not have any func- tional form. This method, and similar methods cannot be used to estimate λ when λ itself is a function of other parameters. In short, should λ = f (X1, ..., Xn) where n parameters need to be estimated, the methods cannot be applied. Without imposing a functional form on λ the effect of different exogenous variables and attributes on the arrival rate, λ, cannot be measured. Functional forms will be imposed on λ in Section 2.4. As will be discussed in Section 2.3, the data specification, the availability of aggregate observations instead of transactional level data, prohibited us from using any conventional 12 2.3. Data Specification and Research Question count data models, and required us to use computationally expensive models of compound Poisson processes and decompounding concepts. 2.3 Data Specification and Research Question The data is composed of passenger booking information for a given flight. For a given long-haul route both from A to B and from B to A, the data begin from the date the first seat is booked on a given flight and ends when the flight departs or is full.7 A challenging feature of the data is that for each flight the data are available as non- regular snap shots meaning there is not any regular pattern between the intervals that the data are reported; for example, it is possible to have a snap shot of data for Jan 19th-2010, Jan 21st-2010, and Jan 29th-2010. We know the time path of bookings and therefore know the overall number of tickets sold for a specific flight for a given period; we can trace out the booking curve for the entire booking history for a given flight number. In addition to the flight number, the time of the flight, the date of the flight, and the price of the tickets are reported. Generally, there is intertemporal price discrimination and price rises as the departure date approaches.8 The detailed data allowed us to create seasonal dummy variables in addition to the state of the economy dummy9 variable. Non-regular intervals between two successive snap shots added to the complexity of the estimation. The challenge is to estimate the price and time sensitivity for four different customer groups: singleton customers, couples, groups of size 3, or groups of size 4, while controlling for season and income (GDP) effects. The most challenging aspect of the data on hand is that they represent the aggregated number of tickets sold in a specific period of time. For example 20 tickets could be pur- chased in a 2 day interval. Other than for this aggregated number no additional information was available on the specific type of customers, i.e. how many of them were singleton cus- tomers, couples, couples with a child, or group of size 4.10 7There is no overbooking to be considered. 8Last minute travellers pay a higher fare than those booking weeks in advance of the flight. 9The data on hand covered three years and spanned to both before and after recession periods. The dummy variable was set to be 1 for recession periods and zero for non-recession periods. 10 We only considered four groups of customers and disregarded the bigger families because the prelimi- 13 2.4. Econometric Model The research question of this chapter is to estimate the price elasticity and character- istics of different groups of customers such as singletons, couples, etc in the absence of detailed information on how many tickets have been purchased by each of the groups of customers - or in the presence of aggregate information on demand in each period. Due to the data specification, and availability of aggregate observation, the only feasible model to use among all models discussed in Section 2.2 was the compound Poisson model. Access to detailed transactional level data is necessary in using any other count model discussed in Section 2.2. 2.4 Econometric Model We specify a basic model, and modify it to estimate the parameters of interest. To begin assume there are n independent processes and each process forms a Poisson process with rate λi ; 1 ≤ i ≤ n. Set A is defined as {w1,w2, ...,wn}, and it is assumed that wi ; 1 ≤ i ≤ n are non-identical integer numbers. Also Nwi{t} is defined as the number of events that have occurred for process wi by time t (starting from 0). Next define N{t} = ∑ i∈A iNi{t} = n∑ i=1 wiNwi{t} (2.6) N{t} is a convolution of different Poisson processes. N{t} can be understood using a trans- portation example. Consider there are three modes of transportation available for students to commute to campus: bicycles, cars, and buses. Bicycles carry only 1 rider, personal cars carry 4 riders, and buses can carry 30 riders. Assume that whenever any mode of trans- portation is used, the capacity is fully utilized. Therefore, each time a bicycle arrives, only one rider is added to the total population of the campus, cars add 4 to the total population, etc. In addition, assume that the total number of bicycles, cars and buses that commute to campus by time t, can be estimated by a Poisson process with parameter 1.2, 2, 4 re- spectively. In this example, A = {1, 4, 30}, λ1 = 1.2, λ2 = 2, λ3 = 4 and N{t} in Equation (2.6) represents the total number of students added to the total population of the campus by nary tests showed only small portion of customers were composed of groups of size 5 or more. 14 2.4. Econometric Model time t. In section 2.4.1 it will be shown that N{t} can be expressed as a compound Poisson process. The non-regular intervals (time periods) between two successive observations create complexity for the estimation. A solution to this problem is to use a well-known charac- teristic of Poisson distribution. If λ is considered as the Poisson process parameter for a unit time interval, then λnew = λt for an arbitrary interval t. In what follows we consider two different models: compound Poisson process without additional error terms and compound Poisson processes with additional error terms. 2.4.1 Model 1 - Compound Poisson Process Keeping in mind that the process of interest is Y = ∑n i=1 wiNwi{t}, we assumed Nis are independent from each other and also each Ni is a Poisson process. In order to introduce the first model, two assumptions are made. The first assumption is the customers are either singletons, couples, couples with a child, or couples with two children; assumed A = {1, 2, 3, 4}. The second assumption is the aggregate number of customers, Y , in each snap shot is less than 25, so any observation larger than 25 is censored in the empirical analysis.11 The λis, the Poisson parameters of Nis, are assumed to have the functional form of (2.7) λi = e ∑Q q=1 θq,iXq (2.7) Xqs are independent variables and θq,is are parameters to be estimated. Throughout the chapter, aggregate observations are all integer numbers. In addition to the variables, only Y - the aggregate amount of purchased tickets - were available; in this situation it was not possible to know the total number of events produced by each process. Finally we assume the likelihood functions to be introduced in Equation (2.8) and (2.9) have unique optimizers.12 11Section 2.4.6 will describe why this assumption should be made. 12The global optimal solutions of Maximum Likelihood functions are consistent estimators for parameters of interest. There only exists one consistent estimator for parameters of interest if the Maximum Likelihood problem has a unique solution. 15 2.4. Econometric Model The following notation is used in specifying the likelihood functions. Define set: GY as the combination of all sets of gi ∈ Z such that ∑i wigi = Y . For example, for A = {1, 2, 3}, w1 = 1,w2 = 2, and w3 = 3, and G5 = {{5, 0, 0}, {3, 1, 0}, {1, 2, 0}, {0, 1, 1}, {2, 0, 1}}. Since 5 = 5 ∗ 1 + 0 ∗ 2 + 0 ∗ 3, the first subset of G5 is {5, 0, 0}. Gi, j corresponds to jth subset of set Gi, and gi, j corresponds to jth element of ith subset of Gi. For instance, G5,1 = {5, 0, 0}, and g1,1 = 5, g1,2 = 0, g1,3 = 0, etc. It is now possible to introduce the likelihood function for k observations. l = ∏ k size{GYk }∑ j=1 4∏ i=1 (λiTk)g j,ie−λiTk g j,i! ; λi = e ∑Q q=1 θq,iXq (2.8) And the log likelihood function can be written as L = log l = ∑ k log( size{GYk }∑ j=1 4∏ i=1 (λiTk)g j,ie−λiTk g j,i! ); λi = e ∑Q q=1 θq,iXq (2.9) In this setting, Xqs are observed variables, and θq,is are the parameters needed to be estimated. It is not feasible to further simplify the likelihood functions. Therefore, finding the closed form solution for asymptotic results is not possible. However, it is feasible to write the objective function as in (2.9) for the likelihood function, and run the maximiza- tion/minimization problem. Showing that L in (2.8) has a bounded solution is straightforward. L, and all elements of L are smooth functions, therefore, are differentiable in all real numbers. Moreover, for vector Xq > 0 it is straightforward to show: lim θq,i→±∞ L = −∞ (2.10) lim θq,i→−∞ ∂L ∂θq,i = 1 (2.11) lim θq,i→+∞ ∂L ∂θq,i = −∞ (2.12) It is also easy to show that if θq,is are set to zero, L has a bounded solution which is always superior to the results of (2.11). This fact together with Equations (2.10), (2.11), and (2.12) are sufficient for the Maximization problem of (2.8) to be bounded. 16 2.4. Econometric Model 2.4.2 Model 2 - Compound Poisson Process with Error Terms Generally, adding the error term to the Poisson distribution will make the computations complex, though the complexity provides a benefit in that the models will become more flexible. It is well known that the expected value and the variance of Poisson distributions are identical which can be restrictive. However, adding an extra error term to the Poisson distributions will give us the flexibility of having expected values different from the vari- ance of the models. A necessary property of the error terms that are added is that they must be positive. This is because the error term is added to a Poisson parameter and these parameters should be greater than zero. One functional form to consider is ei , and any arbitrary distribution could be regarded for the is. For the sake of convenience, assume that ui = e(i) is distributed as a gamma distribution γ(α, β), with pdf: uα−1e(−u|β) βαΓ(α) (2.13) By assuming a gamma distribution the computations are simplified. By setting E(e(i)) = 1 and β = 1 α , it is straightforward to show β = 1 α = η2 = Variance(ei); see, Hausman et al. [1984]. Moreover, the conditional distribution of Poisson occurrences, g j,is, with Pois- son parameter λi for a time interval Tk given independent variables will form a negative binomial distribution given as Γ( 1 η2 + g j,i) Γ( 1 η2 )Γ(g j,i + 1) (η2λiTk)g j,i(1 + η2λiTk) −( 1 η2 +g j,i); λi = e ∑Q q=1 θq,iXq (2.14) where: Γ(z) = ∫ ∞ 0 tz−1e−tdt Assuming the error terms for each Poisson process are independent of each other, and the variance of the Poisson process i is η2i , the likelihood function can be written as: L = log(l) = log ∏ k size{GYk }∑ j=1 4∏ i=1 Γ( 1 ηi2 + g j,i) Γ( 1 ηi2 )Γ(g j,i + 1) (ηi2λiTk)g j,i(1 + ηi2λiTk) −( 1 ηi2 +g j,i); λi = e ∑Q q=1 θq,iXq (2.15) 17 2.4. Econometric Model or: L = log(l) = ∑ k log size{GYk }∑ j=1 4∏ i=1 Γ( 1 ηi2 + g j,i) Γ( 1 ηi2 )Γ(g j,i + 1) (ηi2λiTk)g j,i(1 + ηi2λiTk) −( 1 ηi2 +g j,i); λi = e ∑Q q=1 θq,iXq (2.16) Maximum Likelihood procedures enable us to estimate θq,is and ηis in the model. 2.4.3 Constructing Confidence Intervals To begin, θ̂ is defined as the vector of estimated parameters using MLE. If certain condi- tions hold, the maximum likelihood estimator will have the following asymptotic result.13 √ n(θ̂ − θ0) −→d N(0, I−1) (2.17) where I = E[5θθ ln f (x|θ0)], f(.) is the distribution function, and x, is the vector of observations. θ is the vector of parameters of the model and I is called a Fisher information matrix.14 2.4.4 Pseudo R2 Since the functional forms of the demand processes that are modelled are highly non- linear, it is not possible to use a standard R2. Instead, Pseudo R2s can be used to measure 13The conditions which need to hold in order to find the variance-covariance matrix are as follows: (see [Newey and McFadden, 1994]) 1. f (x|θ) > 0 is twice differentiable and continuous in some neighbourhood of θ̂ 2. ∫ || 5θ ||dx < ∞ (5 is the gradient operator) 3. ∫ || 5θθ ||dx < ∞ 4. I = E[5θ ln f (x|θ) 5θ ln f (x|θ)′] exists and is non-singular 5. E[supθ|| 5θθ ln f (x|θ)||] < ∞ . 14The computational aspects of the problem are discussed in Section 2.4.6. 18 2.4. Econometric Model how much of the variability of the data are captured with the proposed models. However, there is no unique acceptable way to compute Pseudo R2s. [Veall and Zimmermann, 1996] summarize the set of current methods for computing it. The Pseudo R2 calculated here uses the method introduced by [Mcfadden, 1973]. According to [Veall and Zimmermann, 1996] the most commonly used Pseudo R2 is McFadden’s.15 In this methodology, the likelihood values of proposed models are compared with the likelihood value of a model composed of only one variable; the less the difference, the worse the fit of the models. 2.4.5 Parameters to Be Estimated and Possible Sources of Endogeneity As explained earlier, singletons, couples, groups of size 3, and groups of size 4 are four groups of customers considered in this research. The objective is to estimate the price sensitivity of different types of customers, the sensitivity of different groups of customers to the time to the flight, the sensitivity of different groups of customers in recession and non-recession periods, and also the sensitivity of different groups with respect to season- ality. Price is generally considered endogenous in airline markets. It is a decision variable of airlines and does affect the customers’ decision on whether or not to travel. The market structure will also influence the level and structure of fares. Fortunately, the data used in this chapter are from an airline that recently entered a route. With a lack of competition, the airline learned the demand function in the market by varying prices. The large difference in prices, which vary from $100 to $700, would support this assumption. The data also suggest that the airline is not using any source of price differentiation, and also assures that prices are not set based on characteristics of different groups. This will also alleviate the endogeneity problem. Finally, there is only one fare class - economy. 15It was also the only Pseudo-R2 provided by STATA (1995). 19 2.4. Econometric Model 2.4.6 Computational Challenges By continual increases in computational power, and increasingly efficient numerical meth- ods being introduced, problems which seemed intractable a few years ago, are now being solved. For current research, we used a six-core Xenon-based workstation to estimate the parameters and construct the confidence intervals. Depending on the size of data and ini- tial values, the time it took to estimate the parameters varied. The data sets are depicted in Section 2.3 and had 42007 and 43918 observations respectively. The process time for the mentioned cases varied from 2 to 30 days. This research is among the first - if not the first - that handled such a large data set composed of over 40,000 observations - and estimated more than 20 parameters for each model in a decompounding context. It is worth noting that the optimization problems contained more than 1 million nonlinear terms. We believe such a computational achievement can motivate researchers to use some of the models developed 30 years ago and left mostly as theories due to complexity in computation. In order to construct confidence intervals for the estimated parameters, the hessian matrix of the process for every single observation must be evaluated.16 From the computa- tional perspective, the main difference between constructing the confidence intervals and finding the parameters of interest is the ability to use parallel versus serial processing. It is feasible to divide a data set into many sub groups and use different processors to compute the hessian matrix for each sub group of data, and then combine the results. This means, if a six-core processor is being used, all six cores can be used while constructing the confidence intervals by dividing data in 6 sub groups and subsequently combining the results. In contrast, to estimate the parameters, the objective function in the maximization or minimization problem consisted of all the observations in the data set. Therefore, the computation process in parameter estimation is a serial computation. Since the objective function is non-linear, and the approximation methods that are used for this research are all numerical, the initial values are important. The objective function is not necessarily concave and therefore there is always the possibility of local maximums/minimums. One method that can be used to ensure the results are not local 16The process takes as much or sometime more time than finding the parameters. 20 2.5. Empirical Results maximum is to use significantly different initial values, and if the results after using differ- ent initial values converge to one solution, it is likely this is the global optimum solution.17 Recall the functional forms of the Poisson parameters are in the form of λi = e ∑Q q=1 θq,iXq . Finding large negative values for θq,is, for any specific i, will ensure large confidence intervals for θq,is. In these cases the cells of the Fisher index will become quite small and therefore will assure high value for the inverse form of the matrix. Finding such high values will result in large confidence intervals that are not desirable. Standardizing the data set, mapping all data to [−1, 1] interval can reduce the computa- tional time. Dividing all of the data points by the maximum value of the absolute numbers of the set will map data between -1 and 1. The last comment on solving the Maximum likelihood optimization problem is on large factorial numbers. It is necessary to censor observations that have values over 25. Factorials are growing very fast and will result in large computational errors. Fortunately, in this research, more than 99% of observations had values less than 25. 2.5 Empirical Results There are two measures of demand sensitivity that are of interest, the price elasticity, and the marginal effect. In this research, though it is not feasible to show elasticities and marginal effects in a conventional way, elasticities and marginal effects can be computed for any point of observation. The complex structure of the functional forms precludes the simplification of these two distinct definitions. The only straightforward concepts that can be taken into account are the signs of parameters of interest. Since Exp() is a monotonic and increasing function, a positive sign in the parameter affirms a positive relation between the corresponding variable of the parameter with the outcome; the variable increases, the outcome will increase. Similarly, a negative coefficient asserts a negative relation between the variable and the outcome. 17If by using different initial values one ended up with different results, as many initial values as possible should be tried and be compared to the objective values of optimum solutions to decide which one is globally optimum (or at least closest to the globally optimum point). In this research, fortunately, consistent results have been obtained. 21 2.5. Empirical Results Economic theory predicts a negative relation between the price and purchases. Eco- nomic intuition also suggests that a negative correlation would exist between the time to the flight and number of tickets purchased. The underlying rationale is last minute trav- ellers are generally business travellers and these are also generally singletons. Larger groups require more coordination and therefore can and generally will purchase tickets well before the flight departure date. Ticket prices also tend to be lower up to within two weeks of the flight. It is also well documented that as GDP or the growth of GDP goes down, air travel declines; 2009 is a good example of precisely this phenomenon when with the recession and financial crises air travel fell in some markets by more than 20 percent (see [“Air passenger numbers”, 2009]). Finally, for a given price and given period of time to the flight, and same economic situations, we expect to have more ticket purchases in the June to November period or the Christmas break. For different groups of customers we can also have some predictions. Business trav- ellers are generally traveling alone, whereas family members, and couples mostly travel as leisure passengers. Therefore, singleton passengers are expected to be less price sensitive than couples, families of size 3, and families of size 4.18 For the second model specification, with a separate variance term, some predictions can also be made before running the model. The singleton group is composed of travellers who could be business or leisure travellers. This heterogeneity may bring about some variations in purchasing behaviour. In the case of couples, or two passengers who are traveling together, they can also be two business passengers traveling together, two leisure passengers traveling with each other, or a business passenger traveling with his/her spouse. So again we expect to observe more variance in the behaviour of couples than singletons. This trend can be observed in its extreme case for groups of 4. The most likely combination of customers for groups of 4, are either 2 couples traveling together, or a family of size 4. It is also possible that a family of size 3 travels with another passenger. However, the latter case is less likely than the first two scenarios. Due to the different nature of different groups of size 4, we do expect to observe more variability, therefore, more variance for the case of groups of 4. 18If fares are too high, families or couples may switch modes or not travel at all. A fare increase of $Xs would be an ‘n ∗ $X’ increase in total expenditures where n is the size of the group traveling. 22 2.5. Empirical Results The results of the models for two different routes are summarized in Tables 2.1, 2.2, 2.3, and 2.4. The coefficients are consistent with prior expectations. As a general rule, singletons are less price sensitive than couples, and couples are less price sensitive than groups of three and groups of three are less price sensitive than groups of four. Another result reflected in the second column of the tables illustrates the sensitivity of customers to the time to the flight. The results show, with everything else fixed, the closer is the time to the flight, the more customers are likely to buy tickets. The recession had a negative effect on all types of passengers. The results show that the effect of recession is moderately small for singleton passengers whereas the effect is large for other groups of passengers, most likely the leisure passenger. The output of the model shows how the volume of travel by singleton passengers remains the same in high season and low season and how other groups of passengers tend to buy more tickets in high season. Finally, as expected, the larger is the size of the group, the more is the variance of the behaviour of the group. Table 2.1: The results of applying the first model on the first set of data on hand n = 43918 Price Time to The Flight Economy Situation Seasonal Variable Intercept Singletons -0.1747 -2.1513 -0.0236 0.0322 0.0774 (0.0046) (0.0092) (0.0034) (0.0026) (0.0017) Couples -3.0882 -30.7051 -0.6076 -0.2996 1.3372 (0.0327) (0.3477) (0.0332) (0.0246) (0.0128) Group of 3 -17.8874 1.131 -0.2532 0.6817 0.4717 (0.1867) (0.1891) (0.0778) (0.0736) (0.0517) Group of 4 -2.0335 -5.7781 -0.2139 0.1882 -1.173 (0.0272) (0.0810) (0.0212) (0.0154) (0.0099) Table 2.2: The results of applying the second model on the first set of data on hand n = 43918 Price Time to The Flight Economy Situation Seasonal Variable Intercept η Singletons -0.4783 -3.0497 -0.0758 0.0705 0.8301 0.6171 (0.0546) (0.1205) (0.0546) (0.0315) (0.05140) (0.0022) Couples -4.3076 -43.0961 -0.7238 -0.49987 1.8336 1.3383 (0.0447) (0.2411) (0.0261) (0.0283) (0.0662) (0.0675) Group of 3 -4.4851 -65.797 -1.4017 4.9998 -3.4802 1.865 (0.0568) (0.4822) (0.0526) (0.0113) (0.0438) (0.0459) Group of 4 -9.4958 -1.1602 -0.2539 0.7126 -0.4884 3 (0.0682) (0.0314) (0.0363) (0.0238) (0.0042) (0.0053) For Poisson Processes with parameter λi = e ∑Q q=1 θq,iXq , it is straightforward to show 23 2.5. Empirical Results Table 2.3: The results of applying the first model on the second set of data on hand n = 42007 Price Time to The Flight Economy Situation Seasonal Variable Intercept Singletons -0.0569 -2.0152 -0.0034 0.01772 -0.0782 (0.0046) (0.0099) (0.0035) (0.0027) (0.0174) Couples -4.13879 -28.1669 -0.6647 -0.32375 1.5175 (0.0365) (0.3446) (0.03611) (0.0278) (0.0142) Group of 3 -1.6577 2.9581 -0.4461 1.15408 -5.1515 (0.1174) (0.1405) (0.0902) (0.0500) (0.0423) Group of 4 -4.1393 -6.3833 -0.3033 0.2571 -0.5863 (0.0341) (0.1020) (0.0252) (0.0189) (0.0122) Table 2.4: The results of applying the second model on the second set of data on hand n = 42007 Price Time to The Flight Economy Situation Seasonal Variable Intercept η Singletons -0.2453 -2.6365 -0.0884 0.0979 0.5803 0.7287 (0.0241) (0.1842) (0.0174) (0.0236) (0.0017) (0.0003) Couples -6.0666 -37.1022 -0.7022 -0.4999 2.2331 1.526 (0.0485) (0.4883) (0.0182) (0.0084) (0.0126) (0.0297) Group of 3 -8.5734 -49.0799 -1.295 4.9993 -2.1125 2.5491 (0.0975) (0.1992) (0.0243) (0.1011) (0.0548) (0.0384) Group of 4 -20.6108 -1.2711 -0.598 0.7287 2.5677 3 (0.0783) (0.0167) (0.0193) (0.0300) (0.0416) (0.0050) E(Y |X1, X2, ..., XQ, i) = λi = e ∑Q q=1 θq,iXq → ln(E(Y |X1, X2, ..., XQ, i)) = Q∑ q=1 θq,iXq → ∂ ln(E(Y |Xq∈{1,...,Q}, i)) ∂Xq∈{1,...,Q} = θq,i (2.18) Therefore, the θq,is provide a measure of the percentage change in demand given a unit increase in an independent variable. It is possible to calculate the change in demand for each of the four demand groups given a change in any of the three variables used in the estimation; ticket price, seasonality and time. We can show the expected percentage change in demand for groups of 1, 2, 3, and 4 due to change of $X in ticket price. Similarly it is possible to examine percentage difference in demand between high season and low season or the difference in demand between yesterday and today, or between today and tomorrow (two successive days) for each of the groups. The interpretation of the results in this way is, in our view, as informative as notions of elasticities and marginal effects. For example one can compute the percentage of demand loss of each group due to $X increase in the price. The calculations do require the data to be transformed back to their original values by dividing the coefficients by largest values 24 2.5. Empirical Results of price and the time variable since the data had been standardized to the ]0, 1[ interval. (1 representing the highest observed value) It is possible to compute elasticities as well as marginal effects but the values would be calculated for each data point. The elasticities can be calculated as: ∂ ln E(Y |Xq∈{1,..,Q},i) ∂ ln Xq∈{1,...,Q} = θq,iXq (2.19) The elasticities depend only on the values of the estimated parameters, and correspond- ing variables. It is possible to depict price elasticities of demand in a two dimensional space. The result that can be obtained from any Poisson regression is that by increasing the price, the elasticity will increase. The results are illustrated in Figure 2.1 for each of the demand groups.19 The marginal effect can be computed by the following equation: ∂E(Y |Xq∈{1,...,Q}, i) ∂Xq∈{1,...,Q} = θq,ie ∑Q q=1 θq,iXq (2.20) The marginal effects depend not only on each particular parameter and their corre- sponding values, but also depend on all the values of other parameters. As an illustration, using the first data set and the second model that was estimated, the following calculation can be made: • each $12 increase in the ticket price will result in a 1 percent decrease in demand by singletons. • each $1.42 increase in the ticket price will result in 1 percent decrease in demand by couples. • each $1.36 increase in the ticket price will result in 1 percent decrease in demand by groups of three. • each $0.64 increase in the ticket price will result in 1 percent decrease in demand by groups of four. 19In the tables the elasticities are calculated based on standardized data. For each group of customers, the coefficient of variable price represents the elasticity of demand for the highest observed price. 25 2.5. Empirical Results (a) Singletons (b) Couples (c) Group of size 3 (d) Group of size 4 Figure 2.1: Calculated price elasticities for each demand group It is clear that as the size of the group increases, the price sensitivity does as well. Any price increase is multiplied by the size of the group so that the amount paid can become quite large. These differences in price sensitivity provide an opportunity to use yield management across groups to increase total revenue per flight. Price differentiation can especially be used for those flights that have load factors less than 100 percent. At first glance such a difference in price sensitivity among different groups of cus- tomers may seem counter intuitive. To better grasp the meaning of the numbers, it is worthwhile to look at the proportion of different groups of customers in the market. In order to find the proportion of the customers that exist in each market, it is necessary to 26 2.5. Empirical Results solve the likelihood function introduced in Equations (2.8) and (2.9) with only 4 parame- ters, i.e only constants. After solving the MLE maximization problem, 65% of customers are singleton customers, 26% are couples, 4% are traveling as a group of three, and 5% are traveling as a group of four.20 Also, non-singleton, non-couple demand accounts for 9% of the total demand. We use some clarifying examples to show how the estimated parameters can be used to predict the sensitivity of different groups to increase in price. $100 increase in the price will be translated in a 1 − 0.99 10012 = 9% decrease in demand of singleton customers, 1− 0.99 1001.42 = 51% decrease in demand of couples, 1− 0.99 1001.36 = 52% decrease in demand of groups of three, and 1−0.99 1000.64 = 79% decrease in demand of groups of four. Therefore, the overall decrease in demand due to $100 increase in the price is 1 − ((1 − 9%) ∗ 65% + (1 − 51%) ∗ 26% + (1 − 52) ∗ 4% + (1 − 79%) ∗ 5%) = 26%. As Figure 2.1 illustrates, the prices of tickets in the data set have always ranged between $100 and $700. It is not surprising to see a $100 increase in an average price of $350 will result in 26% decrease, roughly a one quarter loss, in overall demand. Figure 2.2 shows the percentage of demand loss due to an increase in price with respect to a base price of $350. It is also possible to measure the relative proportion of different groups of travellers in each price. Calculated numbers shows us the proportion of singletons increases from the original 65% to (1−9%)∗%651−26% = 79%, the proportion of couples decreases from 26% to (1−51%)∗%26 1−26% = 17%, the proportion of customers who come in batches of three decreases from 4% to (1−52%)∗%41−26% = 2.5%, and the proportion of customers who come in batches of four decreases from 5% to (1−79%)∗%51−26% = 1.5% due to $100 increase of the price of tickets from the average price. Figure 2.3 shows how the proportion of different groups of customer changes due to an increase in price with respect to a base price of $350. Apparently the proportion of singletons increases and the proportions of the rest of the groups decrease by increasing the price. It is also possible to see how the timing of a flight will change demand. Our estimates suggest that for each week closer to the departure date, the demand by single customers will increase 11 percent; for couples, groups of three, and groups of four the corresponding 20Finding greater percentage for group of 4 in comparison with group of 3 might be caused by the fact that two couples also form a group of 4 27 2.5. Empirical Results Figure 2.2: Percentage of lost demand with respect to base price $350 values are 166 percent, 244 percent, and 4.48 percent respectively. These non-intuitive results seem to reflect the non-linear pattern of customer arrival times for couples and groups of three. For groups of 4 it seems intuitive since such a large group would have to plan well before the flight departure date. For couples and groups of 3 it may be a few will come into the market well before the departure date, then a large number show up, for example 3 to 8 weeks prior to the flight and then very few just prior to departure. Our model is not powerful enough to capture this trend, therefore, not reliable for time parameters. The estimated results based on Model 2 presented in Table 2.4 suggests that the average demand in the Great Recession for singletons, couples, groups of 3, and groups of 4, was correspondingly 1− 11+0.8 = 7%, 1− 11+0.70 = 41%, 1− 11+1.3 = 56%, 1− 11+0.60 = 37% less than non-recession periods. The weighted average demand loss of different groups of customers in that specific route suggests in great recession on average 20% of demand is lost. Using the same method and arguments for the Seasonal Dummy Variable suggests that the total demand of passengers increases 15% in high seasons.21 The interesting observation is that the effect of a recession on demand is larger than the effect of the seasonality shift. In other words the total demand loss in recession periods exceeds the demand loss due to low season effect. 21The results of Seasonal Variables suggested a 50% decrease in demand for couples. Such an odd result might come from strong collinearity between Price variable and seasonal Variable. The results for Seasonal variables are not as reliable as other estimated parameters. 28 2.5. Empirical Results (a) Singletons (b) Couples (c) Groups of Size 3 (d) Groups of Size 4 Figure 2.3: Proportion of different groups of customers with respect to price In order to compare the computed elasticities of each group with what exists in the literature, it is necessary to find a weighted average of all different elasticities of different groups of customers. As illustrated in Figure 2.1, depending on the price elasticity of singleton customers, the price elasticity varies between 0.07 for minimum price and 0.47 for maximum price. It also varies between 0.6 and 4.2 for couples, between 0.7 and 4.5 for groups of 3, and 1.6 to 9 for groups of 4. The weighted average of all elasticities results in elasticities between 0.31 at the minimum observed price and 2.03 at the maximum observed price. In [Oum et al., 1992], a summary of all empirical papers on passenger air travel elastic- 29 2.6. Conclusion and Discussion ities by 1992 were reported to be from 0.18 to 0.62 based on choice model methodology, and 0.42 to 1.98 for leisure travellers estimated with time series methodology, 1.52 for cross-section, and elasticity for business travellers were reported to be 0.65 in time series models, and 1.15 in cross-section methods. [Gillen and Morrison, 2007] also reported the summary of air travel elasticities found in the literature. The demand elasticity for long- haul Domestic Leisure travellers is reported to be between 0.44 and 3.2. The weighted average of elasticities of this research seems to be completely in line with what has been found in the literature. In order to compare two alternative models, Pseudo R2s are computed for each set of data and each setting of models. The results of Pseudo R2 are presented in Table 2.5. Table 2.5: Pseudo R2 Data set 1 Data set 2 1st Model 2nd model 1st Model 2nd model Pseudo R2 21% 24% 19% 21% The results suggest that the models with error terms are capable of capturing 2 to 3 percent more variability of the data in comparison with models without error term. Figure 2.4 shows the expected frequency predicted from the models with actual fre- quencies observed in the data set. The dotted lines correspond to models, and blue his- tograms correspond to actual observations. 2.6 Conclusion and Discussion This dissertation reports on research to estimate price sensitivity, sensitivity to time to departure, sensitivity to recession, and finally sensitivity to seasonal demand variability for four different groups of customer - singletons, couples, groups of size three, and groups of size four. The data were composed of two-way information on a Canadian long-haul route and were composed of aggregate demand, therefore were impossible to be directly used to estimate the demand sensitivities for each group of customers. 30 2.6. Conclusion and Discussion (a) Data Set 1 vs 1st Model (b) Data Set 1 vs 2nd Model (c) Data Set 2 vs 1st Model (d) Data Set 2 vs 2nd Model Figure 2.4: Expected frequencies versus frequency of data Two models based on compound Poisson processes were considered. Both models considered exponential functional forms on variables for Poisson parameters. The second model considered additional error terms. The estimated results based on that specific route suggest on average 65% of customers are singletons, 26% are couples, and only 9% are either groups of three or four. It also showed a huge difference between the sensitivity of customers of different groups with respect to price. The model predicts that adding $100 to the average price of $350 will result in overall 26% reduction in total demand and changing the proportions of customers to 79% singletons, 26% couples, and only 4% either group of three or group of four. 31 2.6. Conclusion and Discussion The results also show the impact of the recession has led to an overall 20% loss of demand. The loss of demand were the least severe in singletons with 7% decrease in demand and the most severe for groups of three with 56% loss in demand. The results also suggest that demand will decrease 15% in the low season. The difference in magnitude of the effect of a recession on different groups of customers suggests airlines can offset some of their loss in recession periods by offering different packages of prices to different groups of customers. The weighted average elasticity is computed to be in the range of 0.31 at the minimum observed price and 2.03 at the maximum observed price. For singletons it is between 0.07 for minimum price and 0.47 for maximum price. For couples, the price elasticity varied between 0.6 and 4.2, for groups of three it varied between 0.7 and 4.5, and for groups of 4 the variation was between 1.6 and 9. Due to such differences in the price elasticity of different groups, price differentiation can potentially increase both the airline’s profit and the total welfare.22 The results of the models, and their alignment with economic theory, affirms the valid- ity and strength of the models. The integer numbers are the key elements of the model. If the observations were not in discrete numbers, the models would not be useful. In our mod- els, each observation on aggregate number of passengers – each integer number – looks like the DNA which contains significantly more information than what could be imagined. Similar to DNA, such properties could be termed “the miracle of integer numbers”. 22In theory, third degree price discrimination will not necessarily lead to more welfare. Whether or not such a strategy results in more profit or more total welfare depends on specifics of the market under study (see [Varian, 1989]). 32 Chapter 3 Decompounding Poisson Random Sums Via Arithmetic 3.1 Introduction 3.1.1 Statement of the Problem Let S be a random variable representing the aggregate number of customers coming into a market in a given period of time. For example, assume S is the total number of customers visiting a specific location. Write S as the sum S = X1 + X2 + ... + XN , (3.1) where Xi is the total number of customers in ith arrival. For example, X1 = 1 represents the fact that the first customer has arrived in the market alone, X2 = 3 states the second group of customers were 3 customers who visited a specific store simultaneously, etc. The following assumptions are used in this chapter: (i) The variable S has a compound Poisson distribution; see [Adelson, 1966]. This means that S can be rewritten as S = ∑ i∈A i.Ni, (3.2) where Ni’s are mutually independent and each Ni has a Poisson distribution. (ii) The set A of possible types of customers (or bulk sizes) are known integers. (iii) Unless otherwise stated the data recording S is collected at equally spaced time intervals. 33 3.1. Introduction More precisely, let λi denote the Poisson parameter of Ni in Equation (3.2). The prob- lem addressed in this chapter, known as decompounding Poisson processes, is the follow- ing: How to recover λi from the aggregated observations? Under the above assumptions, this question is solved for the following two cases (I) there are exactly two elements in A, or, (II) the elements in A are pairwise coprime. The above question will be answered by employing the properties of congruences of inte- gers and by using the fact that the Vandermonde matrix is invertible. 3.1.2 Relevant Literature One of the most commonly used methods for dealing with the decompounding questions is the Panjer Recursion algorithm [Panjer, 1981]. Panjer showed that if there exits constants a and b such that pn = pn−1 ( a + b n ) (3.3) and if one defines the compound density function as g(x) =  ∑∞ m=1 pn f ?(x) x > 0 p0 x = 0, then g(x) = p1 f (x) + ∫ x 0 ( a + b y x ) f (y)g(x − y)dy, x > 0 (3.4) In this chapter, ? denotes convolution. [Sundt and Jewell, 1981] showed the only members of the family function stated in Equation (3.3) are Poisson, Binomial, Negative Binomial, and Geometric distributions. It is easy to show that for the case of compound Poisson the function f (x) := Pr(S = x) can be expressed as f (x) = e −λ ∑x k=1 λk k! P ?k(x) x > 0, e−λ x = 0. (3.5) 34 3.1. Introduction The function f (x) is the compound Poisson probability mass function with Poisson rate λ. Moreover, λi = λp(i), where p(i) = Pr(X = i) ∀i ∈ A, (3.6) and λ = ∑ i∈A λi. (3.7) The decompounding problem can alternatively be stated as follows: How to recover λ and P in Equation (3.5)? The methodologies based on Panjer recursion lead to easy to implement algorithms for small size problems. The implementation was so easy that even in 1982 pocket calculators could be programmed to solve small size problems; (see e.g. [Gerber, 1982]). [Buchmann and Grubel, 2003] used Panjer Recursion and introduced plug-in type es- timators. They proved consistency and asymptotic normality for those estimators. One of the drawbacks of those types of estimators was that the base distribution in general was not a probability mass function because they could contain negative values. [Buchmann and Grubel, 2004] overcame this problem by adopting new methodologies named pro- jected plug-in estimators, truncated plug-in estimator, and truncated maximum likelihood estimator. Another restriction of these types of plug-in algorithms is that the estimation is relying on some zero observations. In other words, if zero observations are missing in the data set due to a high frequency of arrivals or long time intervals between two succes- sive observations, the plug-in estimators introduced in ([Buchmann and Grubel, 2003] and [Buchmann and Grubel, 2004]) will not be useful. Decompounding procedures are not limited to Poisson sums. [Hansen and Pitts, 2006] considered decompounding a geometric random sum and applied this methodology to performance evaluation of communication networks. Subsequently [Hansen and Pitts, 2010], extended decompounding literature to a general but known distribution of N, a random number of identically distributed summands. One of the main limitations of decompounding procedures in general is limiting the application of methodologies to observations that are tracked on evenly spaced periods. In other words, most of the easy to compute methodologies, if not all, are not applicable 35 3.2. Clarifying Examples to non-even spaced observations. It is shown that for specific types of applications, this chapter’s methodology can be generalized to the case were the data are tracked in non- equal intervals. This chapter introduces another class of plug-in estimators that will benefit from ap- plying simple arithmetic. The estimators are benefiting from the inherent characteristics of integer numbers. We believe this research is the first which looks at decompounding Poisson random sums in an arithmetical framework. The Chapter is organized in the following way. In Section 3.2, some clarifying exam- ples are used to elaborate the problem. Section 3.3 describes the proposed methodology for different scenarios. Section 3.4 shows the evaluation of this new estimation technique using different sets of simulations. And Section 3.5 describes the main advantages and few disadvantages of the new approach and compares them to conventional ones. 3.2 Clarifying Examples Suppose there exists only two types of customers where type one customers are single and type two customers are couples. So the set A (discussed in the introduction) equals {1, 2} and the variable S can be defined as S = ∑ i∈A i.Ni. (3.8) Recall that N1 and N2 are independent, and each Ni has a Poisson distribution with parameter λi. Note that the total number of observations (that is, the total number of customers) is odd if and only if an odd number of singleton customers have visited the specific domain. This corresponds to the obvious congruence relation N1 + 2N2 ≡ N1 (mod 2). (3.9) Given a sample Y1, ...,Yn of values of S define q̂n,i, j = 1 n #{k |Yk ≡ i (mod j)}. (3.10) In the example, i = 2 and j = 0 or 1. The function q̂n,2,0 (resp. q̂n,2,1) is the empirical probability mass function for the proportion of even (resp. odd) numbers in the sample. It 36 3.2. Clarifying Examples is relatively straightforward to show the probability of an even number being generated by Poisson process N1 with parameter λ1 is Prob(Observation = even | λ1) = 1 + e −2λ1 2 (3.11) 3.2.1 Evenly Spaced Observations If it is assumed that the observations are evenly spaced, then Equations (3.8), (3.9), (3.10), and (3.11) imply that the following plug-in estimator can estimate λ1, the population pa- rameter of process N1. λ̂1 = 1 2 log 1 2q̂n,2,0 − 1 (3.12) Alternatively, λ̂1 = 1 2 log 1 1 − 2q̂n,2,1 (3.13) 3.2.2 Unevenly Spaced Observations Now assume a sample of size n, Y1,Y2, ...,Yn which are recorded in non-equally spaced time frames T1,T2, ...,Tn are available. One can show that λ̂1 satisfies the equation n∑ i=1 e−2Tiλ̂1 = n(2q̂n,2,0 − 1). (3.14) It is important to notice that ∂ ∂λ̂1 n∑ i=1 e−2Tiλ̂1 = −2 n∑ i=1 Tie−2Tiλ̂1 < 0 (3.15) There is a 1 to 1 relationship between the lefthand side and righthand side of Equation (3.14), since first order derivative of ∑n i=1 e −2Tiλ̂1 with respect to λ̂1 is negative. 37 3.3. Estimation Methodology 3.2.3 Remarks The above example illustrates how looking at properties of integer numbers can poten- tially reduce the size of the problems. Specifically, some obvious arithmetic rules such as Equation (3.9) can disentangle the effects of one variable from others while looking at aggregate number of observations, e.g. aggregate claims, aggregate number of customers visiting a restaurant, or aggregate number of people who buy airplane seats. This example also showed there is a potential to find an easy to compute plug-in estimator with less restrictive assumptions in comparison to Panjer recursion based methodologies. For ex- ample, this approach does not confine the observation set to include zero. It also shows, for some specific examples, one is not limited to evenly spaced time horizon for two suc- cessive observations. It is shown below how this methodology can be extended to more general settings. 3.3 Estimation Methodology Let i, j be nonnegative integers. Assume j ≥ 2 and i ≤ j. Let λ be a positive real number. Define H( j, i, λ) :=  ∑∞ k=1 e−λλ jk−i ( jk−i)! if i < j, e−λ j ∑ j−1 k=0 e λ cos 2pi jk cos ( λ sin 2pi jk ) if i = j, or i = 0. (3.16) Note that for i < j, the above sum can be written as e−λ j ∂i ∂λi  j−1∑ k=0 eλ cos 2pi j k cos ( λ sin 2pi j k ) . 3.3.1 Decompounding When Set A Has Two Elements This entire Section is devoted to developing the methodological framework developed for decompounding poisson processes when two processes with different group sizes are compounded. Two numbers are either co-prime numbers or not co-prime. Recall that M1 and M2 are coprime if the greatest common divisor of M1 and M2 is one; in this case, the 38 3.3. Estimation Methodology co-prime numbers are specified by gcd(M1,M2) = 1.23 If two numbers are co-prime either one of them is 1, or both are different from 1. In following sections it will be shown how one can decompound the compound Poisson Process under each of the three mentioned possible cases. Decompounding when A = {M1,M2} and M1 , 1, M2 , 1 are co-prime In what follows, we show how properties of co-prime numbers ease up the decompounding procedure. Lemma 3.3.1. Assume evenly spaced aggregate claim observations of size n are recorded. Suppose λ̂1,is and λ̂2,is satisfy equations ∀i ∈ 1, ...,M2 0 ≤ j < M2|M1i ≡ j (mod M2); H(M2,M2 − i, λ̂1,i) = q̂n, j,M2 (3.17) ∀i ∈ 1, ...,M1 0 ≤ j < M1| M2i ≡ j (mod M1); H(M1,M1 − i, λ̂2,i) = q̂n, j,M1 Then λ̂1,is, and λ̂2,is are consistent estimators of the true Poisson Parameters λ1, and λ2. Lemma 3.3.1 states if set A is composed of two coprime numbers, the parameters of two poisson processes - λ1 and λ2 - can consistently be estimated in M2 and M1 different ways. The proof of Lemma 3.3.1 can be found in Appendix A. The proof illustrates how the solution is equivalent to a closed form solution of a Maximum Likelihood problem. Lemma 3.3.2. Under the conditions of Lemma 3.3.1 the asymptotic variance of estimated parameters λ̂1,i and λ̂2,i are as follows: ∀i ∈ {1, 2, ...,M2} 0 ≤ j < M2 | M1i ≡ j (mod M2); (3.18) var(λ̂1,i) = (q̂n, j,M2)(1 − q̂n, j,M2) N [H−1 ′ (M2,M2 − i, λ̂1,i)]2 ≤ [H −1′(M2,M2 − i, λ̂1,i)]2 4n ∀i ∈ {1, 2, ...,M1} 0 ≤ j < M1 | M2i ≡ j (mod M1); (3.19) var(λ̂2,i) = (q̂n, j,M1)(1 − q̂n, j,M1) N [H−1 ′ (M1,M1 − i, λ̂2,i)]2 ≤ [H −1′(M1,M1 − i, λ̂2,i)]2 4n 23The chances for two randomly chosen positive integers to be co-prime is 6 pi2 (approximately 61%). 39 3.3. Estimation Methodology The M2 and M1 different estimators for λ1 and λ2 can be combined. As will be il- lustrated in Appendix A it can be asserted that M2 − 1, and M1 − 1 estimators are based on independent sets of observations. In other words by removing one estimator each of the remaining estimators are estimated based on independent set of observations. All the results can be combined by the following approach. Lemma 3.3.3. The combined estimated result based on Lemma 3.3.1, and Lemma 3.3.2 for λ1 and λ2 are λ̂1,c = ∑M2−1 i=1 λ̂1,i var(λ̂1,i)∑M2−1 i=1 1 var(λ̂1,i) (3.20) var(λ̂1,c) = 1∑M2−1 i=1 1 var(λ̂1,i) (3.21) λ̂2,c = ∑M1−1 i=1 λ̂2,i var(λ̂2,i)∑M1−1 i=1 1 var(λ̂2,i) (3.22) var(λ̂2,c) = 1∑M1−1 i=1 1 var(λ̂2,i) (3.23) The proof of Lemma 3.3.3, is straightforward. One of the inevitable questions that may arise would be if H functions introduced in Equation (3.16) are monotonic , or one to one functions. The general answer is that these functions are neither monotonic nor one-to-one. In what follows, first, the necessary conditions under which functions are monotonic are clarified. Later in Section 3.3.1 it is shown how to overcome this problem and find a unique answer under general assumptions and conditions. Lemma 3.3.4. For all M1, and M2, H(.) functions are monotonic if λ belongs to the inter- val (0, 1). In addition it can be shown that at least for one i in 0 ≤ i < M, H(M, i, λ) is monotonic with respect to λ in interval (0,M).24 Lemma 3.3.4, can be verified by looking at the first order derivative of H(.). 24 Note that these conditions are sufficient rather than necessary. 40 3.3. Estimation Methodology ∂ ∂λ H(M, i, λ) = e−λ ∞∑ k=1 λMk−i−1(Mk − i − λ1) (Mk − i)! . For i = 0, λ < M, the partial derivative is always positive, thus, the function is monotonic. The methodology developed here is not limited to these sufficient conditions. They can be generalized to non-equal time intervals if time intervals are small enough or satisfy sufficient conditions; that is, if time intervals are recorded such that for every successively recorded time interval Ti, λTi satisfies the conditions of Lemma 3.3.4, λ̂1, and λ̂2 can be estimated by solving a simple non linear equation. Therefore, irrespective of the size of the problem, and irrespective of the fact that different observations are recorded in different time intervals, compound Poisson Process can be decompounded under the conditions of Lemma 3.3.4. Again it is important to emphasize that the conditions contained in Lemma 3.3.4 are only sufficient not necessary. Lemma 3.3.5. For aggregate claim observations Y1,Y2, ...,Yn are recorded in non-equal time intervals T1,T2, ...,Tn and {∀k ∈ {1, ..., n}| Tkλ1 ≤ M2, Tkλ2 ≤ M1} if gcd(M1,M2) = 1, the parameters of Poisson Processes λ1 and λ2 can be estimated by solving the following equations: ∀i ∈ 1, ...,M2; 0 ≤ j < M2 | M1i ≡ j (mod M2); 1 n n∑ k=1 H(M2,M2 − i, λ̂1,iTk) = q̂n, j,M2 (3.24) ∀i ∈ 1, ...,M1; 0 ≤ j < M1 | M2i ≡ j (mod M1); 1 n n∑ k=1 H(M1,M1 − i, λ̂1,iTk) = q̂n, j,M1 (3.25) Under the conditions of Lemma 3.3.5 1n ∑n k=1 H(M2,M2− i, λ̂1,iTk) is a monotonic func- tion, thus there exists a one-to-one relationship between the empirical probability, q̂n, j,M1 and 1n ∑n k=1 H(M2,M2 − i, ˆλ1,iTk). Lemma 3.3.5 shows how this new methodology can be applied to a wider range of problems than conventional plug-in estimators such as Panjer recursion. By having a close look at Lemma 3.3.5 and comparing it with Lemma 3.3.1 it can easily be verified that Lemma 3.3.1 is a special case of Lemma 3.3.5 where Tk = 1;∀i. 41 3.3. Estimation Methodology Decompounding when A = {1,M2} and M2 , 1 If A = {1,M2} and M2 , 1, λ̂1 can be estimated with a similar method to that introduced in Lemma 3.3.1. However, estimating λ̂2 should be accomplished through a different ap- proach. Lemma 3.3.6. If A = {1,M2} and M2 , 1, the consistent estimator for true parameters λ1 and λ2, and asymptotic variances of estimates can be estimated by solving the following non-linear equations. ∀i ∈ 1, ...,M2; 0 ≤ j < M2 | M1i ≡ j (mod M2); H(M2,M2 − i, λ̂1,i) = q̂n, j,M2 (3.26) var(λ̂1,i) = (q̂n, j,M2)(1 − q̂n, j,M2) n [H−1 ′ (M2,M2 − i, λ̂1,i)]2 ≤ [H −1′(M2,M2 − i, λ̂1,i)]2 4n λ̂1,c = ∑M2−1 i=1 λ̂1,i var(λ̂1,i)∑M2−1 i=1 1 var(λ̂1,i) (3.27) and λ̂2,i solves (3.28) H(2M2, 2M2 − i, λ̂1,c).H(2, 0, λ̂2) + H(2M2,M2 − i, λ̂1,c).H(2, 1, λ̂2) = q̂n,i,2M2 (3.29) Define H1 as H(2M2, 2M2−i, λ̂1,c).H(2, 0, λ̂2). Also define H2 as H(2M2,M2−i, λ̂1,c).H(2, 1, λ̂2). It is straightforward to show H(2, 0, λ̂2) = 1+e −2λ̂2 2 , and H(2, 0, λ̂2) = 1−e−2λ̂2 2 . Therefore Equation (3.28) can be rewritten as H1+H22 + (H1−H2)e−2λ̂2 2 = q̂n,i,2M2 . It is apparent that sign( ∂ ∂λ̂2 H1+H2 2 + (H1−H2)e−2λ̂2 2 ) = sign(H2 − H1). Since H1 and H2 are independent from λ̂2, H1+H22 + (H1−H2)e−2λ̂2 2 is always monotonic with respect to λ̂2. Lemma 3.3.7. Assume aggregate claim observations Y1,Y2, ...,Yn are recorded in non- equal time intervals T1,T2, ...,Tn, and A = {1,M2}. λ̂1 and λ̂2, the estimators for true 42 3.3. Estimation Methodology Poisson parameters λ1 and λ2 can be estimated as follows: ∀i ∈ 1, ...,M2; 0 ≤ j < M2 | M1i ≡ j (mod M2); (3.30) 1 n n∑ k=1 H(M2,M2 − i, λ̂1,iTk) = q̂n, j,M2 ∀i ∈ {1, 2, ...,M2} 1 n n∑ k=1 H(2M2, 2M2 − i, λ̂1,cTk).H(2, 0, λ̂2Tk)+ H(2M2,M2 − i, λ̂1,cTk).H(2, 1, λ̂2Tk) = q̂n, j,2M2 (3.31) A close look at Lemma 3.3.7 shows that Lemma 3.3.6 is a special case of Lemma 3.3.7 in which observations are recorded in evenly spaced time frames of Tk = 1. Monotonicity helps to find the consistent estimators of parameters of interest with marginal computa- tional effort. The same arguments made about first derivatives of Equation (3.28) in first paragraph after Lemma 3.3.6, state that (3.31) is monotonic on λ̂2. Decompounding when A = {M1,M2} and gcd(M1,M2) , 1 When A = {M1,M2}, and gcd(M1,M2) , 1 25 it is always possible to redefine the problem such that it transforms to conditions of either Section 3.3.1 or that of Section 3.3.1. By dividing M1, M2, and aggregate observations by Greatest Common Divisor, gcd, of M1 and M2 the problem will certainly be reduced to conditions of either Section 3.3.1 or 3.3.1. Decompounding Poisson processes when A has two elements - uniqueness of the estimators Lemma 3.3.1 showed λ1 and λ2 can be estimated by M2 − 1 and M1 − 1 different plug-in estimators.26 Lemma 3.3.6 also pointed out that λ2 can be estimated uniquely and there 25gcd(M1,M2) , 1 denotes M1 and M2 are not co-prime numbers. 26The readers are encouraged to review Appendix B if they are not familiar with Vandermonde matrix and its properties. 43 3.3. Estimation Methodology exist M2 − 1 different plug-in estimators for estimating λ1. It is also possible that more than one λ̂i satisfy each equation in Lemma 3.3.1 and Lemma 3.3.6. Theorem 3.3.1. There is only one unique asymptotically consistent estimator of true Pois- son Parameters that can simultaneously solve all equations considered in Lemma 3.3.1 and Lemma 3.3.6. In order to prove there is only one unique consistent estimator that can satisfy all equa- tions we need to use the properties of roots of unity, complex numbers, and Vandermonde Matrices. Assume ξk is the kth root of unity. The equation xn = 1 has n roots ξ1, ξ2, ξ3, ..., ξn, where ξk = e 2piik n . H functions can be rewritten in forms of roots of unit. H( j, i, λ) = ∞∑ k=1 e−λλ jk−i ( jk − i)! = e−λ j ∂i ∂λi  j−1∑ k=0 eξkλ  (3.32) Only for the sake of consistency in notations, and without loss of generality H( j, i, λ̂) are defined as Hi. Empirical probability functions that are used in the right hand-side of the equations introduced in Lemma 3.3.1 and Lemma 3.3.6, are denoted as qi. Therefore, H0 = HM2 = e−λ M2 ( eλ + eξλ + eξ 2λ + ... + eξ M2−1λ ) Similarly, H1 = e −λ M2 ( eλ + ξeξλ + ξ2eξ 2λ + ... + ξM2−1eξ M2−1λ ) and HM2 = e−λ M2 ( eλ + ξM2−1eξλ + ξ2(M2−1)eξ 2λ + ... + ξ(M2−1)(M2−1)eξ M2−1λ ) If the results of Lemma 3.3.1, or Lemma 3.3.6 are applied, M2 equations are formed as follows: H0 = e −λ M2 ( eλ + eξλ + eξ 2λ + ... + eξ M2−1λ ) = q0 H1 = e −λ M2 ( eλ + ξeξλ + ξ2eξ 2λ + ... + ξM2−1eξ M2−1λ ) = q1 . . . HM2−1 = e−λ M2 ( eλ + ξM2−1eξλ + ξ2(M2−1)eξ 2λ + ... + ξ(M2−1)(M2−1)eξ M2−1λ ) = qM2−1 (3.33) Now define Matrix V as: 44 3.3. Estimation Methodology V =  1 1 · · · 1 1 1 ξ · · · ξM2−2 ξM2−1 ... ... . . . ... ... 1 ξM2−2 · · · ξ(M2−2)(M2−2) ξ(M2−2)(M2−1) 1 ξM2−1 · · · ξ(M2−2)(M2−1) ξ(M2−1)(M2−1)  Define Matrix A as: A =  1 e(ξ−1)λ ... e(ξ M2−2−1)λ e(ξ M2−1−1)λ  Define Matrix Q as: Q =  q0 q1 ... qM2−2 qM2−1  The simultaneous equations in (3.33) could be redefined with this Matrix format: VA = M2Q (3.34) Matrix V is a Vandermonde Matrix, so is invertible. Therefore, A = M2V−1Q (3.35) It can be seen that expression M2V−1Q is independent from λ. Therefore D := M2V−1Q 45 3.3. Estimation Methodology is an M2 ∗ 1 matrix where each element of D is either a complex or real number. D =  D0 D1 ... DM2−2 DM2−1  In contrast, the left hand-side matrix in Equation (3.34) is composed of functions of λ. Consider there are two real λs that solve the equations. Particularly these two λs should satisfy equation e(ξ−1)λ = D1; name these two real numbers λ̂α and λ̂β. Therefore, e(ξ−1)λ̂α = e(ξ−1)λ̂β , and therefore, (ξ − 1)(λ̂α − λ̂β) = 2piik. Thus, ξ − 1 = 2pikλ̂α−λ̂β i. Re(ξ − 1) = cos( 2piM2 ) − 1 < 0 and Re( 2pikλ̂α−λ̂β i) = 0 and that is a contradiction. Therefore, the estimator is always unique.27 3.3.2 Decompounding When Set A Has More Than Two Elements Previous sections showed how to decompound a compound Poisson process when A has two elements irrespective of the elements of A. In this Section the focus is on set A with more than two elements. Again arithmetic is used to establish the results. The results are established for the set A when it is composed of pairwise coprime numbers.28 The results could not be generalized for less restrictive assumptions on A. In order to demonstrate how the procedure works, an illustrative example will be used. Assume set A is composed of three elements A = {2, 3, 5}. So the claim sizes are either 2, 3 or 5 dollars. The aggregate claim at each period Yi can be written in the form of Yi = 2a + 3b + 5c where a, b, and c are the total number of claims of size 2, 3, and 5. Assume also type 2 claims form a poisson distribution with parameter λ1, type 3 claims form a Poisson distribution with parameter λ2 and type 5 form a Poisson distribution with parameter λ3. 27Re() is the real part of a complex number. 28A set of integers is defined to be pairwise coprime if every pair of integers in the set are coprime. 46 3.3. Estimation Methodology The following three relationships can be stated Yi ≡ 2a + 3b + 5c ≡ 3b + 5c ≡ b + c (mod 2) Yi ≡ 2a + 3b + 5c ≡ 2a + 5c ≡ 2(a + c) (mod 3) Yi ≡ 2a + 3b + 5c ≡ 2a + 3b ≡ 2(a − b) (mod 5) (3.36) Following the procedures introduced in Section 3.3.1 three different matrix equations can be written as follows. 1 2  1 11 ξ   1e(ξ−1)(λ̂2+λ̂3)  =  q0q1  (3.37) 1 3  1 1 1 1 ξ ξ2 1 ξ2 ξ4   1 e(ξ−1)λ̂1+(ξ 2−1)λ̂3) e(ξ 2−1)λ̂1+(ξ−1)λ̂3)  =  q0 q1 q2  (3.38) 1 5  1 1 · · · 1 1 ξ ... ξ4 1 ... . . . ... 1 ξ4 · · · ξ16   1 e(ξ−1)(λ̂1+λ̂2) ... e(ξ 4−1)(λ̂1+λ̂2)  =  q0 q1 ... q4  (3.39) By introducing empirical probabilities in right hand side matrices, λ̂2 + λ̂3 can be con- sistently estimated from Equation (3.37), λ̂1 + λ̂3 from Equation (3.38), and λ̂1 + λ̂2 from Equation (3.40). It is also possible to estimate the estimation variance σ̂22 + σ̂ 2 3, σ̂ 2 1 + σ̂ 2 3, and σ̂21 + σ̂ 2 2 from Equations (3.37), (3.38), and (3.39) and also recover λ̂1, λ̂2, λ̂3, σ̂ 2 1, σ̂ 2 2, and σ̂23 from the relationships among them. Let’s assume ĉ1 = λ̂2 + λ̂3, ĉ2 = λ̂1 + λ̂3, and ĉ3 = λ̂1 + λ̂2. These equations can be rewritten in a matrix form of: 0 1 1 1 0 1 1 1 0   λ̂1 λ̂2 λ̂3  =  ĉ1 ĉ2 ĉ3  (3.40) Using the matrix materials introduced in Appendix C, it is straightforward to show: 47 3.3. Estimation Methodology λ̂1 = 1 2 (ĉ2 + ĉ3 − ĉ1) (3.41) λ̂2 = 1 2 (ĉ1 + ĉ3 − ĉ2) (3.42) λ̂3 = 1 2 (ĉ1 + ĉ2 − ĉ3) (3.43) The same procedure can be used to recover σ̂21, σ̂ 2 2 and σ̂ 2 3. Now assume set A = {M1,M2, ...,Mn},∀i, j; i , j ∈ {1, 2, ..., n}(Mi,M j) = 1, and ∀i ∈ {1, 2, ..., n} ; Mi , 1. The same methodology can be applied to find estimators for true Poisson parameters λ1, λ2, ..., λn. Y j ≡ n∑ i=1 KiMi ≡ n∑ i=1,i,l KiMi (mod Ml) (3.44) ∀i ∈ {1, 2, ..., n} it is straightforward to show there exists the following association. 1 Mi  1 1 · · · 1 1 ξ ... ξ4 1 ... . . . ... 1 ξMi−1 · · · ξ(Mi−1)(Mi−1)   1 e ∑ k,k,i(1−ξck,1 )(λk) ... e ∑ k,k,i(1−ξck,Mi−1 )(λk)  =  q0 q1 ... qMi−1  (3.45) For every k, ck, js, j ∈ 1, 2, ...,Mi − 1 are mutually exclusive numbers. Using the same logic introduced earlier in this Section, for every Mi the summation of parameters λ j, j , i that is, ĉi = n∑ j=1, j,i λ̂ j (3.46) and the summation of parameters σ j, j , i that is, d̂i = n∑ j=1, j,i σ̂2j (3.47) can be estimated. Since all n processes are independent, the results of Appendix C can be used to find consistent estimates of true parameters λ1 to λn. 48 3.4. Simulation Lemma 3.3.8. If set A is composed of n pairwise coprime numbers such that all elements of A are different from 1, the Poisson process can be decompounded by using Equations (3.45), (3.46), and λ̂i = 1 n − 1 (2 − n)ĉi + ∑ j=1, j,i ĉ j  (3.48) Asymptotic variances can also be estimated by by using Equations (3.45), (3.47), and σ̂2i = 1 n − 1 (2 − n)d̂i + ∑ j=1, j,i d̂ j  (3.49) The proof of Lemma 3.3.8 can easily be established by applying the results of Ap- pendix C. 3.4 Simulation 3.4.1 Simulation for A = {M1,M2} and gcd(M1,M2) = 1 To validate the methodologies, several sets of simulations are run. The first set of sim- ulations is run for the case where A is composed of two co-prime numbers which are both different from 1. A is assumed to be {3, 11} and 6 different combinations of Poisson parameters (λ1, λ2) = {(0.6, 0.8), (1.2, 1.6), (0.8, 0.6), (0.9, 0.95), (0.8, 0.7), (1.2, 0.8)} are considered. For each combination of the Poisson Parameters, a series of simulated data were generated. The data are assumed to be provided in equally recorded time in- tervals. The procedures stated in Lemma 3.3.1, 3.3.2, and 3.3.3 are used to estimate the parameters of interest. This procedure was repeated many times to capture the variance of the estimated parameters. The results are depicted in Figure 3.1, and Figure 3.2. The results were inline with what the theories developed earlier would have predicted. 49 3.4. Simulation 3.4.2 Simulation for A = {1,M2} In the second run of simulation, the famous example of the time-honored Prussian horse kick data see e.g. [Quine and Seneta, 1987] and [Buchmann and Grubel, 2003] is regarded. It is assumed that A = {1, 2}. From 200 observations 109, 65, 22, 3, and 1 are equal to k=0,1,2,3 ,and 4. Based on the assumptions outlined earlier, Poisson parameters are estimated and presented in Table 3.1. Parameter of Interest Estimated Parameter Variance Of the Estimated Parameters λ1 0.5697 0.0121 λ2 0.0398 0.0071 λ = λ1 + λ2 0.6095 0.0192 Table 3.1: The horse kick data - estimated parameters In order to make results comparable with that of [Buchmann and Grubel, 2003], and [Buchmann and Grubel, 2004] it is necessary to use the following transformation. Both papers used parameters in line with Equation (3.5). However, we have used Equation (3.7). Therefore, in order to compare the results of two papers, the transformation introduced in Equation (3.6) has to be used. λ̂ in both papers are very close to each other. Here λ̂ is estimated as 0.6095, and λ̂ in [Buchmann and Grubel, 2004] is estimated as 0.6069. Our results predict P̂1 = 0.56970.6095 = 0.9347 and P̂2 = 0.03980.6095 = 0.0652, and P PPI 1 = 0.9422, P PPI 2 = 0.0380, P PPI 4 = 0.0198 in [Buchmann and Grubel, 2004].29 Based on the estimated parameters, the expected horse kicks can be computed. The expected values are compared with observed data and presented in Table 3.2. 0 1 2 3 4 > 4 Observations 109 65 22 3 1 0 Expected Value 108.72 61.94 21.95 5.81 1.26 0.28 Table 3.2: The horse kick data - expected value versus observations 29PPI is an abbreviation for Projected Plug-In estimators. 50 3.5. Summary and Results 3.4.3 Simulation for Non-equal Intervals between Successive Observations In this Section the data are assumed to be recorded in non-equally spaced intervals. In order to be consistent with the results of previous sections, the data are assumed to be generated by the convolution of two Poisson processes. It is assumed A = {1, 2}. Random observations were generated based on pre-known Poisson parameters. Ran- dom Time intervals between two successive observations were also generated. Simulated data were used and the procedures introduced in Lemma 3.3.6, and Lemma 3.3.7 were applied to recover the parameters needed. For each set of conditions this process was repeated 1000 times and the distribution of estimated parameters was recorded. Overall 16 runs of simulation were performed. The simulations were run for two different pairs of (λ1, λ2). The true parameters were assumed to be (λ1, λ2) = (1, 1), and (λ1, λ2) = (0.5, 1) and were hidden from the statistician. Time intervals were generated from random draws of uniform distribution of 0 < T < 2 and 0 < T < 3. Another control variable was the total number of observations. 500, 1000, 2000, and 4000 were considered as the total number of observations. The summary of the results can be found in Figures 3.3, 3.4, and D.1 to D.14. Consistency and asymptotic normality are the two most apparent features of these simulation runs. 3.5 Summary and Results A special case of the classical problem of recovering parameters from a convoluted dis- tribution known as decompounding is investigated in this chapter. The chapter sought to introduce easy to compute methodologies for decompounding Poisson parameters from a compound poisson distribution. This chapter introduces a new class of plug-in estimators that benefits from applying simple arithmetic. The inherent characteristic of integer numbers and the congruency relationship among integer numbers formed the foundation of these methodologies. As a point of reference the advantages and disadvantages of this methodology are compared with the most commonly used algorithm for decompounding, Panjer recursion algorithm. 51 3.5. Summary and Results One of the restrictions of Panjer recursion algorithm for decompounding is that the validity of such approaches is contingent among other features on zero observations. In other words in the absence of zero observations the Panjer Recursion Algorithm is not useful. The first advantage of this algorithm in comparison with Panjer’s approach is that these algorithms are not dependent on zero observations. The second restriction of almost all existing algorithms, including Panjer recurssion, is that such algorithms cannot be applied if the observations are recorded in non-equal inter- vals. Under some conditions, the algorithms introduced in this chapter can be applied for estimating parameters even if the data are recorded in non-equal intervals. Such a powerful property can be traced back to the arithmetic properties of integer numbers. Though the application of this new approach is restricted to limited circumstances such as piecewise co-prime support, it may show a gateway for future researchers. The third distinction between the two algorithms is that Panjer recursion solely con- siders Formulation (3.5) whereas our approach uses (3.6); therefore, to identify the whole system one less parameter is estimated in comparison with Panjer’s algorithm. The fourth difference between these two algorithms is that Panjer recursion algorithm only takes into account empirical probability of one number to estimate each parameter whereas the proposed algorithm will look at the sequence of occurrence of each bulk size to estimate the parameters of interest. As an example assume that the group of customers who visit a specific restaurant are either singletons or couples. The Panjer recursion al- gorithm solely focuses on zero observations to estimate λ, only considers records of ag- gregate 1 customers to estimate P1 and just takes into account aggregate observation of 2 customers to estimate P2. However, the proposed algorithms look at the entire observa- tions and considers the whole spectrum of observations to estimate λ1 and λ2 - the Poisson parameters of two poisson parameters. The last distinction of the proposed algorithm and Panjer’s algorithm is that it can be used to estimate the parameters of Poisson parameter in some very general settings. Let’s borrow the example of the previous paragraph but add the following assumption to the setting of the problem. Assume single customer arrivals form a Poisson distribution with parameter λ1 and couple customers arrive according to a half normal distribution with parameters µ1, σ1. With absolutely same technique of estimation, λ1 can consistently 52 3.5. Summary and Results be estimated! Indeed even if couple customers were in accordance with any known or unknown distributions, the process of estimating λ1 would be identical. Properties of integer numbers were indeed the main reason behind such incredible flexibility. Despite the advantages of these models relative to Panjer recursion, the Panjer recur- sion can sometimes be used under more general settings. For instance, so far this algorithm cannot solve the cases where A is composed of 3 or more non pairwise co-prime numbers. To credibly validate our theories, a series of simulations were used in this study. The simulated results showed the efficiency of the new methodology. In our point of view, in the realm of Decompounding Compound Poisson processes, the methodologies developed can be regarded as a new approach for plug-in estimators. We hope future work will solve some of the restrictions that this algorithm already has and widen the application of the approach. An open question is left in Section 5.2.2 for avid readers. Showing that Equation (5.1) always has a unique solution can generalize the results for the cases where data are recorded in non-equal time intervals. 53 3.5. Summary and Results (a) λ1 = 0.6, λ2 = 0.8,M1 = 11,M2 = 3 (b) λ1 = 1.2, λ2 = 1.6,M1 = 11,M2 = 3 (c) λ1 = 0.8, λ2 = 0.6,M1 = 4,M2 = 3 (d) λ1 = 0.9, λ2 = 0.95,M1 = 4,M2 = 3 (e) λ1 = 0.8, λ2 = 0.7,M1 = 7,M2 = 5 (f) λ1 = 1.2, λ2 = 0.8,M1 = 7,M2 = 5 Figure 3.1: In each panel of figure 3.1, λ̂1 is estimated under the process with parameters λ1, and λ2, and arrival sizes of M1 and M2. The regression line shows the relationship between the variance of the estimators and N number of observations. The extremely high R2 suggests the variance of the introduced estimators decrease by order of N and is inline with expectations 54 3.5. Summary and Results (a) λ1 = 0.6, λ2 = 0.8,M1 = 11,M2 = 3 (b) λ1 = 1.2, λ2 = 1.6,M1 = 11,M2 = 3 (c) λ1 = 0.8, λ2 = 0.6,M1 = 4,M2 = 3 (d) λ1 = 0.9, λ2 = 0.95,M1 = 4,M2 = 3 (e) λ1 = 0.8, λ2 = 0.7,M1 = 7,M2 = 5 (f) λ1 = 1.2, λ2 = 0.8,M1 = 7,M2 = 5 Figure 3.2: In each panel of figure 3.2, λ̂2 is estimated under the process with parameters λ1, and λ2, and arrival sizes of M1 and M2. The regression line shows the relationship between the variance of the estimators and N number of observations. The extremely high R2 suggests the variance of the introduced estimators decrease by order of N and is inline with expectations 55 3.5. Summary and Results (a) mean(λ̂1) = 1.0282, var(λ̂1) = 0.0422 (b) mean(λ̂2) = 1.0653, var(λ̂2) = 0.3383 Figure 3.3: The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 2, (λ1, λ2) = (1, 1). 56 3.5. Summary and Results (a) mean(λ̂1) = 0.5033, var(λ̂1) = 0.0048 (b) mean(λ̂2) = 1.0414, var(λ̂2) = 0.0899 Figure 3.4: The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 2, (λ1, λ2) = (0.5, 1). 57 Chapter 4 Hierarchical Decision Making and Franchising: A Mechanism to Compete both Aggressively and Softly 4.1 Introduction Competition between firms takes place across a number of dimensions including pricing, quantities, service levels, access and R&D, for example. The choice of strategy is influ- enced by several factors including the market structure, the nature of the product, and the source of competitive advantage. The outcomes in monopoly and full competition are clear but oligopoly presents challenges, as the degree of firm interdependence and governance structure will affect equilibrium outcomes. In some service industries such as electronic product retailing, firms like Future Shop and Best Buy compete in local markets with only company-owned units. In other retail sectors such as fast food restaurants we witness a mixture of both company-owned and franchised units under a single brand name.30 In cases of the mixed governance structure, we observe the following features. First, in sec- tors where franchising has become a common practice, the ratio between company-owned units and franchised units is stable for most of the mature brands ([Lafontaine and Shaw, 2005]; [Cliquet, 2000]; [Windsperger, 2004]). Second, the presence of company-owned units and franchised units seems to be correlated to the type of the industry. In other words, although franchising has proved to be a successful business model, it only exists in some industries but not the others. Moreover, the extent to which franchising is adopted 30We as customers are usually not able to tell from outside which McDonald’s restaurants are company- owned and which ones are franchised, because of the characteristics of the “business model franchising”. 58 4.1. Introduction is substantially different across various sectors. For example, Subway (a sandwich chain restaurant) has franchised all its retail outlets, while Starbucks, a cafe chain, on the other hand, never even tries franchising. An interesting observation is that even in the coffee shop industry despite the large number of franchises that exist in the market the existence of positive economic profit (see [“CNN Report on Annual Ranking of America’s Largest Corporations”, 2011]) suggests the competing multi-national brands are not competing fiercely with each other. This may also be a consequence that only seemingly efficient firms open franchises [Barros and Perrigot, 2007]. We also observe that in many industries where franchising is a common practice there exists a company owned outlet, a flagship retail, or a major division along with other smaller franchises or sub divisions in the market. For example, it is common for fast fashion brands to establish a big outlet in the commercial centre and a few counters inside shopping malls to serve the same local market. In this chapter, we propose a framework that unites all of the above-mentioned as- pects of these oligopolistic markets. Basically we argue that the seemingly diverse market equilibrium and company structure under oligopoly competition can be due to a common mechanism adopted by firms to prevent aggressive competition and utilize any exogenous advantages. Franchising has become a common practice in the modern business world. In the U.S. for example, according to the World Franchise Council, there are more than 1,500 franchise systems, representing a total of 760,000 franchisees. These franchisees generate one seventh of the U.S. job posts (18 million), 11% of the private sector payroll ($506 billion) and about 10% of the private sector economic output ($1.53 trillion) in the U.S. It is thus arguably the fastest growing governance form of retailing in the world [Dant et al., 2008]. Numerous theoretical and empirical papers have discussed the reasons for franchising, the pattern of franchise system and the structure of franchise contracts ([Lafontaine, 1992]; [Brickley and Dark, 1987]; [Manolis et al., 1995]; [Dant and Kaufmann, 2003]; etc.). However, there are still some mysteries, including why there is a stable plural form (a mixture of franchisees and company-owned downstream retailers) in many franchise industries/companies. Traditional approaches for franchising analysis fail to foresee the 59 4.1. Introduction existence of the plural form. Explicitly or implicitly, these approaches lead to different predictions. On the one hand, the resource constraint theory and ownership redirection hypothesis proposed by [Oxenfeldt and Kelly, 1968] predict that a franchise chain will converge to a totally company-owned pattern as it matures. On the other hand, several other frameworks such as Agency Theory ([Lafontaine, 1992]; [Brickley and Dark, 1987]), Transaction Cost Analysis [Manolis et al., 1995], Signaling Theory [Dant and Kaufmann, 2003], and Property Rights Theory [Mathewson and Winter, 1985] show the superiority of franchising and imply that a brand should become totally franchised as it grows. The real- ity is, however, neither totally company-owned nor totally franchised systems are common in real life. Instead, most chains have a stable proportion between company-owned stores and franchised stores after several years of operations ([Lafontaine and Shaw, 2005]; [Cli- quet, 2000]; [Windsperger, 2004]). In fact, many of the most successful franchise systems continue to open and maintain both company-owned and franchisee-owned outlets for growth, in an apparent support for the stable plural governance forms thesis of franchising ([Bradach and Eccles, 1989]; [Bradach, 1997]; [Dant et al., 1992]; [Dant and Kaufmann, 2003];[Harrigan, 1984]). Some argue that different economic circumstances require different organizational so- lutions. Whereas company-owned stores are preferred in some cases, franchises are better options in others ([Brickley and Dark, 1987]; [Castrogiovanni et al., 2006]; [Combs and Ketchen, 2003]). There are numerous factors that affect this “threshold”, such as manage- rial control [Lafontaine and Shaw, 2005], risk sharing [Burkle and Posselt, 2008] and in- complete contracting [Hendrikse and Jiang, 2005]. Others try to show that the plural form can provide a synergy effect31 that cannot be obtained with pure company-owned system or pure franchise system, such as innovation and authority [Lewin-Solomons, 1999] and a balance between “exploration” and “exploitation” through organization learning [Soren- son and Srensen, 2001]. To our knowledge, the literature on franchising focuses almost entirely on internal governance issues and ignores the market structure within which the plural form of gov- ernance operates. The proposition advanced here is that franchising can be and is used to 31Synergy effects can be defined as two or more things functioning together to produce a result not inde- pendently obtainable [Synergy Definition, (n.d.)]. 60 4.1. Introduction affect both market structure and the strategic interaction of firms and the plural governance structure is the strategic outcome in some market structures.32 Since the seminal paper of [Salant et al., 1983], the so-called “unprofitable merger” has drawn significant attention from the Industrial Organization literature. Their major conclusion is that in the absence of cost efficiency gains any merger is profitable only if it involves more than 50 percent of the firms in the industry. This result emerges because any internalization of market power gains brought about by a contraction in output of the merging parties is more than offset by an expansion in output of the non-merging parties. As a dual of this phenomenon, the potential profitability of divisionalization (a company splits itself into more than one independent unit) is readily apparent. [Schwartz and Thompson, 1986] are the first to show that divisionalization can be used as a strategic tool by demonstrating the fact that delegating decision making to independent divisions allows the firm to credibly commit to an entry-deterring level of output. They argue that the ability to divisionalize leads perfectly informed incumbents to preempt all rational entry into their markets. In a subsequent paper, [Veendorp, 1991] shows that by decentralizing operating decisions but centralizing investment decisions, divisionalized firms can crowd the market and prevent entry, while limiting the negative impact of the independent actions on overall profitability. [Corchon, 1991] proves that parent firms competing in a Cournot environment have an incentive to form independent competing units. However, [Polasky, 1992] shows that with linear demand and costless divisionalization, although firms possess a unilateral in- centive to divisionalize, there is no equilibrium as firms wish continually to expand their number of divisions. To deal with this seemingly unreasonable prediction, a few different attempts follow. [Corchon and Gonzalez-Maestre, 2000] rectify the nonexistence problem by placing an exogenous upper bound on the permissible number of divisions per firm. [Baye et al., 1996a], on the other hand, demonstrate that when it is costly to form compet- ing units (divisions or franchises), there exists a Nash equilibrium in which firms choose multiple divisions in an attempt to commit to a greater level of output, thus mimicking a Stackelberg-type outcome. They further illustrate that in a simple duopoly environment 32One issue that is highly related with the adoption of franchising but largely ignored by the traditional franchising literature is the strategic divisionalization problem. 61 4.1. Introduction the Nash equilibrium number of competing units is socially optimal. They treat the num- ber of divisions as a continuous variable, but essentially the same results hold even with an integer constraint on divisions (see e.g. [Baye et al., 1996b]). [Yuan, 1999] shows that product differentiation will also ensure the existence of an interior sub-game perfect Nash equilibrium, and the equilibrium number of divisions in- creases with the degree of substitution among products and the number of firms. Product differentiation is seen to discourage firms from going into perfect competition because it deters rival firms from developing similar products and therefore potentially create in- crease in market power. A few relatively recent studies go back to the beginning of the idea and successfully bridge divisionalization with merger, and thus break the non-profitable merger curse at some level or another. [Ziss, 2001] shows that a strategic adoption of divisionalization can significantly increase the profitability of mergers in Cournot oligopoly and thus reduce the minimum market share that merging parties require in order to merge profitably without efficiency gains. [Creane and Davidson, 2004] propose a similar result in which firms merge in order to take advantage of strategies that are available only to multidivisional firms. Merger with divisionalization is shown to be a superior alternative than pure divi- sionalization, since creating a division increases the number of competing divisions, while a merger leaves that number unchanged. With both merger and divisionalization as strate- gic options, [Qiu and Zhou, 2010] show that the interaction between the two restructuring activities weakens the incentive to divest and strengthens the incentive to merge. This chapter’s key findings are that when firms make investment decisions in new sub- divisions or franchises in a hierarchical way, the outcome will depend on market size, relative cost advantages and the number of inefficient firms in the market. We find that the decision of how aggressive relatively more efficient firms will compete in markets depends on whether firms are homogeneous or heterogeneous in their cost structure. If n homoge- neous firms compete, the firm’s best strategy is to choose no franchise or divisionalization and the outcome is Cournot. However, we also find when there is cost heterogeneity, de- pending on the relative differences in costs, more efficient firms have an incentive to ’flood the market’ with franchise capacity and to not only keep inefficient entrants out of the market but also create insufficient market for potential entrants. We also show the fran- 62 4.2. Models chising strategy is not open to relatively cost inefficient firms and is more likely to occur in smaller markets. Once efficient firms make sure they have kept inefficient firms out of the business, they do not have any incentive to compete more than necessary and will share market profits. The major contribution of this chapter to the economics literature is two-fold, given that we build a bridge between the franchise literature and the strategic divisionalization literature. The first contribution is to reveal the strategic role of franchising and suggest that the franchise plural form may exist because of the need to curb companies from com- peting too fiercely with each other by divisionalization. On the other hand, it is also the first to show a commonly seen market mechanism (franchise plural form) can serve as a credible way to get out of the divisionalization dilemma. The minor contribution of the paper is that it shows that among n heterogenous brands, only more efficient brands can potentially open franchises. The remainder of this chapter is organized as follows. Section 4.2.1 is devoted to the structure of the game. The game is solved under 3 different scenarios and the results of the competition among homogenous companies are presented in Section 4.2.2. The results of the competition among heterogeneous companies are shown in Section 4.2.3. The general setting of competition among heterogeneous brands is discussed in Section 4.2.3, the result of the competition between 1 efficient and n−1 inefficient companies is discussed in 4.2.3, and the results of the more general setting of m efficient brands competing with n inefficient brands is presented in 4.2.3. Section 4.3 summarizes the results of all sections. 4.2 Models 4.2.1 The Structure of Strategic Interaction The mechanism in which companies make decisions based on a hierarchical order is com- mon in the business world. The structure of the brand (i.e., number of sub-divisions or franchisees to have) is a long-term decision made by head office. It reflects the goals and visions of the company and has to be determined before any other economic activities can be pursued. The existence of a major division or unit on top of all other sub-divisions 63 4.2. Models is also a common phenomenon. For example, it is common that in the aviation industry two airlines sign a franchise contract where the prime carrier (e.g. Air Canada, United, Lufthansa) allows a smaller downstream carrier (e.g. Jazz, United Express, Cityline, re- spectively) to fully use its brand (aircraft livery and interior, crew uniforms, method of customer service delivery and flight designator code). The smaller carrier which is the franchisee in this case would offer service only after the prime carrier does. In some mar- kets both carriers offer service and in others only one does, but the prime carrier always has a priority in decision making [Denton and Dennis, 2000]. Other examples include main factories and subcontract factories in manufacturing, and flagship outlets and mall stores/counters in retailing.33 A company normally puts more investment in its major divi- sion and also endows it with extra tasks (for instance, flagship stores are often show cases for the company and thus responsible for the brand image). Therefore, it is natural for a company to give higher priority to its major division over other sub-divisions or fran- chisees. Once the decision of the major division is made, the sub-divisions or franchisees can make their decisions based on the subsequent market and resource conditions. The following discussion aims to formally state the hierarchical order of decision making in an easy to follow game structure. There are three decisions to be made; the number of franchises or subdivisions, the output level of the prime franchise or major division and the output of the remaining sub- divisions or franchises. The model is structured so these decisions are made in the fol- lowing way. In stage 1, each head office of n competitors decides how many franchises or subdivisions they would like to have.34In second stage the major division’s 35 management team of each brand decides on their production level. In the 3rd stage all subdivisions of all companies decide on their production level. Once all major divisions and franchises decide on their level of production major divisions and franchises of all brands will start competing in quantity. 33For cosmetics or fashion brands, it is very common to find a big conspicuous outlet in the central commercial area as well as a few smaller shops/counters in shopping malls like Sears or the Bay. Shiseido, Levi’s and Esprit are all perfect examples. 34As mentioned earlier this decision reflects the long-run goal of the company. 35Usually this major division is the first division opened by the brand, or the first manufacturer of a specific brand. 64 4.2. Models The head-quarters’ decisions are made to maximize the overall profit of the brand i.e. the summation of the profit of the major division and franchises.36 Also, the major division’s and franchises’ manager decision are made to maximize the profit of their own retail store, or division.37 4.2.2 Homogenous Brands We establish the Nash-equilibrium results of the competition among n homogenous brands using backward induction. Therefore, the order of decision-making is first the decisions of franchisees, second, the major divisions, and last the headquarters’ decisions. Demand is considered to be linear and marginal costs are considered constant for each decision maker throughout this study. Assume there are n homogenous brands competing in a market and the market inverse demand function is defined as P = A − QT (4.1) Each brand has a major division38 and the major division of brand i produces Qi units. Brand i has δi franchisees and the jth franchise of brand i produces qi, j units. Therefore, QT - the total quantity of the products available in the market - can be expressed as: QT = n∑ i=1 Qi + n∑ i=1 δi∑ j=1 qi, j or n∑ i=1 Qi + δi∑ j=1 qi, j  (4.2) Define the marginal cost of production as C. By introducing θ = A − C, the profit function of each of the franchises can be written as pii, j = (θ − QT )qi, j (4.3) 36We implicitly assume the brands know how to extract all the rents from both major brands and franchises by some contractual structure. 37It is also assumed that the contractual structure between divisions or franchises and company is such that every management team has full incentive to maximize the profit. 38They can have multiple major divisions and that does not affect the results as far as all major divisions are managed under one decision making team. 65 4.2. Models In the 3rd stage of the game, all franchises will maximize their own profit by choosing a profit maximizing q∗. The first order condition of the jth franchise of the ith brand is ∂pii, j ∂qi, j = θ − QT − qi, j (4.4) In total there are exactly ∑n i=1 δi equations of form (4.4). By solving this system of equa- tions the symmetric Nash-equilibrium for the output of each franchise will be: q∗ = θ −∑ni=1 Qi 1 + ∑n i=1 δi (4.5) We can update the total quantities produced by all divisions and franchises in all n brands by plugging in q∗ in (4.2) QT = ∑n i=1 Qi + θ ∑n i=1 δi 1 + ∑n i=1 δi (4.6) Next turn to stage 2 and concentrate on the decision of the major divisions. Considering (4.6) the profit function of the major division of brand i can be written as Πi = (θ − ∑n j=1 Q j + θ ∑n j=1 δ j 1 + ∑n j=1 δ j )Qi (4.7) It is worth noting that in this stage the management teams of the company owned divisions solely maximize the profit of the major divisions. The first order condition of the profit function of each major division with respect to its output can be expressed as ∂Πi ∂Qi = θ −∑nj=1 Q j − Qi 1 + ∑n j=1 δ j (4.8) In total there are exactly n systems of equations of form (4.8) and the symmetric Nash- Equilibrium is given by Q∗ = θ n + 1 (4.9) Q∗ is exactly the result we would get if we had no franchises and if n brands were com- peting with each other in Cournot fashion. We can update the quantity produced by each 66 4.2. Models franchisee and the total quantity produced by all divisions and franchisees as a function of δis. q∗ = θ (1 + n)(1 + ∑n j=1 δ j) (4.10) Q∗T = θ(n + (1 + n)( ∑n j=1 δ j)) (1 + n)(1 + ∑n j=1 δ j) (4.11) Now, consider the first stage. In the first stage the head office decides on the number of franchises they would like to consider in long-run. It is natural to assume they will decide on a number that will maximize the profit of the whole brand, i.e. the profit that the major division makes and the total profit that can be earned by all franchisees. Therefore, the brand seeks to choose the optimal δi to maximize Π̃i defined below Π̃i = (δiq∗ + Q∗)(θ − Q∗T ) = θ2(1 + δi + ∑n j=1 δ j) (1 + n)2(1 + ∑n j=1 δ j)2 (4.12) Lemma 4.2.1. If n homogenous brands compete under the hierarchical structure intro- duced in Section 4.2.1, the Nash-equilibrium decision is 0 franchises and the result of the competition will be exactly equivalent to Cournot competition where each major division produces θn+1 units. Proof. Consider the partial derivative of the profit function introduced in (4.12) with respect to δi. ∂Π̃i ∂δi = −2θ2δi (1 + n)2(1 + ∑n j=1 δ j)3 ≤ 0 (4.13) Since the first order derivative is always negative, the best strategy each brand can adopt is to chose δ∗i = 0, thereforeQ∗i = θn+1 . This result is equivalent to the result of n-homogenous Cournot competition.39 Q.E.D. 39If it was that the management team of the major division would also consider the profit of franchises, the Lemma 4.2.1 results would still hold. Under this assumption: Q∗i = (1+ ∑n j=1 δ j−δi)θ (n+1)+(n−1)δ j and ∂Π̃i ∂δi = −2(n−1)(1+∑nj=1 δ j−δi)θ2 (1+n+(n−1)(∑nj=1 δ j))3 ≤ 0 67 4.2. Models Lemma 4.2.1 states that under the hierarchical decision making process homogenous brands do not have an incentive to open any franchises or to make the competition more fierce than a Cournot competition. Rivals will not be intimidated by a threat to open up franchises because they know you will hurt yourself as well by applying such a strategy. Brands also know other brands have no incentive to open any franchises.40 In contrast to results of some papers such as [Baye et al., 1996a] that predict the equivalent results of full competition for some extreme cases, the results of Lemma 4.2.1 indicate there is no need for fierce competition when the competitors believe the other brands are applying a hierarchical structure defined in Section 4.2.1. To summarize, such hierarchical structure is used as a commitment among homoge- nous brands not to compete more extensively than Cournot. They will act soft when they encounter a rival who is identical to them. Lemma 4.2.1 probably explains why in some markets such as the telecommunication industry we do not see any franchises. The closer the cost structure of companies the more likely they will end up in no-franchise structure.41 Next we show this hierarchical structure can change from commitment into a credible threat and potential tool for preemption in cases where competing brands have different cost efficiencies. 4.2.3 Heterogeneous Brands General setting In this section we consider the dynamics of competition among heterogeneous brands - the brands with different marginal costs. Assume the efficiency of each brand is different from one another. Specifically, assume that to produce an extra unit of product each brand incurs marginal cost Ci. By considering the linear inverse demand function introduced in 40Throughout the chapter we intentionally disregarded the fixed costs associated with opening new fran- chises since if the models show sustainable equilibriums in the case of zero cost of opening new franchises the results will be also true for the cases where they incur positive fixed costs on opening new franchises. 41Some of the airlines adopt airline within airline strategy to compete with low cost carriers. Qantas airline can be used as an example. An interesting observation is that airlines will only adopt such an strategy for circumstances in which they are facing competition with an airline whose cost structure is totally different from them. In this case airline within airline strategy which can be considered as a form of franchising is adopted solely to compete with low cost carriers. 68 4.2. Models (4.1) the profit function of the jth franchise of the ith brand can be written as pii, j = (P −Ci)qi, j (4.14) Similar to (4.4) the change in each franchise’s profit with respect to its output level can be written as ∂pii, j ∂qi, j = A −Ci − QT − qi, j (4.15) Therefore, the Nash-equilibrium for each franchise of the ith brand is q∗i, j = A −∑nk=1 Qk + ∑nk=1 Ckδk − (1 + ∑nk=1 δk)Ci 1 + ∑n k=1 δk (4.16) and, the total units produced in all franchises of all brands are n∑ i=1 δi∑ j=1 q∗i, j = (A −∑ni=1 Qi)(∑ni=1 δi) − (∑ni=1 Ciδi) (1 + ∑n i=1 δi) (4.17) Now, solve the second stage game and see how the major divisions decide on the number of units they produce. The profit function of each major division of brand i can be written as Πi = (P −Ci)Qi (4.18) The first order condition of the major division’s change in profit with respect to their decision variable - their output level - can be expressed as ∂Πi ∂Qi = A −Ci − QT − Qi (4.19) By substituting (4.17) in (4.19) and solving the first order conditions, the Nash equilibrium for outputs of each major division will become Q∗i = A + ( ∑n j=1 C j − (n + 1)Ci)(1 + ∑n j=1 δ j) + ∑n j=1 C jδ j n + 1 (4.20) Therefore, the output of each franchise can be expressed as: q∗i, j = Q∗i 1 + ∑n j=1 δ j (4.21) 69 4.2. Models The margin of each major division or franchise i.e. P −Ci can be expressed as P −Ci = A − QT −Ci = q∗i, j = Q∗i 1 + ∑n j=1 δ j (4.22) In the first stage of the game, each brand’s head office decides on the total number of franchises. They consider the total profit available from their major division(s) and δi franchises. The total profit they can earn can be written as Π̃i = (P −Ci)(Q∗i + δiq∗i, j) (4.23) By substituting (4.20), (4.21), and (4.22) in (4.23) the brand’s profit function can be re- expressed as Π̃i = (1 + δi + ∑n j=1 δ j)Q ∗ i 2 (1 + ∑n j=1 δ j)2 (4.24) Lemma 4.2.2. Out of heterogeneous brands which compete under the hierarchical struc- ture introduced in Section 4.2.1 only the more cost efficient brands - the ones whose marginal cost is less than the average marginal cost of all brands - may open franchises. Proof. The result will become apparent if we take first order derivative of (4.24) with respect to the number of brand’s franchises. ∂Π̃i ∂δi = −2δiQ∗i 2 (1 + ∑n j=1 δ j)3 + 2Q∗i ( ∑n j=1 C j − nCi)(1 + δi + ∑n j=1 δ j) (n + 1)(1 + ∑n j=1 δ j)2 (4.25) −2δiQ∗i 2 (1+ ∑n j=1 δ j) 3 ≤ 0 and for Ci ≥ ∑n j=1 C j n = C̄ results in ∑n j=1 C j − nCi ≤ 0, therefore,∂Π̃i∂δi ≤ 0 for Ci ≥ C̄. Hence, the brands which have marginal cost more than the average marginal cost of all existing brands have no incentive to establish any franchises; only more efficient brands may have an incentive to open up any franchises in addition to their major division. Q.E.D. The results in Lemma 4.2.2 are intuitive. More efficient brands will not be intimidated by the threat of less efficient brands duplicating themselves. However, the less efficient brands have legitimate reasons to take efficient brands’ duplication as a serious threat.42 42Note that, less efficient brands cannot expand their output more than the original Cournot quantity because they will decrease their own profit by overproduction. 70 4.2. Models Below we show under what circumstances the efficient brands open franchises and under what conditions the inefficient brands which could compete under Cournot competition can no longer enter a market if the efficient brands consider opening up franchises. In following sections we will first consider competition between 1 efficient and n − 1 inefficient brands and then the general case of m efficient and n inefficient brands. 1 efficient and n − 1 inefficient brands Assume there is 1 efficient brand and n − 1 inefficient brands. Without loss of generality we normalize the marginal cost of the efficient brand to zero and C stands for the marginal cost of an inefficient brand. The higher is C, the more efficient the efficient brand is in comparison with inefficient ones. Based on the results of Lemma 4.2.2 we know the inefficient brands will not open franchises since C is more than the average cost of all brands, C̄ = (n−1)Cn ≤ C. Therefore, by following the same procedures introduced in Section 4.2.3 it is straightforward to show the Nash-equilibrium output for the franchises of the efficient brand is q∗i = A −∑nj=1 Q j 1 + δi (4.26) The Nash-equilibrium result for the output of the major division of the more efficient brand is Q∗i = A + (n − 1)(1 + δi)C n + 1 (4.27) and the Nash-equilibrium result for the output of the major division of the less efficient brands is Q∗j = A − 2(1 + δi)C n + 1 (4.28) Therefore, for δi ≥ A2C − 1 the inefficient brands become unprofitable. By substituting (4.27) and (4.28) in (4.26) we have q∗i = A + (1 + δi)(n − 1)C (n + 1)(1 + δi) (4.29) 71 4.2. Models Lemma 4.2.3. Under the hierarchical structure introduced in Section 4.2.1, if there ex- ists one efficient and n − 1 inefficient brands, either the efficient brand will never open a franchise and will compete in a Cournot fashion or it will open A2C − 1 franchises and will pre-empt the market from inefficient competitors. The efficient brand’s decision depends on the following conditions δ∗i = 0 i f A C ≥ 1 + n2 A 2C − 1 otherwise (4.30) Proof. By looking at ∂Π̃i ∂δi it is easy to see Π̃i is an increasing function of δi from 0 to δi = A−√A(A−4C(n−1)) 2C(n−1) − 1. It is a decreasing function of δi for δi = A− √ A(A−4C(n−1)) 2C(n−1) − 1 to δi = A−√A(A−4C(n−1)) 2C(n−1) + 1 and an increasing function of δi for δi = A−√A(A−4C(n−1)) 2C(n−1) + 1 to δi = A 2C − 1. Therefore, the optimal number of franchises, δ∗i , is either δ ∗∗ i = A 2C − 1 or δ∗i = A−√A(A−4C(n−1)) 2C(n−1) − 1. It is straightforward to show 0 ≤ δ∗i ≤ 1 By imposing integer conditions on the total number of franchises, the results of Π̃i(δi = 0),Π̃i(δi = 1) and Π̃i(δi = A2C − 1) can be recomputed. It can be verified that the optimal level of sub divisions for the efficient brand is δ∗i = 0 i f A C ≥ 1 + n2 A 2C − 1 otherwise (4.31) Q.E.D. Figure 4.1 shows how the decision of the efficient brand is contingent on the ratio of the cost of the inefficient brand to the market size , CA , and n total number of brands in the market. There are three distinct areas in figure 4.1. The first region is the right side of the graph, region C. Once the market size is small or the efficient brand is considerably more efficient than the other brands, the efficient brand will open up only 1 franchise and will keep inefficient brands out of the market. Another case is region B where the efficient brand can keep competitors out of the market when there are large numbers of competitors and all are less efficient than the efficient brand. In this case the efficient brand will open 72 4.2. Models Figure 4.1: Decision of the most efficient brand for different CA and n. Region A corre- sponds to conditions where the efficient brand will not open any franchise. In area C it will open 1 franchise and in area B it will produce more than one franchise. up numerous franchises and will keep other competitors out of the market. The larger the number of inefficient brands the more can the efficient brand use its superior efficiency and gain market power. Of course being an efficient brand among a large number of com- petitors means the brand is a truly exceptional brand and that is why in highly competitive markets it is the efficient brands who can capture the most of the markets. Next region of figure 4.1, region A, shows the outcome when markets are large or when they are less competitive such as when the relative efficiency between efficient and less efficient brands is small, the incentive for the efficient brand to open franchises is small. This is because, in such environments both less efficient and more efficient brands can reap reasonable profits by coexisting and engaging in Cournot competition. Lemma 73 4.2. Models 4.2.3 can for example justify why there is usually 1 toy store with a couple of franchises in North American cities. The toy store market size is relatively small because only certain age groups can enjoy the products and even these age groups visit these stores with their parents only occasionally.43 m efficient and n inefficient brands Now consider the case where there are m efficient brands and n inefficient brands in the market. We normalize the cost of efficient brands to 0 and consider the marginal cost of inefficient brands to be C. Again based on the results of Lemma 4.2.2 we can expect that the total number of franchises opened by less efficient brands is 0. Therefore, we only consider the number of franchises by efficient brands δi. It is relatively straightforward to compute the Nash-Equilibrium for the units produced by franchises, q∗i , units produced by the major division of the efficient brands, Q∗i , and units produced by the major division of inefficient brands,Q∗j , as (4.32), (4.33), and (4.34). q∗i = A + n(1 + ∑m i=1 δi)C (1 + ∑m i=1 δi)(1 + n + m) (4.32) Q∗i = A + n(1 + ∑m i=1 δi)C 1 + n + m (4.33) Q∗j = A − (m + 1)(1 + ∑mi=1 δi)C 1 + n + m (4.34) Therefore, for δi ≥ A−C(m+1)Cm(1+m) the inefficient brands cannot produce profitably. Lemma 4.2.4. Under the hierarchical structure introduced in Section 4.2.1, if there ex- ists m efficient and n inefficient brands, either the efficient brands never open a franchise and compete in a Cournot fashion or they will produce A−C(m+1)Cm(1+m) franchises and will pre- empt the market from inefficient competitors. The efficient brands’ decision depend on the following conditions δ∗i = 0 i f A C ≥ 1 + (1+n) 2 m A−C(m+1) Cm(1+m) (4.35) 43According to [Toy Association Report, (n.d.)] in 2011 the annual sale of toys including infant toys, dolls, building sets, plush, and youth electronics was around $11B - very close to Starbucks annual sale (see [“CNN Report on Annual Ranking of America’s Largest Corporations”, 2011]). 74 4.2. Models Proof. By considering the change in Π̃i with respect to δi and imposing symmetry on the δis we have ∂Π̃i ∂δi = 2(A + Cn(1 + mδi))(−Aδi + Cn(1 + mδi)2) (1 + n + m)(1 + mδi)3 (4.36) Examining (4.36) it can be established that Π̃i is an increasing function over δi from 0 to δi = A−Cmn−√A(A−4Cmn) 2Cmn , is decreasing in δi = A−Cmn√A(A−4Cmn) 2Cmn to δi = A+Cmn−√A(A−4Cmn) 2Cmn , and increasing in δi = A+Cmn−√A(A−4Cmn) 2Cmn to A−C(m+1) Cm(1+m) . 44 It is again easy to check that A−Cmn− √ A(A−4Cmn) 2Cmn ≤ 0.5 ;∀m ≥ 2. Therefore the symmetric Nash equilibrium is δ∗i = 0 i f A C ≥ 1 + (1+n) 2 m A−C(m+1) Cm(1+m) (4.37) Q.E.D. Lemma 4.2.4 indicates that as much as the efficient brands have an incentive to keep the inefficient brands out of the market, they have no incentive to over-compete with brands that are as efficient as they are. For instance in small markets they have more incentive to produce so much that the inefficient brands go out of business. However, in small markets they do not compete such that they leave zero profit for everyone. They also have no incentive to open a franchise if the market size is large enough. They prefer to play passively and never open any franchise and to coexist with less efficient brands and share the profits. As the case with 1 efficient brand, the decision by efficient brands regarding how aggressively to compete depends on the market size, the cost advantage of the efficient brands, the number of inefficient brands in addition to number of efficient brands.45 Lemma 4.2.4 provides an explanation as to why the coffee chain brands composed of relatively few giant brands and despite the ubiquitousness of each brands’ franchises they are profitable. In contrast to the Toy industry the coffee shop industry has a very large demand. Also, entering the market is relatively simple and that results in large number 44If A−C(m+1)Cm(1+m) ≥ A+Cmn− √ A(A−4Cmn) 2Cmn 45Because such a decision also depends on the number of efficient brands in the market we would need a 3 dimensional graph to show different decision making planes. 75 4.3. Summary and Discussion of potential entrants. Therefore, in order to capture the market the giant multi-national and cost efficient brands aggressively open franchises. They open franchises such that no demand will be left for new inefficient entrants, but once they reach that point they stop. They stop because they realize competing with other efficient brands will ruin the profitability of their business. These brands are considered very aggressive by potential new entrants and soft-players by their giant competitors. 4.3 Summary and Discussion In this chapter we propose a mechanism with which companies can avoid the rent dissi- pation brought about by over divisionalization. We argue that this mechanism is actually widely adopted in the real business world and it might be one of the reasons why oligopoly competition will result in diverse market equilibriums and company structures. In many conventional approaches the new divisions or the franchises occur at the same level of de- cision making as the major divisions. This point of view will lead to over divisionalization and fierce competition. For instance [Baye et al., 1996a] argued if there is absolutely no cost to divisionalize, the outcome of the competition of two homogenous brands is identi- cal to the result of full competition. However, the reality is this outcome is not always or even generally the case. In many industries, including semi-regulated industries, such as airlines, primary firm rights come before new entrants. As another example the interests of flagship retail store come before any other franchises of a brand. One may see this as a brand image related phenomenon. We argue such an approach is a fine-tuned strategic move specifically by efficient brands. We showed that assigning hierarchical order between major divisions and franchises will help divisions mitigate the unpleasant result of fierce competition among homoge- nous companies by means of over divisionalization or opening too many franchises. The result is striking because the same results exist even if the competing companies can only observe the hierarchy not the actual decisions made by the headquarters over the number of franchises they are going to open. It is shown that if n homogenous brands compete, the companies’ head-quarter’s best strategy is to choose no franchises, and only keep their major division. Under this scenario the result will be equivalent to an outcome of Cournot 76 4.3. Summary and Discussion competition. As much as this strategy opens the door to cooperation or tacit collusion among ho- mogenous brands, it can keep the inefficient brands out of the market. It is shown that in a competitive environment those companies which have more than average marginal cost will never have an incentive to open a franchise. This latter fact provides an oppor- tunity for more efficient brands to use such a cost advantage to open up franchises and keep the inefficient brands out of the business. The results of two specific cases where we only had two types of brands showed that the incentive for more efficient brands to apply the strategy of opening new franchises to get rid of inefficient brands is stronger in smaller markets. It also suggested that when the number of potential inefficient competitors is high a small cost advantage is enough to open one franchise and pre-empt the competition. When there are a few homogenous efficient and a few homogenous inefficient compa- nies in the market, the efficient brands may have an incentive to keep the inefficient brands out of the market. If the homogenous efficient brands find it profitable to keep the ineffi- cient brands out of the market they will open franchises such that they get rid of inefficient brands, and will stop opening new franchises when there is insufficient market to support an entrant. Essentially, the efficient brands are all in line with each other in getting rid of inefficient brands whereas once it comes to competing with their own species they do not compete more aggressively than Cournot competition. Under no circumstances, does the model predict pure franchising scenario, with no company-owned divisions but franchises, as a Nash equilibrium. 77 Chapter 5 Conclusions 5.1 Summary Three topics are covered in this dissertation. Chapter 2 estimates the demand responses for different size of air passenger groups. Demand processes are assumed to form a compound Poisson distribution and the parameters of interest are estimated based on compound Pois- son models with and without error terms. Four different groups of customers including singletons, couples, families of size three, and families of size four are considered for analysis. The estimated parameters suggest that the price elasticity of groups of different sizes increases with the size of the group. For instance it was found that the price elasticity of demand for singletons varies between 0.07 to 2.03, for couples vary between 0.6 to 4.2, for families of size three vary between 0.7 to 4.5 and for families of size four vary between 1.6 to 9. The estimated results also suggest that overall 20% of demand was lost in the great recession period. The recession affected the singleton group only about 7% whereas it most severely affected the group of three people with total 56% of demand loss. Also the results suggested that the demand decreases on average 15% in low seasons. In Chapter 3 we introduced a new easy to compute method for decompounding Pois- son processes. The method introduced uses the properties of integer numbers and number theory. It has shown that the introduced methods can decompound compound Poisson processes if we face convolution of two Poisson processes or convolution of several Pois- son processes with pairwise co-prime group sizes. The method introduced is capable of decompounding under some general settings such as when data are collected at irregular time intervals, where data do not have any zero observations, or when Poisson distribution is convoluted with some unknown distributions. 78 5.2. Future Research Suggestions In Chapter 4 we showed that hierarchical decision making combined with company owned divisions and franchises is a fine tuned mechanism by which homogenous efficient brands not only avoid fiercely competing with each other by means of over divisionaliza- tion, but also do credibly commit to become involved in fierce competition with inefficient brands. If the market is solely composed of competing homogenous brands, no brand has any incentive to open any franchise. In this scenario, opening new franchises not only will hurt the competitors but also will hurt the incumbent brand. It is also shown that only most efficient firms have an incentive to open franchises. Finally, it is shown that in addition to cost, the market size and number of competitors in the market are the main determinants to whether or not efficient brands open new franchises. 5.2 Future Research Suggestions 5.2.1 Future Research Suggestions for Chapter 2 We only had access to the data of a Canadian long-haul route. In order to generalize the empirical results found in Chapter 2 it will be useful to look at other routes. Since the models introduced in Chapter 2 can be applied to any data set, any researcher who has access to data of new routes can test whether the results can be generalized. As is discussed by [Varian, 1989], third degree price discrimination will not necessarily result in more profit or welfare. It will be helpful to either theoretically or numerically study the range of parameters of compound Poisson demand models for which third degree price discrimination will make airlines reap more profit, or will increase total welfare. Finding simple cut-off points will help airlines to easily adopt such models in their revenue management systems. 79 5.2. Future Research Suggestions 5.2.2 Future Research Suggestions for Chapter 3 Decompounding when set A has more than two elements and the elements are not pairwise co-prime It was shown in Chapter 3 that the decompounding methods introduced are only appli- cable to either convolution of two Poisson processes or convolution of several Poisson processes with pairwise co-prime elements in set A. One of the open questions is whether the methodologies introduced in Chapter 3 can be generalized to any set of A. Decompounding when set A has two elements and data are recorded in non-equal time intervals In general it is not easy to prove the uniqueness of the estimators when data are recorded in non-equal time intervals. Proving the uniqueness of the estimators requires the proof that Equation (5.1) can only be solved by a unique λ. We were unable to prove the uniqueness and have left this as an open question. 1 n2  1 · · · ∑ni e(ξ(n−1)−1)Tiλ ... . . . ...∑n i T n−1 i · · · ∑n i ξ (n−1)(n−1)T n−1i e (ξ(n−1)−1)Tiλ   1 ... 1  =  q0 ... qn−1  (5.1) 5.2.3 Future Research Suggestions for Chapter 4 The results of Chapter 4 are derived by assuming linear demand functions and linear cost functions. It will be useful to test how robust the results are under more general cost and demand structures. In page 107 we discuss why we were unable to test the procedures on more general settings of demand functions. Empirically testing the theoretical results found in Chapter 4 can be very useful. One of the industries that the results of such theories can be tested on is the airline industry. One can regard the airline within airline structure or different routings between two city pairs as different franchises. One can test what types of airlines tend to introduce multiple routes for identical city-pairs or can test what type of markets motivate airlines to adopt airline within airline strategies. 80 5.2. Future Research Suggestions One of the assumptions we made in Chapter 4 was that brands can fully incentivize the fracnhisees and not only maximize the profits but also extract all rents through a con- tractual agreement. In the case of deterministic demands and absence of any hidden in- formation and hidden action, such types of contracts are easily achievable by setting the franchising fees fixed and equal to the maximum level of achievable profit. One of the fu- ture research directions is relaxing the deterministic demand assumption and considering hidden information or hidden action. Mixing the principal agent theories with hierarchical decision making structure in franchise literature may result in interesting findings. 81 Bibliography Air passenger numbers fall by 20 per cent, new figures reveal, (2009, June 16th). The Daily Mail. Retrieved September 14th, 2011, from http://www.dailymail.co.uk/travel/article-1193358/ Air-passenger-numbers-fall-20-cent-new-figures-reveal.html. CNN Report on Annual Ranking of America’s Largest Corporations, (n.d), Retrieved Oc- tober 1st, 2012, from http://money.cnn.com/magazines/fortune/fortune500/ 2011/snapshots/10567.html. Synergy Definition, (n.d.). Retrieved June 10th, 2012, from http://en.wikipedia. org/wiki/Synergy. Toy Association Report, (n.d), Retrieved June 10th, 2012, from http://www. toyassociation.org/AM/PDFs/Research/RollingData.pdf. R. M. Adelson. Compound poisson distributions. OR, 17(1):pp. 73–75, 1966. ISSN 14732858. URL http://www.jstor.org/stable/3007241. G. Alperovich and Y. Machnes. The role of wealth in the demand for international air travel. Journal of Transport Economics and Policy, 28(2):pp. 163–173, 1994. ISSN 00225258. URL http://www.jstor.org/stable/20053033. C. P. Barros and R. Perrigot. Franchised network efficiency: A DEA application to US networks. In G. Cliquet, M. Tuunanen, G. Hendrikse, and J. Windsperger, editors, Economics and Management of Networks, Contributions to Management Science, pages 191–212. Physica-Verlag HD, 2007. ISBN 978-3-7908-1758-4. 82 Bibliography M. R. Baye, K. J. Crocker, and J. Ju. Divisionalization, franchising, and divestiture incen- tives in oligopoly. American Economic Review, 86(1):223–36, March 1996a. M. R. Baye, K. J. Crocker, and J. Ju. Divisionalization and franchising incentives with integral competing units. Economics Letters, 50(3):429–435, March 1996b. J. L. Bradach. Using the plural form in the management of restaurant chains. Administra- tive Science Quarterly, 42(2):276–303, 1997. J. L. Bradach and R. G. Eccles. Price, authority, and trust: From ideal types to plural forms. Annual Review of Sociology, 15(1):97–118, 1989. J. A. Brickley and F. H. Dark. The choice of organizational form the case of franchising. Journal of Financial Economics, 18(2):401–420, 1987. B. Buchmann and A. Grubel. Decompounding: An estimation problem for poisson ran- dom sums. Annals Of Statistics, 31(4):1054–1074, AUG 2003. ISSN 0090-5364. B. Buchmann and R. Grubel. Decompounding poisson random sums: Recursively trun- cated estimates in the discrete case. Annals Of The Institute Of Statistical Mathematics, 56(4):743–756, DEC 2004. ISSN 0020-3157. T. Burkle and T. Posselt. Franchising as a plural system: A risk-based explanation. Journal of Retailing, 84(1):39–47, 2008. M. J. Carrillo. Extensions of palm’s theorem: A review. Management Science, 37(6):pp. 739–744, 1991. ISSN 00251909. URL http://www.jstor.org/stable/2632529. G. J. Castrogiovanni, J. G. Combs, and R. T. Justis. Resource scarcity and agency the- ory predictions concerning the continued use of franchising in multi-outlet networks. Journal of Small Business Management, 44(1):27–44, 2006. G. Cliquet. Plural forms in store networks: A proposition of a model for store network evolution. International Review of Retail, Distribution and Consumer Research, 10(4): 369–387, 2000. 83 Bibliography J. G. Combs and D. J. Ketchen. Why do firms use franchising as an entrepreneurial strat- egy?: A meta-analysis. Journal of Management, 29(3):443–465, 2003. L. C. Corchon. Oligopolistic competition among groups. Economics Letters, 36(1):1–3, May 1991. L. C. Corchon and M. Gonzalez-Maestre. On the competitive effects of divisionalization. Mathematical Social Sciences, 39(1):71–79, January 2000. A. Creane and C. Davidson. Multidivisional firms, internal competition, and the merger paradox. The Canadian Journal of Economics / Revue canadienne d’Economique, 37 (4):pp. 951–977, 2004. R. P. Dant and P. J. Kaufmann. Structural and strategic dynamics in franchising. Journal of Retailing, 79(2):63–75, 2003. R. P. Dant, P. J. Kaufmann, and A. K. Paswan. Ownership redirection in franchised chan- nels. Journal of Public Policy Marketing, 11(1):33–44, 1992. R. P. Dant, R. Perrigot, and G. Cliquet. A cross-cultural comparison of the plural forms in franchise networks: United states, france, and brazil. Journal of Small Business Management, 46:286–311, 2008. P. Davis. Airline ties profitability to yield management. Society for Industrial and Applied Mathematics (SIAM) News, 27(5):pp 12 and 18, 1994. N. Denton and N. Dennis. Airline franchising in europe:benefits and disbenefits to airlines and consumers. Journal of Air Transport Management, 6(4):179 – 190, 2000. H. U. Gerber. On the numerical evaluation of the distribution of aggregate claims and its stop-loss premiums. Insurance: Mathematics and Economics, 1(1): 13 – 18, 1982. ISSN 0167-6687. doi: DOI:10.1016/0167-6687(82)90016-6. URL http://www.sciencedirect.com/science/article/B6V8N-45S97SP-4/ 2/661dc8cab0ce1ab6c7b0458176198af6. 84 Bibliography D. Gillen and M. Morrison. Air Travel Demand Elasticities: Concepts, Issues and Mea- surement, Chapter 16 (365-411) in Darin Lee (ed) Advances in Airline Economics,. The Economics of Airline Institutions, Operations and Marketing, Amsterdam: Else- vier, 2007. C. Gourieroux, A. Monfort, and A. Trognon. Pseudo maximum-likelihood methods - theory. Econometrica, 52(3):681–700, 1984. ISSN 0012-9682. M. B. Hansen and S. M. Pitts. Nonparametric inference from the m/g/1 workload. Bernoulli, 12(4):737–759, 2006. ISSN 1350-7265. M. B. Hansen and S. M. Pitts. Decompounding random sums: a nonparametric approach. Annals of The Institute of Statistical Mathematics, 62:855–872, 2010. K. R. Harrigan. Formulating vertical integration strategies. Academy of Management Review, 9(4):638, 1984. J. Hausman, B. Hall, and Z. Griliches. Econometric-models for count data with an appli- cation to the patents r and d relationship. Econometrica, 52(4):909–938, 1984. ISSN 0012-9682. G. Hendrikse and T. Jiang. Plural form in franchising: An incomplete contracting ap- proach. Research Paper ERS-2005-090-ORG, Erasmus Research Institute of Manage- ment (ERIM), ERIM is the joint research institute of the Rotterdam School of Man- agement, Erasmus University and the Erasmus School of Economics (ESE) at Erasmus University Rotterdam., 2005. F. Lafontaine. Agency theory and franchising: some empirical results. RAND Journal of Economics, 23(2):263–283, 1992. F. Lafontaine and K. L. Shaw. Targeting managerial control: evidence from franchising. Rand Journal of Economics, 36(1):131–150, 2005. S. Lewin-Solomons. Innovation and authority in franchise systems: An empirical explo- ration of the plural form. Research paper, Iowa State University, 1999. 85 Bibliography X. Lin and K. Pavlova. The compound poisson risk model with a threshold dividend strategy. Insurance Mathematics & Economics, 38(1):57–80, FEB 24 2006. D. Lord and F. Mannering. The statistical analysis of crash-frequency data: A re- view and assessment of methodological alternatives. Transportation Research Part A: Policy and Practice, 44(5):291 – 305, 2010. ISSN 0965-8564. doi: 10.1016/j. tra.2010.02.001. URL http://www.sciencedirect.com/science/article/pii/ S0965856410000376. D. Lord, S. Washington, and J. Ivan. Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Prevention, 37(1):35–46, 2005. URL http://eprints.qut.edu.au/38176/. C. Manolis, R. Dahlstrom, and A. Nygaard. A preliminary investigation of ownership conversions in franchised distribution systems. Journal of Applied Business Research, 11(2):1–8, 1995. B. Mantin and B. Koo. Dynamic price dispersion in airline markets. Transportation Re- search Part E: Logistics and Transportation Review, 45(6):1020 – 1029, 2009. ISSN 1366-5545. doi: 10.1016/j.tre.2009.04.013. URL http://www.sciencedirect. com/science/article/pii/S1366554509000489. G. F. Mathewson and R. A. Winter. The economics of franchise contracts. Journal of Law and Economics, 28(3):pp. 503–526, 1985. D. Mcfadden. Conditional logit analysis of qualitative choice behavior. Fron- tiers in Econometrics, pages 105–142, 1973. URL http://ci.nii.ac.jp/naid/ 10007461386/en/. G. Nenes, S. Panagiotidou, and G. Tagaras. Inventory management of multiple items with irregular demand: A case study. European Journal Of Operational Research, 205(2): 313–324, SEP 1 2010. ISSN 0377-2217. doi: 10.1016/j.ejor.2009.12.022. 86 Bibliography W. Newey and D. McFadden. Large sample estimation and hypothesis testing. Handbook of Econometrics, 4:2111–2245, 1994. ISSN 15734412. doi: 10.1016/S1573-4412(05) 80005-4. URL http://dx.doi.org/10.1016/S1573-4412(05)80005-4. X. Ning, L. Papiez, and G. Sandison. Compound-poisson-process method for the multiple scattering of charged particles. Phys. Rev. E, 52(5):5621–5633, Nov 1995. doi: 10. 1103/PhysRevE.52.5621. J. Oh, S. P. Washington, and D. Nam. Accident prediction model for railway-highway interfaces. Accident Analysis & Prevention, 38(2):346 – 356, 2006. ISSN 0001-4575. doi: 10.1016/j.aap.2005.10.004. URL http://www.sciencedirect.com/science/ article/pii/S0001457505001776. T. H. Oum, W. G. W. II, and J.-S. Yong. Concepts of price elasticities of transport demand and recent empirical estimates: An interpretative survey. Journal of Trans- port Economics and Policy, 26(2):pp. 139–154, 1992. ISSN 00225258. URL http: //www.jstor.org/stable/20052976. M. R. Oxenfeldt and A. O. Kelly. Will successful franchise systems ultimately become wholly-owned chains? Journal of Retailing, 44(4):69–83, 1968. H. H. Panjer. Recursive evaluation of a family of compound distributions. Astin Bulletin, 12:22–26, 1981. S. Polasky. Divide and conquer on the profitability of forming independent rival divisions. Economics Letters, 40(3):365–371, November 1992. D. Qiu and W. Zhou. Mergers, divestitures and industry reorganization. Research paper, Univeristy of Hong Kong, 2010. M. P. Quine and E. Seneta. Bortkiewicz’s data and the law of small numbers. International Statistical Review / Revue Internationale de Statistique, 55(2):pp. 173–181, 1987. ISSN 03067734. URL http://www.jstor.org/stable/1403193. 87 Bibliography S. W. Salant, S. Switzer, and R. J. Reynolds. Losses from horizontal merger: The effects of an exogenous change in industry structure on cournot-nash equilibrium. The Quarterly Journal of Economics, 98(2):pp. 185–199, 1983. F. S. Schnatter. Finite mixture and Markov switching models. Springer Verlag, 2006. M. Schwartz and E. A. Thompson. Divisionalization and entry deterrence. The Quarterly Journal of Economics, 101(2):pp. 307–322, 1986. J.-S. Song, H. Zhang, Y. Hou, and M. Wang. The effect of lead time and demand uncer- tainties in (r, q) inventory systems. Operations Research, 58(1):68–80, JAN-FEB 2010. ISSN 0030-364X. doi: 10.1287/opre.1090.0711. O. Sorenson and J. B. Srensen. Finding the right mix: franchising, organizational learning, and chain performance. Strategic Management Journal, 22(6-7):713–724, 2001. M. Srinivasan and H. Lee. Random review production inventory systems with compound poisson demands and arbitrary processing times. Management Science, 37(7):813–833, JUL 1991. ISSN 0025-1909. X. Su. Intertemporal pricing with strategic customer behavior. Management Science, 53: 726–741, 2007. B. Sundt and W. Jewell. Further results on recursive evaluation of compound distributions. Astin Bulletin, 12:27–39, 1981. H. R. Varian. Price discrimination. In R. Schmalensee and R. Willig, editors, Handbook of Industrial Organization, volume 1 of Handbook of Industrial Organization, chapter 10, pages 597–654. Elsevier, 1989. URL http://ideas.repec.org/h/eee/indchp/ 1-10.html. M. R. Veall and K. F. Zimmermann. Pseudo-r2 measures for some common limited de- pendent variable models. Journal of Economic Surveys, 10(3):241–259, 1996. ISSN 1467-6419. doi: 10.1111/j.1467-6419.1996.tb00013.x. URL http://dx.doi.org/ 10.1111/j.1467-6419.1996.tb00013.x. 88 E. C. H. Veendorp. Entry deterrence, divisionalization, and investment decisions. The Quarterly Journal of Economics, 106(1):297–307, February 1991. J. Windsperger. Economics and management of franchising networks. Contributions to management science. Physica-Verlag, 2004. R. Winkelmann and K. F. Zimmermann. Recent developments in count data modelling: Theory and application. Journal of Economic Surveys, 9(1):1–24, March 1995. URL http://ideas.repec.org/a/bla/jecsur/v9y1995i1p1-24.html. L. Yuan. Product differentiation, strategic divisionalization, and persistence of monopoly. Journal of Economics and Management Strategy, 8(4):581–602, 1999. S. Ziss. Horizontal mergers and delegation. International Journal of Industrial Organiza- tion, 19(3-4):471–492, March 2001. 89 Appendix A Proof of Lemma 3.3.1 If Y is the main process based on the convolution of poisson processes introduced before, the new Process Xk can be defined to have values 1 if the kth observation {1 ≤ k ≤ n Yk : Yk ≡ i (mod M1)} and 0 otherwise. It is assumed the total number of observation is n and X1, X2, ....Xn are iid which can only take values zero and 1. Prob[X1 = x1, X2 = x2, ..., Xn = xn] = n∏ j=1 Prob[X j = x j] = P n∑ j=1 x j (1 − P) n− n∑ j=1 x j ⇒ log Prob(.) = n∑ j=1 x j log P + (n − n∑ j=1 x j) log (1 − P) ⇒F.O.C P̂ = n∑ j=1 x j n = q̂n, j,M2 And it is straightforward to show P = Prob(Y ∈ {1 ≤ k ≤ n Yk : Yk ≡ i (mod M2)}) = H(M2,M2 − i, λ1) Therefore by applying Continuous Mapping Theorem λ̂1 can consistently be estimated by ∀i ∈ 1, ...,M2; H(M2,M2 − i, λ̂1) = q̂n, j,M2 ∀i ∈ 1, ...,M2; 0 ≤ j < M2 | M1i ≡ j (mod M2); It is also very straightforward to show (P− P̂) −→d N(0, P(1−P)n ). And also it is straight- forward to show var(P̂) = P(1−P)n ≤ 14n . Now univariate delta method can be applied to establish asymptotic properties of λ̂1. (λ1 − λ̂1) −→d N(0, (q̂n,Vi ,M2 )(1−q̂n,Vi ,M2 )N [H−1 ′ (M2,M2 − Ui, λ̂1)]2) Therefore, var(λ̂1) = (q̂n, j,M2 )(1−q̂n, j,M2 ) n [H −1′(M2,M2 − i, λ̂1)]2 ≤ [H−1 ′ (M2,M2−i,λ̂1)]2 4n Q.E.D. 90 Appendix B Vandermonde Matrix Let’s Vn be a Vandermonde Matrix of order n defined as: Vn =  x1 x2 · · · xn−1 xn x21 x 2 2 · · · x2n−1 x2n ... ... . . . ... ... xn−11 x n−1 2 · · · xn−1n−1 xn−1n xn1 x n 2 · · · xnn−1 xnn  The inverse of Vn, V−1n always exists and can be specified by Matrix B such that element Bi, j is calculated by Bi, j = ∑ 1≤k1<...<kn− j≤n k1,...,kn− j,i (−1) j−1 xk1 . . . xkn− j xi ∏ 1≤k≤n k,i (xk − xi) 91 Appendix C Matrix Inversion Assume B is an n by n matrix with zero diagonal elements and all other elements 1. Bn∗n =  0 1 · · · 1 1 1 0 · · · 1 1 ... ... . . . ... ... 1 1 · · · 0 1 1 1 · · · 1 0  The inverse of A can be written as follows: B−1 = 1 n − 1  2 − n 1 · · · 1 1 1 2 − n · · · 1 1 ... ... . . . ... ... 1 1 · · · 2 − n 1 1 1 · · · 1 2 − n  92 Appendix D Simulation Results for Chapter 3 (a) mean(λ̂1) = 1.1087, var(λ̂1) = 0.1725 (b) mean(λ̂2) = 1.1830, var(λ̂2) = 2.3721 Figure D.1: The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 3, (λ1, λ2) = (1, 1). 93 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 1.0166, var(λ̂1) = 0.0191 (b) mean(λ̂2) = 1.0257, var(λ̂2) = 0.0999 Figure D.2: The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 2, (λ1, λ2) = (1, 1). 94 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 1.0247, var(λ̂1) = 0.0485 (b) mean(λ̂2) = 1.0608, var(λ̂2) = 0.3513 Figure D.3: The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 3, (λ1, λ2) = (1, 1). 95 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 1.0121, var(λ̂1) = 0.0092 (b) mean(λ̂2) = 1.0072, var(λ̂2) = 0.0452 Figure D.4: The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 2, (λ1, λ2) = (1, 1). 96 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 1.0202, var(λ̂1) = 0.0193 (b) mean(λ̂2) = 1.0350, var(λ̂2) = 0.1344 Figure D.5: The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 3, (λ1, λ2) = (1, 1). 97 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 1.0010, var(λ̂1) = 0.0042 (b) mean(λ̂2) = 1.0103, var(λ̂2) = 0.0240 Figure D.6: The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 2, (λ1, λ2) = (1, 1). 98 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 1.0086, var(λ̂1) = 0.0093 (b) mean(λ̂2) = 1.0099, var(λ̂2) = 0.0539 Figure D.7: The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 3, (λ1, λ2) = (1, 1). 99 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 0.5078, var(λ̂1) = 0.0068 (b) mean(λ̂2) = 1.1344, var(λ̂2) = 0.3696 Figure D.8: The distribution of estimated parameters based on simulated data. n = 500, 0 < T < 3, (λ1, λ2) = (0.5, 1) 100 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 0.5003, var(λ̂1) = 0.0022 (b) mean(λ̂2) = 1.0244, var(λ̂2) = 0.0358 Figure D.9: The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 2, (λ1, λ2) = (0.5, 1) 101 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 0.5036, var(λ̂1) = 0.0030 (b) mean(λ̂2) = 1.0502, var(λ̂2) = 0.0881 Figure D.10: The distribution of estimated parameters based on simulated data. n = 1000, 0 < T < 3, (λ1, λ2) = (0.5, 1) 102 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 0.5015, var(λ̂1) = 0.0012 (b) mean(λ̂2) = 1.0076, var(λ̂2) = 0.0172 Figure D.11: The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 2, (λ1, λ2) = (0.5, 1) 103 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 0.4999, var(λ̂1) = 0.0015 (b) mean(λ̂2) = 1.0150, var(λ̂2) = 0.0352 Figure D.12: The distribution of estimated parameters based on simulated data. n = 2000, 0 < T < 3, (λ1, λ2) = (0.5, 1) 104 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 0.5007, var(λ̂1) = 0.0005 (b) mean(λ̂2) = 1.0025, var(λ̂2) = 0.0085 Figure D.13: The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 2, (λ1, λ2) = (0.5, 1) 105 Appendix D. Simulation Results for Chapter 3 (a) mean(λ̂1) = 0.5004, var(λ̂1) = 0.0007 (b) mean(λ̂2) = 1.0154, var(λ̂2) = 0.0189 Figure D.14: The distribution of estimated parameters based on simulated data. n = 4000, 0 < T < 3, (λ1, λ2) = (0.5, 1) 106 Appendix E Nonlinear Demands - Chapter 4 We tried different demand functions to test the validity of our models in more general settings. The only tractable demand function that we could find closed form solutions for were linear demand functions. Here we show our computations for two demand families. E.1 P = Q−T We first tried constant elasticity demands family. Even for the case of 2 homogenous firms we could not find any closed form solutions. Assume the marginal cost is fixed and is C. We can write QT = Q1 + Q2 + ∑δ1j=1 q1, j + ∑δ2j=1 q2, j. Each franchise’s profit can be written as: pii, j = (P − QT )qi, j (E.1) ∂pii, j ∂qi, j = −C + QT−1−(QT − qi, j) (E.2) We could not find a closed form solution for first order condition of the profit function of each franchise, (E.2), and that stopped us from further calculations. E.2 P = A − Q1nT n = 2 is the easiest case to study. If we assume there are only two homogenous brands are competing, construct the profit functions of franchises, and solve the first order conditions we will find that each franchise will choose the output level: q∗i, j = (1 − 3(δ1 + δ2))(Q1 + Q2) + √ (Q1 + Q2)2 + 3(δ1 + δ2)(−2 + 3(δ1 + δ2)) (δ1 + δ2)(−2 + 3(δ1 + δ2)) (E.3) 107 E.2. P = A − Q 1nT Plugging these findings into the profit function of major divisions and computing the best response of each major division of each brand lead to very cumbersome equations and it precluded us from further analysis of such types of demand functions. 108


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items