E S S A Y S O N H E T E R O G E N E I T Y I N C H O I C E M O D E L I N G By Kwangpil Chang B. A . (Spanish) Hankuk University of Foreign Studies M . Sc. (Advertising) University of Illinois at Urbana-Champaign A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E STUDIES F A C U L T Y O F C O M M E R C E A N D BUSINESS A D M I N I S T R A T I O N We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A August 1998 © Kwangpil Chang, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Faculty of Commerce and Business Administration The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1Z1 Date: Abst rac t This thesis includes three essays which examine the implications of incorporating pa-rameter heterogeneity, consideration set heterogeneity, and decision rule heterogeneity, respectively, in brand choice models. In the first essay, we identify the conditions under which unaccounted for price re-sponse heterogeneity results in a spurious sticker shock effect. We show, using an analyt-ical derivation, a simulation study and an empirical application to scanner panel data, that estimates of the sticker shock effect may be biased if households that are price sensi-tive in their brand choice decision are also more likely to respond to category marketing activity in their purchase timing decision. The empirical results, from two product categories, show that the sticker shock coef-ficient from a Hierarchical Bayes model (which continuously accounts for price response heterogeneity) is statistically insignificant, providing no evidence of the existence of a sticker shock effect. In contrast, the corresponding coefficient from the standard model, which ignores this heterogeneity, is highly significant and supports the existence of a sticker shock effect. A posterior analysis of household parameters confirms the hypoth-esized relationship between price sensitivity in brand choice and responsiveness to pro-motional activity in purchase incidence, and is consistent with our explanation of the underlying cause of the bias in the standard model. The second essay develops a new consideration set model that can be estimated with scanner panel data. In contrast to many previous approaches, which require enumeration of all possible consideration sets, we directly model uncertainty about including a brand in the consideration set. The resulting inclusion probabilities for brands reflect a "fuzzy" ii consideration set in the sense that a brand belongs to the consideration set only proba-bilistically. The proposed fuzzy set model outperforms several previous consideration set models in two product categories (yogurt and ketchup). We then apply the fuzzy set approach to examine the role of the consideration set in moderating the impact of advertising on price sensitivity. In contrast to the experimental findings of Mitra and Lynch (1995), we find no positive relationship between consideration set size and price sensitivity. Further empirical test may be necessary to confirm the hypothesized relationship. In the third essay, we investigate the role of decision rule heterogeneity in brand choice behavior. We develop a flexible model, which allows for the uncertainty in decision rules used by the consumer. Specifically, we develop a Hierarchical Bayes model of reference price effects that accommodates both the sticker shock and reference-dependent formulations. In addition, we also incorporate the possibility that consumers may mix the two decision rules probabilistically. Therefore, the proposed model allows for three different decision hierarchies which incorporate sticker shock, reference-dependent and mixed rules respectively. The empirical results show that consumers differ not only in their preference and response but also in their decision rules. On average, half the sample households appear to show loss aversion, i.e., follow a reference-dependent decision rule, while the remaining households do not seem to respond to reference prices. The proposed model provides a richer description of consumer choice processes than the comparison models that allow for only one model structure and ignore model uncertainty. iii Table of Contents Abstract ii List of Tables viii List of Figures x Acknowledgement xi 1 Introduction 1 1.1 Overview 1 2 Response Heterogeneity and Spurious Sticker Shock Effects 7 2.1 Overview 7 2.2 Theory 11 2.2.1 Description of the True Model 12 2.2.2 Description of the Misspecified Model 13 2.2.3 Analysis of the Misspecified Model 14 2.3 Simulation Study 18 2.3.1 Data Generation 18 2.3.2 Predictions and Estimated Models 20 2.3.3 Simulation Results 21 2.4 Empirical Application 23 2.4.1 Data 23 2.4.2 Models Estimated 24 iv 2.4.3 Estimation 27 2.5 Results and Discussion 29 2.5.1 N L and NL-ST models 29 2.5.2 HB-ST Model 30 2.5.3 Posterior Analysis 30 2.5.4 Attenuation of Reference Price Effects 31 2.6 Conclusion 31 2.7 Limitations and Future Research 33 3 Consideration Set Heterogeneity, Advertising, and Price Sensitivity 48 3.1 Overview 48 3.1.1 Approaches to Incorporating Consideration Sets in Choice Models 50 3.1.2 Advertising , Price Sensitivity and Consideration Sets 52 3.2 Proposed Fuzzy Set Model 55 3.2.1 Modeling Approach 55 3.3 Advertising, Price Sensitivity and Consideration Sets 61 3.4 Empirical Application 63 3.4.1 Estimation 63 3.4.2 Data 64 3.4.3 Model Variables 65 3.4.4 Models Estimated 68 3.5 Results and Discussion 71 3.6 Managerial Implications 73 3.7 Conclusion 75 3.8 Limitations and Future Research 77 v 4 Decision Rule Heterogeneity 89 4.1 Overview 89 4.2 Model Development 93 4.2.1 Model Uncertainty 94 4.2.2 Parameter Heterogeneity 95 4.2.3 Joint Model Specification 95 4.3 Model Features 97 4.4 Estimation 98 4.5 Empirical Application 100 4.5.1 Data 100 4.5.2 Models Estimated 101 4.6 Results and Discussion 101 4.7 Conclusions 103 4.8 Limitations and Future Research 105 5 Conclusions 110 Bibliography 114 Appendices 121 A Derivation of the Conditions Underlying Biased Sticker Shock Effects 121 A . l True model specification 122 A . 2 Misspecified model 122 B Priors and the Markov Chain Monte Carlo Simulations 126 B. l Priors 126 B.2 Full Conditionals and Simulations 126 vi B.2.1 Generating 12Q B.2.2 Generating ^ ( m + 1 ) 127 B. 2.3 Generating S'1^^ 128 C Priors and the Markov Chain Monte Carlo Simulations 129 C . l Priors and Pseudopriors 129 C. l . l Priors on Model Probabilities 129 C.1.2 Priors on /3^ | M = s 129 C.l .3 Pseudopriors on (3%\ M ^ s . 130 C.2 Pull Conditionals and Simulations 130 C.2.1 Generating Model Indicators 130 C.2.2 Generating A h( m + 1) 131 C.2.3 Generating p^m+i) 1 3 1 C.2.4 Generating / ^ m + 1 ) 132 C.2.5 Generating E~*(m+*) 132 vii List of Tables 2.1 Estimation Results on Simulated Data 35 2.2 Parameter Estimates for MNL-ST 35 2.3 Sensitivity Analysis on Category Value Parameters 36 2.4 Sensitivity Analysis on Price Parameters 37 2.5 Sensitivity Analysis on Price Parameters 38 2.6 Market Share/Average Price for the Yogurt Category 39 2.7 Market Share/Average Price for the Ketchup Category 40 2.8 NL and NL-ST Parameter Estimates and Fit Statistics 41 2.9 HB-ST Parameter Estimates and Fit Statistics 42 2.10 Regression of CV Parameters on Price Parameters 43 2.11 Reference Price Elasticity for the Yogurt Category 44 2.12 Reference Price Elasticity for the Ketchup Category 45 2.13 Price Elasticity for the Yogurt Category 46 2.14 Price Elasticity for the Ketchup Category 47 3.1 Fit Statistics for the Yogurt Data 79 3.2 Fit Statistics for the Ketchup Data 80 3.3 Parameter Estimates for the Yogurt Data 81 3.4 Parameter Estimates for the Ketchup Data 82 3.5 FZENT: Yogurt Data 83 3.6 Inclusion Probability and Market Share 84 3.7 Share Gains Across Brands: FZSET . 85 viii 3.8 Share Gains Across Brands: MNL 86 4.1 Parameter Estimates and Fit Statistics for HB-ST and -RD modelsl08 4.2 Parameter Estimates and Fit Statistics for HB-STRD model . . 109 ix List of Figures 3.1 Functional Shapes of Compensatory vs. Non-Compensatory Rule 87 3.2 Relationship Between Advertising and Price Sensitivity 88 x Acknowledgement First of all, I thank God for helping me go through the ups and downs of completing my dissertation and giving me an opportunity to meet two wonderful advisors with high moral and academic standards. Siddarth is the one with whom I have spent most of my time in developing ideas and translating them into my dissertation. Ongoing discussions with him, and his insightful suggestions, have helped me shape the core idea of my dissertation. Without his help I could have not finished my dissertation on time. Not only did he devote endless hours to reading the entire manuscript and helping me refine it, but he also demanded of me critical thinking and reasonable progress during the entire process. I could not have been more fortunate in finding a supervisor like Siddarth and I think of him as my big brother. Whoever has him as a supervisor is half-done in his/her dissertation. Chuck Weinberg is the one whose part I would describe as that of an uncle in super-vising my dissertation. His broad interests and knowledge in many areas in marketing have been a guiding light on the way I made progress at each stage. Seemingly, he was slow to understand the issues that I brought up in his presence (mainly because of my inarticulateness) but later on I was always amazed at his raising deeper issues in return, which I could not think of. Like an uncle, he has been always generous and encourag-ing (yet reasonably demanding). Again whoever has him as a supervisor is half-done in his/her dissertation. Toward the end of the dissertation, their insightful reviews and suggestions I received helped to make significant refinements in this final version of the text. Whoever has both of them as supervisors is all-done in his/her dissertation. xi Finally, the backbone to this entire effort is my family — my parents, for instilling in me the sense of perseverance needed to bring this dissertation to fruition, my wife, Sunghye, for appeasing our crying baby and putting up with the late night hours and my son, Jay, for being what I strive to work for. xii Chapter 1 In t roduct ion 1.1 Overview With the advent of scanner panel data, discrete choice analysis in marketing has ad-vanced our understanding of choice processes and consumer response to the marketing mix. The advantage of scanner panel data is that it represents actual behavior from a set of consumers, and offers the opportunity to observe the longitudinal choice behavior of panelists and a number of environmental variables in an unobtrusive way. This provides the researcher with an opportunity to either test theories or to use a scanner study to validate what has been shown in the laboratory. An important area of investigation in this research stream has been the development of methodologies to account for hetero-geneity across households. One aspect of heterogeneity is the notion that consumers differ in their preference structure and in their response to the marketing mix. This has been incorporated by allowing the parameters of the brand choice model to vary across consumers. Several alternative methods have been proposed, notably, a latent class ap-proach (Kamakura and Russell 1989), a fixed effects approach (Rossi and Allenby 1993) and a random effects approach (Chintagunta 1994, Jain, and Vilcassim 1991, Gonul and Srinivasan 1993, Rossi, McCulloch and Allenby 1996). Another type of heterogeneity is derived from the notion that consideration or choice sets vary across consumers. Several approaches to model heterogeneity in considera-tion sets have been proposed in the literature (Andrews and Srinivasan 1995, Chiang, 1 Chapter 1. Introduction 2 Chib and Narasimhan 1998, Siddarth, Bucklin and Morrison 1995, Gaudry and Dagenais 1979, Swait and Ben-Akiva 1987, Fotheringham 1988, Bronnenberg and Vanhonacker 1996). In addition, some of these approaches also incorporate parameter heterogeneity (Bronnenberg and Vanhonacker 1996, Chiang et. al. 1998). While it is generally accepted that scanner panel data analysis should include para-metric uncertainty and consideration set heterogeneity, much less attention has been paid to incorporating uncertainty about the model itself. That is, when faced with competing yet plausible structural assumptions, most often, analysts have used the data to identify a single best model and then proceeded as if the chosen model based on fit statistics were known to be true. The chosen model is then used to make inferences and predictions even though the rejected models are not necessarily wrong. In the statistics literature, this issue has been addressed by the Bayesian model averaging approach (Chatfield 1995, Draper 1995) that acknowledges that model uncertainty is likely to be more serious than other sources of uncertainty, e.g., parametric uncertainty, which have received far more attention from statisticians. The spirit of this approach is well represented in the follow-ing quote: "The main way to avoid noticing after the fact that a set of modeling assumptions, different from those originally assumed, turned out to be correct is for one's model prospectively to have been sufficiently large to encompass the retrospective truth (Draper 1995, p. 55)." In a broader sense, all kinds of heterogeneity corrections are reflections of the re-searcher's uncertainty about the true model structure. Therefore, as long as model complexity is manageable, the researcher seeks to build a bigger (more complex) model. Failure to do this may result in wrong inferences and misunderstanding of choice behavior. In this thesis, we seek to examine the implications of heterogeneity in choice modeling, C h a p t e r 1. I n t r o d u c t i o n 3 i.e., parametric uncertainty, consideration set heterogeneity and structural uncertainty. More specifically, we investigate the relationship between reference price effects and pa-rameter heterogeneity, the impact of advertising on price sensitivity through its influence on consideration set heterogeneity, and structural uncertainty about two competing ref-erence price formulations: sticker shock (Winer 1986) and reference-dependent (Hardie, Johnson and Fader 1993). Chapter 2 discusses the consequences of ignoring parametric uncertainty in investi-gating reference price effects. Several papers in the marketing literature have captured the impact of reference price on brand choice via the sticker shock formulation in which consumers are assumed to use the difference between the shelf price and reference price to evaluate an alternative. In this research, reference prices are brand-specific and typically imputed from the prices of the brands that a consumer is supposed to have "observed" on previous purchases in the category. However, since category marketing activity can also differentially affect the purchase timing of households, we argue that these measures of reference price may show certain systematic patterns and result in a spurious sticker shock effect. Specifically, we show that estimates of the sticker shock effect may be bi-ased if households that are price sensitive in their brand choice decision are also more likely to respond to category marketing activity in their purchase timing decision. We present an analytical derivation to establish some general conditions under which the bias occurs and conduct a simulation experiment that confirms our specific hypotheses. We follow this up with an empirical application using scanner panel data from the yogurt and ketchup categories. The results show that the 95% probability interval of the posterior distribution of the sticker shock coefficient in a Hierarchical Bayes model (which continuously accounts for price response heterogeneity across households) contains the value of zero, providing no evidence of the existence of a sticker shock effect. In contrast, the corresponding Chapter 1. Introduction 4 coefficient from a standard model (which ignores this heterogeneity) is highly significant and supports the existence of a sticker shock effect. A posterior analysis on household parameters confirms the hypothesized positive relationship between responsiveness to category promotion activities and price sensitivity in the brand choice decision and is consistent with our explanation of the underlying cause of the bias in the standard model. Chapter 3 presents a new "fuzzy" consideration set model that can be estimated with scanner panel. Unlike previous non-fuzzy (well-defined) set approaches which model uncertainty about the sets from which a brand is chosen, the proposed fuzzy set approach directly models the probability of a brand being included in the consideration set. The resulting inclusion probabilities for brands reflect a "fuzzy" consideration set, in the sense that a brand belongs to the consideration set only probabilistically, in contrast to the non-fuzzy consideration sets, in which the sets are well-defined but there is uncertainty about the sets from which a brand will be chosen. As compared to an existing fuzzy set model (Bronnenberg and Vanhonacker 1996), in which variables have a compensatory influence on consideration, we model consideration as a non-compensatory process. We apply the approach to scanner panel data in two product categories and show that the proposed fuzzy set model outperforms several previous consideration set models. We use our modeling approach to examine the role of the consideration set in mod-erating the effect of advertising on price sensitivity. This is to generalize the findings of Mitra and Lynch (1995), by using purchase data as opposed to experimental data, and also overcoming some of the limitations of their experimental study. Specifically, unlike their experimental study in which advertising could only increase consideration set size (since subjects only evaluated unfamiliar brands), we use a natural setting in which advertising can both increase and decrease consideration set size. Empirical results show that consideration set size has no significant (positive) effect Chapter 1. Introduction 5 on price sensitivity. The experimental findings of Mitra and Lynch (1995) do not seem to generalize to this particular data set. Further empirical test may be necessary to confirm their hypothesized relationship. In marketing, a significant body of research has investigated the reference price con-struct and modeled its impact on consumer choice behavior via two major competing models, the sticker shock (Winer 1986), and reference-dependent formulations (Hardie, Johnson and Fader 1993). The sticker shock formulation captures reference price ef-fects via a "sticker shock term" defined as the difference between brand-specific reference prices and the current prices. On the other hand, the reference-dependent formulation captures asymmetric responses to positive and negative price deviations from a category-specific reference price, which are hypothesized to result in loss aversion in the case of the current price exceeding the reference price (Tversky and Kahneman 1991). Therefore, these two models represent two different decision rules or choice processes underlying consumer brand choice, both of which are empirically supported as well as well-grounded in psychological theory: Adaptation-Level Theory (Helson 1964) and Prospect Theory (Kahneman and Tversky 1979). Behaviorally, however, there is no reason to believe that every household follows the same decision rule. Some consumers may be well-described by the sticker shock model because they have sufficient cognitive capacity and motivation to recall the shelf-prices observed on the last purchase occasion. Other consumers may follow the reference-dependent model because they want to simplify the decision process, by using only one category-specific reference price to decrease the cognitive burden. One could also hypoth-esize a third segment of consumers that mixes the two decision rules probabilistically, i.e., uses one of the decision rules on different purchase occasions. Previous research has not attempted to account for this structural uncertainty about Chapter 1. Introduction 6 the way reference prices affect the choice process. Chapter 4 discusses the general prob-lems of ignoring structural uncertainty and develops a Hierarchical Bayes model of refer-ence price effects that accounts for heterogeneity in the way that consumers respond to reference prices in making brand choices. Our empirical application of the proposed model to the ketchup category shows that consumers differ not only in their preference and response but also in their decision rules. On average, half the sample households show loss aversion while the remaining households do not respond to reference prices. The proposed model provides a richer description of consumer choice process than the base models that allow for only one model structure and ignore model uncertainty. Chapter 2 Response Heterogeneity and Spurious Sticker Shock Effects 2.1 Overview A significant body of recent research in marketing has investigated the reference price construct and modeled its impact on consumer brand choice behavior. Both aggregate (Raman and Bass 1988, Putler 1992) and disaggregate analysis (Winer 1986, Lattin and Bucklin 1989, Kalwani et al. 1990, Rajendran and Tellis 1994, Briesch et al 1997) have been used to describe the nature of consumer response to reference price. In addition, a considerable body of work suggests an asymmetry in response to reference price with consumers treating gains differently from losses (Putler 1992, Hardie Johnson and Fader 1993). In their review paper, Kalyanaram and Winer (1995) interpret these, and other, research findings as strong support for the existence of reference price effects. Previous disaggregate analysis has typically employed some variant of a logit choice model (Guadagni and Little 1983) and has used scanner panel data in model estimation. Since reference price is a latent construct and is difficult to measure through conventional data sources, it has been imputed from the price series in the data. In a majority of the models reported in the literature, the hypothesized impact of reference price is captured by including in the utility function of a logit choice model, a variable, commonly referred to as the "sticker shock" term (Winer 1986), that represents the difference between a brand's reference price and its current shelf price. A positive coefficient for the associated parameter is interpreted as evidence for the "sticker shock" effect. 7 Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 8 Virtually all previous sticker shock models have not accounted for the heterogeneity in price response among consumers (Bell and Bucklin 1998, and Briesch et al. 1997 are exceptions). The importance of capturing heterogeneity, and its implications for parameter estimates, has been well documented in the marketing literature (Chintagunta 1994, Jain and Vilcassim 1991, Kamakura and Russell 1989, Gonul and Srinivasan 1993, Rossi and Allenby 1993), and suggests that accounting for price response differences is an important prerequisite to obtaining accurate measures of the sticker shock effect. In a study of employment decisions, Heckman (1981) also discusses the relationship between heterogeneity and state dependence. He argues that "individuals may differ in certain unmeasured variables that influence their probability of experiencing the event. If these variables are correlated over time, and are not properly controlled, previous experience may appear to be a determinant of future experience solely because it is a proxy for such temporally persistent unobservables" (Heckman 1981, p.91-92). Bell and Lattin (1996) highlight a similar issue but, unlike the current work that examines the sticker shock effect, focus on the reference-dependent formulation1 and the role of unaccounted for heterogeneity in biasing estimates of loss aversion. They argue that price-sensitive consumers may predominantly observe losses (actual prices higher than a reference price) while price-insensitive consumers may predominantly observe gains (actual prices lower than a reference price). In this case, the best fit to the data is given by a model that allows for different response parameters for gains and losses. They show that, in a standard cross-sectional analysis, the parameter for losses (price-sensitive segment) tends to be greater in absolute magnitude than that for gains (price-insensitive segment) even in the absence of true loss aversion. In contrast to their work, we also J I n the reference-dependent formulation, the price of each choice alternative is compared to the s a m e reference price, as opposed to the alternative-specific reference prices used in the sticker shock models. Consumers' responses to positive or negative deviations from this reference price are hypothesized to be asymmetric, resulting in loss aversion. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 9 include an analytical derivation and a simulation study to describe the mechanism causing the bias, and examine the role played by factors other than unaccounted for response heterogeneity. Bell and Bucklin (1998) use a latent class approach to account for price response heterogeneity in a sticker shock model of brand choice and purchase incidence. They do not, however, discuss the bias issue but focus on the impact of reference prices on the purchase incidence decision. Similarly, Briesch et al. (1997) also use the latent class approach to compare alternative models of reference price. While these studies have accounted for heterogeneity in parameters, the techniques used have not been fully Bayesian. For example, the latent class approach used by Bell and Bucklin (1998) and Briesch et al. (1997) approximates the continuous distribution of heterogeneity by a discrete distribution. The Bayesian approach to fixed effects (Rossi and Allenby 1993) used by Bell and Lattin (1996) requires ad hoc specification of prior tightness. Since a thorough heterogeneity correction is crucial in examining "true" sticker shock effects, we apply a more sophisticated Hierarchical Bayes approach (See Tanner and Wong 1987, Gelfand and Smith 1990, Gelfand et al. 1990 for general discussion of this method) to account for heterogeneity in preference and response in modeling the brand choice decision. Besides not accounting for response heterogeneity, a second, more subtle, character-istic of previous research, is that the reference price measure is based on those trips on which consumers actually made a purchase in the category. Such a measure, therefore, directly includes the impact of pricing and promotion activity on the purchase timing decision — an effect that has been extensively studied in the marketing literature (Gupta 1988, Jain and Vilcassim 1991, Bucklin, Gupta and Siddarth 1998). In this work, we show that if the impact of promotional activity on a consumer's category purchase deci-sion depends on the underlying price response heterogeneity in the brand choice decision, then the resulting reference price measure can cause biased estimates of the sticker shock Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 10 effect. Specifically, we present an analytical derivation to show that this bias can occur if consumers who are more price-sensitive in the brand choice decision are also more likely to respond to promotional activity in the category in their purchase timing decision — a relationship for which there is some previous empirical evidence (Bucklin and Gupta 1992, Bucklin, Gupta and Siddarth 1998). While unaccounted for price response heterogeneity alone may cause biased sticker shock estimates, results from a simulation study show that this requires very large vari-ations in price responsiveness—unlikely to be observed in practice. In contrast, results from the simulation also demonstrate that, even moderate differences in brand choice price responsiveness, combined with systematic differences in purchase timing, are sufficient to produce biased estimates of the sticker shock effect. This research seeks to provide the following contributions: • Identify some general conditions under which unaccounted for price response het-erogeneity can result in a spurious sticker shock effect. • Describe the role that heterogeneity in purchase timing behavior plays in this pro-cess. • Demonstrate, by way of an analytical derivation, a simulation study, and an analysis of scanner panel data, that properly accounting for price response heterogeneity can provide better measures of the sticker shock effect. • Quantify the attenuation of the sticker shock effect as a result of heterogeneity correction. This chapter is organized as follows. In Section 2.2 we present an analysis of a simple two segment setting to illustrate how the combination of unaccounted for price response heterogeneity and the impact of category promotion activity on purchase timing behavior Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 11 can result in a spurious sticker shock effect. Section 2.3 presents a simulation study that confirms the predictions from our theoretical analysis. In Section 2.4 we present an empirical application to scanner panel data from two product categories. Results from our empirical analysis are consistent with our predictions. We find that the sticker shock coefficient is statistically significant in a standard model, which does not account for price response heterogeneity. On the other hand, the 95% probability interval 2 of the posterior distribution of the sticker shock coefficient in a Hierarchical Bayes model contains the value of zero. Therefore, at least for the datasets used in this study, there is no evidence for the sticker shock effect. A posterior analysis of household parameters shows that, consistent with our explanation of the underlying cause of the spurious effects, households that are price sensitive in the choice decision are also more sensitive to category promotion activity in their decision to purchase in the category. Finally, we quantify the attenuation of the sticker shock effects via an elasticity analysis on estimated models. We close with our conclusions and directions for future research. 2.2 Theory In this section, we discuss the conditions that can produce a spurious reference price effect. For ease of exposition, we consider the situation in which the market consists of two homogenous groups of consumers, differing in price sensitivity, but unresponsive to reference prices. We first specify the true model and follow with a description and analysis of the misspecified model. 2 A Bayesian probability interval for an unknown quantity of interest can be directly regarded as having high probability of containing the unknown quantity, in contrast to a frequentist's confidence interval, which is strictly interpreted only in the context of repeated sampling (Gelman et al. 1995). Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 12 2.2.1 Description of the True Model We begin by specifying the deterministic component of household h's utility for brand i on occasion t as, tf£ = a i + / 3 c P R I C E « , (2.1) where, OJJ is a brand-specific constant, PRICEjt is the price of brand i on occasion t, and Pc is the price response coefficient. To illustrate more clearly the logic of the underlying process, we omit other terms from the utility specification. The Multinomial Logit (MNL) model specifies the probability that household h chooses brand i on occasion t as : Including price response heterogeneity Consider a situation in which consumers belong to one of two segments with either high, or low, price sensitivities, respectively (A straightforward extension to a situation for a more general distribution is provided in Appendix A) . The utility function can then be modified as follows: U% = at + / ? w P R I C E f t + pdSEGh • P R I C E i t , (2.3) where S E G h is a dummy variable that takes the value 0 if household h is in the high price-sensitive segment and 1 otherwise. The price parameter for the high price-sensitive segment is Pu{< 0) and that for the low price-sensitive segment is Phi + Pd{< 0) , where fid > 0- Equation (2.3) represents the true model that underlies the choice process and, in this formulation reference prices play no role in the household's choice decision. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 13 2.2.2 Description of the Misspecified Model We now take the position of an analyst who, incorrectly, ignores differences in price sensitivity across households, and omits the dummy variable, SEG ' 1 from the utility function. Price responsiveness is captured by a single price parameter, /?c, common to both segments. Also, an additional misspecification error is introduced by including in each brand's utility function a sticker shock term — the difference between its reference price and its current actual price. The resulting utility function, which is similar to that used in previous research, is r/* = a i + / ? c PRICE i t + / M R P R I C E & " PRICE*) , (2.4) which can be rewritten as, U* = a i + (pc - / ^ P R I C E * + /5„RPRICE*, (2.5) where R P R I C E ^ is a brand-specific reference price for household h on occasion t. We expect pc to lie in the range of the true price response parameters of each segment, i.e., Pu < Pc < Phi + Pd- A n estimate of Prp > 0 implies that consumers have reference prices and experience "sticker shock" in the brand choice decision. Measuring reference price Since reference price is an internal, latent, construct it has to be imputed from the prices paid (or observed) by the consumer on previous shopping occasions. The standard ap-proach has been to operationalize reference price as a function of brand prices "observed" by the consumer on those trips on which a category purchase actually occurred. Kalya-naram and Little (1994) use an exponentially smoothed measure of past shelf prices encountered by a consumer and estimate the smoothing parameter to be 0.82, implying that the most recent price has the maximum influence on reference price. Briesch et al. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 14 (1997) use a similar measure and report price history of about six periods in forming reference prices. Kalwani et al. (1990) summarize the impact of the previous five prices and find the most recent price to be weighted most heavily. Mayhew and Winer's (1992) exponential smoothing of past prices resulted in an optimal smoothing constant of one and led them to set reference price equal to that observed on the previous purchase occasion, i.e., R P R I C E ^ = PRICEi( t - i ) - A similar approach is also used by Bell and Bucklin (1998). Consistent with these findings, and earlier operationalizations of the reference price construct, we set the reference price of a brand equal to its price on the previous purchase occasion. While this definition has the added advantage of simplifying our exposition, we should point out that the underlying logic that we present carries over to other, more complex, measures of reference price. In the empirical analysis, we also re-estimate the model using a smoothed reference price measure and find the results to be unchanged. 2.2.3 Analysis of the Misspecified Model In Appendix A , we present a formal analysis of the misspecified model and derive, in an OLS context, the conditions that can cause biased estimates of the sticker shock effect. Ideally, we could examine the bias issue in the setting of the logit choice model. How-ever, the maximum likelihood estimates of model parameters are complicated nonlinear functions of data and do not provide much insight (See Yatchew and Griliches 1984 for a discussion of bias issues in the setting of the probit and logit models). Therefore, we choose to analyze bias in the OLS context. As we show in a later section, analytical results from a linear model are consistent with simulation results from the logit model. The misspecified model involves the problem of omitted variables. Omission of a relevant variable from the model will bias the sticker shock term if What would cause the omitted variable to be also directly correlated to the reference price variable? Our Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 15 results show two general conditions that must be satisfied. First, consumers must differ in their price sensitivities, i.e., display heterogeneity in price response. Second, the omitted variable must be related significantly to both the utility (the left-hand side variable) and the sticker shock term. In the current instance, the omitted segment membership dummy variable is directly related to the brand utility derived by a household — since the utility depends on the price sensitivity. In this work we focus on the reference price measure, and its relationship to the purchase timing decision, as a possible cause of the bias. We now discuss the intuition underlying the results of our formal analysis. Role of P r i ce Heterogeneity From Equation (2.5) it is clear that, for the segment of price sensitive households, any positive value of {3^, increases its effective response to price by making (/?c — Prp) more negative and, therefore, closer in magnitude to the true parameter value, Phi- In other words, a positive estimate of the sticker shock parameter compensates for ignoring het-erogeneity in price response. Moreover, this compensating mechanism also permits /?c to be estimated closer to the true parameter value, Phi 4- Pd, of the segment of less sensitive households. In summary, the presence of prp makes the model more flexible in represent-ing the underlying price heterogeneity — Pc captures the response of the price-insensitive segment while Pc and P^,, together, capture the response of the price-sensitive segment. Role of Purchase T i m i n g A significant body of previous research has investigated the impact of marketing activity on the purchase timing decision (Gupta 1988, Jain and Vilcassim 1991, Helsen and Schmittlein 1993). Bucklin and Gupta (1992) and Bucklin, Gupta and Siddarth (1998) provide some empirical evidence to show that price-sensitive segments have a greater Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 16 tendency to change purchase timing to coincide with promotional activity than price-insensitive segments. This empirical finding is of considerable significance to the current analysis. Such a pattern of behavior implies that the price-sensitive segment is more likely to purchase in the category when it is more "attractive", i.e., when one or more brands is on promotion. The price-insensitive segment, on the other hand, is less likely to change its purchase timing due to promotional activity, resulting in a smaller proportion of its purchases coinciding with promotional activity. Therefore, the observed across-brand shelf prices will tend to be lower for the price-sensitive segment and higher for the insensitive segment. Since reference prices are set equal to observed prices on the previous purchase occasion, they will also display the same regularity. Equation (2.5) shows that the estimated coefficient for firp depends upon the measure of reference price used in the choice model. Also, as shown in Equation (A.8) (Appendix A) , the bias in the reference price coefficient depends upon the relationship between the reference price measure and the underlying segment membership. Since the aggregate pattern of the reference prices across consumers, brands, and purchase occasions, varies systematically with the underlying segment membership (leading to a positive and significant #3 in Equation (A. 9)), and the associated price heterogeneity, it may result in biased estimates of the sticker shock effect. It should be noted that if price response heterogeneity is present only in the brand choice decision (not in the purchase incidence decision), only one reference price 3 (not the aggregate pattern of reference prices) would vary systematically with the segment membership, resulting in a smaller #3 and hence less serious bias. Therefore, a tendency to change purchase timing plays a crucial role in biasing estimates 3Unobserved price sensitivity is revealed by the price actually paid for the one chosen brand which will be used as the reference price for the brand on the next purchase occasion. Other observed prices will be also used for reference prices, but will provide no information on the price sensitivity in the brand choice decision. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 17 of the sticker shock effect. Ex t reme Pr ice Heterogeneity If the two segments display extreme heterogeneity, i.e., a large in price responsive-ness, it could result in a distinct pattern of prices in which the lower-priced brands are exclusively purchased by the price-sensitive segment and the higher-priced brands by the price-insensitive segment. This alternative mechanism could cause the reference price variable and the omitted variable to be correlated, even without differences in purchase timing, and result in biased estimates of the sticker shock parameter. In our simula-tion, we study the extent of heterogeneity required to produce the bias and find that the implied range of price elasticities is much larger than has been observed in practice (Tellis 1988a). In contrast, simulation results show that even moderate differences in price responsiveness, when combined with heterogeneity in purchase timing, can result in a biased sticker-shock parameter. In summary, our analysis suggests that the combination of price response heterogene-ity and the relationship between purchase timing and price sensitivity together make it more likely that the coefficient for the sticker shock term in the misspecified model is significant. Since reference prices are imputed from those occasions on which consumers actually purchase in the category, the purchase timing decision influences imputed ref-erence prices which, in turn, biases estimates of the sticker shock effect. Also, while unaccounted for response heterogeneity in brand choice is a necessary condition for the existence of a spurious reference effect, that we can infer from Heckman (1981), it is not a sufficient one. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 18 2.3 Simulation Study Our theoretical analysis allows us to derive several predictions about the bias in the misspecified model. To confirm these predictions, we carried out a simulation study in which choices of hypothetical panelists are generated according to the true model, represented by Equation (2.3), after which the misspecified model is estimated on the same data. 2.3.1 Data Generation We simulated category purchases, via a nested logit, and brand choice decisions, via a multinomial logit, for 400 hypothetical households of which 240 (60%) were assumed to be price-sensitive in the choice decision while the rest were price-insensitive. Each household was assumed to encounter each of the 60 different store environments, (i.e., make 60 shopping trips). We generated 60 prices for each of two hypothetical brands of which the first brand (A) was assumed to be a national brand and the second (B) a non-promoted lower-priced brand. Regular prices for A were assumed to be uniformly distributed between $2.90 and $3.10, with a mean of $3, while the range for B was between $2.40 and $2.60, with a mean of $2.50. The combinations of prices for the two brands, therefore, represented 60 possible store environments that could potentially be encountered by the household on a shopping trip. The promoted price of Brand A was in the range of $2.30 to $2.50 (about 80% of its mean regular price) while Brand B was assumed never to be on promotion. The promotional frequency for Brand A was assumed to be about once in every three weeks, i.e., it was on promotion 20 out of 60 occasions. These levels are quite similar to those found in the yogurt and ketchup categories. The following steps were used to generate the data. Each household was assumed to Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 19 encounter each of the 60, different, store environments, (i.e., make 60 shopping trips). For each store environment, we calculated the utility of each brand based on the following choice utility functions: Segment Utility of Brand A Utility of Brand B Price Sensitive 3 - (3hPA 2 - /3hPB Price Insensitive 3 - (3lPA 2 - PlPB In addition, the decision to purchase in the category was modeled using a nested logit in which the incidence utility was specified as a function of a constant and a category value term (CV) as follows: where is the response parameter for C V . C V for each trip is calculated as \n[exp(UA) + exp(UB)], and s represents the segment to which the consumer belongs. The parameter a3 was chosen such that the overall incidence probability for each household resulted in approximately 10 category purchases. The probability of category purchase is given by: , , exp(V s) P(inc) = . . . V ' 1 + exp(V s) A draw from the binomial vector P(inc) was used to simulate an incidence decision. If the draw resulted in a predicted category purchase, then the brand choice decision was simulated using the multinomial logit choice probability. If the draw resulted in a non-purchase, we repeated the above steps for the next store environment. Thus, the resulting data set contained about 4000(10 x 400) "choice" observations. Price parameters in choice decision In a meta-analysis of 367 price elasticities, Tellis (1988a) reports elasticities in the range from +2 to -10 with a vast majority falling in the range -1 to -9. The choice Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 20 parameters used in the simulation reflect this range of elasticity differences. Thus, for the base case we choose the price parameters 0h = —5 and ft1 — — 1 which implies that price elasticities range from —0.96 to -8.94. In addition, we also analyzed several other parameter combinations outside this base range. CV parameters in incidence decision For each pair of choice parameters (/3h, ft1), we generated data for two conditions. In the first, we explicitly allow the decision to purchase in the category to be linked to price sensitivity of the household, i.e., price-sensitive households are more likely to change their purchase timing decision based on price and promotion activity in the category. Consistent with results of earlier empirical analysis (Bucklin and Gupta 1992, Bucklin, Gupta and Siddarth 1998), the C V parameters were set at 0.7 and 0.2, for the high, and low price-sensitive segments, respectively. In the second condition, the category value response parameters were equal across segments ( / J^ = f5linc — 0.2) reflecting price response heterogeneity without accompanying differences in the purchase timing responsiveness. 2.3.2 Predictions and Estimated Models Our theoretical analysis allows us to make four predictions about parameters of models estimated on the simulated data. First, in the presence of unaccounted for heterogeneity, the parameter will tend to be positive and significant. Second, when compared to the estimate obtained from a model which omits the sticker shock term (a "plain" M N L model), the common price parameter, /3C, which tends to compensate for the price-insensitive segment, will have a smaller absolute value. Third, a latent class model, which accounts for price response heterogeneity but (incorrectly) includes a sticker shock term, will not result in a spuriously significant sticker shock coefficient. Fourth, the misspecified model which does not account for price response heterogeneity will show no Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 21 bias in the sticker shock parameter if the underlying heterogeneity in purchase timing is the same across the segments. To test these predictions, we estimate three models on the simulated data, the "plain" M N L model, the corresponding sticker shock model (MNL-ST) and a multi-segment latent class model, which (mistakenly) includes a sticker shock term (SEG-ST). 2.3.3 Simulation Results Base case results CV parameters different across segments Maximum likelihood estimation results for the base case ((3h = — 5 and /?z = —1) are displayed in Table 2.1. The price coefficient (—2.89) for the M N L model lies near the midpoint of the true values (—1 and —5). The coefficient for the sticker shock term in the M N L - S T model is positive and significant even though the true model does not include a reference price term in the utility function. Additionally, in line with our predictions, the price coefficient for the M N L - S T model has a smaller absolute value than its counterpart in the M N L model (2.61 < 2.89). These results confirm that introducing the sticker shock term increases model flexibility and permits the common coefficient, /3C, to be estimated closer to the true price coefficient of the insensitive segment. Finally, the results from the latent class model, SEG-ST, show that accounting for price response heterogeneity via separate price coefficients for each segment causes the spurious sticker shock effect to disappear.4 CV parameters equal across segments Model results for the M N L - S T fitted to the base case data are presented in Table 4 W e also generated data for the case in which the true model included sticker shock terms with known parameters (1.0 for the high sensitive segment and 0.5 for the low sensitive segment, respectively) in both segments and found that the latent class model was successful at recovering the true sticker shock parameters. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 22 2.2. In contrast to the results from the previous simulation, we find that the coefficient for the sticker shock term is insignificant. These results show that unaccounted for heterogeneity in price response is a necessary, but not sufficient, condition to obtain a spuriously significant sticker shock effect. Robustness check We estimated the M N L - S T model on fifty additional data sets generated for each of several different combinations of the C V parameters, (3^ and (3linc-5 Results, presented in Table 2.3, show that the mean bias increases as the differences in the C V response parameters across the two segments increases, and is statistically insignificant when the value of the difference term is less than 0.4. These results confirm that differences in purchase timing responsiveness play an important role in determining the extent of bias. P r i ce Parameters Different F rom the Base Case CV parameters different across segments Table 2.4 reports the mean value of the sticker shock parameter, /?rp, and the associ-ated t-statistic obtained from estimating the M N L - S T model on fifty different data sets generated for each of seven parameter combinations. The results indicate that term fi^, is insignificant when differences in heterogeneity are small but becomes significant as price heterogeneity increases. CV parameters equal across segments Results for this case are reported in Table 2.5. Only very extreme variations in the price parameters result in a spurious sticker shock effect. The implied range of elasticities is much higher than that reported in previous research (Tellis 1988). This contrasts with 5If is the estimate obtained from a single data set t, and /3rp represents the mean across the 50 simulations then we can write a standard error as S.E.((3rp) = Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 23 our earlier findings in which even moderate differences in price responsiveness, when com-bined with purchase timing heterogeneity, resulted in a spurious sticker-shock coefficient. 2.4 Empirical Application To test our theory further, we conduct an empirical test using scanner panel data from the yogurt and ketchup categories. We develop and estimate a Hierarchical Bayes model of incidence and sticker shock choice (labeled as HB-ST) that accounts for heterogeneity in all parameters across consumers. Since the Hierarchical Bayes model eliminates the source of bias, we expect the magnitude of the coefficient for the sticker shock term to be attenuated and to represent the true impact of reference price on brand choice. For comparison purposes, we estimate a standard nested logit model of incidence and choice (Bucklin and Gupta 1992), labeled NL, as well as a nested logit model that includes a sticker shock term in the choice model, labeled as NL-ST, that does not account for price response heterogeneity. We expect the sticker shock coefficient from the NL-ST model to be positive and statistically significant. 2.4.1 Data The data for this study are drawn from A . C . Nielsen scanner panel records in the yogurt and ketchup categories, for households in Sioux Falls, South Dakota, for the period 1986-1988. The last 51 weeks of the data were used for model calibration and the preceding 61 weeks were used for initializing model variables. Households qualified for inclusion in the sample if they made at least one grocery purchase every four weeks over the entire study period and made at least one product purchase both in initialization and calibration periods. Out of the qualified households, 400 were randomly selected from each product category for model calibration (42954 shopping trips and 3852 purchases in the yogurt Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 24 category, 44631 shopping trips and 2476 purchases in the ketchup category). The top seven selling brands are included in the study. Together, these accounted for 84 (85) percent of category sales in dollars, and 89 (87) percent in units for the yogurt and ketchup categories, respectively. These brands, their market shares by units, and average prices per oz are given in Table 2.6 and Table 2.7. 2.4.2 Models Estimated Nested Logit (NL) Model This model assumes that, conditional on purchase incidence, consumers choose a brand without relying on internal reference price and that they share the same response pa-rameters. The N L model specifies the probability of brand choice for household h on occasion t as : Pth(i\inc) = "MU/jL (2-6) Utf denotes the deterministic component of utility for each alternative i and is specified as a function of both household-specific and marketing mix variables as follows: = Pio + ftBLOY? + + & P R I C E * + / ? 4 P R O M O i t , (2.7) where, B L O Y ^ = within-household market share of brand i in the initialization period, LBP£ = 1 if i was the brand purchased on the previous occasion; 0 otherwise, PRICEit = actual shelf price of brand i at time t, and PROMOjt = 1 if brand i is on promotion at time t; 0 otherwise. The above formulation accommodates cross-sectional heterogeneity in brand preference through the static brand loyalty measure and time-varying heterogeneity through the last purchase variable, LBP#. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 25 A consumer's category purchase decision is modeled with a binary nested logit. Specif-ically, the probability of purchase incidence for household h on a store visit at time t is given by (e.g., Bucklin and Gupta 1992): p * ( m c ) = l + e x p w r ( 2- 8 ) Vth denotes the deterministic component of utility for purchase incidence and is specified as follows: Vth = 70 + 7iCRh + 72INV? + 7 3 C V * (2.9) where, CR/ 1 = average weekly consumption rate for household h, INV£ = inventory estimate for household h at time t, CVf = category value at time t for household h. Consumption rate is computed as the total amount of product purchased by a household h in the initialization period divided by the number of weeks in the period. It remains constant through the calibration period. The inventory variable, INV^, is designed to capture time-varying heterogeneity in incidence probabilities. To construct the inventory variable, we assume that households draw down their supply at their rates of consumption, CR f c . We initialize our inventory measure at zero at the start of the initialization period. We also mean- center INV? by subtracting each household's average level of inventory during the calibration period. This makes the measure purely longitudinal so that INV£ becomes a measure of relative inventory within a household (Bucklin and Gupta 1992). In the nested logit incidence model, the attractiveness of the product category (due to the price and promotion of the various brands) is represented by category value, C V ^ (the log of the denominator of the brand choice model). The coefficient for category value Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 26 should be positive (reflecting the positive influence of, say, a price promotion on the incidence decision) and lie between zero and one so that it is consistent with stochastic utility maximization theory (Ben-Akiva and Lerman 1985). Nested Logit Sticker Shock (NL-ST) Model The NL-ST model calls for augmenting the N L brand choice utility function with the sticker shock term that reflects the difference between a brand's reference price and its shelf price. Following Winer(1986) and Lattin and Bucklin(1989), the sticker shock utility is given by : Uu = foo + ftBLOY? + /? 2LBP£ + / ? 3 PRICE i t + / ^ ( R P R I C E * - P R I C E i t ) + / ? 5 P R O M O i t , (2.10) where RPRICE& is a brand-specific reference price. The reference price of a brand is assumed to be its price on the last occasion on which the household made a purchase in the category. The brand choice and purchase incidence probabilities are the same as in Equation (2.6) and Equation (2.8), respectively. Hierarchical Bayes Sticker Shock (HB-ST) Model The heterogeneity in choice and incidence parameters is modeled via the HB-ST model. The model has three stages. In the first stage, the brand choice and purchase incidence models are written with household-specific parameters. In the second stage, the prior (population) distribution over the household-specific parameter vectors, 0h = ({3h, fh), is specified as a multivariate normal, i.e., 0h ~ MVN( fx, £). The covariance matrix of household parameter vectors, 27, captures the unobserved source of heterogeneity across households. Since the fully Bayesian approach requires priors over unknown quantities in the model, the third stage of the model specifies hyper-priors over the parameters of Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 27 the population distribution, fj, and 27. We assume a multivariate normal hyper-prior, MVN(r), C) over fj, and a Wishart prior, W[(pR p] over 27 following the standard approach suggested in the statistics literature (Gelfand and Smith 1990, Gelfand et al. 1990). Details on each prior are provided in Appendix B. This hierarchical set up involves evaluating high-order multivariate normal integrals required in formulating the posterior density, making it practically impossible to use conventional maximum likelihood estimation techniques. We circumvent this problem by using sampling-based Bayesian approaches that build upon recent advances in Markov Chain Monte Carlo (MCMC) methods (Gelfand and Smith 1990, Gelfand et al. 1990). Using the sampling-based approach permits one to account properly for uncertainty in inferences and to make exact (as opposed to asymptotic) inference possible with any desired precision (since the number of draws is under an analyst's control). 2.4.3 Estimation NL & NL-ST Models The N L and NL-ST models have the following log-likelihood function: where 8%. = 1 if household h bought brand i on a store visit at time t, and zero otherwise, across households. Maximization of Equation (2.11) simultaneously yields the choice and incidence parameters. (2.11) and 6% Y2i 0 is the choice and incidence parameter vector which is common HB-ST Model Sampling from the joint posterior distribution is achieved by sampling from the full con-ditional distributions, known as substitution sampling (Details on the full conditional Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 28 distributions for model parameters are provided in Appendix B). If all the full condi-tional distributions are known (i.e., known conjugate distributions such as a normal and a Wishart), then substitution sampling reduces to a procedure known as Gibbs Sampling (Geman and Geman 1984, Gelfand and Smith 1990). If the full conditional distribu-tions are not completely known (i.e., known only up to a normalizing constant), then a Metropolis-Hastings step is used (Tierney 1994, Chib and Greenberg 1995). In the context of the HB-ST model developed in the previous section, the (m -f- l ) th step of the substitution sampling involves generating the following draws: (a) Generate Qh draws from p(0fc("1+1>|2;(m>, p^) for ft = 1 to H, using a Metropolis-Hastings algorithm. (b) Generate a p. draw from p(/z(m+1)| {0^m)}, S^m\ 77, C), using a Gibbs sampler. (c) Generate a I7"1 draw from p ( 2 ; - l ( m + l ) | {6h(-m)}, ^ m ) , p, R), using a Gibbs sampler. This sequence of draws generates a Markov chain whose stationary distribution is the joint posterior distribution of all unknown parameters. The initial draws (1st to mth) from the chain reflect a transient period in which the chain has not converged to the equilibrium distribution and is therefore discarded. A sample of draws obtained after convergence is used to make posterior inferences about model parameters. It should be noted that this sampling-based procedure generates the entire posterior distribution rather than just a point estimate. Given that we expect substantial uncertainty in infer-ring household parameters with only a small number of observations, it is critical that we should also measure properly the uncertainty of estimates. In our application, the substitution sampler was run for 15000 iterations and conver-gence was ensured by monitoring the time-series of the draws. We chose a burn-in length of 10000 iterations (after which the time-series plot stabilized.) and retained every fifth Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 29 iteration of the remaining 5000 iterations to reduce the serial correlation of the sampled draws, i.e., resort to "thinning the chain" (Geyer 1992, Raftery and Lewis 1995). There-fore, 1000 draws from the posterior distribution of each parameter were used to make inferences. 2.5 Results and Discussion In this section, we compare the results of the standard (NL and NL-ST) and HB-ST models. Traditional measures of goodness-of-fit such as the Bayesian Information Crite-ria (BIC) and Akaike Information Criteria (AIC), which penalize the likelihood for the number of estimated parameters, are not suitable for assessing and comparing Hierar-chical Bayes models against non-Bayesian models because these measures are motivated by asymptotic arguments and are, therefore, inappropriate when the number of model parameters varies with the sample size (Manchanda et al. 1997, Carlin and Louis 1996), as in the HB-ST model. Therefore, we use log-likelihood and in-sample hit-rates6 to compare the models. 2.5.1 NL and NL-ST models Fit statistics and parameter estimates are presented in Table 2.8. The NL-ST model provides a better fit than the NL model in both categories. The sticker shock coefficient is positive and significant in both categories. Also, as per our prediction, in both categories, the common price parameter, j3c, in the NL-ST model is larger than the corresponding value in the N L model. 6Each household is assumed to pick the brand with the highest predicted choice probability. The percentage of correct choice predictions gives the hit-rate (Kalwani, Meyer and Morrison, 1994). Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 30 2.5.2 HB-ST Model Estimates of the population parameters (/x), and fit statistics, are shown in Table 2.9. In both categories, the HB-ST model outperforms the baseline models in terms of both log-likelihood and hit-rates. As is usual for Bayesian inference, we summarize the posterior distribution of the parameters by the posterior mean and 95% probability interval for each parameter. In both categories, this interval for the sticker shock coefficient contains the value zero. Thus, properly accounting for price response heterogeneity mutes the impact of reference price and, in our application, the sticker shock effect disappears completely. For these data sets therefore, the results do not support the existence of a sticker shock effect. In addition, the magnitude of another state-dependent variable, L B P , is substantially smaller in both categories supporting the notion that state dependence is attenuated by accounting for heterogeneity.7 2.5.3 Posterior Analysis As discussed previously, a spurious reference price effect requires households that are price-sensitive in the choice decision also to be more responsive to price and promotion in the category purchase decision. To provide support to this argument, household C V parameters were regressed on the price parameters.8 The regression results, presented in Table 2.10, show that, in both categories, greater responsiveness to category value is associated with higher price sensitivity in the choice decision, providing strong empirical 7We re-estimated the NL-ST model using an exponentially smoothed reference price measure, RPj t = RP i ( t _ 1 )(l — or) + Pi(t-i)Q> similar to that used by Briesch et. al. (1997). This measure provided a very slight improvement (about one likelihood point) in fit over the current measure, RPjt = -F\(t-i) and a was estimated to be 0.72 (0.56) for the yogurt (ketchup) category. We then re-estimated the HB-ST model using this definition of reference prices, and found our results to be unchanged, i.e., the 95% probability interval for the sticker shock parameter still contained the value zero. 8 The substitution sampling yields a sample of 1000 draws for all model parameters. We then obtain point estimates of household parameters (8h) by averaging over this distribution . Specifically, we regress 73 o n 03• Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 31 support for our theory. 2.5.4 Attenuation of Reference Price Effects In both categories, we find that the sticker shock effect from the standard (homogeneous) model is overestimated. We are interested in quantifying the extent of attenuation due to heterogeneity correction. One straightforward way to do this is by comparing the elasticity of reference price of the standard model to that of the HB-ST model. For the HB-ST model, we base our inference about elasticities on a sample of 1000 draws from the posterior distribution of population parameters (/z). For each draw, we calculate the reference price elasticities across purchase occasions and average them across brands. Then we repeat the procedure for 1000 draws and average the brand-level elasticities across draws. We present the elasticity comparisons between the models for both categories in Table 2.11 and Table 2.12. Results show that attenuation of the effect size is quite significant with average percent decreases being 63% and 61% for the yogurt and ketchup categories, respectively (It should be noted that in fact, the value zero is included in 95% probability intervals of all elasticity estimates of the HB-ST model, indicating an insignificant sticker shock effect). In contrast, price elasticities have increased in the absolute magnitude when heterogeneity is accounted for, as presented in Table 2.13 and Table 2.14. Therefore, accounting for heterogeneity appears to sharpen the effect of price while attenuating the sticker shock effect. 2.6 Conclusion The reference price construct, and its role in the brand choice decision, has been exten-sively investigated by researchers in marketing using secondary data from scanner panels. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 32 Consumer heterogeneity also has a rich modeling tradition in the marketing literature and several papers have highlighted the importance of accounting for it in order to obtain unbiased parameter estimates from choice models. However, most previously reported models of reference price effects have not adequately accounted for heterogeneity. The current work studies the impact of unaccounted for price response heterogeneity on estimates of the sticker shock effect. The analysis reveals two conditions under which estimates of the sticker shock effect are biased. First, the estimated model must ignore true differences in price sensitivities among consumers. Second, one, or more, of the other explanatory variables used in the analysis should systematically vary with the underlying price responsiveness. This research highlights the role that heterogeneity in purchase timing can play in this process. Based on this analysis, several predictions about the nature of parameter bias are made and confirmed via a simulation study. Simulation results also show that, while the existence of extreme (unaccounted for) price response heterogeneity can, in itself, cause biased estimates of the sticker shock effect, the implied range of price response variation is much larger than observed in practice. A Hierarchical Bayes version of the nested logit model is proposed and estimated on scanner panel data from two product categories. In contrast to a nested logit model with a sticker shock term, which does not account for price response heterogeneity, the 95% confidence interval of the sticker shock parameter in the HB-ST model includes the value zero. For both these data sets, therefore, the results do not support the existence of a brand-specific reference price. Household-level category value and price response parameters obtained from the posterior distribution, show that greater responsiveness to category value is associated with higher price sensitivity in the choice decision. Overall, these results provide strong support for our theory of the determinants of sticker-shock bias. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 33 Our results complement those of Bell and Lattin (1996) who focus on the reference-dependent formulation of reference price. It is interesting to note that their explanation for bias is consistent with our analysis. Thus, in their work, gains and losses mimic the pattern of price heterogeneity of the underlying segments and play the same role as the sticker shock term does in our model. Our empirical work in two product categories (yogurt and ketchup) shows that accounting for price response heterogeneity results in the sticker shock effect being insignificant. Their results also show that the loss aversion is significantly attenuated, though it does not completely disappear. Posterior analysis on reference price elasticities show that attenuation of the effect size after accounting for heterogeneity is quite substantial with average percent decreases being 63% and 61% for the yogurt and ketchup categories, respectively. However, given large uncertainty bounds around elasticity estimates (i.e., 0 is included in 95% probabil-ity intervals of the estimates), real effect sizes are considered to be negligible once price response heterogeneity is accounted for. In these particular categories, our results, there-fore, do not support the existence of a brand-specific reference price as operationalized according to the standard approach in previous research. Recent work in the marketing literature has used reference price effects to derive optimal promotion and pricing policies (Koppalle, Rao and Assuncao 1996, Greenleaf 1995). Since accurate estimates of the size and prevalence of reference price effects are critical to the development of these strategies, our research highlights the importance of accounting for consumer heterogeneity in obtaining such estimates. 2.7 Limi ta t ions and Future Research Future research could take several directions. Laboratory and survey work could be used to uncover the mechanisms that consumers actually use to form reference prices Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 34 in different product categories. Besides providing a useful framework to understand the construct, this research may identify better heuristics that could be used to construct reference price measures in studies using secondary data sources. Another fruitful area for research would be to compare the results obtained by different methods of account-ing for consumer heterogeneity. In fact, when we employed a latent class approach (Kamakura, K i m and Lee 1996), six and three segments, respectively, emerged in the yogurt and ketchup categories. Our results regarding the sticker shock effect was the same in the yogurt category but one of the segments (represents 29% of the sample) in the ketchup category showed a significant sticker shock effect. This implies that resid-ual price response heterogeneity within a segment could potentially result in a spurious sticker shock effect. The objective of this research was to describe the mechanism by which price response heterogeneity could bias estimates of the sticker shock effects of reference price via a more complete treatment of heterogeneity than carried out in the previous studies. Our work points out some of the limitations of previous studies and shows that considerable additional analysis, perhaps over multiple categories, will be needed to describe accurately the formation and role of reference prices in brand choice. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 35 Table 2.1: Estimation Results on Simulated Data Category Value Parameters Different Across segments ^ = 0 . 7 , ^ = 0.2 Variable M N L M N L - S T SEG-ST PRICE1 -2.89 (-24.29) -2.61 (-16.25) -5.18 (-12.33) RPRICE1-P R I C E 0.29 (2.53) 0.04 (0.17) PRICE2 -1.07 (-3.62) RPRICE2-P R I C E 0.17 (0.88) Asymptotic ^-statistics in parentheses. Table 2.2: Parameter Estimates for M N L - S T Category Value Parameters Equal Across Segments PL = 0.2, PL = 0.2 Variable Parameter Estimate t-value P R I C E -2.95 -17.47 R P R I C E - 0.06 0.48 P R I C E Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 36 Table 2.3: Sensi t ivi ty Analys i s on Category Value Parameters Base Case r'xnc 0l-r'tmc Rh -Ql. Mean f3rp t-value for Mean f3rp 0.7 0.2 0.5 0.29 2.53 0.6 0.2 0.4 0.25 2.15 0.5 0.2 0.3 0.20 1.75 0.4 0.2 0.2 0.15 1.30 0.3 0.2 0.1 0.11 0.93 0.2 0.2 0.0 0.06 0.48 Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 37 Table 2.4: Sensitivity Analysis on Price Parameters Category Value Parameters Different Across Segments ^ = 0 . 7 , / ^ = 0.2 Price Parameter for Segment 1 Price Parameter for Segment 2 Implied Elasticity Range Mean t-value for Mean /3rp -1 0 -1.67 to 0 0.039 0.33 -1 -2 -2.94 to -0.96 0.037 0.34 -1 -3 -4.26 to -0.96 0.091 0.82 -1 -4 -6.50 to -0.96 0.194 1.72 -1 -5 -8.94 to -0.96 0.287 2.53 -1 -10 -20.68 to -0.96 0.501 4.14 -1 -20 -40.95 to -0.96 0.552 4.30 Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects Table 2.5: Sensitivity Analysis on Price Parameters Category Value Parameters Equal Across Segments Price Parameter for Segment 1 Price Parameter for Segment 2 Implied Elasticity Range Mean (3rp t-value for Mean /?rp -1 0 -1.67 to 0 0.013 0.11 -1 -2 -2.94 to -0.96 -0.010 -0.09 -1 -3 -4.26 to -0.96 -0.014 -0.12 -1 -4 -6.50 to -0.96 0.004 0.04 -1 -5 -8.94 to -0.96 0.056 0.48 -1 -10 -20.68 to -0.96 0.170 1.40 -1 -20 -40.95 to -0.96 0.280 2.20 er 2. Response Heterogeneity and Spurious Sticker Shock Effects Table 2.6: M a r k e t Share/Average Pr ice for the Yogur t Category Brand Market Share Average Price per Oz. Yoplait 0.19 10.1 cents Weight Watcher 0.12 7.7 cents Dannon 0.08 8.5 cents Nordica 0.29 6.7 cents QC 0.03 5.2 cents W . B . B 0.17 5.4 cents Private Label 0.11 4.8 cents Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects Table 2.7: M a r k e t Share/Average Pr ice for the Ke tchup Category Brand Market Share Average Price per Oz. Hunt's 32 Oz. 0.14 3.45 cents Del Monte 32 Oz. 0.07 3.46 cents Heinz 28 Oz. 0.23 4.52 cents Heinz 32 Oz. 0.40 3.51 cents Heinz 40 Oz. 0.06 4.83 cents Heinz 64 Oz. 0.04 4.60 cents Private Label 0.06 2.76 cents Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects Table 2.8: N L and NL-ST Parameter Estimates and Fit Statistics Variable Yogurt Ketchup NL NL-ST NL NL-ST Brand Choice B L O Y 1.701 1.694 2.540 2.520 (24.898) (24.608) (22.804) (22.654) L B P 1.230 1.264 0.776 0.826 (31.054) (30.913) (14.170) (14.631) P R I C E -0.365 -0.279 -1.266 -1.065 (-12.769) (-8.213) (-17.941) (-12.104) R P R I C E - 0.126 0.299 P R I C E (4.394) (4.101) P R O M O 1.464 1.447 1.960 1.937 (21.706) (21.337) (26.510) (26.349) Purchase Incidence C O N S U M E 0.068 0.068 0.122 0.122 (34.628) (34.963) (23.281) (23.334) I N V E N T O R Y -0.073 -0.074 -1.382 -1.383 (-2.676) (-2.675) (-21.295) (-21.265) C V 0.429 0.418 0.382 0.379 (12.456) (12.245) (11.680) (13.511) Log-likelihood -16924.600 -16914.391 -11949.074 -11940.608 BIC -16999.275 -16994.400 -12024.017 -12020.904 In-sample Hit-Rate (Choice) 54.82% 55.53% 57.23% 57.23% Hit-Rate (Incidence) 91.05% 91.05% 94.42% 94.42% a Brand-specific constants are not presented, b A l l parameters are significant at a =0.05 c Asymptotic i-statistics in parentheses. d Log-likelihood and BIC are evaluated at the maximum likelihood estimates. Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects Table 2.9: H B - S T Parameter Estimates and F i t Statistics 42 Variables fj, (Yogurt) (j, (Ketchup) B r a n d Choice B L O Y 2.476 3.315 (2.214, 2.738) (2.886, 3.743) L B P 0.324 0.16 (0.167, 0.481) (0.025, 0.296) P R I C E -0.482 -1.522 (-0.560, -0.404) (-1.633, -1.410) R P R I C E - 0.052 0.135 P R I C E (-0.024, 0.128) (-0.038, 0.307) P R O M O 1.409 2.163 (1.223, 1.595) (2.022, 2.303) Purchase Incidence C O N S U M E 0.098 0.166 (0.072, 0.124) (0.133, 0.200) I N V E N T O R Y -0.524 -2.421 (-0.643, -0.407) (-2.693, -2.148) C V 0.447 0.360 (0.374, 0.520) (0.292, 0.428) Log-likelihood -12663.221 -9832.465 In-sample Hit-Rate (Choice) 74.36% 74.35% Hit-Rate (Incidence) 91.59% 94.50% a Brand-specific constants are not presented, b 95% probability interval in parentheses c Log-likelihood is evaluated at the posterior estimates of household parameters (6h) Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 43 Table 2.10: Regression of C V Parameters on Pr ice Parameters Variables Parameter Estimates t-value Yogur t Intercept 0.335 8.562 Price Parameters -0.233 -3.358 Ketchup Intercept 0.184 3.492 Price Parameters -0.117 -3.472 Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 44 Table 2.11: Reference Pr ice Elas t ic i ty for the Yogur t Category Brand NL-ST model HB-ST model Percent Decrease Yoplait 1.02 0.41 (-0.27, 1.10) 60% Weight Watcher 0.86 0.35 (-0.23, 0.93) 59% Dannon 0.94 0.37 (-0.24, 0.97) 61% Nordica 0.63 0.22 (-0.14, 0.58) 65% QC 0.56 0.21 (-0.14, 0.55) 62% W . B . B 0.56 0.19 (-0.12, 0.50) 66% Private Label 0.43 0.14 (-0.09, 0.37) 67% a 95% probability interval in parentheses Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 45 Table 2.12: Reference Pr ice Elas t ic i ty for the Ke tchup Category Brand NL-ST model HB-ST model Percent Decrease Hunt's 32 Oz. 0.85 0.35 (-0.19, 0.88) 59% Del Monte 32 Oz. 0.95 0.40 (-0.21, 1.01) 68% Heinz 28 Oz. 0.98 0.38 (-0.21, 0.98) 61% Heinz 32 Oz. 0.68 0.25 (-0.13, 0.64) 63% Heinz 40 Oz. 1.28 0.55 (-0.29, 1.38) 57% Heinz 64 Oz. 1.28 0.54 (-0.29, 1.37) 58% Private Label 0.78 0.32 (-0.17, 0.82) 59% a 95% probability interval in parentheses Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 46 Table 2.13: Pr ice Elas t ic i ty for the Yogur t Category Brand NL-ST model HB-ST model Percent Increase Yoplait -3.33 -4.73 (-5.48, -3.98) 142% Weight Watcher -2.76 -3.99 (-4.60, -3.38) 145% Dannon -3.04 -4.17 (-4.82, -3.52) 137% Nordica -2.05 -2.55 (-2.91, -2.19) 124% QC -1.80 -2.36 (-2.72, -2.00) 131% W . B . B -1.82 -2.18 (-2.50, -1.87) 120% Private Label -1.39 -1.61 (-1.86, -1.36) 116% a 95% probability interval in parentheses Chapter 2. Response Heterogeneity and Spurious Sticker Shock Effects 47 Table 2.14: P r i ce Elas t ic i ty for the Ketchup Category Brand NL-ST model HB-ST model Percent Increase Hunt's 32 Oz. -3.95 -4.76 (-5.34, -4.18) 121% Del Monte 32 Oz. -4.38 -5.36 (-5.95,-4.77) 122% Heinz 28 Oz. -4.54 -5.29 (-5.89, -4.68) 117% Heinz 32 Oz. -3.16 -3.44 (-3.83, -3.05) 109% Heinz 40 Oz. -5.90 -7.34 (-8.20, -6.48) 124% Heinz 64 Oz. -5.83 -7.18 (-8.02, -6.33) 123% Private Label -3.59 -4.35 (-4.83, -3.87) 121% a 95% probability interval in parentheses Chapter 3 Consideration Set Heterogeneity, Advertising, and Price Sensitivity 3.1 Overview Recently, several models have been introduced in the marketing literature to understand better the role of choice or consideration sets in brand choice behavior. These models view brand choice as a two-step process: a consideration set is formed in the first stage and, in the second stage, consumers evaluate the brands in the consideration set. The use of consideration sets by consumers can be theoretically justified in two ways. First, consumers may choose not to evaluate some brands for which they have a low preference or have high evaluation costs (e.g., Hauser and Wernerfelt 1990, Roberts and Lattin 1991). Second, even if evaluation costs are low, consumers may not have the ability to process all the information available to them. Consequently, they may simplify the decision process by only evaluating a subset of alternatives (Meyer and Kahn 1991). The resulting consideration set is a product of a simple non-compensatory, process rather than a more elaborate, compensatory, process (e.g., Lehmann and Pan 1994, Shocker et. al. 1991). In this chapter, we develop a new consideration set model, which can be estimated with scanner panel data, to account better for heterogeneity in consideration sets. The proposed approach directly models the probability of a brand being included in the consideration set. The resulting inclusion probabilities for brands reflect a "fuzzy" con-sideration set, i.e., the composition of the consideration set is fuzzy in the sense that a 48 Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 49 brand belongs to the consideration set only probabilistically. This approach differs from that of most previous research (e.g., Siddarth, Bucklin and Morrison 1995, Andrews and Srinivasan 1995, Chiang, Chib and Narasimhan 1998), in which consideration sets are well-defined and the researcher models the uncertainty about the set from which a brand is chosen. Our approach has a modeling advantage and better represents some findings from the consumer behavior literature (Shocker et al. 1991) as discussed below. The approach also differs from the fuzzy set model of Bronnenberg and Vanhonacker [B&V] (1996) in which variables have a compensatory effect on both brand evaluation and consideration. In contrast, our model permits a non-compensatory process in consid-eration set formation while allowing brand evaluation to be compensatory. We apply the proposed fuzzy set model and find that it outperforms several alternative consideration set models (in two product categories) proposed in the literature. We apply our approach to examine the role of the consideration set in moderating the impact of advertising on price sensitivity. This is a generalization of Mitra and Lynch (1995) — using real purchase data as opposed to experimental data — that overcomes several limitations of their experimental approach. We seek to make the following contributions. • Develop a new fuzzy consideration set model and compare its performance to ex-isting consideration set models. • Apply the proposed model to investigate the role of the consideration set in mod-erating the effect of advertising on price sensitivity. In the remainder of this section, we review existing approaches to incorporating consider-ation sets in choice models and highlight the methodological differences of the proposed approach. We provide a brief review of previous work on the relationship between ad-vertising and price sensitivity, and discuss the manner in which our approach overcomes Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 50 some of the limitations of prior research in this. The remaining sections of this chapter are organized as follows. Section 3.2 develops the new fuzzy consideration set model and provides mathematical specifications of the model components. Section 3.3 extends the model to include the impact of advertising on price sensitivity through its influence on the consideration set. Section 3.4 presents an empirical application of our model to yogurt and ketchup scanner panel data sets. Section 3.5 discusses the empirical results and Section 3.6 presents the managerial implications of the study. We close with our conclusions and directions for future research. 3.1.1 Approaches to Incorporating Consideration Sets in Choice Models Non-Fuzzy Set Approaches One approach to modeling consideration sets is, what we term as, the Universal Set Approach (UNSA) (e.g., Andrews and Srinivasan 1995, Chiang, Chib and Narasimhan 1998), in which all possible consideration sets are enumerated and the associated proba-bility of a consumer using each set is explicitly modeled. As a result, as the number of alternatives (n) increases, the number of consideration sets (2 n — 1) increases exponen-tially as does the difficulty in model estimation. Another approach, which we term as the Restricted Set Approach (RSA), circum-vents this problem by imposing a priori restrictions on the number of subsets used by the consumer (e.g., Siddarth, Bucklin and Morrison 1995, Gaudry and Dagenais 1979, Swait and Ben-Akiva 1987). However, the RSA cannot avoid mis-specifying considera-tion sets. For example, a chosen brand may not be in the consideration set or the actual consideration set may be smaller than the specified consideration set. A feature shared by both approaches is that consideration sets are well-defined, the consideration sets are "crisp" as termed in Elrod et al. (1992) and only the probability associated with the use \ Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 51 of each set is explicitly modeled. Fuzzy Set Approaches A "Fuzzy" set approach directly models the probability of an individual brand being included in the consideration set (Fotheringham 1988, Bronnenberg and Vanhonacker [B&V] 1996). The composition of the consideration set is "fuzzy", as compared to "crisp", in the sense that a brand belongs to the consideration set only probabilistically. This contrasts with the use of the non-fuzzy consideration sets in which the sets are well-defined but there is uncertainty about the set from which a brand is chosen. One distinct advantage of the fuzzy set approach is that it can easily accommodate many possible consideration sets. Consider a n brand market. Let Si indicate whether brand i is, or is not, included in the consideration set. The U N S A permutes the 6i while restricting its value to either 1 or 0 and thus the total number of possible consideration sets, excluding the empty set, is (2n—1). On the other hand, the fuzzy set approach allows the value of 6i to lie between 0 and 1, defining infinitely many consideration sets in a single fuzzy set. This approach requires the use of n brand-specific inclusion probabilities as opposed to the 2 n — 1 set-specific probabilities in the UNSA. In addition, the fuzzy set approach is consistent with the behavioral reality that consumers themselves may be unable to disclose the contents of their consideration sets with certainty (Shocker et al. 1991). B & V (1996) model consideration via a fuzzy set approach. They use a brand-specific binary logit expression (based on a linear additive utility function) to model its inclu-sion probability, which results in a compensatory model of consideration set formation. In contrast, we hypothesize a non-compensatory consideration process and, therefore, use a Multiplicative Competitive Interaction (MCI) model to specify inclusion probabil-ities. The two approaches represent two different conceptualizations of consideration set Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 52 formation process which we compare later on both theoretical and empirical criteria. 3.1.2 Advertising , Price Sensitivity and Consideration Sets There are two, conflicting views on the effect of advertising on price sensitivity. The "Market Power" hypothesis contends that advertising decreases price sensitivity by cre-ating brand loyalty, reducing competition and thus building market power (Comanor and Wilson 1979). The alternative, "Information", hypothesis is that advertising pro-vides information which makes consumers aware of more alternatives and thus become more price-sensitive (Nelson 1974). Numerous empirical studies that have examined this relationship have produced con-tradictory findings. Lambin (1976) regressed the absolute value of price elasticities on several measures of advertising intensity and found a significant inverse relationship be-tween ad expenditure and price elasticity, consistent with the market power hypothesis. In contrast, Wittink (1977) found that territories with higher advertising levels were more price-sensitive, consistent with the information hypothesis. Prasad and Ring (1976) inves-tigated the interaction effects of advertising and price on market shares at two controlled levels of aggregate T V advertising exposure and found a positive relationship between advertising and price sensitivity. Krishnamurthi and Raj (1985) compared sales in a test period with high advertising to a pretest period with normal advertising. Both control and experimental panels were classified into high and low price-sensitive groups. The high price-sensitive group in the experimental panel was found to become less price-sensitive, supporting the market power hypothesis. Kanetkar, Weinberg and Weiss (1992), using scanner panel data, found that increased exposure to advertising was associated with an increase in price sensitivity, supporting the information hypothesis. In contrast, Papatla (1995), also using scanner panel data, showed that advertising reduced price sensitivity. Several studies have attempted to resolve the conflicts in previous empirical findings. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 53 Boulding, Lee and Staelin (1994) speculate that the content of advertising might provide a potential explanation. For example, price-oriented advertising by retailers may increase price elasticity, and image-oriented brand advertising by manufacturers may decrease price elasticity. Gatignon (1984) proposes that competitive reaction to own brand's advertising is an important moderator of its effect on price sensitivity. Therefore, even if the direct effect of brand advertising is to build market power, it may trigger competitive counter-advertising (competitive increase in advertising expenditures) that may make the market become more aware of alternatives and, hence, more price-sensitive. Mitra and Lynch (1995) highlight the role that consideration sets may play in mod-erating the impact of advertising on price elasticity. In their experimental study, they hypothesize the following process at work. First, advertising increases price elasticity by expanding consideration set size ("Information"). Second, advertising decreases price elasticity by increasing relative strength of preference ("Market Power"). Third, in-creased relative strength of preference may also reduce consideration set size, decreasing price elasticity ("Market Power"). The net effect of advertising on price elasticity, there-fore, depends on the combined impact of these processes. Two pretests establish the hypothesized relationship between advertising and consideration set size and that be-tween advertising and relative strengths of preference. The main experiment varies the relative strength of each relationship via different choice environments (memory-based and stimulus-based) and advertising conditions (reminder vs. differentiating advertising), and shows that the direction of the effect of advertising on price elasticity depends on the strength of these relationship. Thus, their framework provides a unifying explanation of advertising effects on price elasticity by explicitly accounting for the moderating role of the consideration set. What follows from their conceptual model is that the net observed effect of advertising may support either the "Market Power" or the "Information" theory, but not both, if the separate effects on consideration set formation are ignored. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 54 In general, previous research in this area is limited in several ways. First, as is often the case in econometric studies, most of the studies have been performed at various level of aggregation without apparent control for possible aggregation bias (Gatignon 1984, Kanetkar et al. 1992, and Papatla 1995 are exceptions.). Second, advertising expenditures were most often used as proxy measure for advertising exposure (Kanetkar et al. 1992 and Papatla 1995 are exceptions.). More importantly, these studies did not account for the consideration set that is hypothesized to moderates the effect of advertising (Mitra and Lynch 1995). While the results reported in Mitra and Lynch (1995) supported their hypotheses, their experimental study was limited in two important ways. First, to establish the direction of causality, they compared the case in which all brands advertised to the case in which none of the brands engaged in advertising. Specifically, all subjects in the experimental group are assumed to be exposed to the same level of advertising. Thus, household heterogeneity in advertising exposure levels is not controlled for in the analysis. Consequently, as Popkowski-Leszczyc and Rao (1989) suggest, household differences in advertising susceptibility may lead to biased estimates. (This may also be true of other previous studies that have not accounted for heterogeneity in consumer responses to advertising). Second, to control for prior brand preference (to minimize the effects of prior preference for the brand and to maximize the power to detect the effect of advertising), unfamiliar brands were used in the experiment. Therefore, increased advertising for all brands could only increase the consideration set size. Our current study seeks to generalize their findings using a more natural competitive, field, setting that overcomes the above limitations of their experiment. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 55 3.2 Proposed Fuzzy Set M o d e l The consideration set evolves from the awareness set and "can be viewed as consist-ing of alternatives salient or accessible on a particular occasion" (Shocker et al. 1991, p.183). Some brands may be more accessible from memory and therefore, included in the consumer's consideration set, e.g., brands that the consumer has purchased in the past or brands for which the consumer has recently been exposed to brand advertising. This initial consideration set may change as a result of in-store stimuli such as displays or point-of-purchase (POP) materials. In summary, the consideration set is a dynamic construct that varies across consumers, within consumers across purchase occasions, and even within a given purchase occasion (Nedungadi 1990, Shocker et. al. 1991). This aspect guides the choice of variables that influence consideration. Another important aspect of consideration set formation reflected in our modeling approach is its non-compensatory nature (Lehmann and Pan 1994, Shocker et. al. 1991). Consumers are assumed to screen out brands in a non-compensatory manner to form the consideration set, and then to evaluate only a subset of brands in a compensatory manner. We implement a fuzzy version of the non-compensatory rule in which every brand has a non-zero probability of being included in the consideration set. 3.2.1 M o d e l i n g Approach We assume that overall brand utility is affected by variables that influence brand evalua-tion and the consideration set. Let X^t and be vectors of variables that affect brand evaluation and consideration of household h for brand i on purchase occasion t. The intersection of the two vectors may be non-empty. For example, brand advertising may influence the consideration set, by making the brand more accessible in memory, and also brand evaluation, by distinguishing it from other brands in the consideration set. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 56 We begin by specifying the deterministic component of the overall utility, OU#, as follows: OU& = 52(3lfx(xhut) + T,1™fy(yLt), (3-1) I m where fa and jm are parameters for brand evaluation and consideration variables, re-spectively, and, fx and fy denote functional relationships between each set of variables and the overall utility. This is a general formulation and, in order to develop the model further, we must choose appropriate functional forms for fx and fy. Impact of X^t For the moment, let us assume that household h considers all available brands and starts evaluating brands at the point of purchase. Further, ignoring Y~£ and using an identity transformation function for fx, we can specify the utility function as:1 i where denotes the deterministic component of the conventional utility function. This represents the typical multinomial logit (MNL) brand choice model in which it is assumed that all brands are considered and evaluated in a compensatory manner. Since exact utilities are unknown, we can also add a random error term, Then the probability of brand i being chosen is given by: Pt(i) = p[U?t + eit>Ui;t + ekt(k£K,k^i)} = p[elet<eit + U£-U£t(k€K,k?i)]. (3-3) 1 W e may either omit Y£ from the model or include it in X±t, assuming that all variables share the same functional relationship, fx, to the utility. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 57 Without making any assumption about the distribution of the error terms, McFadden (1974) shows that Equation (3.3) may be written as: Pth(i) = / g(ett = x)H g{ekt = y)dydx, (3.4) k^i where g(-) represents the probability density function of the error. However, in general, consumers do not evaluate all K alternatives but instead only evaluate a subset, C, of them where C G K and, from either an analyst's or a consumer's point of view, there is uncertainty about the composition of C. Therefore, the set C can be considered to be "fuzzy" (Fotheringham 1988). This uncertainty can be incorporated by rewriting Equation (3.3) as: Pth(i) = p]pr*(i e C)(U% + eit) > pr?(k E C){U^t + ekt){k &K,k^ i)], (3.5) where the utility for brand i is rescaled by pr^(i EC), the probability of brand i being included in the consideration set C. Assuming that en is independently and identically distributed with a Type I extreme value distribution but making no assumption about the values of the pr^(i € C) terms (Fotheringham 1988), Equation (3.5) can be expressed as follows: roo K rx+vii-ukt Pt(i) = Pr?(i e C) / g(eit = x) l[Prt(k € C) / g(ekt = v)dydx k^i K = / exp(-x) Upr^k E C) exp[- exp(-x + U£t)}dx Jx=—oo f_ prth(t € C) exp(U?t) i:kpr^keC)exp(U^y K ' ' For notational simplicity and ease of understanding, we term pr^(i € C) as the inclusion probability, IP^. In the subsequent discussion, we provide a specific functional form for TP^ Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 58 Impact of We adopt a fuzzy version of the non-compensatory rule in which every brand has a non-zero probability of being included in the consideration set. We specify the inclusion probability, IP^, as a function of Y£, using a Multiplicative Competitive Interaction (MCI) functional form. Thus, we have: TT (uh )7m " ~ £ f c n ™ ( y L t ) 7 m { } Using an M C I form for IP^ is equivalent to choosing fy different from fx in the overall utility expression in Equation (3.1). The following derivation makes this point clear: i p ^ x p ( ^ ) = i>i^i!aS^exp^) n m ( y L t ) 7 m e x P ( ^ ) E f c n m ( y L t ) 7 m e x P ( ^ ) expMYimiyLtneMUu1) E f e exP[ln(nm (yLt) 7 m)l exp(£/&) e x p [ E m l n ( ^ m t ) 7 m ] e x p ( ^ ) E f c exp[Em In ( 2 / L t ) 7 m J exp(C/£t) e x p [ ^ + E m 7 m l n ( j / L t ) ] E f c exp[(7& + Em 7m HvLt)] exp[E/A4t + E HvLt)] ,Q 0 . Efc exp[Ez /?jx&t + E m 7m In (yLt ) ] In other words, specifying an M C I form for IP# is, in effect, equivalent to using log-transformed Y£ in the overall utility function as opposed to the identity transformation used for X^. By allowing the functional forms to differ (Roberts and Lattin 1997), the analyst can capture the non-compensatory nature of consideration set formation while allowing brand evaluations to be compensatory (e.g., Bettman 1979; Gensch 1987). In addition, this feature of model formulation makes it possible to identify separately2 the 2In practice, however, this formulation may suffer from multi-collinearity problem as reported in Andrews and Srinivasan (1995). Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 59 effects of the same variable used in consideration and brand evaluation, i.e., when the intersection of x£t and Y£ is non-empty (This feature distinguishes our model from the B & V model, which we will discuss later). In effect, the log-transformation helps screen out brands that are unsatisfactory on any one attribute ( y ^ ) of Y£. T O illustrate this point, we present functional shapes that reflect a compensatory process, a fuzzy version and a crisp version of a non-compensatory process,3 respectively, in Figure 3.1. The fuzzy non-compensatory line (log function) shows that the utility begins to fall off rapidly for alternatives with y^t close to zero. This happens when brand preference (conventionally captured via a brand loyalty measure that lies between 0 and 1), for example, in Yth comes close to zero, resulting in predicted choice probability close to zero. The utility for a brand with negligible preference cannot be compensated for by other attributes. In contrast, the compensatory line reflects that the utility can be compensated for by other attributes even if y£mt comes to zero. As compared to the fuzzy non-compensatory line 4, the crisp non-compensatory line describes a kinked utility function that allows for a immediate fall-off for y^t ^ess than a certain threshold (Elrod et al. 1992). 3The non-compensatory process adopted in the current study is based on the conjunctive decision rule described in Elrod et al. 1992. Unlike the lexicographic-type Elimination-By-Aspects model (Tver-sky 1972) that screens alternatives on an attribute-by-attribute basis in some hierarchical sequence, our conjunctive decision rule does not assume a sequential consideration of attributes but a parallel consider-ation of attributes along with attribute weights. Therefore, among a predetermined set of consideration attributes, if any one attribute level of a brand comes close to zero, that brand is likely to be screened out. 4In effect, uncertainty about the composition of the consideration set arises from fuzziness of the threshold level, and is represented by smoothness of the utility function implied by a fuzzy version of non-compensatory process. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 60 Comparison to B & V (1996) Bronnenberg and Vanhonacker [B&V] (1996) define the inclusion probability for each brand using an independent binary logit expression as follows: ,pfc _ e x p ( 7 l ^ ) ( 3 Q ) where A t is a threshold (to be estimated) against which inclusion utility, fY^, is com-pared. To compare the two approaches, we use the above Equation (3.9) to substitute for IP^ in Equation (3.8) as follows: IP£exp(C/£) = e x p ( 7 y £ ) [ e x p ( 7 Y £ ) + exp(A t)]~ 1 exp(C/£) E f c IP£ t exp( t /&) E f c e x p ( 7 Y £ ) [ e x p ( 7 Y £ ) + exp(A t )]- i exp(£/&) exp [u* + 7 Y £ - Mexpfrlff) + exp(A t)]] E f c exp [t/& + 7 Y £ " ln[exp( 7lfci) + exp(A t)]] exp [Zi P A + EmlmVLt ~ ln[exp(E )+exp(A t )]] Efcexp [EiPA + Em7myLt - ln[exp(Em7m!/L) + exp(A t)]] (3.10) Therefore, in contrast to our model, in B & V (1996), the Y£ appear in the overall utility function after an identity transformation. This implies that in the B & V model, both brand evaluation and consideration variables have a compensatory influence on overall utility and unlike in our model, explicit screening based on {y^t} does not take place. Note that, in the B & V model, due to identification problems, the same variable cannot influence both brand evaluation and consideration. In contrast, our model permits us to separately identify the impact of a variable on both stages. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 61 3.3 Advertising, Price Sensitivity and Consideration Sets In generalizing Mitra and Lynch (1995) in the current study, we focus on the direct effect5 of advertising on consideration set size. Unlike their experimental study, which hypothesizes that advertising increases consideration set size (that should be true when all unfamiliar brands advertise) and therefore increases price sensitivity, the current study hypothesizes the following process at work. Advertising can either increase or decrease price sensitivity through its influence on the consideration set. First, brand advertising can increase inclusion probability for a brand. Second, consideration set size can either increase or decrease as a result of advertising. For example, if in response to advertising, the distribution of inclusion prob-abilities across brands becomes more even (skewed), then consideration set size increases (decreases). Third, as the consideration size increases (decreases), the consumer per-ceives brands as more (less) close substitutes, becoming more (less) price-sensitive. This indirect relationship between advertising and price sensitivity is presented in Figure 3.2. We examine the first link by specifying the inclusion probability as a function of adver-tising and the second link by measuring consideration set size via the entropy of inclusion probabilities. Gensch and Soofi (1994) illustrate a mapping of an information index 6 to consideration set size by using linear interpolation. The logic is that in the absence of attribute information, predicted choice probabilities are uniform across brands (all brands are considered), resulting in maximum entropy (maximum uncertainty), \n(K), where K= the number of brands. If attribute information is available, then predicted 5 M i t r a and Lynch (1995) also investigates the indirect effect of advertising on consideration set size through its influence on relative strength of brand preference. Since we do not model brand preference as a function of advertising, but infer it from revealed choice history, we focus on the direct effect of advertising on consideration set size. SCK TT In IT 6 T h e information index is defined as (1 — k=fB £ ) where Wk is the estimated choice probability, ln_K" is the maximum entropy and K is the number of brands. The information index is negatively related to a set size. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 62 choice probabilities are pulled away from the uniformity, resulting in reduced entropy which reflects the reduction of the number of choices in a decision problem from K to less than K. Following the logic, we use the ratio of the entropy of inclusion probabil-ities to the maximum entropy as a measure of a consideration set size since it provides more complete, continuous, information as opposed to discrete approximation of a set size. In addition, a continuous measure of set size is consistent with our view that the composition of the consideration set is fuzzy rather than crisp. The relative entropy for household h on occasion t is calculated as follows: E N T t = ln(K) ' ( 3 - U ) where ENT^ is the relative entropy for household h on occasion t and K is the number of brands. The relative entropy7 takes the maximum value of 1 when all inclusion prob-abilities are equal (consideration set size of K) and the minimum value of 0 when any one brand has an inclusion probability of one (consideration set size of one) and all other brands have inclusion probabilities of zero. Therefore, the relative entropy is positively related to a consideration set size. To investigate the last link, we use a varying parameter approach (Gatignon and Hanssens 1987). This approach allows us to examine the relationship between the con-sideration set size and price sensitivity by reparameterizing the price parameter as a function of the relative entropy. In doing so, it is critical to account for differences in inherent price sensitivity to capture the correct relationship since consideration set size does not explain inherent price sensitivity. For example, consumers with the same con-sideration set size may show quite different price sensitivities depending on which brands are usually in the consideration set. If low price-tier brands are in the set, price sensitivity Alternatively, we could use the entropy of inclusion probabilities to derive a consideration set size by solving for 5 in equation ln(5) = - IP£ t tn(IP£t) where S lies between 1 and K. We found that this alternative measure (S) provided a equivalent fit in the empirical application. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 63 will be high, and vice versa. Therefore, we assume that price sensitivity has two com-ponents, inherent (time-invariant) price sensitivity and inter-temporal price sensitivity. In a regression context, differences in inherent price sensitivity are captured by separate intercepts across segments. Beyond inherent price sensitivity, inter-temporal variations in price sensitivity are captured by the relative entropy (consideration set size). 3.4 Empirical Application 3.4.1 Estimation For notational simplicity, so far, we have developed the fuzzy set model in a single-segment framework in which consumers are assume to share common response param-eters. In estimating the model, we employ a latent class approach to account for re-sponse heterogeneity in both stages (Bucklin and Gupta 1992, Kamakura and Russell 1989). Therefore, all parameters are allowed to vary across segments. This is partic-ularly important in the current study since, in the absence of heterogeneity correction, response parameters for inclusion probabilities may be inflated8 by erroneously capturing unaccounted for preference and response heterogeneity in brand choice. In addition, a hypothesized relationship between advertising and price sensitivity, in particular, should be investigated after controlling for price response heterogeneity (Krishnamurthi and Raj 1985, Papatla 1995) and heterogeneity in advertising susceptibility (Popkowski-Leszczyc and Rao 1989) 8 A homogeneous logit choice model is reported to overestimate the impact of household-specific loyalty measures since they capture unaccounted for heterogeneity (e.g. Siddarth et al. 1995). Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 64 3.4.2 Data The data for this study are drawn from A . C . Nielsen scanner panel records, in the yogurt and ketchup categories, for households in Sioux Falls, South Dakota, for the period 1986-1988. The last 51 weeks of the data were used for model calibration and the preceding 61 weeks were used for initializing model variables. Households were qualified for inclusion in the sample if they made at least one grocery purchase every four weeks over the entire study period and made at least one purchase both in initialization and calibration periods. 599 and 592 households qualified on this basis for the yogurt and ketchup categories, respectively. A technical problem that arises with single-source data is missing information in the household commercial exposure data due to breaks in the connection between household TVs and telemeters. Since some of the ads that households are exposed to may go un-detected, the observed ad exposure rates in the database may underestimate the true household exposure rates. Following the screening procedure of Pedrick and Zufryden (1991), we included only those purchase records of the qualified households in the study in which their telemeter was working for over 21 out of the 28 days prior to each occasion. Thus, in the yogurt category, out of the total of 5,813 purchase records, 4,446 purchase records were included in the study. Out of the qualified households, 400 were randomly selected for model calibration (3053 purchases) and 199 for model validation (1393 pur-chases). In the ketchup category, we chose not to include ad exposure data because only two brands out of seven advertised on T V during the study period. Out of the qualified households, 400 were randomly selected for model calibration (2476 purchases) and 192 for model validation (1140 purchases). The top seven selling brands are included in the study. Together, these accounted for 84 (85) percent of category sales in dollars, and 89 (87) percent in units for the yogurt Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 65 and ketchup categories, respectively. 3.4.3 Model Variables We now specify the model variables that influence brand evaluation and consideration. Brand Evaluation We assume that consumers choose a brand by trading off between price and quality among actively considered brands. We interpret the brand-specific intercepts in the utility function as quality scores that can be decomposed into brand attributes and weights for those attributes (Elrod and Keane 1995). Therefore, the deterministic component of utility for each brand i for household h in segment s is specified as a function of a brand-specific constant and price (e.g., Elrod and Keane 1995, B & V 1996 and Erdem 1996): ^ = /?« + ^ i P R I C E i t , (3.12) where PRICEj t = actual shelf price of brand i at time t. Consideration Set Formation In the absence of self-reported consideration set information, we may infer the brands which consumers consider based on their brand preference as measured by their prior experience with the brand (Nedungadi 1990, Andrews and Srinivasan 1995, Siddarth et. al. 1995, Chiang et. al. 1998). In a similar spirit, B & V (1996) propose that inclusion probabilities are related, either positively (reinforcing) or negatively (variety-seeking), to recency of purchase.9 This variable assumes the value 1 for the brand last purchased and 9 W h e n choice behavior is of the reinforcing type, recently purchased brands are more likely to be salient to consumers. In contrast, when choice behavior is of the variety-seeking type, recently purchased brands are less likely to salient to consumers ( B & V 1996). Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 66 exponentially slopes downward for other brands. The consideration set is dynamic and can change until a time immediately prior to brand choice (Shocker et al. 1991). Therefore, our model includes variables that capture brand preference, recency, TV ads, feature, and display activity. Marketing communi-cation variables, such as T V ads and feature ads may provide cues to aid retrieval of brands from memory for inclusion into the consideration set. Moreover, after a consumer is in the store, he/she may consider additional brands based on their display (e.g., Ne-dungadi 1990, Andrews and Srinivasan 1995, B&V 1996). In addition, we propose to test whether there is an interaction effect of brand preference and TV advertising on consideration. Tellis (1988b) has shown that the impact of advertising on the probability of brand choice is higher for brands with higher preference. An underlying assumption of his study is that advertising affects brand choice by directly increasing brand utility and, hence, the probability of brand choice. In contrast, the well-known Hierarchy of Effects model (Lavidge and Steiner 1961) predicts that advertising will have a greater impact at a lower level of the hierarchy, i.e., awareness, knowledge, liking, and preference. In the current study, unlike Tellis (1988b), we investigate the interaction effects of advertising and brand preference on consideration set formation.10 It may be argued that advertising for the most preferred brand might not increase the inclusion probability since the brand is most likely to be already included in the consideration set, i.e., a ceiling effect. On the other hand, advertising may have a substantial impact on consumers who currently have a low inclusion probability for a brand. Ultimately, whether the impact of advertising on inclusion probability (consideration set formation) is higher for low preference (low market share) brands or for high preference (high market share) brands is an empirical question which can be tested with data. 1 0 E v e n though the consideration stage is not clearly defined in the Hierarchy of Effects model, it can be reasonably assumed that consideration stage corresponds to a lower level of the hierarchy that precedes action stage (brand choice). Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 67 In summary, we specify the inclusion probability as a function of preference, recency of choice, T V ads, interaction between brand preference and T V ads, feature ads, and display: I p f c ( A D H ) ^ 1 l n ( ^ ) } WtV-2 ( R E C ^ 3 (FE ATa)** (DISP«)™ ( 3 1 3 ) *** Efc(ADjjJ^« 0^ l l n(<))(7r^^KREC5jJ^3(FEAT f c t)^(DISP f c t)^ " ' where, AD#= brand i's share of advertising exposures received by household h during 4 week period prior to occasion t, 7r^= household h's preference for brand i on occasion t, REC#= Recency of choice, 1 if i was last brand purchased; a R E C ^ j otherwise, where cv is a smoothing constant, FEATi t= 1 if brand i is featured; 011 otherwise and DISP#= 1 if brand i is displayed; 0 otherwise. The effect of achieving a certain share-of-voice is captured by the ( 7 s o + 7 s i bi (7r£)) > 0. Since 0 < 7 r £ < 1, a positive value for 7 a l implies that a low-preference brand will have a smaller exponent value than a high-preference brand. Therefore, since A D * is a share measure, the same level of advertising will have a greater impact on inclusion probabilities for a low-preference brand than a high-preference one. In other words, a positive estimate of 7 s i implies a negative interaction between brand advertising and preference. Inferring brand preference: We infer brand preference from the choice history of house-holds by assuming that the prior distribution of brand preference across households can be described by the Dirichlet distribution. Maximum likelihood estimates of the Dirichlet parameters are obtained from purchases observed in an initialization period. Knowing 1 1 One generic limitation of estimating the M C I model is that a variable should not take the value of 0. One way of circumventing this problem, as suggested by Cooper and Nakanishi (1988), is adding a small number (0.001) to the variable across brands and then rescaling the new value by the sum of the new values across brands. We apply this correction to our advertising share measure. For display and feature variables, we use 0.999 and 0.001 instead of 1 and 0. We also used the zeta-scores (Cooper and Nakanishi (1988)) and found similar results. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 68 these parameters allows us to obtain a household's dynamically updated posterior brand preference ( 7 r £ ) in a Bayesian fashion (See Siddarth et al. 1995 for details), incorporating both sample-wide priors (represented by the Dirichlet parameters) and household-specific purchase history (represented by total number of purchases of each brand from the ini-tialization period up to the last purchase occasion). 3.4.4 Models Estimated We estimate the proposed model along with several competing models described below. MNL Model This is the "plain" multi-segment logit choice model that accounts for preference and response heterogeneity but not for consideration sets, and assumes that consumers eval-uate available brands in a compensatory fashion. The deterministic component of the MNL utility function for household h in segment s is given by: (3.14) where all variables are as defined above in Equation (3.13). Dynamic Bayes (DB) Model For comparison purposes, we also estimate the Dynamic Bayes (DB) model proposed by Siddarth et al. (1995). The DB model represents the restricted set approach to deter-mining consideration sets, and employs a "crisp" version of a conjunctive rule. Andrews and Manrai (1995) suggest that the DB model performs better than previous considera-tion set models that have appeared in the marketing literature. The DB model specifies Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 69 brand choice probability as a mixture of two component probabilities as follows: i>*(i) = A p c ^ - f - ( l - A ) p 4 , (3.15) where pf t^ is the probability of choosing brand i from all available brands, and pc£jt is the probability of choosing brand i from the restricted choice set. The restricted choice set is determined by comparing dynamically updated brand preference (n^) against a cross-sectional threshold level. The parameter A can be interpreted as the probability that a household chooses from the restricted choice set. B & V M o d e l As an alternative fuzzy set approach to modeling consideration sets, we also estimate a model based on B&V (1996). The deterministic component of the utility function is given by Equation (3.12) and is the same as the one used in our proposed model. However, the inclusion probability for each brand is given by a binary logit expression based on the following deterministic component of the consideration set utility function (termed the "salience function" in the B&V model): = 7 s 0 AD£ + 7 siAD£ * TT£ + 7 s 24 + 7 .3REC£ + 7 s 4 FEAT i t + 7 s 5 DISP i t . (3.16) Then, IP^ t is given by: ^ - e x p ^ + e x p t A ) ' where A is a common threshold across brands, purchase occasions, and households, which is to be estimated. The more a brand's salience exceeds A, the more the brand is likely to be included in the consideration set. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 70 Proposed Models We label the plain consideration set model as F Z S E T and the consideration set model with an entropy measure (price parameter as a function of relative entropy) as F Z E N T . The F Z E N T model subsumes the F Z S E T model and permits us to investigate the indirect effect of advertising on price sensitivity. FZSET Model: We specify the overall utility (OU^) function as follows: OU$t = Poi + & iPRICE« + 7 s 0 ln(AD£) + 7 a l ln(AD? t) * l n ( 4 ) + 7 s 2 ln(vr£) +7,3 ln(REC&) + 7*4 In (FEAT*) + 7,5 ln(DISP«). (3.18) As previously discussed, variables that affect consideration enter the overall utility func-tion after being log-transformed. The final choice probability is then given by the ex-pression, e x p( 0^»rt) which is equivalent to Equation (3.8). FZENT Model: We reparameterize / ? s i , the price response parameter in Equation (3.18) as a function of relative entropy as follows: & i = 0so + 0 iENT* , (3.19) where E N T ^ t is defined in Equation (3.11) and is segment-specific. This varying pa-rameter specification enables us to study the relationship between consideration set size and price sensitivity after accounting for inherent differences in price sensitivity. 8aQ is a segment-specific price parameter that reflects inherent (time-invariant) price sensitivity and di captures the impact of consideration set size on inter-temporal variations in price sensitivity. We expect 9i to be negative, i.e., consideration set size to be positively related to price sensitivity. If this hypothesis is supported, it would imply that advertising affects inter-temporal price sensitivity through its influence on consideration set size. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 71 3.5 Results and Discussion For all the models, the two-segment solution provided the best overall fit based on the Bayesian Information Criteria (BIC). Fit statistics for this version of the models are pro-vided in Table 3.1 (Table 3.2) for the yogurt (ketchup) category. Based on the BIC, the results show that all of the consideration set models outperform the M N L model (except for the B & V model in the ketchup category) and strongly support the notion that con-sumers simplify their choice decision by restricting consideration to a subset of available brands. Most importantly, the F Z S E T model outperforms the other consideration set models in both categories. The superiority of the F Z S E T model over the DB model can be attributed to its utilization of continuous brand preference information to directly model inclusion proba-bilities as opposed to the discretized brand preference information used by the DB model. It is interesting to find that both the F Z S E T and D B models provide better fit than the B & V model (an exception is the holdout likelihood for the B & V model in the yogurt cat-egory). This supports the notion that consideration set formation is a non-compensatory process as described by the F Z S E T (a fuzzy version) and DB (a crisp version) models, rather than a compensatory process as represented by the B & V model (Note that, in Equation (3.10), the B & V model appears to be algebraically equivalent1 2 to the M N L model (compensatory process) except for the negative log term.) Based on these results, it seems that the F Z S E T model provides a better description of the choice processes than do other models. Parameter estimates for the models are reported in Tables 3.3 and 3.4. In the yogurt category (Table 3.3), price parameters are all significant and correctly signed. Feature advertising is not estimated to have a significant effect in any of the models. 1 3 1 2In the ketchup category, the B&V model turns out to be almost equivalent to the MNL model 13Only two brands carried feature ads and the frequency is extremely low (58 times out of 3053 Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 72 The F Z S E T model is the only one in which, both, the main and interaction effects of advertising are significant in Segment 1. As previously discussed, a positive value for 7i in the F Z S E T model indicates that the effect of advertising on inclusion probabilities depends on brand preferences. More specifically, advertising's impact is found to be higher for less preferred brands. In contrast, the B & V model results show a significant positive interaction effect (but insignificant main effect of advertising) in Segment 1. Given the superior fit of the FZSET model in both the calibration and holdout samples, we argue that, for the yogurt category, the impact of advertising on inclusion probabilities is likely to be higher for less preferred brands than for more preferred brands. This effect holds for Segment 1 that accounts for more than 60% of the sample households, but not for Segment 2. In the ketchup category (Table 3.4), we do not investigate the effect of advertising due to data limitations. A l l estimated parameters are correctly signed and reasonable in terms of face validity. It is noteworthy that the smoothing parameter a in the F Z S E T model is substantially lower than those in the comparison models, indicating that consumers give more weight to the brand last purchased than the comparison models predict. Due to differences in model formulations, the magnitude of variables that affect consideration cannot be directly compared (In the next section, we report an analysis of elasticities for a direct comparison). Fi t statistics and parameter estimates for the F Z E N T model, estimated on the yogurt data, are presented in Table 3.5. The results show that the F Z E N T and F Z S E T models are basically equivalent, and do not provide the evidence for a positive relationship be-tween consideration set size and price sensitivity. In contrast to the experimental findings of Mitra and Lynch (1995), the hypothesized effects do not appear to be carried over to purchase occasions). Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 73 this particular data set. Unlike their experimental study in which unfamiliar brands are used, we study a well-established market (yogurt) which consists of familiar brands (no new brands). In this market, advertising may simply reinforce brand preference rather than change consideration set size, resulting in a smaller variation in the set size than we expect in the market that consists of all unfamiliar brands. 3.6 Manager ia l Implications Dickson and Sawyer (1990) have shown that consumers spend only a limited amount of time at the point of purchase and that they may not examine all the prices of available brands in the category. In other words, consumers evaluate the prices of a subset of brands. In contrast to this empirical finding, previous single-stage choice models have assumed that consumers evaluate the prices of all available brands. A well-known con-sequence of ignoring consideration set formation in modeling brand choice is that the impact of price may be biased as compared to the "true" impact which would be ob-tained if the model was specified only for actively considered alternatives (Meyer and Kahn 1991). Since consideration is a necessary condition for choice, it is critical to iden-tify a brand's position in the consideration set in order to decide on relative emphasis on marketing communication elements that affect consideration set formation vs. pricing. For example, if we can identify brands that have low inclusion probabilities, then, in the presence of price discounts across brands, such brands can benefit more by increasing their communication efforts since price cuts for brands not likely to be considered may now become noticed. In Table 3.6, we present the average inclusion probabilities (based on the F Z S E T model) and market shares for the calibration sample. As shown in Siddarth et al. (1995), Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 74 the rank-order of brands by market share may not be the same as the rank-order by in-clusion probability in the current study. For example, Yoplait and Dannon rank second and fourth, respectively, in inclusion probability but rank third and fifth in market share. On the other hand, W . B . B and Weight Watcher rank third and fifth in inclusion proba-bility but rank second and fourth in market share. This suggests a potentially different focus in target consumers and marketing effort across brands. Yoplait and Dannon might want to induce more purchases among current customers by providing financial incentives while W . B . B and Weight Watcher might want to attract new customers by focusing on marketing communications. In principle, any brand needs a base-line level of marketing communications but some brands might need more so that their pricing may have an impact on market share. To illustrate this point, we conducted a simulation based on the parameter estimates obtained from the F Z S E T model. First, we simulated a 10% price cut and calculated market shares given the current level of advertising and display. Then, with the same price cut, we simulated one additional T V ad exposure1 4 prior to each purchase occasion and display in 50 randomly selected week-stores (out of 663 week-stores), respectively, and calculated market shares again. Simulation results are shown in Table 3.7. Results show that additional T V ads and displays, respectively, appear to amplify the impact of price cut on share gains and that the share gains vary across brands due to differences in inclusion probabilities. The impact of additional displays, in particular, on share gains is greater especially for the two brands that have the lowest inclusion probabilities, QC and private label. This suggests that brands, which have low inclu-sion probabilities need some type of communication, e.g., displays, otherwise their price promotion may go unnoticed. 14Four brands are excluded from this simulation because they never or rarely advertised on TV during the study period and therefore, the condition given here is too different from what actually happened, to make a reliable prediction based on the estimated parameters. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 75 We repeated a simulation based on the parameter estimates obtained from the M N L model. Simulation results are shown in Table 3.8. As compared to the share gains pre-dicted by the F Z S E T model, the M N L model appears to underpredict share gains from improving pricing and communication mix elements, e.g., displays. 1 5 This is the expected result since the M N L model dilutes the effect of the communication mix, assuming that all brands are considered, whereas the F Z S E T Model sharpens the effect of the commu-nication mix, assuming that consideration is a necessary condition for choice. Therefore, the simulation results illustrate the importance of incorporating the consideration set into brand choice model to capture accurately the impact of a price discount coupled with improved communication mix. 3.7 Conclus ion We develop a new fuzzy consideration set model that incorporates the effects of brand preference, recency of choice, T V ads, feature ads, and display on consideration set formation. As opposed to the universal set approach or the restricted set approach that model uncertainty about the sets from which a brand is chosen, the proposed fuzzy set approach directly models the probability of a brand being included in the consideration set. The resulting inclusion probabilities for brands reflect a "fuzzy" consideration set, in the sense that a brand belongs to the consideration set only probabilistically, in contrast to well-defined consideration sets, in which consumers are assumed to know which brands are included in each consideration set but there is uncertainty about the sets from which a brand will be chosen. As compared to an existing fuzzy set model (B&V 1996) in which both brand eval-uation and consideration variables have a compensatory influence, our model permits a 1 5 Addit ional ad exposures were not simulated since the advertising parameter in the M N L model is insignificant. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 76 non-compensatory process in consideration set formation while allowing brand evaluation to be compensatory. Results from estimating our model on scanner panel data from the yogurt and ketchup categories shows that the F Z S E T model outperforms the DB model. The superiority of the F Z S E T model can be attributed to utilization of continuous brand preference information as opposed to the discretized information used in the D B model. In addition, both the F Z S E T and DB models provide a better fit than the B & V model, implying that consideration set formation is a non-compensatory, rather than compensatory, process. To investigate the indirect effect of advertising on price sensitivity through its influ-ence on the consideration set, we extend the F Z S E T model by reparameterizing the price parameter as a function of relative entropy (FZENT model) which measures considera-tion set size. We find no evidence for a positive relationship between consideration set size and price sensitivity. The experimental results of Mitra and Lynch (1995) do not seem to be carried over to this particular data set. It may be harder to detect variation in consideration set size in the established market (yogurt) that consists of relatively familiar brands (no new brands). Further empirical test may be necessary to confirm their hypothesized relationship with other data set that includes both existing and new brands. Simulation results based on estimated parameters show that the impact of T V ads and display on share gains vary across brands due to differences in inclusion probabilities. This suggests that marketing communications, e.g., display, can amplify the impact of a price cut on share gains especially for brands that have low inclusion probabilities. As compared to the share gains predicted by our model, the M N L model appears to under-predict share gains by additional display, that is, underestimate the impact of marketing communications. This illustrate the importance of incorporating the consideration set into a brand choice model to assess accurately the impact of marketing communications. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 77 3.8 Limitations and Future Research In investigating the effect of advertising on price sensitivity through its influence on consideration set size, we used a share-of-voice measure without taking into account advertising content, i.e., price-oriented vs. image-oriented advertising. As speculated in Boulding et al. (1994), the content of advertising could influence price sensitivity qualitatively, i.e., a share of voice and advertising content may jointly affect price sen-sitivity through their impact on the consideration set size. In addition, we studied a well-established market (yogurt) which consisted of relatively familiar brands (no new brands) unlike the experimental study of Mitra and Lynch (1995). Therefore, it may have been harder to detect variation in consideration set size and thus a relationship between consideration set size and price sensitivity. Further investigation may be necessary to generalize the findings of Mitra and Lynch (1995) with other data set that contains more accurate and richer information about advertising for both existing and new brands. We specify the overall utility function that incorporates two sets of variables that might affect consideration set formation and brand evaluation, respectively, based on the classification of previous research (Nedungadi 1990, Andrews and Srinivasan 1995, Siddarth et al. 1995, B & V 1996). We assume that consumers uniformly screen out brands based on these pre-specified variables. However, consumers may differ not only in their consideration sets but also in their choice of variables on which they screen out brands. Therefore, from the analyst's point of view, there is uncertainty about which variables consumers would use in consideration set formation. We could incorporate this uncertainty into a more general model that allows for the set of variables that influence the consideration set to vary across consumers. One additional issue in research on consideration set formation is whether one brand Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 78 can belong to a consideration set without affecting the probability of another brand be-longing to the consideration set. The proposed fuzzy set model assumes "dependence" among inclusion probabilities, that is, increased inclusion probability for one brand nec-essarily decreases the sum of inclusion probabilities for remaining brands. However, it can be argued any brand can belong to a consideration set independently (Andrew and Srinivasan 1995, B&V 1996). A further generalization may be that one segment of con-sumers may follow the "dependence" assumption and another segment of consumers may follow the "independence assumption" based on their cognitive capacity or willingness to carry brands in the consideration set. Therefore, we could incorporate these competing views into an integrated model to describe better the consideration set formations across segments. Essentially, this generalization seek to account for model uncertainty by al-lowing structural assumptions to vary across segments. We can deal with this issue by employing a Bayesian model averaging approach, as discussed in the next chapter. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity Table 3.1: F i t Statistics for the Yogur t D a t a Model Number of Parameters Calibration log-likelihood BIC Holdout log-likelihood F u l l Set M N L 22 -3137.531 -3225.794 -1424.837 Considerat ion Set D B M 24 -3121.569 -3217.856 -1419.600 B & V 23 -3131.227 -3223.502 -1418.532 F Z S E T 22 -3098.374 -3186.637 -1383.128 2 segment solution is the best based on BIC for all models. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity Table 3.2: F i t Statistics for the Ke tchup D a t a Model Number of Parameters Calibration log-likelihood BIC Holdout log-likelihood F u l l Set M N L 18 -2701.921 -2772.251 -1321.615 Considerat ion Set D B M 20 -2674.621 -2752.765 -1306.080 B & V 19 -2701.921 -2776.158 -1321.617 F Z S E T 18 -2665.829 -2736.159 -1298.054 2 segment solution is the best based on BIC for all models. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 81 Table 3.3: Parameter Est imates for the Yogur t D a t a Variable M N L . D B M B & V F Z S E T SEG1 SEG2 SEG1 SEG2 SEG1 SEG2 SEG1 SEG2 P R I C E -0.694 -0.307 -0.695 -0.323 -0.674 -0.278 -0.679 -0.293 A D 0.018* 0.519* 0.076* 0.573* -0.070* 0.419* 0.063 0.028* A D * 7 T 1.774* -0.706* 1.378* -0.794* 2.991 0.522* 0.014 -0.001* 7T 1.190 2.232 1.023 2.011 1.447 3.499 0.379 0.422 R E C 0.708 2.043 0.636 2.042 0.894 2.662 0.043 0.115 F E A T 0.318* -0.306* 0.398* -0.417* 0.443* -0.279* 0.035* -0.043* DISP 1.584 0.686 1.644 0.653 1.913 0.975 0.233 0.089 Smoothing a 0 836 0.823 0.790 0.291 Cutoff (6) 0.052 X 0.260 Threshold (A) 3.963 Segment Share 0.640 0.360 0.681 0.319 0.663 0.337 0.638 0.362 (*): insignificant at a =0.05; A l l other parameters are significant at a =0.05 Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 82 Table 3.4: Parameter Estimates for the Ke tchup D a t a Variable M N L SEG1 SEG2 D B M SEG1 SEG2 B & V SEG1 SEG2 F Z S E T SEG1 SEG2 P R I C E R E C F E A T DISP Smoothing cv Cutoff (6) A Threshold (A) Segment Share -1.667 -0.121* 4.270 3.773 0.607 0.741 0.618 0.808* 2.075 2.592 0.961 0.717 0.283 -1.623 -0.130* 3.798 2.838 0.415 0.752 0.675 0.828* 2.210 2.837 0.892 0.070 0.408 0.749 0.251 -1.667 -0.121* 4.270 3.773 0.607 0.741 0.618 0.808* 2.075 2.593 0.961 16.673* 0.717 0.283 -1.476 -0.024* 0.878 0.624 0.044 0.052 0.083 0.104* 0.300 0.400 0.306 0.817 0.183 (*): insignificant at a =0.05; A l l ot her parameters are significant at a =0.05 Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 83 Table 3.5: F Z E N T : Yogur t D a t a Variable SEG1 SEG2 Brand Evaluation P R I C E (9s0) -0.734 -0.333 E N T * P R I C E (00 0.066* Consideration Set Formation A D ( 7 s 0 ) 0.063 0.027* AD*7T ( 7 s l ) 0.014 -0.001* 71" (7 «2) 0.376 0.425 R E C ( 7 s 3 ) 0.043 0.116 F E A T ( 7 s 4 ) 0.036* -0.044* DISP ( 7 a 5 ) 0.232 0.087 Smoothing a 0.294 Fit Statistics Calibration Log-likelihood -3098.186 BIC -3190.460 Holdout Log-likelihood -1384.302 (*): insignificant at a =0.05 Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 84 Table 3.6: Inclusion Probab i l i ty and M a r k e t Share Brand Inclusion Probability Market Share Yoplait 0.172 0.160 Weight Watcher 0.114 0.135 Dannon 0.144 0.122 Nordica 0.305 0.268 Quality Control (QC) 0.024 0.035 Well's Blue Bunny (W.B.B) 0.147 0.175 Private Label 0.094 0.105 Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 85 Table 3.7: Share Gains Across Brands: F Z S E T Brand 10% Price Cut with Current A d Exposures and Display Level 10% Price Cut with Additional A d Exposures 10% Price Cut with Additional Displays Yoplait 4.33% 4.91% 4.68% Weight Watcher 2.34% 2.41% 2.59% Dannon 3.08% 3.44% 3.41% Nordica 3.81% ** 4.44% Quality Control 2.34% ** 8.24% Well's Blue Bunny 2.75% ** 3.37% Private Label 2.92% ** 7.73% **: These brands are excluded from the simulation. Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 86 Table 3.8: Share Gains Across Brands: M N L Brand 10% Price Cut with Current Display Level 10% Price Cut with Additional Displays Yoplait 3.95% 4.17% Weight Watcher 2.60% 2.84% Dannon 3.01% 3.23% Nordica 3.23% 3.65% Quality Control 2.09% 6.85% Well's Blue Bunny 2.42% 2.86% Private Label 2.50% 6.23% Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity 87 Figure 3.1: Funct ional Shapes of Compensatory vs. Non-Compensa to ry Ru le * Compensatory: U = 7 y where 7 is set equal to 0.3 and y represents one attribute * Fuzzy non-compensatory: U = 7 l n (y ) where 7 is set equal to 0.1 * Crisp non-compensatory: U = — 00 for y < 6, otherwise 0 where 6 is a threshold and set equal to 0.05 (cut off value of the Dynamic Bayes Model (Siddarth et al. 1995) estimated in the current study) U t i l i t y -0.6 -0.8 c r i s p non-compensatory -1 -1.2 -1.4 Chapter 3. Consideration Set Heterogeneity, Advertising, and Price Sensitivity Figure 3.2: Relationship Between Advertising and Price Sensitivity Link 1 Brand Advertising + + Inclusion Probabilities Link 2 + — i j Entropy: Consideration Set Size Link 3 + — Price Sensitivity Chapter 4 Decision Rule Heterogeneity 4.1 Overview Faced with competing hypotheses, most often, we develop different models based on their respective structural assumptions, examine the data to identify a single best model, and then proceed as if the chosen model based on fit statistics were known to be true. The chosen model is then used to make inferences and predictions even though the rejected models are not necessarily wrong (Draper 1995). In other words, structural uncertainty about the model itself is generally neglected. In the statistics literature, the issue of model uncertainty (or structural uncertainty) has been addressed by the Bayesian model averaging approach (Chatfield 1995) that acknowledges the possibility that there may be more than one model which may be regarded as "close to true". The notion of having more than one model is a key element of the Bayesian model averaging approach which avoids having to select a single best model but rather averages over more than one model. The spirit of this approach is well represented by the following quote: "The main way to avoid noticing after the fact that a set of modeling assumptions, different from those originally assumed, turned out to be correct is for one's model prospectively to have been sufficiently large to encompass the retrospective truth (Draper 1995, p. 55)." Recently, two studies that account for structural or model uncertainty were introduced 89 Chapter 4. Decision Rule Heterogeneity 90 in the marketing literature. Kamakura, K i m and Lee (1996) develop a flexible model in which consumers may differ in the choice process (i.e., use either a brand-primary or a form-primary decision rule). In an analysis based on store-level data, Kalyanam (1996) applies a Bayesian mixture model that accounts for uncertainty about functional forms of demand, recognizing that in practice many different demand functions might be theoretically plausible and fit the data equally well. This type of modeling approach, that incorporates competing yet plausible hypotheses in an integrated model, seeks to build a "big" model, which encompasses all plausible hypotheses, rather than using a single best model, which may not represent household-level choice behavior uniformly better than other, rejected, models. In deciding how big is sufficiently big, we may resort to subject-matter considerations such as accepted theories, expert background knowledge, and prior information, including that obtained from previous, similar, data sets. In marketing, a significant body of research has investigated the reference price con-struct and modeled its impact on consumer choice behavior. Two major competing models have been proposed: the sticker shock (Winer 1986) and reference-dependent formulations (Hardie, Johnson and Fader 1993). The sticker shock formulation cap-tures reference price effects via "a sticker shock term" defined as the difference1 between brand-specific reference prices and the current prices. On the other hand, the reference-dependent formulation captures asymmetric responses to positive and negative price deviations from a category-specific reference price, which are hypothesized to result in loss aversion (Tversky and Kahneman 1991) in the case of the current price exceeding the reference price. *We may allow for different parameters for the positive and negative differences to capture asymmetric responses to price losses and gains (e.g., Kalwani et al. 1990, Briesch et al. 1997). Chapter 4. Decision Rule Heterogeneity 91 A key distinction between the two models is that the sticker shock formulation as-sumes that consumers compare the current price of each alternative to its brand-specific reference price while the reference-dependent formulation assumes that consumers com-pare the current price of each alternative to one, category-specific, reference price and that consumer responses to gains and losses are asymmetric. Therefore, these two mod-els represent two different decision rules or choice processes underlying consumer brand choice, both of which are empirically supported and theoretically grounded in psycholog-ical theory: Adaptation-Level Theory (Helson 1964) and Prospect Theory (Kahneman and Tversky 1979). Then, naturally arising questions are, "Which model represents the true nature of response to reference price" and "Which model should we choose" ? In an attempt to answer these questions, Briesch et al. (1997) performed a compara-tive analysis of brand-specific and category-specific reference price models.2 Their empir-ical results from four product categories provided overall support for the memory-based brand-specific reference price model (i.e., the sticker shock model). They assume that all consumers consistently follow either the sticker shock model or reference-dependent model, and once a single best model is chosen, they base their subsequent inferences on that model. Is this view correct? More importantly, if one model is better supported by data, does that model represent the whole truth? Is it possible that the other, rejected, models may also partially represent reality? These questions are all related to the issue of model uncertainty. In this chapter, we develop a model that accounts for uncertainty about the way reference prices affect brand choice. We avoid selecting a single best model but, rather, 2They evaluated five alternative models of reference price which are either stimulus-based (i.e., based on information available at the point-of-purchase) and memory-based (i.e., based on price history). Essentially, however, these models can be regrouped into the sticker shock and reference-dependent models, depending on whether reference price is brand-specific or category-specific. Chapter 4. Decision Rule Heterogeneity 92 incorporate the competing decision rules into a single model to account for model un-certainty. Our approach recognizes the possibility that there may be two consumer segments, one consistently following the sticker shock model while the other consistently follows the reference-dependent model. In addition, as addressed in Briesch et al. (1997), consumers may use either of the decision rules on different purchase occasions, i.e., mix the two decision rules3 probabilistically. In summary, we develop a general (big) model that incorporates the distinct model structures implied by the three different decision rules. This research seeks to provide the following contributions: • Develop a structural uncertainty-based model of reference price effects that allows for different decision rules (decision rule heterogeneity). • Demonstrate, by way of an analysis of scanner panel data, that properly accounting for structural uncertainty can provide a better description of the way reference prices affect the choice process. This chapter is organized as follows. In Section 4.2, we present a model that fully accounts for both structural uncertainty and parametric uncertainty (parameter hetero-geneity across consumers) in capturing reference price effects. In Section 4.3 we compare our approach to one that ignores model uncertainty. Section 4.4 describes sampling-based methods (Markov Chain Monte Carlo simulation) to estimate the proposed model. Section 4.5 presents an empirical application to scanner panel data from the ketchup category. Section 4.6 discusses the empirical results. We close with our conclusions and directions for future research. 3In fact, this mixed rule could have been incorporated into Kamakura et al's study (1996) since o priori it is uncertain whether consumers consistently make either brand-primary decision or product form-primary decision, or they simply mix the two decision rules probabilistically on each purchase occasion. Chapter 4. Decision Rule Heterogeneity 93 4.2 Model Development We specify three models (structures) that consumers follow in their brand choice decision. A consumer could be described by a sticker shock model (M = 1), a reference-dependent model(M = 2), or a mix of the two decision rules ( M = 3). Given a particular state M = s, we specify the probability of brand choice for house-hold h on occasion t as the Multinomial Logit (MNL) probability: E f c exp(r^ t ) ' ( 4 - l j where Ugit denotes the deterministic component of utility for alternative i, household h, and model structure s, and (3% is a household- and structure-specific parameter vector. Structure 1: The sticker shock model reflects a decision rule that incorporates re-sponses to the differences between brand-specific reference prices and shelf prices. Fol-lowing Winer(1986) and Lattin and Bucklin(1989), the sticker shock utility 4 is given by: Uut = PIK + /?iiPRICE« + /?£>(RPRICE£ - PRICE*) + / ^ 3 P R O M O i t , (4.2) where R P R I C E * is a brand-specific reference price. The reference price of a brand is assumed to be its price 5 on the last occasion on which the household made a purchase. Structure 2: The reference-dependent model assumes that consumers compare the current price of each alternative to one, category-specific, reference price and that con-sumer responses to gains and losses are asymmetric. Following Hardie, Johnson and Fader (1993), the reference-dependent utility is given by : = PL + ^ P G A I N i t + /?£2PLOSS£ + ^ P R O M O f t > (4.3) 4In the empirical application, we include two additional variables, brand loyalty and last brand purchased, that are often used in a standard logit model. Alternatively, Briesch et al. (1997) used the exponentially smoothed (memory-based) shelf prices observed on purchase occasions. In our application, we choose to use the simplest form of the reference price since we are interested in structural uncertainty rather than in alternative operationalizations. Chapter 4. Decision Rule Heterogeneity 94 where P G A I N ^ is the difference between the reference price and actual price of brand i when the actual price is lower than the reference price, 0 otherwise and PLOSS^ is the difference between the reference price and actual price of brand i when the actual price is higher than the reference price, 0 otherwise. The reference price of a brand is assumed to be the price last paid 6 for a purchase in the category. Structure 3: The mixed decision rule is represented by the following model: P*(* I ft) = 7hJ$(< I Pzi) + (1 - 7 I / & ) (4-4) where yh represent the probability that household h follows the sticker shock model on each purchase occasion and 0*1 = j3%2, i h ) . As compared to Structures 1 and 2, that represent either a pure sticker shock or a pure reference-dependent decision rule across all purchase occasions, Structure 3 implies a probabilistic mixture of the two decision rules, which is consistent over time. 4.2.1 Model Uncertainty It is uncertain which decision rule a particular household h may follow. To account for this uncertainty, we specify the household-specific probability that household h follows decision rule s as ph(M = s) = A^, where £ s = i ^« = 1-Priors on Model Probabilities: If the model indicator M for household h is thought of as a multinomial random variable with probability vector \ h , then the natural conjugate prior distribution is the Dirichlet distribution, i.e., Xh ~ D{ct\, a 2 , 0:3). The relative sizes of the Dirichlet parameters o;s describe the mean of the prior distribution for A^, and 6In Briesch et al. (1997), two alternative operationalizations of the reference price are suggested. One is the current (stimulus-based) shelf price of the reference brand (the chosen brand on the last purchase occasion). The other is the exponentially smoothed shelf prices of the brands chosen on past purchase occasions (memory-based). Following the standard approach in previous research, we choose to use the price last paid (memory-based) as the reference price. As compared to the exponentially smoothed shelf prices, this measure is simpler and avoids confounding prices of different brands and sizes. Chapter 4. Decision Rule Heterogeneity 95 the sum of the a s 's is a measure of the strength of the prior distribution, i.e., the prior sample size. This specification accounts for structural uncertainty within a household and heterogeneity in decision rules across households. 4.2.2 Parameter Heterogeneity Conditional on the model indicator M = s, it is straightforward to model heterogeneity in parameters across households. Following the standard approach suggested in the literature (e.g., Gelfand et al. 1990), we specify a prior distribution over household parameters, (3h, given M = s. Priors over Household Parameters: Since the fully Bayesian approach requires priors over unknown quantities in the model, we specify a multivariate normal prior distribution over household parameters, /3^, given M = s. 0h\M = s~ MVN{p,s, S„) (4.5) We can either estimate p.s and Ss or pre-specify their values by some constants.7 Since we are interested in population quantities, ps and Es, we estimate them by specifying hyper-priors over them. In this stage, we fix the parameter values of the hyper-priors such that they represent diffuse but proper distribution. Following Gelfand et al. (1990), we assume a multivariate normal prior, MVN(rjs, Cs) over p,s and a Wishart prior, W[(psRs)~l, ps] over Ss. Details on each prior are provided in Appendix C. 4.2.3 Joint Model Specification Given the model probabilities and priors over household parameters, we can specify the joint distribution of all unknown parameters and data, yh (vector of indicators for chosen 7Pre-specifying the parameter values for a prior distribution is a simplification of the fully Bayesian approach and is similar to the approach suggested in Rossi and Allenby (1993). This approach ignores uncertainty in the parameters of the prior. Chapter 4. Decision Rule Heterogeneity 96 brands). Corresponding to model structure s, let p{(3h \ M = s) be a prior over household parameters, and f(yh | (3h,M = s) = Y^=i Pst{c I P*) D e a likelihood for household h, where c = an indicator for a chosen brand. Conditional on M = s, only (3* is relevant to r d P£(e | /3h) (Similarly, only p(ph \ M = s) is relevant to ph). In other words, conditional on M = s, P% is assumed to describe completely yh, the observed choice history of household h. Therefore, we assume complete independence among the various P^ given M = s (i.e., yh is independent of {P^s} given M — s), and may specify the fully Bayesian model by choosing proper "pseudopriors", p(Ph \ M ^ s) (We discuss the role of pseudoprior and the specifics of pseudoprior selection, as described in Carlin and Chib 1995, in Appendix C). From the usual conditional independence assumptions, p(yh \M = s) = Jf(yh | p\M = s)p(J3h \ M = s)d(5h = j f{yh\Phs,M =s)p{(3hs\M = s)d(3h, where (3h = ((3h, /3*), and p(yh) = J2xs I f(yh I ft M = s)p&hs I M = s)d0s- (4-6) s=l J So the form given to p(P^ \ M ^ s) is irrelevant in Equation (4.6). As the name suggests, a pseudoprior is not really a prior but only a conveniently chosen linking density (Carlin and Chib 1995), required to define the following complete joint model specification. Given model probability ph(M = s) = \ h , the joint distribution of yh and (3h when M = s is given by: Aj, (4-7) f[p(Phk\M=s) U=i p(yh, Ph, M=s) = f(yh\ Phs, M=s) Unlike in Equation (4.6), pseudopriors, e.g., p(/3j | M = 1) and p(p* | M = 1) need to be specified in Equation (4.7). Given this joint distribution, we account for structural and parametric uncertainty by integrating over uncertainty about model parameters, ph and Xh, respectively. Chapter 4. Decision Rule Heterogeneity 9 7 Given the priors and likelihood, we construct the full conditional distributions for the entire model parameters and implement the M C M C simulations to estimate posterior distributions of model parameters. The exact forms of the full conditional distributions and detailed estimation procedures are discussed in Appendix C. 4.3 Mode l Features Our modeling approach explicitly incorporates structural uncertainty into the model, recognizing that any possible structure receiving a prior probability of 0 must also have a posterior probability of 0. Given that both the sticker shock model and the reference-dependent model of reference price are theoretically grounded and empirically supported, it is imperative to put non-zero prior model probabilities on these structures as opposed to setting ph(M = s) — 1.0 and ph(M ^ s) = 0.0, i.e., putting a point mass on only one structure s, which is equivalent to estimating each model separately and choosing a single best model in terms of fit statistics. This "point-mass-on-one-structure-at-a-time" may be too concentrated on a single structural assumption to lead to well-calibrated inferences and predictions (Draper 1995) . Behaviorally, there is no reason to believe that every household follow the same deci-sion rule characterized either by the sticker shock model or by the reference-dependent model. A consumer segment may follow the sticker shock model because they have suf-ficient cognitive capacity as well as motivation to recall the shelf-prices observed on the last purchase occasion. Another segment may follow the reference-dependent model be-cause they want to simplify the decision process and reduce their cognitive burden by having only one, category-specific, reference price. The other segment may mix the two decision rules probabilistically. For a particular household h, we can infer which decision rule the household may follow by examining the model probability, ph(M — s) = A^. Chapter 4. Decision Rule Heterogeneity 98 Corresponding to a model structure, we account for parametric uncertainty by spec-ifying a continuous prior distribution (a multivariate normal) over response parameters (including preference), (3h. This approach may overcome the limitations of a finite mix-ture approach that involves a heuristic way of determining the additional number of response segments within a structure (See Kamakura et al. 1996 for details on the heuristic procedure). In addition, our fully Bayesian approach explicitly accounts for heterogeneity in model probabilities across households while Kamakura et al. (1996) assume them to be common across households (i.e., probability of brand-primary deci-sion and that of form-primary decision are common across consumers), though posterior model probabilities vary across consumers.8 4.4 Estimation The proposed model estimation involves computation of high order multidimensional integrals and therefore makes it practically impossible to use conventional maximum likelihood methods. To circumvent this problem, we use sampling-based methods which involve approximating the joint posterior distribution by obtaining many random draws from the full conditional distributions (rather than directly from the joint posterior dis-tribution) and then basing inference on the empirical distribution of this sample of draws (Details on the full conditional distributions for model parameters are provided in Ap-pendix C). This method is known as substitution sampling. If all the full conditional distributions are known (i.e., known conjugate distributions such as a normal, a Dirichlet, and a Wishart), then substitution sampling reduces to a procedure known as Gibbs Sampling (Geman and Geman 1984, Gelfand and Smith 8 Major criticisms of this type of empirical Bayes approach are overconfidence in prior model proba-bilities and double counting of data in posterior analysis, i.e., use of the same data to estimate priors and posteriors (Kass and Steffey 1989). Chapter 4. Decision Rule Heterogeneity 99 1990). If the full conditional distributions are not completely known (i.e., known only up to a normalizing constant), then a Metropolis-Hastings step can be used (Tierney 1994, Chib and Greenberg 1995). In implementing a sequence of random draws for the proposed model parameters, the (m 4- l ) th step of the substitution sampling involves generating the following draws: (a) Generate the model indicator M from a multinomial distribution with a probability vector zh for h = 1 to H where zh is the posterior model probability vector (defined in Appendix C) of household h following each structure. (b) Generate a Xh draw from a posterior Dirichlet distribution, D(pt.\,..., as+Th,..., as), where M = s and Th is total number of purchases made by household h. (c) If M = s, generate /3* draws from p(/3*(m+l> \ Ssm), /x<m), yh, M = s) for h = 1 to H, using a Metropolis-Hastings algorithm. If M ^ s, generate draws from a "pseudoprior", p(/3^ | M ^ s), defined in Appendix C, i.e., when M = s we generate from the usual full conditional; when M ^ s w e generate from a "pseudoprior", the linking density (Carlin and Chib 1995). (d) Generate a f i s draw from p ( / i j m + l ) | {/3^ ( m )}, Hsm}, Vs, C,), for 5=1,2,3, using a Gibbs sampler. (e) Generate a S;1 draw from p^Sj1^1) \ {P^m)}, /x^ m ) , p., R3), for 5=1,2,3, using a Gibbs sampler. The joint posterior distribution of all unknown parameters is approximated by this sequence of random draws which generates a Markov chain, which becomes stationary after an initial "burn-in" period. Therefore, we discard the initial draws (1st to mth) from the chain since they only reflect a transient period in which the chain has not Chapter 4. Decision Rule Heterogeneity 100 converged to the stationary distribution. A sample of draws obtained after convergence is used to make posterior inferences about model parameters. It should be noted that this sampling-based procedure generates the entire posterior distribution rather than just a point estimate. Given that we expect substantial uncertainty in inferring household parameters with only a small number of observations, it is critical that we should also measure properly the uncertainty of estimates, i.e., a probability interval of the parameter estimates in interest. In our application, the substitution sampler was run for 15000 iterations and conver-gence was ensured by monitoring the time-series of the draws. We chose a burn-in length of 10000 iterations (after which the time-series plot stabilized) and retained every third iteration of the remaining 5000 iterations to reduce the serial correlation of the sampled draws, i.e., resort to "thinning the chain" (Geyer 1992, Raftery and Lewis 1995). There-fore, 1000 draws from the posterior distribution of each parameter were used to make inferences. 4.5 Empirical Application 4.5.1 Data The data for this study are drawn from A . C . Nielsen scanner panel records in the ketchup category for a sample of households in Sioux Falls, South Dakota, for the period 1986-1988. The first 61 weeks were used for initializing model variables, the next 40 weeks of the data (1954 purchases) were used for model calibration, and the last 11 weeks of the data (509 purchases) were used for validation. Households qualified for inclusion in the sample if they made at least one grocery purchase every four weeks over the entire study period and made at least one ketchup purchase both in calibration and validation periods. Out of 592 qualified households, 400 households were randomly selected. Chapter 4. Decision Rule Heterogeneity 101 The top seven selling ketchup brands are included in the study. Together, these accounted for 85 percent of category sales in dollars, and 87 percent in units. 4.5.2 Models Estimated • Pure Sticker Shock Model: Hierarchical Bayes sticker shock model that accounts for parameter heterogeneity only (labeled as HB-ST). This model is estimated setting ph(M = 1) = 1 for h — 1 to H and skipping steps 1 and 2 described in Section 4.4. • Pure Reference-Dependent Model: Hierarchical Bayes model of reference-dependent effect that accounts for parameter heterogeneity only (labeled as HB-RD) . This model is estimated setting ph(M = 2) = 1 for h = 1 to H and skipping steps 1 and 2 described in Section 4.4 • Proposed model: Structural uncertainty-based Hierarchical Bayes model of ref-erence price effects that accounts for both structural uncertainty and parametric uncertainty (labeled as HB-STRD) . 4.6 Results and Discussion In this section we discuss results from the comparison models (HB-ST and HB-RD) and the proposed model (HB-STRD), and compare them in terms of fit statistics such as log-likelihood and hit-rate. Traditional measures of goodness-of-fit such as the Bayesian Information Criteria (BIC) and Akaike Information Criteria (AIC), which penalize the likelihood for the num-ber of estimated parameters, are not suitable for assessing and comparing Hierarchical Bayes models because these measures are motivated by asymptotic arguments and are, therefore, inappropriate in situations where the number of model parameters varies with the sample size, i.e., when more households are added to the sample,(Carlin and Louis Chapter 4. Decision Rule Heterogeneity 102 1996, e.g., Manchanda et. al. 1997). Therefore, we use log-likelihood and hit-rate to compare the models. Comparison Models: Results of estimating the HB-ST and H B - R D models are presented in Table 4.1. For the calibration period, the HB-RD model outperforms the HB-ST model while for the hold-out period, the results are reversed (except that the hit-rate of the H B - R D model is better). Therefore, it is difficult to favor one model over the other with confidence. If we look at specific parameter estimates of interest, the posterior mean of the sticker shock effect estimate is insignificant (i.e., 0 is included in 95% probability interval) while the loss aversion estimate, measured by the posterior mean difference between the coefficients for both P G A I N and PLOSS, is significant (i.e., 0 is not included in 95% probability interval).9 In summary, ignoring model uncertainty may lead to the conclusion that either there is no evidence to support the existence of a sticker shock effect (if the HB-ST model is chosen) or that consumers exhibit significant loss aversion (if the H B - R D model is chosen). Proposed Model: Estimation results for the HB-STRD model is shown in Table 4.2. In an early stage of iterations, simulated model indicator M hardly took the value of 3, indicating that the mixing rule (within households) is not supported by the data. Therefore, we estimated the proposed model with only two structures, setting ph(M = 3) = 0 for h = 1 to H. The H B - S T R D model outperforms the comparison models for both calibration and hold-out periods. The HB-STRD model results provide no evidence for the existence of the sticker shock effect while loss aversion1 0 still holds. Therefore, it seems that loss aversion phenomenon is robust to heterogeneity correction. 9Posterior mean difference between the coefficients for PGAIN and PLOSS is 0.283 (95% probability interval is 0.038 and 0.528). A paired-comparison t-test further confirmed a significant difference. 10Posterior mean difference between the coefficients for PGAIN and PLOSS is 0.365 (95% probability interval is 0.160 and 0.570). A paired-comparison t-test further confirmed a significant difference. Chapter 4. Decision Rule Heterogeneity 103 The posterior means of household-level model probabilities, p(M = 1 | •) = 0.517 and p(M = 2 | •) = 0.483, give an idea of the proportion of households that follows each deci-sion rule and the range of ph(M = 2 | •) across households (MIN=0.151, MAX=0.835 and standard deviation=0.099) indicates heterogeneity in decision rules across households. Given insignificant sticker shock effects, we consider households with high ph(M = 1 | •) as having no brand-specific reference price. In contrast, households with high ph(M = 2 | •) can be viewed as having a category-specific reference price and responding more strongly to price losses than to price gains. On average, half the sample households appear to show loss aversion and the remaining households do not seem to have reference prices. In other words, no households seem to have brand-specific reference prices (consistent with the results from Chapter 2) while half of them rely on a category-specific reference price. 4.7 Conclusions A significant body of research in marketing has investigated the reference price construct and modeled its impact on brand choice decisions via two major competing approaches, the sticker shock and reference-dependent formulations. The two different views on how reference prices affect consumer choice behavior are well-grounded in psychological theory. In addition, Kalyanaram and Winer (1995), in their review paper, interpret the previous research findings as strong support for the existence of reference price effects. The issue of consumer heterogeneity also has a rich modeling tradition in the mar-keting literature, and several papers have highlighted the importance of accounting for heterogeneity in obtaining unbiased parameter estimates for state-dependent variables such as reference price used in choice models. Most often, modeling consumer hetero-geneity has been limited to accounting for preference and response heterogeneity, i.e., Chapter 4. Decision Rule Heterogeneity 104 allowing for intercepts (preference) and response parameters to vary across households or segments in choice models. However, less attention has been paid to heterogeneity in decision rules, i.e., different households may have different utility functions (i.e., the sticker shock utility and the reference-dependent utility) as well as different response parameters. Building on previous research on reference price effects and the notion of model un-certainty, in the current work we develop a Hierarchical Bayes model of reference price effects that account for both model uncertainty and parametric uncertainty, and estimate the model using the M C M C sampling-based methods. As compared to the comparative analysis of reference price models by Briesch et al. (1997), which assumes that all con-sumers use the same type of reference price, our approach acknowledges that different consumer segments may have different types of reference price, i.e., different decision rules. Our empirical application of the proposed model to the ketchup category shows that consumers differ not only in their preference and response but also in their decision rules. On average, half the sample households appear to show loss aversion and the remaining households do not seem to respond to brand-specific reference prices. The superiority of the proposed model can be attributed to modeling heterogeneity in decision rules across households and has an added advantage of richer description of consumer choice processes as compared to the comparison models that allow for only one model structure and ignore model uncertainty. In a broader sense, all kinds of heterogeneity corrections are reflections of the re-searcher's uncertainty about the true model structure. Therefore, as long as model com-plexity is manageable and the theoretical underpinnings are compelling, the researcher should seek to build a bigger model. Failure to do this may result in wrong inferences and misunderstanding of the nature of the market. Chapter 4. Decision Rule Heterogeneity 105 4.8 Limitations and Future Research One limitation of the current work is that we operationalize reference price as the price of a brand on the last purchase occasion, following the standard approach in previous research. A more general measure, that includes several past prices, could be used. However, as shown in Chapter 2, such an approach seems to have limited impact on the results, and the last price paid has the strongest results. In addition, all models do not fit the holdout period as well as the calibration period, indicating a problem of "overfitting". Chiang et al. (1995) also find that the perfor-mance of the Hierarchical Bayes model deteriorates in the holdout period. Thus, while Hierarchical Bayes models may provide a good description of the data on which they are calibrated, they may not have a strong forecasting performance. Our findings show that it may be worth making more effort in validating Hierarchical Bayes models in marketing. Moreover, it may be important to develop techniques that can enhance the predictive validity of Hierarchical Bayes model. 1 1 Our empirical analysis showed that the market was described by two distinct seg-ments that follow different decision rules characterized by the sticker shock and reference-dependent models. However, we do not know why each segment uses different decision rules. A tentative explanation about the differences is that a consumer segment may follow the sticker shock model because they have sufficient cognitive capacity and moti-vation to recall the shelf-prices observed on the last purchase occasion. In contrast, the other segment may follow the reference-dependent model in order to simplify the deci-sion process and use only one, category-specific, reference price. We can investigate this hypothesis further by extending our model, i.e., using covariates to explain the variation of model probabilities (ph(M = s)) across households. 1 1 An exploratory sensitivity analysis shows that as shrinkage toward the prior increases, the calibration fit deteriorates yet the holdout fit enhances and vice versa. Chapter 4. Decision Rule Heterogeneity 106 As shown in our empirical results, on average, half the sample households appear to show loss aversion while the remaining households do not respond to (brand-specific) reference prices. Therefore, it seems that loss aversion phenomenon is robust to het-erogeneity correction, which is consistent with the results from Murthi and Kalyanaram (1997). In contrast, Briesch et al. (1997) find no loss aversion, in any of the four product categories, after accounting for price response heterogeneity. Bell and Lattin (1996) also show a significant attenuation of loss aversion after heterogeneity correction. As discussed in Chapter 2, their explanation of bias (attenuation of loss aversion) is consistent with our explanation of the underlying cause of a spurious sticker shock effect. Thus, in their work, gains and losses mimic the pattern of price heterogeneity of the underlying seg-ments and play the same role as the sticker shock term does. However, we find that loss aversion persists even after continuously accounting for price response heterogeneity. To understand better the underlying mechanism of bias, if any, in the reference-dependent model, it may be worth deriving analytical results and conducting a simulation study of the kind used in Chapter 2. Analysis of structural uncertainty is certainly not limited to the reference price re-search area. It could take many other directions, wherever there are competing hy-potheses well grounded in theory and supported by data. For example, one issue on consideration set formation is whether or not one brand can belong to a consideration set without affecting the probability of another brand belonging to the consideration set. "Dependence" among inclusion probabilities means that increased inclusion probability for one brand necessarily decreases the sum of inclusion probabilities for the remaining brands. However, it can be argued that any brand can belong to a consideration set independently. A further generalization may be that one segment of consumers may follow the "dependence" assumption and the other segment may follow the "indepen-dence" assumption, depending on their cognitive capacity or willingness to carry brands Chapter 4. Decision Rule Heterogeneity 107 in the consideration set. Essentially, this generalization seeks to account for model un-certainty by incorporating competing hypotheses into a single model. We can deal with this by employing the proposed approach. However, it is not always easy to specify all plausible model structures. Guidance is provided by subject-matter considerations such as accepted theories, expert background knowledge, and examinations of conflicting yet complementary empirical results. A l l this information essentially constitutes model un-certainty. The methods used in this chapter can then be employed to account for this uncertainty and estimate its occurrence. Chapter 4. Decision Rule Heterogeneity 108 Table 4.1: Parameter Estimates and Fit Statistics for H B - S T and - R D models Variables Posterior Means 95% Probability Interval HB-ST B L O Y 3.602 (3.167, 4.037) L B P 0.281 (0.109, 0.452) P R I C E -1.642 (-1.818, -1.465) , R P R I C E -P R I C E 0.030* (-0.117, 0.178) P R O M O 2.429 (2.215, 2.642) Calibration L L Calibration Hit-rate Hold-out L L Hold-out Hit-rate -1251.205 76.20% -571.978 59.92% H B - R D B L O Y 3.610 (3.112, 4.109) L B P 0.108* (-0.141, 0.357) P G A I N 1.438 (1.163, 1.712) PLOSS 1.721 (1.520, 1.921) P R O M O 2.683 (2.268, 3.099) Calibration L L Calibration Hit-rate Hold-out L L Hold-out Hit-rate -1212.397 77.07% -598.751 60.51% *: 0 is included in 95% probability interval. Chapter 4. Decision Rule Heterogeneity Table 4.2: Parameter Estimates and Fit Statistics for HB-STRD model Variables Posterior Means 95% Probability Interval Sticker Shock B L O Y 3.807 (3.536, 4.077) L B P 0.126 (0.001, 0.251) P R I C E -1.616 (-1.947, -1.286) R P R I C E -P R I C E 0.160* (-0.004, 0.324) P R O M O 2.700 (2.418, 2.982) p(M = 1 | •) 0.517 (0.323, 0.711) Reference-Dependent B L O Y 3.730 (3.431, 4.030) L B P -0.021* (-0.203, 0.161) P G A I N 1.589 (1.310, 1.868) PLOSS 1.954 (1.673, 2.235) P R O M O 2.706 (2.484, 2.928) p(M = 2 | •) 0.483 (0.289, 0.677) Calibration L L Calibration Hit-rate Hold-out L L Hold-out Hit-rate -1171.616 78.25% -569.840 60.90% *: 0 is included in 95% probability interval. Chapter 5 Conclusions In this thesis we examined heterogeneity in choice modeling to understand better choice processes and consumer response to the marketing mix. In the introduction we discussed the kinds of heterogeneity that needs to be ac-counted for in disaggregate choice analysis, i.e., parametric uncertainty, consideration set heterogeneity, and structural uncertainty. In a broader sense, all kinds of heterogeneity correction are reflections of the researcher's uncertainty about the true model structure. Parametric uncertainty refers to whether or not consumers share common response pa-rameters, and consideration set heterogeneity, whether or not consumers have the same consideration set while structural uncertainty, whether or not consumers have the same utility function (or decision rule). Failure to account for this uncertainty may result in wrong inferences and misunderstanding of choice behavior. In each subsequent chap-ter, we examined the implications of heterogeneity identified in the introduction and the consequences of ignoring this heterogeneity. Chapter 2 investigates the impact of unaccounted for price response heterogeneity on estimates of the sticker shock effect. The analysis reveals two conditions under which estimates of the sticker shock effect are biased. First, the estimated model must ignore true differences in price sensitivities among consumers. Second, one, or more, of the other explanatory variables used in the analysis should systematically vary with the underlying price responsiveness. This research highlights the role that heterogeneity in purchase timing can play in this process. Based on this analysis, several predictions about 110 Chapter 5. Conclusions 111 the nature of parameter bias are made and confirmed via a simulation study. Simulation results also show that, while the existence of extreme price response heterogeneity can, in itself, cause biased estimates of the sticker shock effect, the implied range of price response variation is much larger than observed in practice. A Hierarchical Bayes version of the nested logit model is proposed and estimated on scanner panel data from two product categories. In contrast to a nested logit model with a sticker shock term, which does not account for price response heterogeneity, the 95% confidence interval of the sticker shock coefficient in the HB-ST model includes the value zero. For both these data sets, therefore, the results do not support the existence of a brand-specific reference price. Household-level category value and price response parameters obtained from the posterior distribution, show that greater responsiveness to category value is associated with higher price sensitivity in the choice decision. Overall, these results provide strong support for our theory of the determinants of sticker-shock bias. Chapter 3 develops and tests a new "fuzzy" consideration set model that can be esti-mated with scanner panel. Unlike previous non-fuzzy (well-defined) set approaches which model uncertainty about the sets from which a brand is chosen, the proposed fuzzy set approach directly models the probability of a brand being included in the consideration set. The resulting inclusion probabilities for brands reflect a "fuzzy" consideration set, in the sense that a brand belongs to the consideration set only probabilistically, in con-trast to the non-fuzzy consideration sets, in which the sets are well-defined but there is uncertainty about the sets from which a brand will be chosen. As compared to an existing fuzzy set model (Bronnenberg and Vanhonacker 1996), in which variables have a compensatory influence on consideration, we model consideration as a non-compensatory process. The empirical application to scanner panel data shows that the proposed fuzzy set model outperforms several previous consideration set models. Chapter 5. Conclusions 112 We apply our modeling approach to examine the impact of advertising on price sen-sitivity through its influence on consideration set size. In contrast to the experimental findings of Mitra and Lynch (1995), we find no evidence for a positive relationship be-tween consideration set size and price sensitivity. Their hypothesized effects do not appear to be carried over to this particular category. To generalize their findings, further investigation may be necessary with other data set. Chapter 4 discusses the general problems of ignoring structural uncertainty (decision rule heterogeneity) and develops a Hierarchical Bayes model of reference price effects, in particular, that accounts for heterogeneity in the way that consumers respond to reference prices in making brand choices. As compared to the comparative analysis of reference price models by Briesch et al. (1997), which assumes that all consumers use the same type of reference price, our approach acknowledges that different consumer segments may have different types of reference price, i.e., brand-specific or category-specific reference price. Our empirical application of the proposed model to the ketchup category shows that consumers differ not only in their preference and response but also in their decision rules. On average, half the sample households show loss aversion (have a category-specific reference price) while the remaining households do not respond to brand-specific reference prices. In other words, no sample households seem to use brand-specific reference prices (which is consistent with the results from Chapter 2) while half of them rely on category-specific reference price. The proposed model provides a richer description of the consumer choice process than the base models that allow for only one model structure and ignore model uncertainty. The objective of this thesis was to examine the consequences of ignoring different types of heterogeneity and to develop models that properly account for them to understand better choice processes and consumer response to the marketing mix. Our work points Chapter 5. Conclusions 113 out some of the limitations of previous studies and shows that it will be necessary to build a model that is sufficiently large to take into account various sources of heterogeneity. While it is generally accepted that scanner panel data analysis should include parametric uncertainty and consideration set heterogeneity, much less attention has been paid to incorporating uncertainty about the model itself. In addition, most often, it is not always easy to pre-specify all plausible model structures. Then we may have to resort to subject-matter considerations such as accepted theories, expert background knowledge, and examinations of conflicting yet complementary empirical results. A l l this information essentially constitutes model uncertainty. In conclusion we should seek to build a bigger model as long as model complexity is manageable and theoretical underpinnings are compelling. Ideally, we should incorporate the kinds of heterogeneity mentioned above in a single model so that it may span the range of plausible hypotheses. Otherwise, we may be overconfident of one hypothesis and accept it without questioning its basic assumptions. Bibliography Andrews, Rick L . and Ajay K . Manrai (1995), "A Conceptual and Empirical Comparison of Two-Stage Discrete Choice Models for Single-Source Scanner Data," working paper, Department of Business Administration, University of Delaware. and T. C. Srinivasan (1995), "Studying Consideration Effects in Empirical Choice Models Using Scanner Panel Data," Journal of Marketing Research, 32(February), 30-41. Bell, David R. and James M . Lattin (1996), "Looking for Loss Aversion in Scan-ner Panel Data: The Confounding Effect of Price-Response Heterogeneity," Research Paper No. 1259, Graduate School of Business, Stanford University. and Randolph E . Bucklin (1998), "The Role of Internal Reference Points in the Category Purchase Decision," Working Paper. Ben-Akiva, Moshe and Steve R. Lerman (1985), Discrete Choice Analysis, Cam-bridge, M A : MIT Press. Bettman, James R. (1979), An Information Processing Theory of Consumer Be-havior, Reading, M A : Addison Wesley. Boulding, William, Eunkyu Lee and Richard Staelin (1994), "Mastering the Mix: Do Advertising, Promotion, and Sales Force Activities Lead to Differentia-tion?," Journal of Marketing Research, 31(May), 159-172. Briesch, Richard A. , Lakshman Krishnamurthi, Tridib Mazumdar, and S. P. Raj (1997), "A Comparative Analysis of Reference Price Models," Journal of Con-sumer Research, 24, 202-214. Bronnenberg, Bart J. and Wilfried R. Vanhonacker (1996), "Limited Choice Sets, Local Price Response, and Implied Measures of Price Competition," Journal of Marketing Research, 33(May), 163-173. Bucklin, Randolph E . and Sunil Gupta (1992), "Brand Choice, Purchase Incidence, and Segmentation: A n Integrated Modeling Approach," Journal of Marketing Research, 24 (May), 201-215. and and S. Siddarth (1998), "Determining Segmentation in Sales Response Across Consumer Purchase Behaviors," Journal of Marketing Research, 35(May), 189-197. 114 Bibliography 115 Carlin, B . P. and S. Chib (1995), "Bayesian Model Choice via Markov Chain Monte Carlo Methods," Journal of the Royal Statistical Society, Series B, 57(3), 473-484. Carlin, Bradeley P. and Thomas A . Louis (1996), Bayes and Empirical Bayes Meth-ods for Data Analysis, Chapman and Hall, London. Chatfield, Chris (1995), "Model Uncertainty, Data Mining and Statistical Infer-ence," Journal of the Royal Statistical Society, Series A , 58, 419-466. Chiang, Jeongwen, Siddartha Chib and Chakravarthi Narasimhan (1995), "Estima-tion of a Heterogeneous Consideration Set Brand Choice Model Using Scanner Panel Data," working paper, Olin School of Business, Washington University. (1998), "Markov Chain Monte Carlo Models of Consideration Set and Parameter Heterogeneity," Journal of Econometrics, forthcoming. Chib, S. and E . Greenberg (1995), "Understanding the Metropolis-Hastings Algo-rithm," American Statistician, 49, 327-335. Chintagunta, Pradeep K . (1994), "Heterogeneous Logit Model Implications for Brand Positioning," Journal of Marketing Research, 31 (May), 304-311. Comanor, William S. and Thomas A . Wilson (1979), "The Effects of Advertising on Competition: A Survey," Journal of Economic Literature, 17(June), 453-476. Cooper, Lee G. and Masao Nakanishi (1988), Market Share Analysis: Evaluating Competitive Marketing Effectiveness. New York: Kluwer Academic Publishers. Dickson, P.R. and A . G . Sawyer (1990), "The Price Knowledge and Search of Su-permarket Shoppers," Journal of Marketing, 54 (July), 42-53. Draper, David (1995), "Assessment and Propagation of Model Uncertainty," Jour-nal of the Royal Statistical Society, Series B, 57(1), 45-97. Elrod, Terry, Richard D. Johnson, and Joan White (1992), "An Additive Model for Characterizing Conjunctive, Disjunctive, Compensatory, and Hybrid (Two Stage) Evaluation and Choice Behavior," working paper, Faculty of Business, University of Alberta. Elrod, Terry and Michael P. Keane (1995), "A Factor-Analytic Probit Model for Representing the Market Structure in Panel Data," Journal of Marketing Re-search, 32 (February), 1-16. Erdem (1996), "A Dynamic Analysis of Market Structure Based on Panel Data," Marketing Science, 15(4), 359-378. Fotheringham, A . S. (1988), "Consumer Store Choice and Choice Set Definition," Marketing Science, 7(3), 299-310. Bibliography 116 Gatignon, Hubert (1984), "Competition as a Moderator of the Effect of Advertising on Sales," Journal of Marketing Research, 21(November), 387-398. Gatignon, Hubert and Dominique M . Hanssens (1987), "Modeling Marketing In-teractions With Application to Salesforce Effectiveness," Journal of Marketing Research, 24 (August), 247-257. Gaudry, Marc J . I. and Marcel G. Dagenais (1979), "The Dogit Model," Trans-portation Research B, 13B(June), 105-111. Gelfand, A . E . and A . F . M . Smith (1990), "Sampling Based Approaches to Cal-culating Marginal Densities," Journal of the American Statistical Association, 85, 398-409. Gelfand, A . E . , S. E . Hills, A . Racine-Poon, and A. F . M . Smith (1990), "Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling," Journal of the American Statistical Association, 90, 972-985. Geman, S., and D. Geman (1984), "Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741. Gelman, A . , John B. Carlin, Hal S. Stern and Donald B. Rubin (1995), Bayesian Data Analysis, London: Chapman and Hall. Gensch, Dennis H . (1987), "A Two Stage Disaggregate Attribute Choice Model," Marketing Science, 14 (3: Part 1), 326-342. Gensch, Dennis H . and Ehsan S. Soofi (1994), "Information-Theoretic Estimation of Individual Consideration Set," working paper, School of Business Adminis-tration, University of Wisconsin-Milwaukee. Geyer, C. J. (1992), "Practical Markov Chain Monte Carlo," Statistical Science, 7, 473-511. Gonul, Fusun and Kannan Srinivasan (1993), "Modeling Multiple Sources of Het-erogeneity in Multinomial Logit Models: Methodological and Managerial Is-sues," Marketing Science, 12 (3), 213-229 Green, Peter J . (1995), "Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination," Biometrika, 82(4), 711-732. Greenleaf, E . A . (1995), "The Impact of Reference Price Effects on the Profitability of Price Promotions," Marketing Science, 14(1), 82-104. Guadagni, P. M and J . D. C. Little (1983), "A Logit Model of Brand Choice Calibrated on Scanner Data," Marketing Science, 2 (3), 203-238. Gupta, Sunil (1988), "Impact of Sales Promotions on When, What, and How Much to Buy," Journal of Marketing Research, 25 (November), 342-355 Bibliography 117 Hardie, B.G.S., E . J . Johnson, and P.S. Fader (1993), "Modeling Loss Aversion and Reference Dependence Effects on Brand Choice," Marketing Science, 12 (4), 378-394. Hauser, John R. and Birger Wernerfelt (1990), "An Evaluation Cost Model of Consideration Sets," Journal of Consumer Research, 16(March), 393-408. Heckman, James J . (1981), "Heterogeneity and State Dependence," in S. Rosen, (Ed.), Studies in Labor Markets, Chicago: University of Chicago Press, 91-139. Helsen, K . and David Schmittlein (1993), "Analyzing Duration Times in Marketing: Evidence for the Effectiveness of Hazard Rate Models." Marketing Science, 12 (4), 395-414. Helson, H . (1964), Adaptation Level Theory, New York: Harper and Row. Jain, D. C. and N . J . Vilcassim (1991), "Investigating Household Purchase Timing Decisions: A Conditional Hazard Function Approach," Marketing Science, 10 (1), 1-23. Kahneman, Daniel and Amos Tversky (1979), "Prospect Theory: A n Analysis of Decision Under Risk," Econometrica, 47 (March), 263-291. Kalwani, Manohar U . , Chi K i n Y im, Heikki J . Rinne, and Yoshi Sugita (1990), "A Price Expectations Model of Customer Brand Choice," Journal of Marketing Research, 27 (August), 251-262. Kalyanam, Kir thi (1996), "Pricing Decisions Under Demand Uncertainty: A Bayesian Mixture Model Approach," Marketing Science, 15(3), 207-221. Kalyanaram, Gurumurthy and J. D. C. Little (1994), "An Empirical Analysis of Latitude of Price Acceptance in Consumer Package Goods," Journal of Con-sumer Research, 21 (December), 408-418. and Russell S. Winer (1995), "Empirical Generalizations from Reference Price Research," Marketing Science, 14 (3), Part 2 of 2, G161-G169. Kamakura, Wagner A . and Gary J. Russell (1989), "A Probabilistic Choice Model for Market Segmentation and Elasticity Structure," Journal of Marketing Re-search, 26 (November), 379-390. —, Byung-Do K i m and Jonathan Lee (1996), "Modeling Preference and Structural Heterogeneity in Consumer Choice," Marketing Science, 15 (2), 152-172 Kanetkar, Vinay, Charles B. Weinberg and Doyle L . Weiss (1992), "Price Sensitivity and Television Advertising Exposures: Some Empirical Findings," Marketing Science, 11(4), 359-371. Bibliography 118 Kass, Robert E . and Duane Steffey (1989), "Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models)," Journal of the American Statistical Association, 84, 717-726. Kopalle, P. K . , Ambar G. Rao and Joao L . Assuncao (1996), "Asymmetric Ref-erence Price Effects and Dynamic Pricing Policies," Marketing Science, 15(1), 60-85. Krishnamurthi, Lakshman and S. P. Raj (1985), "The Effect of Advertising on Consumer Price Sensitivity," Journal of Marketing Research, 22 (May), 119-129. Lattin, J . M . and R.E. Bucklin (1989), "Reference Effects of Price and Promotion on Brand Choice Behavior," Journal of Marketing Research, 26 (August), 299-310. Lavidge, Robert J. and Gary A. Steiner (1961), "A Model for Predictive Measure-ments of Advertising Effectiveness," Journal of Marketing, 25(October), 59-62. Lambin, Jean J . (1976), Advertising, Competition and Market Conduct in Oligopoly Over Time, Amsterdam: North Holland. Lehmann, Donald R. and Yigang Pan (1994), "Context Effects, New Brand Entry and Consideration Sets," Journal of Marketing Research, 31 (August), 364-374. Manchanda, Puneet, Asim Ansari, and Sunil Gupta (1997), "The "Shopping Bas-ket" : A Model for Multi-Category Purchase Incidence Decisions," working pa-per, the Graduate School of Business, Columbia University. Mayhew, G. E . and Russell S. Winer (1992), "An Empirical Analysis of Internal and External Reference Prices using Scanner Data," Journal of Consumer Research, 19 (June), 62-70. McFadden, D. (1974), "Conditional Logit Analysis of Qualitative Choice Behavior," pp. 105-142 in Frontiers in Econometrics, P. Zarembka (ed.), Academic Press: New York. Meyer, Robert J. and Barbara E. Kahn (1991), "Probabilistic Models of Consumer Choice Behavior," in Handbook of Consumer Behavior, T. S. Robertson and H . H . Kassarjian, eds., Englewood Cliffs, NJ : Prentice-Hall Inc. 85-123. Mitra, Anusree and John G. Lynch (1995), "Toward a Reconciliation of Market Power and Information Theories of Advertising Effects on Price Elasticity," Journal of Consumer Research, 21(March), 644-659. Murthi, B . P. S. and G. Kalyanaram (1997), "An Empirical Analysis of Asymme-try in Latitude of Price Acceptance," working paper, School of Management, University of Texas at Dallas. Nedungadi, Prakash (1990), "Recall and Consumer Consideration Sets: Influencing Choice Without Altering Brand Evaluations," Journal of Consumer Research, 17(December), 263-276. Bibliography 119 Nelson, Philip (1974), "Advertising as Information," Journal of Political Economy, 81(July-Aug.), 729-754. Papatla, Purushottam (1995), "A Dynamic Model of the Advertising-Price Sen-sitivity Relationship for Heterogeneous Consumers," Journal of Business Re-search, 33, 261-271. Pedrick, James H . and Fred S. Zufryden (1991), "Evaluating the Impact of Adver-tising Media Plans: A Model of Consumer Purchase Dynamics Using Single-Source Data," Marketing Science, lO(Spring), 111-130. Popkowski-Leszczyc, P. T., and Ram C. Rao (1989), "An Empirical Analysis of National and Local Advertising Effects on Price Elasticity," Marketing Letters, 1(2), 149-160. Prasad, V . Kanti and L . Winston Ring (1976), "Measuring Sales Effects of Some Marketing Mix Variables and Their Interactions," Journal of Marketing Re-search, 13 (November), 391-396. Putler, D. (1992), "Incorporating Reference Price Expectations into Theory of Consumer Choice," Marketing Science, 11 (Summer), 287-309. Raftery, A . E . , and S. M . Lewis (1995), "How Many Iterations of the Gibbs Sam-pler," in Bayesian Statistics 4, eds. J . M . Bernardo, J . O. Berger, A . P. Davis and A . F . M . Smith, 641-649, Oxford University Press, Oxford. Rajendran, K . N . and G.J . Tellis (1994), "Contextual and Temporal Components of Reference Price," Journal of Marketing, 58 (January), 22-34. Raman, K . and F . M . Bass (1988), "A General Test of Reference Price Theory in the Presence of Threshold Effects," Working Paper, The University of Texas at Dallas. Roberts, John H . and James M . Lattin (1991), "Development and Testing of a Model of Consideration Set Composition," Journal of Marketing Research, 28(November), 429-441. Rossi, Peter E . and Greg M . Allenby (1993), "A Bayesian Approach to Estimating Household Parameters," Journal of Marketing Research, 30(May), 171-182. , Robert E . McCulloch, and Greg M . Allenby (1996), "The Value of Purchase History Data in Target Marketing," Marketing Science, 15(4), 321-340. Shocker, Allan D., Moshe Ben-Akiva, Bruno Boccara, and Prakash Nedungadi (1991), "Consideration Set Influences on Consumer Decision Making and Choice; Issues, Models and Suggestions," Marketing Letters, 2(3), 181-198. Siddarth, S., Randolph E . Bucklin, and Donald G. Morrison (1995), "Making the Cut: Modeling and Analyzing Choice Set Restriction in Scanner Panel Data," Journal of Marketing Research,32(August), 255-266 Bibliography 120 Swait, Joffre and Moshe Ben-Akiva (1987), "Incorporating Random Constraints in Discrete Models of Choice Set Generation," Transportation Research B, 21B (2), 91-102. Tanner, T. and W. Wong (1987), "The Calculation of Posterior Distributions by Data Augmentation," Journal of the American Statistical Association, 82, 528-549. Tellis, Gerard J. (1988a), "The Price Elasticity of Selective Demand: A Meta-Analysis of Econometric Models of Sales," Journal of Marketing Research, 25(November), 331-341. (1988b), "Advertising Exposure, Loyalty and Brand Purchase: A Two-Stage Model of Choice," Journal of Marketing Research, 25(May), 134-144. Tierney, L . (1994), "Markov Chains for Exploring Posterior Distributions," Annals of Statistics, 22, 1701-62. Tversky, Amos (1972), "Elimination By Aspects: A Theory of Choice," Psycholog-ical Review, 79, 281-299. Tversky, Amos and Daniel Kahneman (1991), "Loss Aversion and Riskless Choice: A Reference-Dependent Model," The Quarterly Journal of Economics, 106 (November), 1039-61. Winer, R.S. (1986), "A Reference Price Model of Brand Choice for Frequently Purchased Products," Journal of Consumer Research, 13 (September), 250-256. Wittink, D. Richard (1977), "Advertising and Competition," Journal of Marketing Research, 10 (November), 428-441. Yatchew, A . , and Z. Griliches (1984), "Specification Error in Probit Models," Re-view of Economics and Statistics, 66, 1984, pp. 134-139. Appendix A Derivation of the Conditions Underlying Biased Sticker Shock Effects Glossary of Symbols Symbols Meaning « i a vector of intercepts in the true model Pi a vector of intercepts in the misspecified model Pc a common price parameter in the misspecified model Pu a price parameter for high sensitive segment in the true model Pd + Phi a price parameter for low sensitive segment in the true model Prp a sticker shock parameter in the misspecified model 121 Appendix A. Derivation of the Conditions Underlying Biased Sticker Shock Effects 122 A . l True model specification We begin by writing the utility specification of the true model (Equation (2.3)), for a market with n brands, in a general form as, y = Xf3 + u = (xi:x2:x3:x4) fihi 0 + u. (A.1) X t = = ( x i t : x 2 t : x 3 t : x 4 t ) , (A.2) With the usual assumptions, E(u) = 0 , E(uu') = a2I and E(X.'u) — 0 . The design matrix for one purchase occasion can be written as: Pit (RP?t-Plt) SEGh-Plt It i : Pnt (RP*-Pnt) SEGhPnt where, It represents a matrix of the dummy variables of the n brands1, Pu is price of brand i on occasion t, RP£ is the reference price of brand i and SEG'* is a dummy variable that takes the value 0 if household h is price-sensitive, and is 1 otherwise. Also, as discussed in the paper, ot,\ is a vector of intercepts, fid is assumed to be positive and, together with fihi, captures the price response of the insensitive segment. The above formulation thus represents the underlying true model consisting of two segments, each with distinctive price sensitivities, but unaffected by reference price since the parameter for the sticker shock term, X 3 , is set to zero. A.2 Misspecified model The mis-specified model omits x 4 , i.e., ignores differences in price sensitivity, but in-cludes a sticker shock term, X 3 and its associated parameter firp. The analyst, therefore, Unlike in the logit model, all n brand-specific intercepts are included in the model. Appendix A. Derivation of the Conditions Underlying Biased Sticker Shock Effects 123 estimates the following model: y = ( x i ! x 2 : x 3 ) Pc +«, (A.3) \Prpj where P\ is the vector of parameters corresponding to the brand-specific constants and Pc is the common price coefficient. Let X e = ( x i :x2 :x3) , denote the matrix of variables used in the estimated model. The estimated parameter vector in an OLS framework is given by Pc Substituting for y from Equation (AA) gives = ( X ^ X e r X y (A.4) Pc = (x;xe)-1x;x^-r-(x;x e)- 1x;u = (X;X e)-1X;(X e:x4) = (X^XePX^X « 1 Phi V 0 / Phi 0 {pd) \ + (x;x e)- xx; u + ( x i x , ) - ^ ^ + (x;xe)-1x;u(A.5) Appendix A. Derivation of the Conditions Underlying Biased Sticker Shock Effects 124 The expectations of the estimated parameters can be written as: < «C Phi V 0 ) + ( x ; x e ) - 1 x ; x 4 0 d (A.6) E\ pc \PrpJ The above equation reveals two sources for the bias in Prp. The first is unaccounted for price response heterogeneity, i.e., a positive value for Pd- The second is the term ( X g X e ) _ 1 X g X 4 , which can be interpreted as the parameter estimates obtained by re-gressing X4 on X e . If we stack the observations, so that those from the low price-sensitive segment follow those from the high price-sensitive segment, the regression equation can be written as: I o ^ x 4 = = X i 0 i + X 2 0 2 4" X 3 0 3 + U (A.7) Pi Pi \ ''• J where Pi and . . . are the prices of the brands on any purchase occasion. Note that the elements of x 4 are a string of zeros in the first rows and a string of positive numbers i n the last rows. Under what conditions wi l l the term ( X g X e ) _ 1 X e X 4 lead to significance? From the above equation it is clear that this can happen if any of the explanatory variables x i , x 2 , or X3 mimic the pattern of the dependent variable X4. The variable X3 (the sticker shock term) is of particular interest to us. Since X 3 = R P — x 2 , we can rewrite the regression equation as follows: X 4 = X X 0 ! + X 2 (0 2 - 03) + RP03 + U (A.8) Appendix A. Derivation of the Conditions Underlying Biased Sticker Shock Effects 125 We see that if the first rows of RP are lower than the last rows in RP then this can cause #3 to be positive and significant, and thus result in a biased estimate of Prp. As we argue in the paper, heterogeneity in purchase timing behavior and its relationship to the underlying price sensitivity introduces precisely this pattern in RP, i.e., collectively lower reference prices in the price-sensitive segment than in the price-insensitive segment. In summary, our derivation establishes two conditions for a spurious sticker shock co-efficient. First, there should be unaccounted for price response heterogeneity. In addition, the vector of explanatory variables X e should not be orthogonal to X 4 , i.e., X^ X 4 ^ 0. The magnitude of bias in the sticker shock coefficient is given by the expression: E ( «i \ Pu 0 + V u / 02 Pd (A.9) Pc \PrpJ Although not the focus of the present paper, misspecification can also lead to biased estimates of Pi and Pc. For a more general specification that includes more than 2 segments, the magnitude of the bias is given by a straightforward extension of Equation ( A 6) as follows: E Phi 0 + (X^X e)- 1X^x 4^ 1 + (X^Xe)-1X;x5/?d2 + . . •, (A.10) Pc where X 4 , X 5 , and .. . are dummy variables corresponding to each price-sensitive segment. V u / Appendix B Priors and the Markov Chain Monte Carlo Simulations B.l Priors We specify a multivariate normal prior over household parameters, B h as follows: 0h~ MVN(n, 27) (B.l) fj, and 27 are to be estimated. Following Gelfand et. al. (1990), we put hyper-priors over population parameters, \ i and 27 as follows: /x ~ MVN(r), C) (B.2) S-^W((pRy\p), (B.3) where r}=0, C-1=0, R=I, and p is a degree of freedom for Wishart distribution and fixed at 16 (a number of parameters plus one). All these parameters are pre-specified to represent diffuse (non-informative) but proper distribution. B.2 Full Conditionals and Simulations B.2.1 Generating 0h(™+i) Given the simulated f j , ^ and 27-1(m), we sample household-specific 0h(™+i) from the following full conditional distribution: v(Qh(m+i) | /i(m)j 2 7 M ) oc L(6h(m+1))4>(0h(m+l) I / i ( m ) , 27(m)) (B.4) 126 Appendix B. Priors and the Markov Chain Monte Carlo Simulations 127 where L(0*(m+l))=171=1 Pth(tnc)*?(l - Pth(inc))l~s? TJi [Ptfc(«|tnc)]*S], 8% = 1 if house-hold h bought brand t on a store visit at time t, and zero otherwise, 8h = 6# and <f>(- | •) is a multivariate normal density. This posterior distribution is known only up to a normalizing constant since it is a product of a normal prior and logit likelihood. Therefore, we adopt the random walk version of the Metropolis-Hastings algorithm (Tierney 1994, Chib and Greenberg 1995, Green 1995). In this version, given the previous draw 0h(m) in the chain, a candidate draw 0h^ is simulated as: Qh{c) = Qh{m) + u where u ~ MVN(0, V), V is a pre-specified matrix (In our application, following Chiang et al. (1998), we let V be twice the negative inverse of the observed information matrix of the NL-ST likelihood.). Then 0h^ is accepted as 0'»(m+1) in the chain with the following probability: T L(0h^)6(0h^ I u,(m\ IJW) min — — • — 1 [L{Oh^)(t>{GhW | p(m\ 27<m))' Therefore, we do not need to know the normalizing constant in simulating draws. If the candidate draw is rejected, we set fl'*"*1) = 0h(m). B.2.2 Generating ^m + 1) Given the simulated {0h^} across households and S -i(m)( ^ (m+i) i g drawn from the following full conditional distribution (a posterior multivariate normal). ?(A*(m+l) I {0h(m)}, V, C) = MVN \D{HS-<mW + C _ 1i|) f b\ (B.5) where D=(HIJ-^+C-1)-1, 0 = j; Eh=i 0 / l ( m ) a n d H i s t h e t o t a l number of house-holds. Appendix B. Priors and the Markov Chain Monte Carlo Simulations 128 B.2.3 Generating I T 1 ^ 1 ) Given the simulated {0fc(m>} across households and iiSm\ we simulate E~^m+"^ from the following full conditional distribution (a posterior Wishart). p(2;-i(m+i) | { ^ W ^ W ^ ^ = w - i - l J~J(0*("O _ ^(m))(0hM _ + pR U=i (B.6) Appendix C Priors and the Markov Chain Monte Carlo Simulations C.l Priors and Pseudopriors C.l . l Priors on Model Probabilities A h~ 7J(a1)Cv2 )a3) (C.l) We use fairly diffuse priors (i.e., a priori, we do not favor any model) by setting ctB = |. This is equivalent to a prior sample size of 1. C.l.2 Priors on (3^\ M = s We specify independent normal priors on (3^, for s=l, 2, 3 as follows: ft | M = s ~ MVN(fis, 2.) (C.2) fi3 and 23 are to be estimated. Following Gelfand et al. (1990), we put hyper-priors over population parameters, fj,s and 23 as follows: fj.s ~ MVN{r)s, Ca) (C.3) £;*~W{{psRs)-\ps), (C.4) where rjs—0, Cj1=0, RS=I, and ps is a degree of freedom for Wishart distribution and equal to the dimension of parameter vector plus one.1 A l l these parameter values are pre-specified to represent diffuse (non-informative) but proper distribution. 1The dimension of /z3 is twice the dimension of p,x and /x3 (since variables and intercepts used in both Structure 1 and 2 are required to represent Structure 3), plus one for the mixing parameter 7*. 129 Appendix C. Priors and the Markov Chain Monte Carlo Simulations 130 C.l.3 Pseudopriors on f3h\ M ^ s I f M ^ s (i.e., if the simulated model indicator points to a structure other than Structure s), then the likelihood for /3h is undefined in our application and posterior updating can not be implemented. To circumvent this problem and to maintain the Markov chain, we need to specify the "pseudopriors" or linking density from which we can generate (3h (See Carlin and Chib 1995 for implementational details). Following Carlin and Chib (1995), as the linking densities, we use estimates of the model-specific posterior distributions y((3h \ M = s, yh) that can be obtained from individual M C M C runs in which the models with one structure (without structural un-certainty component) are estimated. Matching the linking densities as nearly as pos-sible to the true model-specific posteriors can generate a reasonably well mixing final algorithm since failure to specify reasonable densities will result in slow convergence or making jumps between M = s and M ^ s extremely unlikely in subsequent iterations. It should be noted that we are not using the data to help to select the prior, but only the pseudoprior. C.2 Full Conditionals and Simulations C.2.1 Generating Model Indicators Draw M from a multinomial distribution with zh, the posterior probability vector of household h. (m + l ) th zh is updated in the following fashion, from Xh^: xh^m)f(yh \(3h,M = s) fc) nlM0h \M = k)] (C5) where f(yh \ (3h, M = s) is the likelihood for household h. Appendix C. Priors and the Markov Chain Monte Carlo Simulations 131 C.2.2 Generat ing \ h ( m + 1 ) Generate a Xh draw from the posterior Dirichlet distribution, D(ct\,..., as + Th,... ,ocs), where M = s and Th is total number of purchases made by household h. C.2.3 Generat ing / 3 ^ m + 1 ) If M = s, given the simulated yS™) and S~l(jn\ we sample household-specific (3^m+1) from the following full conditional distribution: where »^(- | •) is a multivariate normal density. This posterior distribution is known only up to a normalizing constant since it is a product of a normal prior and logit likelihood. Therefore, we adopt the random walk version of the Metropolis-Hastings algorithm (Tierney 1994, Chib and Greenberg 1995, Green 1995). In this version, given the previous draw ph(-m^ in the chain, a candidate draw /3^(c) is simulated as: where us ~ MVN(0, V a ) , is a pre-specified matrix. 2 Then is accepted as ph(.m+1) in the chain with the following probability: f(yh | P«C\M = s)4>{(3^ | M("0, 27(")) • f(yh | Pfm\ M = s)(j>{p^ I s^y Therefore, we do not need to know the normalizing constant in simulating draws. If the candidate draw is rejected, we set pMrn+x) =ph{m)^ 2 I n our application, following Chiang et al. (1998), we let Va be twice the negative inverse of the observed information matrix estimated on the single-segment multinomial logit models of sticker shock (M = 1), reference-dependent formulations (M = 2), and mixing rules (M = 3), respectively, using the maximum likelihood estimation. min Appendix C. Priors and the Markov Chain Monte Carlo Simulations 132 If M ^ s, we generate (3h draws from the pseudoprior, p(Ph \ M ^ s) discussed in C.1.3. Therefore, when M = s we generate from the usual full conditional; when M ^ s we generate from a "pseudoprior", the linking density (Carlin and Chib 1995). C.2.4 Generating fi3m+l) Given the simulated { ^ m ) } across households and 27 -i(m)) ^ (m+i) jg drawn from the following full conditional distribution (a posterior multivariate normal). P(psm+1) I { A h ( m ) } , ^ m ) , Vs, C,) = MVN [D(HS~^mWs + C^Vs), D] (C.7) where D = ( H + C 7 1 ) - 1 , ~pl = £ EiL/^ ( m ) and H is the total number of house-holds. C.2.5 Generating 2^™+^ Given the simulated {(3h^m)) across households and H S M \ we simulate 27- l («+0 from the following full conditional distribution (a posterior Wishart). •H r 1 £ ( # m ) - »sm))(P?m) - Mm))' + PsRs ,H + P s (C.8)
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Essays on heterogeneity in choice modeling
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Essays on heterogeneity in choice modeling Chang, Kwangpil 1998
pdf
Page Metadata
Item Metadata
Title | Essays on heterogeneity in choice modeling |
Creator |
Chang, Kwangpil |
Date Issued | 1998 |
Description | This thesis includes three essays which examine the implications of incorporating parameter heterogeneity, consideration set heterogeneity, and decision rule heterogeneity, respectively, in brand choice models. In the first essay, we identify the conditions under which unaccounted for price response heterogeneity results in a spurious sticker shock effect. We show, using an analytical derivation, a simulation study and an empirical application to scanner panel data, that estimates of the sticker shock effect may be biased if households that are price sensitive in their brand choice decision are also more likely to respond to category marketing activity in their purchase timing decision. The empirical results, from two product categories, show that the sticker shock coefficient from a Hierarchical Bayes model (which continuously accounts for price response heterogeneity) is statistically insignificant, providing no evidence of the existence of a sticker shock effect. In contrast, the corresponding coefficient from the standard model, which ignores this heterogeneity, is highly significant and supports the existence of a sticker shock effect. A posterior analysis of household parameters confirms the hypothesized relationship between price sensitivity in brand choice and responsiveness to promotional activity in purchase incidence, and is consistent with our explanation of the underlying cause of the bias in the standard model. The second essay develops a new consideration set model that can be estimated with scanner panel data. In contrast to many previous approaches, which require enumeration of all possible consideration sets, we directly model uncertainty about including a brand in the consideration set. The resulting inclusion probabilities for brands reflect a "fuzzy" consideration set in the sense that a brand belongs to the consideration set only probabilistically. The proposed fuzzy set model outperforms several previous consideration set models in two product categories (yogurt and ketchup). We then apply the fuzzy set approach to examine the role of the consideration set in moderating the impact of advertising on price sensitivity. In contrast to the experimental findings of Mitra and Lynch (1995), we find no positive relationship between consideration set size and price sensitivity. Further empirical test may be necessary to confirm the hypothesized relationship. In the third essay, we investigate the role of decision rule heterogeneity in brand choice behavior. We develop a flexible model, which allows for the uncertainty in decision rules used by the consumer. Specifically, we develop a Hierarchical Bayes model of reference price effects that accommodates both the sticker shock and reference-dependent formulations. In addition, we also incorporate the possibility that consumers may mix the two decision rules probabilistically. Therefore, the proposed model allows for three different decision hierarchies which incorporate sticker shock, reference-dependent and mixed rules respectively. The empirical results show that consumers differ not only in their preference and response but also in their decision rules. On average, half the sample households appear to show loss aversion, i.e., follow a reference-dependent decision rule, while the remaining households do not seem to respond to reference prices. The proposed model provides a richer description of consumer choice processes than the comparison models that allow for only one model structure and ignore model uncertainty. |
Extent | 6078336 bytes |
Subject |
Brand choice -- Mathematical models Consumer behavior -- Mathematical models |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-06-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0089128 |
URI | http://hdl.handle.net/2429/9492 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
GraduationDate | 1998-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1998-345378.pdf [ 5.8MB ]
- Metadata
- JSON: 831-1.0089128.json
- JSON-LD: 831-1.0089128-ld.json
- RDF/XML (Pretty): 831-1.0089128-rdf.xml
- RDF/JSON: 831-1.0089128-rdf.json
- Turtle: 831-1.0089128-turtle.txt
- N-Triples: 831-1.0089128-rdf-ntriples.txt
- Original Record: 831-1.0089128-source.json
- Full Text
- 831-1.0089128-fulltext.txt
- Citation
- 831-1.0089128.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0089128/manifest