@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix dc: . @prefix skos: . vivo:departmentOrSchool "Business, Sauder School of"@en, "Finance, Division of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Smith, Daniel Robert"@en ; dcterms:issued "2009-10-05T18:42:57Z"@en, "2002"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """This thesis consists of two essays which contribute to different but related aspects of the empirical asset pricing literature. The common theme is that incorrect restrictions can lead to inaccurate decisions. The first essay demonstrates that failure to account for the Federal Reserve experiment can lead to incorrect assumptions about the explosiveness of short-term interest rate volatility, while the second essay demonstrates that we need to incorporate skewness to develop models that adequately account for the cross-section of equity returns. Essay 1 empirically compares the Markov-switching and stochastic volatility diffusion models of the short rate. The evidence supports the Markov-switching diffusion model. Estimates of the elasticity of volatility parameter for single-regime models unanimously indicate an explosive volatility process, whereas the Markov-switching models estimates are reasonable. We find that either Markov-switching or stochastic volatility, but not both, is needed to adequately fit the data. A robust conclusion is that volatility depends on the level of the short rate. Finally, the Markov-switching model is the best for forecasting. A technical contribution of this paper is a presentation of quasi-maximum likelihood estimation techniques for the Markov-switching stochastic-volatility model. Essay 2 proposes a new approach to estimating and testing nonlinear pricing models using GMM. The methodology extends the GMM based conditional mean-variance asset pricing tests of Harvey (1989) and He et al (1996) to include preferences over moments higher than variance. In particular we explore the empirical usefulness of the conditional coskewness of an assets return with the market return in explaining the cross-section of equity returns. The methodology is both flexible and parsimonious. We avoid modelling any asset specific parameters and avoid making restrictive assumptions on the dynamics of co-moments. By using GMM to estimate the models' parameters we also avoid making any assumptions about the distribution of the data. The empirical results indicate that coskewness is useful in explaining the cross-section of equity returns, and that both covariance and coskewness are time varying. We also find that the usefulness of coskewness is robust to the inclusion of Fama and French's (1993) SMB and HML factor returns. There is an interesting debate raging in the empirical asset pricing literature comparing the SDF versus beta methodologies. This paper's technique is a conditional version of the beta methodology, which turns out to be directly comparable with the SDF methodology with only minor modifications. Our SDF version imposes the CAPM's restrictions that the coefficients in the pricing kernel are known functions of the moments of market returns, which are modelled using macro-variables. We find that the SDF implied by the three-moment CAPM provides a better fit in this data set than current practice of parameterizing the coefficients on market returns in the SDF. This has an interesting application to the current SDF versus beta methodology debate."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/13589?expand=metadata"@en ; dcterms:extent "6075504 bytes"@en ; dc:format "application/pdf"@en ; skos:note "Essays in Empirical Asset Pricing by Daniel Robert Smith B. Bus. (Hons), Queensland University of Technology, 1996 M . Bus., Queensland University of Technology, 1999 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in The Faculty of Graduate Studies Finance Division Faculty of Commerce and Business Administration We accept this thesis as conforming to the required standard The University of British Columbia July 2002 © Daniel Robert Smith, 2002 In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my depart-ment or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Finance, Faculty of Commerce and Business Administration. The University of British Columbia Vancouver, Canada Abstract ii Abstract T his thesis consists of two essays which contribute to different but related aspects of the empirical asset pricing literature. The common theme is that incorrect restrictions can lead to inaccurate decisions. The first essay demonstrates that failure to account for the Federal Reserve experiment can lead to incorrect assumptions about the ex-plosiveness of short-term interest rate volatility, while the second essay demonstrates that we need to incorporate skewness to develop models that adequately account for the cross-section of equity returns. Essay 1 empirically compares the Markov-switching and stochastic volatility diffu-sion models of the short rate. The evidence supports the Markov-switching diffusion model. Estimates of the elasticity of volatility parameter for single-regime models unanimously indicate an explosive volatility process, whereas the Markov-switching models estimates are reasonable. We find that either Markov-switching or stochastic volatility, but not both, is needed to adequately fit the data. A robust conclusion is that volatility depends on the level of the short rate. Finally, the Markov-switching model is the best for forecasting. A technical contribution of this paper is a presen-tation of quasi-maximum likelihood estimation techniques for the Markov-switching stochastic-volatility model. Essay 2 proposes a new approach to estimating and testing nonlinear pricing models using G M M . The methodology extends the G M M based conditional mean-variance asset pricing tests of Harvey (1989) and He et al (1996) to include preferences over moments higher than variance. In particular we explore the empirical usefulness of the conditional coskewness of an assets return with the market return in explaining the cross-section of equity returns. The methodology is both flexible and parsimo-nious. We avoid modelling any asset specific parameters and avoid making restrictive assumptions on the dynamics of co-moments. By using G M M to estimate the models' parameters we also avoid making any assumptions about the distribution of the data. The empirical results indicate that coskewness is useful in explaining the cross-section of equity returns, and that both covariance and coskewness are time varying. We also find that the usefulness of coskewness is robust to the inclusion of Fama and French's (1993) SMB and HML factor returns. There is an interesting debate raging in the empirical asset pricing literature com-paring the SDF versus beta methodologies. This paper's technique is a conditional version of the beta methodology, which turns out to be directly comparable with the SDF methodology with only minor modifications. Our SDF version imposes the CAPM's restrictions that the coefficients in the pricing kernel are known functions of the moments of market returns, which are modelled using macro-variables. We find that the SDF implied by the three-moment C A P M provides a better fit in this data set than current practice of parameterizing the coefficients on market returns in the SDF. This has an interesting application to the current SDF versus beta methodology debate. Table of Contents iii Table of Contents Abstract i i Table of Contents i i i List of Tables v List of Figures v i Preface v i i Acknowledgements v i i i Essay 1: M S S V Interest Rate Models 1 1.1 Introduction 1 1.2 Models of Interest Rate Volatility Dynamics 3 1.2.1 Diffusion Models 3 1.2.2 Generalized ARCH Models 5 1.2.3 Stochastic Volatility 6 1.2.4 Markov Switching 8 1.2.5 Markov-Switching Stochastic Volatility 11 1.3 Data 12 1.4 Empirical Results 15 1.4.1 Basic Diffusion Model 17 1.4.2 Stochastic Volatility 18 1.4.3 The Markov-Switching Diffusion Model 22 1.4.4 A Markov-Switching Stochastic Volatility Diffusion Model . . 26 1.5 Comparing the Models 27 1.6 Conclusions 32 L A Appendix: Quasi-Maximum Likelihood Estimation 33 l . A . l Stochastic Volatility Model 33 1.A.2 Markov-Switching Models 35 1.A.3 Markov-Switching Stochastic Volatility Models 36 l.A.3.1 Smoothing 38 References 38 Essay 2: Conditional Coskewness and Asset Prices 43 2.1 Introduction 43 2.2 Model Development and Motivation 45 2.2.1 The Two-Moment C A P M 45 2.2.1.1 An Empirical Specification 48 Table of Contents iv 2.2.1.2 Modelling Mean and Variance, or Mean and Price of Covariance Risk 51 2.2.2 The Three-Moment C A P M 52 2.2.2.1 An Empirical Implementation 55 2.2.2.2 An Alternative Empirical Specification 57 2.2.3 Multi-Moment Extension 59 2.3 Data and Estimation Methodology 61 2.4 Empirical Results 69 2.5 Multi-Factor Asset Pricing 78 2.6 Three Factors or Three Moments? or Both?? 82 2.7 Further Specification Tests 90 2.7.1 Time Varying Alphas 90 2.7.2 Structural Breaks 91 2.8 Comparison with Dittmar (2001) 95 2.9 Conclusions 101 2.A Appendix: Conditional Asset Pricing 103 References 105 List of Tables v Lis t of Tables 1.1 Summary Statistics and Diagnostic Tests on 30-day Treasury Bill yields and Residuals From AR(1) Model 15 1.2 Parameter Restrictions Imposed by Different Volatility Models 16 1.3 Parameter Estimates of Mean Equation 17 1.4 Maximum Likelihood Parameter Estimates of CKLS Model . 17 1.5 Parameter Estimates of CKLS Model 18 1.6 Parameter Estimates of Stochastic Volatility Model 20 1.7 Parameter Estimates of Markov-Switching Model 23 1.8 Parameter Estimates of Markov-Switching Stochastic Volatil-ity Model 27 1.9 Diagnostic Tests on the Standardized Residuals 30 1.10 In-Sample and Out-of-Sample Specification Tests 31 2.1 Portfolio Return Predictability 63 2.2 Predictability of Conditional Covariance 67 2.3 Predictability of Conditional Coskewness 68 2.4 Parameter Estimates of the Two-Moment Model 70 2.5 Parameter Estimates of Full Three-Moment Model 72 2.6 Two- and Three-Moment C A P M Parameter Estimates: Fixed Weights 76 2.7 Parameter Estimates of the Sign-Corrected Three-Moment C A P M 79 2.8 Parameter Estimates of the Fama-French Three-Factor Asset Pricing Model 83 2.9 Parameter Estimates of Fama-French's Three-Factor Model with Fixed Weighting Matrix 84 2.10 Parameter Estimates of the Combined Three-Factor and Three-Moment Asset Pricing Model 86 2.11 Comparing Various Specifications of the Asset Pricing Models 88 2.12 Comparing Various Specifications of the Asset Pricing Mod-els: Equally Weighted Moments 89 2.13 Lagrange Multiplier Tests for Time-Varying Intercepts: In-dustry Portfolios 92 2.14 Testing for Structural Breaks 95 2.15 Stochastic Discount Factor Estimation of Industry Data . . . 100 List of Figures vi Lis t of Figures 1.1 Annualized 30-Day Treasury Bill Yields, July 1962 to December 1996 13 1.2 Change in Annualized 30-Day Treasury Bill Yields, July 1962 to December 1996 14 1.3 Conditional Standard Deviation from Diffusion Model, July 1962 to December 1989 19 1.4 Conditional Standard Deviation from the Stochastic Volatil-ity Diffusion Model, July 1962 to December 1989 21 1.5 Conditional Standard Deviation from the Markov-Switching Diffusion Model, July 1962 to December 1989 24 1.6 Plot of Probabilities of Low-Volatility Regime in the Markov-Switching Diffusion Model, July 1962 to December 1989. . . 25 1.7 Conditional Standard Deviation from the Markov-Switching Stochastic Volatility Diffusion Model, July 1962 to December 1989 28 1.8 Plot of Probabilities of Low-Volatility Regime in the Markov-Switching Stochastic Volatility Diffusion Model, July 1962 to December 1989 29 Preface vii Preface A paper based on Essay 1 of this thesis, \"Markov-Switching and Stochastic Volatility Diffusion Models of Short-Term Interest Rates\" was published in April 2002 in the Journal of Business and Economic Statistics (Vol. 20, No. 2., pp. 183-197). Acknowledgements viii Acknowledgements I would like to acknowledge the love and support of my wife Nicole. I thank my thesis supervisory committee Murray Carlson, John Cragg, Glen Donaldson, and Allan Kraus for their excellent advice and guidance throughout the program. I thank Kai Li for providing the data used in Essay 1, and Ken French for providing the data used in Essay 2 online at http://mba.tuck.dartmouth.edu/pages/faculty/ ken.french/data_library.html. I thank seminar participants at the University of British Columbia and the Queensland University of Technology for comments on Essay 1, and Simon Fraser University for comments on Essay 2. The comments of Jeff Wooldridge and an anonymous referee greatly improved the paper. This research would not have been possible without the financial support of the University of British Columbia. 1 Essay 1: Markov-Switching and Stochastic Volatility Diffusion Models of Short-Term Interest Rates 1 1.1 Introduction A very important research area of asset pricing is modeling the term structure. Most of the recent term-structure models were developed in continuous time. There are two main approaches to modeling the term structure in continuous time: the no-arbitrage approach and the general equilibrium approach. Vasicek ( 1 9 7 7 ) showed how to price zero-coupon default free bonds of different matu-rities using the no-arbitrage approach used in the option pricing of Black and Scholes (1973) for a given stochastic process for the spot interest rate. The approach prices all bonds on the basis of a finite number of state variables. By contrast, the general equilibrium approach, by Cox, Ingersoll, and Ross ( 1 9 8 5 ) , prices the term structure in equilibrium. These models specify the dynamic behavior of exogenous production factors (in continuous time) and the preferences of a representative consumer, and they derive the interest rate and prices of all assets, including bonds, endogenously. In both these approaches it is critical to the pricing of bonds to specify the stochastic behavior of short-term, or ideally spot, interest rates. Most previous term-structure models used diffusion models of the spot interest rate in pricing bonds. However some recent research, most notably by Ball and Torous ( 1 9 9 9 ) and Gray (1996), indicates that this basic model may be inappropriate. Ball and Torous argued that the coefficient of the lagged interest rate (raised to some power) should be allowed to be time varying. Furthermore, Gray and others argued that U.S. short-term interest rates should be characterized by a nonlinear regime-shifting model to account for changes in economic regime brought about by factors such as the Federal Reserve experiment in the early 1980s and the OPEC oil crises in the late 1970s. This essay builds on this research in several directions. It presents a new approach 1 This essay has been published in the Journal of Business and Economic Statistics, April 2002, Vol. 69, pp. 666-677. Essay 1. MSSV Interest Rate Models 2 to modeling short-term interest rates: a Markov-switching diffusion model. There are several motivations for this. First, different economic regimes appear to govern the level of interest rate volatility, and the Markov-Switching diffusion model is able to account for these explicitly. Supporting this view, Duffee (1993) found evidence of a temporary structural break in the interest rate series during the Federal Reserve experiment of 1979-1982. Second, the point estimates of the elasticity of variance with respect to the level of interest rates from single-regime diffusion models indicate an explosive volatility process. Duffee (1993) finds that the very high estimate reported by Chan, Karolyi, Longstaff, and Sanders (1992) (about 1.5 when most other models impose elasticities in the range of 0 to 1) is attributable to their failing to account for the structural break. This essay draws a similar conclusion. The single-regime models have point estimates of volatility elasticity that are comparable with the Chan et al. (1992) estimate, while the regime-switching model estimates are more reasonable around 1. The Markov-switching diffusion model examined in this essay takes account of both effects: the change in the level of interest rate volatility and the spuriously high elasticity of volatility. An alternative diffusion model that allows the coefficient on lagged interest rates to change through time is the stochastic volatility model of Ball and Torous (1999). We estimate this model and compare its performance to that of the Markov-switching model. We are unable to reject one model in favor of the other using the nonnested hypothesis testing procedure of Vuong (1989). Similarly, neither model can be re-jected in favor of the more general Markov-switching stochastic volatility model that nests them both. Finally, both models performed similarly in forecasting tests; how-ever, the evidence slightly favored the Markov-switching model over the stochastic volatility model. Also, when the standardized residuals of all models were examined for goodness of fit, the two models performed comparably but the Markov-switching diffusion model again outperformed all other models. Finally, the results from the Markov-switching and Markov-switching stochastic volatility models indicate that the stochastic volatility model overestimates the size of the elasticity of volatility with respect to the level of interest rates. Given this body of evidence we favor the Markov-switching model over the stochastic volatility model as a parsimonious representation of the dynamic behavior of U.S. short-term interest rates. These results are broadly consistent with Naik and Lee (1997). They developed an analytic expression for the term structure of interest rates with time-varying volatility in two contexts: stochastic volatility and regime switching. Naik and Lee (1997) found that the regime-switching model is better able to reproduce the observed term Essay 1. MSSV Interest Rate Models 3 structure than is the stochastic volatility model. The regime-switching and stochastic volatility models analysed by Naik and Lee (1997) exclude a diffusion term (that is, set 7 = 0). This essay rejects this assumption and this rejection is robust to the different volatility specifications considered herein. Our conclusion is that term-structure models that allow volatility to be both regime dependent and a function of the level of the short rate are worth consideration. On a technical note, this essay demonstrates how to estimate Markov-switching stochastic volatility models using quasi-maximum likelihood techniques. The benefits of quasi-maximum likelihood procedures are that they are straightforward to apply and are less computationally intensive than the current Bayesian estimation proce-dure. However, the estimates may not be as efficient as the Bayesian technique, which involves approximating a log-chi-squared random variable with a normal random vari-able with the same mean and variance. The Markov-switching stochastic volatility model is used to compare the diffusion models with Markov-switching and stochas-tic volatility parametrically. It nests both the Markov-switching and the stochastic volatility model. Combining Markov-switching and stochastic volatility adds very little over the more simple models. The remainder of the essay is organized as follows. Section 1.2 discusses model development. Section 1.3 discusses the data used. Section 1.4 presents the empirical results. Section 1.5 compares the Markov-switching and stochastic volatility models, and section 1.6 concludes. 1.2 Models of Interest Rate Volatility Dynamics This section discusses the basic types of model that have been used to explain short-term interest rate dynamics. The first type of model is the diffusion model that is predominantly used in building term-structure models. The second type of model is the autoregressive, conditional heteroscedasticity (ARCH) model that has proved useful for modeling the dynamics of the second moment of many financial time series. The next three models are extensions of the basic diffusion model: the first allows for stochastic volatility, and second allows for Markov switching in the behavior of volatility, and the third includes both effects simultaneously. 1.2.1 Diffusion Models Most term-structure models assume that short-term interest rates evolve over time as some type of diffusion process. The beauty of the diffusion model is that the Essay 1. MSSV Interest Rate Models 4 instantaneous change in the short rate can be characterized as a stochastic differen-tial equation (SDE), and Ito calculus can then be utilized to characterize the term structure. This basic approach is used in both the arbitrage pricing and the general equilibrium approaches to pricing the term structure. Chan et al. (1992) (CKLS) showed that many of the specific SDEs used in the literature can be written as special cases of the following general SDE: drt = (a + brt)dt + arJdBt, (1.1) where dBt is a standard Brownian motion. CKLS were concerned with calibrating this general SDE econometrically to eval-uate the appropriateness of these competing models for the short rate. The exact functional form of the short rate SDE is of critical importance for models of the term structure. For example, Vasicek (1977) used an arbitrage argument to derive a par-tial differential equation for bond prices. His derivation was sufficiently general to allow for any diffusion type of SDE for the short rate and then proceeded to derive closed-form bond process for the special case of an Ornstein-Uhlenbeck process for the short rate. However, the following quote by Vasicek (1977, p. 185) illustrates the importance of estimating the appropriate SDE for the short rate: In the absence of empirical results on the character of the spot rate pro-cess, this specification serves only as an example. However, this empirical work was not pursued until much later. Most theoretical term-structure research imposed ad hoc structures on the SDE, and there was no consistency between the different models. It was not until the work of CKLS that financial economists were concerned with empirically calibrating models of the short rate. To empirically calibrate the general SDE, CKLS employed the following simple discretization of (1.1): Art = a + 6r t_i + a r ? . ^ , (1.2) where A r 4 = rt — r t _ i and et is a standard normal random variable. They estimate the parameters of this model by using the so-called generalized method of moments (GMM) estimation technique of Hansen (1982). They found that the short rate is mean reverting, and that the elasticity of volatility parameter was 1.4999 (the standard error was 0.2519). The elasticity parameter indicates that the volatility of short-term interest rates is explosive. Essay 1. MSSV Interest Rate Models 5 Other research includes the work of Broze, Scaillet and Zakoian (1995) who used maximum-likelihood-based procedures and the indirect inference technique of Gourier-oux, Monfort, and Renault (1993) to account for the discretization bias, which they found to be very small. Another approach by Ait-Sahalia (1996) estimates the im-plied density of discrete changes in the spot rate implied by various continuous-time models, and compares these with the empirical distribution of the discrete changes in the spot rate. 1.2.2 Generalized ARCH Models The ARCH model was introduced by Engle (1982), and later extended by Bollerslev (1986), who developed the generalized ARCH, or GARCH, model. In a GARCH(1,1) model, the conditional mean and conditional variance of a time series process are modeled simultaneously, rt = a + brt-i + et where the conditional volatility of et is given by E\\e2t {ipt-i} = ht: ht = LU + ae2t_x + fiht-i. GARCH models are able to capture the very important volatility clustering phenom-ena that have been documented in many financial time series, including short-term interest rates (see Bollerslev, Chou, and Kroner, 1992), as well as their leptokurto-sis. Note that in GARCH models the volatility is a deterministic function of lagged volatility estimates and lagged squared forecast errors. One problem with GARCH models of the short rate is that the parameter estimates suggest that the volatility process is explosive. Bollerslev (1986) demonstrates that the variance process is covariance stationary when | YA=I a i + Z)jLi Pj\\ < 1- Note that it is usually assumed that oti,[3j > 0 Vi, j to ensure that the conditional volatility is nonnegative, so the usual case where YA=I A% + Pj < 1 is considered. If this inequality is violated, then shocks to the volatility process are persistent or explo-sive. If the sum of the coefficients equals 1, then the process is integrated G A R C H (IGARCH). If the sum of coefficients is strictly greater than 1, then a shock to volatil-ity is explosive, and linij_ o c , E[ht+j\\ipt] = co. Parameter estimates of GARCH(1,1) models fitted to short-term interest rates typically indicate an explosive process for the conditional volatility, or a + (3 > 1. For example, Gray (1996) reported using weekly 30-day treasury-bill data that a + j3 = 1.0303, and Engle, Ng, and Rothschild Essay 1. MSSV Interest Rate Models 6 (1990) found that a+(3 = 1.0096 for a portfolio of treasury-bills. It is interesting that Gray (1996) found that the implied persistence of volatility within a regime in his two-state regime-switching GARCH model is much lower than in the single-regime G A R C H model. He found that in the high volatility regime a + (3 = 0.6586, and a + j3 = 0.4340 in the low-volatility regime, compared with 1.0303 in the single-regime model. This is because there is high persistence in the state of the economy that the single-regime model interprets as persistence in volatility. This is a common finding in the regime-switching interest rate literature. 1.2.3 Stochastic Volatility The stochastic volatility model used in this essay allows log-volatility to itself evolve stochastically over time. This is in direct contrast with the GARCH type models which model volatility as a deterministic function of lagged squared forecast errors and lagged conditional volatility. The stochastic volatility model is parsimonious and yet flexible, and has been successfully applied to a range of financial time series including short-term interest rates (Ball and Torous, 1999), exchange rates (Taylor, 1986; and Harvey, Ruiz, and Shephard, 1994), and stock prices (Hsieh, 1991; Harvey and Shephard, 1996; Sandmann and Koopman, 1998; and So, Lam, and L i , 1998). Most stochastic volatility models are set in discrete time, and this essay follows this convention. We follow Ball and Torous (1999), who presented their stochastic volatility model as a simple extension of the discrete time-diffusion models of the type presented in (1.2): Art = a + brt-i + atf-^. (1.3) As the time subscript on a in equation (1.3) indicates, the generalization employed allows the volatility to be time varying. The model allows log-volatility to evolve stochastically as a simple first-order autoregressive process: logo~} — cu -\\- (plog at2_j + r]t, (1.4) where r\\t ~ iidA(0, o^). The disturbance term rjt in (1.4) makes the process stochastic-—the variance itself is subject to random shocks. This process is parsimonious and able to capture interesting dynamics. Note that GARCH models can be derived as the discrete time limit of a continuous-time stochastic volatility model but that the discrete-time stochastic volatility model here considered is more direct. Essay 1. MSSV Interest Rate Models 7 One of the simplest procedures available for estimating stochastic volatility models of this type is the quasi-maximum likelihood procedure of Harvey et al. (1994). This approach is based on a simple transformation of the residual in (1.3), which allows one to write the system in state-space form and apply the Kalman filter to recursively build up the quasi-likelihood function. The method involves transforming the residual et = Art — a — brt_\\. By taking the log of the squared residual, one obtains log e\\ = log of + 27 log r t _ i + log e2t, because et — crtrt-i£t- New notation simplifies this expression: yt = loge^, which is observable given the observed changes in interest rates and the parameters a and 6; and xt = log of is the state variable—log-volatility. Using this notation, we can rewrite the system in state-space form as yt = x t + 27 log r t_j + logs'}, (1.5) xt = u + ^Xt-x+rjt. (1.6) The Kalman filter is an iterative procedure that forecasts the state variable one period into the future by a linear projection and then updates this forecast when the observation on the variable yt becomes available. If the disturbance terms are both Gaussian, the linear projection is also the conditional expectation, and the conditional expectation and its mean squared error are all that is required to describe the conditional density. In this case, the Kalman filter enables the construction of the exact likelihood function, and then full maximum likelihood estimation is possible. However, in this case the disturbance term for the observation equation (1.5) is non-Gaussian. In fact it is distributed as log-chi-squared random variable with one degree of freedom, which has mean £[log£j] = —1.2704 and variance Var[log£j] = ^ . The quasi-maximum likelihood procedure of Harvey, Ruiz, and Shephard (1994) uses the quasi-likelihood function, which uses the likelihood function of a normal random variable that has the same mean and variance as log ef: j / t = x t + 27 log r t _ i - 1.2704+ (1.7) The Kalman filtering equations and likelihood function are presented in Appendix l .A. This approach provides a criterion that when maximized results in consistent and asymptotically normal parameter estimates if the model is correctly specified (White, 1982b; and Bollerslev and Wooldridge, 1992). We can use the central limit Essay 1. MSSV Interest Rate Models 8 theorem of Dunsmuir (1979) to establish the consistency and asymptotic normality of the resulting parameter estimates. A number of alternative estimation techniques to quasi-maximum likelihood exist in the literature. These include the Bayesian tech-nique of Jacquier, Poison, and Rossi (1994), maximum likelihood procedures by Frid-man and Harris (1998), the maximum likelihood Monte Carlo method of Sandmann and Koopman (1998), and the efficient method of moments by Andersen, Chung, and S0rensen (1999). However, these techniques are usually very computationally in-tensive, in contrast to the relatively straightforward quasi-maximum likelihood tech-nique. 1.2.4 Markov Switching In recent years econometricians have modeled various economic time series as coming from one of a number of regime-switching time series. In these models it is assumed that the distribution of the variable is known, conditional on a particular regime or state occurring. When the economy changes from one regime to another, a substantial change occurs in the series. Hamilton (1989) presented the Markov regime-switching model in which the unobserved regime evolves over time as a first-order Markov process. This type of modeling makes sense in a time-series context. Markov-switching models have proven to be quite useful in modeling a range of eco-nomic time series ranging from the business cycle (Hamilton, 1989; Goodwin, 1993, Durland and McCurdy, 1994; Filardo, 1994; and Smith, 2001), the stock market (Hamilton and Susmel, 1994; Schaller and van Norden, 1997; and Turner, Startz and Nelson, 1989), exchange rates (Engle and Hamilton, 1990) and short-term interest rates (Cai, 1994; and Gray, 1996). The empirical usefulness of Markov-switching models of macro-economic variables that are related to interest rates, and the signifi-cant evidence for the validity of Markov-switching models of short-term U.S. interest rates, motivate this research. Cai (1994) fitted a Markov-switching A R C H model to the excess returns of the 3-month treasury-bill over the 30-day treasury-bill and found two periods of exces-sively high interest rate volatility: the OPEC oil shock in 1974 and the monetarist experiment of the Federal Reserve between 1979 and 1982. Cai (1994) fitted only an ARCH model because regime-switching GARCH models are inherently path de-pendent. This is because the conditional variance today depends on the conditional variance yesterday, which, in turn, depended on previous conditional variances. Thus the conditional variance today depends explicitly on all previous states. However, both Gray (1996) and Dueker (1997) have since developed two approximations that Essay 1. MSSV Interest Rate Models 9 overcome the path-dependence problem. Gray (1996) fitted a generalized-regime switching model that allowed for G A R C H effects as well as a diffusion term. He found evidence in his weekly 30-day treasury-bill dataset that two regimes, a high-volatility and a normal-volatility regime, are necessary to adequately characterize the dynamics of short-term interest rates. He also found that he needed both GARCH terms as well as the diffusion term (which roughly takes the place of the intercept in the conditional variance equation) in his model. In three periods the economy was in the high-volatility regime, the OPEC oil shock, the Federal Reserve experiment, and a brief period in 1987 corresponding to the stock market crash. Duffee (1993) found evidence of a structural break (associated with the monetarist experiment) and concluded that the high elasticity of variance reported by CKLS was due to their failure to account for this break. As reported in the following, even the stochastic volatility model suffers from this problem. Regime-switching models allow the economy to be in any of a finite number of dis-tinct regimes at any point in time. The regime completely governs the dynamic be-havior of the series. This implies that once we condition on a particular regime occur-ring, and assume a particular parameterization of the model, we can write down the density of the variable of interest. In Markov-switching models, the regime is strictly unobservable by the econometrician, who must therefore draw statistical inference re-garding the likelihood of each regime occurring at any point in time. In this particular parameterization it is necessary to define the transition probabilities from regime j to regime i at time t asp^ = PrfSt = i\\St-i = j]. Note that in specifying the parameters of such a model, we need to include only K(K — 1) parameters because there are K re-dundant transition probabilities, Pr[5 t = K\\St-\\ = j] = 1—Si l l 1 Pr[St = i\\St-i = j]. It is also possible to allow for time-varying transition probabilities. For example, Durland and McCurdy (1994) allowed the transition probabilities to decline as the economy remained in one regime. In other words the longer the economy is in one regime, the more likely it is to change out of that regime. Another approach was taken by Filardo (1994) and Gray (1996), who allowed the transition probability to be a function of some other variables. For example, Gray (1996) made the transition probability a function of the log-interest rate. The function is the cumulant of a standard normal random variable and hence guaranteed to be contained within the unit interval. This essay presents a Markov-switching diffusion model of the type in (1.2), but al-lows the unconditional volatility to change between regimes. This is done by allowing Essay 1. MSSV Interest Rate Models 10 a in the basic discretized diffusion model (1.2) to change between two states: Art = a + brt-\\ + ^r^et, (1.8) where i € {1,2} is an index of the regime at time t. We choose to fit two regimes since we expect to find, as have previous studies, that the Federal Reserve experiment and the oil shock are fundamentally different in their dynamic behavior from the rest of the sample period. This difference in behavior is because the experiment resulted in a different economic regime, and a similar explanation can be given for the oil shock. Because we are fitting two regimes, we need to fit two transition probabilities: p = Pi[St = l | 5 t _ i - 1] and q = Pr[5 t = 2|S t_! = 2]. As noted, in Markov-switching models the regime is unobserved, and the econo-metrician must infer which regime occurred at each time period. Hamilton (1989) developed a filter that allows the econometrician to infer the probability of the regime at each point in time iteratively. A useful byproduct of this algorithm is the log-likelihood, which can be numerically maximized to obtain the maximum likelihood estimates of the parameters. Appendix 1.A.2 presents Hamilton's filter along with a brief discussion. Kim (1994) presented a simple backwards recursive filter that al-lows inference to be made regarding the regime at each point in time using the entire sample of observations. His smoothing algorithm is also discussed in Appendix l.A.2. This model should outperform the basic diffusion model because the parameters of the basic single regime diffusion model are estimated under the assumption that there is only one regime. If there are two regimes, say, a high- and low-volatility regime, simply assuming a constant volatility during the sample period will systematically overestimate the volatility in the low-volatility regime, and systematically underes-timate the volatility in the high-volatility regime. Therefore the Markov-switching diffusion model is expected outperform the basic single-regime diffusion model of CKLS. Because we are interested in directly comparing the Markov-switching model with the stochastic volatility model instead of estimating (1.8), we employ the same trans-formation used in the stochastic volatility model: 2/t = Wi + 27logr t_i + loge?, (1.9) where Ui = log of and yt is defined previously. Essay 1. MSSV Interest Rate Models 11 1.2.5 Markov-Switching Stochastic Volatility The final model estimated in this essay is the Markov-switching stochastic volatility (MSSV) model. This model is a generalization of the stochastic volatility model and the Markov-switching model, which are both special cases of MSSV. This is useful on a number of levels. First, it enables the models to be compared. If we can reject the MSSV model in favor of the Markov-switching model but not the stochastic volatility model, then we can conclude that the data support the stochastic volatility model; the converse is also true. Another benefit is that it helps determine which features of the data are specific to which model. Specifically, the Markov-switching and stochastic volatility diffusion models give different estimates of the elasticity of variance 7. The estimate of 7 is important because if 7 > 1, then the interest rate process is explosive, while if 7 < 1, it is not. Our regime-switching models of interest rates find estimates of 7 that are in the stable region, while all the single-regime models indicate that interest rates are explosive. To determine which of these is spurious, we can compare the estimate from the MSSV model with the two estimates and draw some conclusions on which model is preferred based on this. The specific form of the MSSV model estimated is yt = xt + 27 log rt-i + log e2t, xt = LUi + 0x t_i + r)t. (1-10) This nests the stochastic volatility model when u\\ = u2, although the transition probabilities p and q (defined as in the Markov-switching model) are not identified. This model also nests the Markov switching model when a2 = 0. However, and o^) to be regime dependent. In the model estimated in this essay only the unconditional volatility is modeled as being regime dependent (that is, only Ui is different in each regime i). This was done to keep the model as parsimonious as possible. 1.3 Data The data for this study consist of monthly observations on 30-day U.S. treasury bills from June 1964 to December 1996. Duffee (1996) presented an interesting discussion on some of the problems associated with using the treasury-bill to proxy for the risk-free short-term interest rate. Yet, this study uses 30-day treasury-bill data to ensure comparability with most previous studies that attempted to model the short-rate (see, for example, CKLS; Ball and Torous, 1999; Gray, 1996). To maintain comparability with CKLS and Ball and Torous (1999) we use data from June 1964 to December 1989 in all in-sample calculations and hold the remaining data (January 1990 to December 1996) for out-of-sample analysis. The treasury-bill data are plotted in Figure 1.1 and the first difference of the interest rate is plotted in Figure 1.2. Several interesting observations can be made. First, Figure 1.2 shows the well documented volatility clustering phenomena in financial time series—large changes, of either sign, are typically followed by large changes, and small changes follow previous small changes. Second, there appear to be two regimes in the volatility. The 1979-1982 period is clearly more volatile than other periods, and the period in late 1974, corresponding to the oil crisis, also seems a little more volatile than other periods. October 1987 also consists of a large changes in the interest rate, but this is an isolated incident—interest rate changes either side of it are of normal size. This indicates that it may be useful to allow for a two-regime Markov-switching model that allows the volatility in the Federal Reserve experiment and the oil crisis to be different from the volatility in other periods. Also, the periods of higher volatility are related to periods of higher interest rates, so it is possible that a diffusion model, in which volatility is a function of the level of interest rates, accounts for this effect. Table 1.1 contains some summary statistics and diagnostic tests of the treasury-bill data. There is clear evidence of autocorrelation and ARCH effects in both the level of interest rate series and in the residual of a first-order autoregressive model (also the residuals later used in the stochastic volatility modeling). This indicates that we need to take account of time-varying heteroscedasticity. Even accounting for A R C H Essay 1. MSSV Interest Rate Models 14 co OS O (%) PieiA ll!a-JL \"I 96UBUO g bJO • !—t fa Essay 1. MSSV Interest Rate Models 15 effects leaves clear evidence of autocorrelation in the residuals (see the Box-Pierce A R C H adjusted portmanteau statistics). Table 1.1: Summary Statistics and Diagnostic Tests on 30-day Treasury Bill yields and Residuals From AR(1) Model Skewness and kurtosis are calculated as the standardized third and fourth central moments, respectively. The standard errors of the skewness and kurtosis statistics are reported in parenthesis below the statistic and are calculated as y/oTTand )/24JT respectively. The Bera-Jacque statistic tests for normality by checking that the skew-ness equals 0, and kurtosis equals 3 and is distributed as a chi-squared variable with two degrees of freedom. BPk tests for k-\\h order autocorrelation using the Box-Pierce portmanteau test statistic, ^ p A R C H ^ e s ^ s f o r / j_^ n order autocorrelation us-ing the Box-Pierce portmanteau test statistic corrected for A R C H following Diebold (1986), and BP% tests for autocorrelation in the squared deviation from mean (roughly equivalent to testing for ARCH) out to k lags. Al l portmanteau test statistics are distributed as a chi-squared random variables with k degrees of freedom. rt et = Art- a- brt-i Mean 6.6587 -0.0015 Variance 6.9120 1.2533 Skewness 1.2097 -1.1038 (Standard Error) 0.1398 0.1403 Kurtosis 4.3588 11.6429 (Standard Error) 0.2796 0.2805 Bera-Jarque 98.4936 1011.2436 BPx 276.5656 60.7781 BP12 2221.6082 91.4565 gpARCH 72.0637 15.4548 BpARCH 786.5749 25.2592 BPl 264.5250 23.1056 1991.4313 98.7721 1.4 Empirical Results This section reports the results of fitting a number of different empirical models to short-term interest rates. To ensure that the playing field is level, all the models are estimated as special cases of the MSSV model. Therefore, all models are of the basic form yt = xt + 2 7 r t _! - 1.2704 + Essay 1. MSSV Interest Rate Models 16 Table 1.2: Parameter Restrictions Imposed by Different Volat i l i ty Models This table presents the restrictions imposed by the various diffusion models considered in this essay. The restrictions are on the parameters of the model: Xt = LOi + Xt-! + T]t, where i G {1, 2} is the index of the regime occurring at time t and rjt ~ iid./V(0, cr2). Note that NI refers to a parameter that is not identified in this model—these param-eters have no effect on the log-likelihood. Model U>2 4> < CKLS LU2 = Uli NI 0 SV C02 = Ul\\ MS NI 0 MSSV but each imposes different restrictions on the parameters. These restrictions are sum-marized in Table 1.2. The reason for this approach is consistency. Our technique for estimating the stochastic volatility and MSSV models requires the model to be writ-ten in state-space form, and then the parameters are estimated by quasi-maximum likelihood. The CKLS and Markov-switching models, however, can be estimated by full maximum likelihood. By also writing these two models in state-space form, we use the same approximation as in the stochastic volatility and MSSV models and guarantee that our results are not biased against either the stochastic volatility or the MSSV model and therefore our results cannot be attributed to the use of different estimation techniques. Thus, all models require the residuals from the model Art = a + & r t _ i + et, because the observable variable used in the estimation of all models is yt = \\oge2. Ordinary least squares is used to obtain consistent estimates of the mean parameters a and b. These parameters are reported in Table 1.3. The parameter estimates are broadly consistent with the G M M estimates of CKLS. Both models indicate that short-term interest rates are mean reverting (b < 0), although the mean reversion implied by the point estimate of b is much lower than in CKLS. The G M M estimates of CKLS imply that the unconditional estimate of the spot treasury-bill interest rate is 0.0689, whereas the parameter estimates of the ordinary least squares estimates are 6.8436 percent. The differences are simply due to the use of percentages rather than decimals as in CKLS; therefore, the results are comparable. Essay 1. MSSV Interest Rate Models 17 Table 1.3: Parameter Estimates of Mean Equation Ordinary least squares estimates are of the discretized diffusion model: A r s = a + brt-\\ + et. The standard errors are estimated using the procedure of White (1980a) and are therefore consistent in the presence of unspecified heteroscedasticity. Parameter Estimate Standard error a 0.3476 (0.1842) b -0.0508 (0.0322) 1.4-1 Basic Diffusion Model Table 1.4 reports the maximum likelihood estimates of the CKLS model. These results indicate significantly less mean reversion than implied by the G M M estimates of CKLS. The estimate of the elasticity of volatility with respect to the level of interest rates is very similar to CKLS G M M estimate—well in excess of unity. Finally, note that the coefficient on lagged volatility is lower when estimated by maximum likelihood than when estimated by G M M , although it is comparable with some of the restricted models. However, neither estimation technique can reject the null hypothesis that a2 = 0. A striking result is the comprehensive rejection of the non-diffusion model in favor of the diffusion alternative. The robust Lagrange multiplier (RLM) test statistic for this null hypothesis (7 = 0) is 39.5349, which is distributed as a chi-squared random variable with one degree of freedom. We can thus reject the nondiffusion restriction at any normal significance level. One of the more popular term structure models is by Cox et al. (1985) (CIR), who set 7 = 0.5, which implies a square-root process for interest rates. Gray (1996), in his generalized regime switching model, chose to set 7 = 0.5 to conform with CIR. To determine how appropriate this restriction is, a restricted form of the diffusion model is estimated in which this restriction is imposed while all other parameters are estimated freely. This restricted specification is estimated for all the other types of Table 1.4: Maximum Likelihood Parameter Estimates of C K L S Mode l Parameter Estimate Standard error a 0.2216 (0.0909) b -0.0308 (0.0177) a2 0.0031 (0.0012) 7 1.3066 (0.1001) Essay 1. MSSV Interest Rate Models 18 Table 1.5: Parameter Estimates of C K L S Mode l Parameter estimates and robust standard errors are presented for a basic diffusion and a constant variance model of interest rate volatility. Both models are of the general form yt = cu + 27 log rt-\\ — 1.2704 + £ t. In the diffusion model, 7 is estimated as a free parameter; in the nondiffusion model, 7 is restricted to equal 0. Parameter Diffusion model Nondiffusion model Restricted model LO -6.7227 -1.4198 -3.2464 (0.6896) (0.1540) (0.1470) 7 1.4515 0 0.5 (0.1788) - — Log-Likelihood -715.2661 -750.3347 -730.3362 Robust L M — 39.5349 21.5914 model considered in this essay. The R L M test statistic for this restriction takes the value 21.5914, which is highly significant given that it has a one-degree-of-freedom chi-squared distribution. Thus the basic model rejects both the no-diffusion and the CIR restrictions on the relationship between the level of interest rates and interest rate volatility. Figure 1.3 plots the conditional standard deviation from the basic diffusion model against the absolute residual. Although the periods of high volatility associated with the Federal Reserve experiment and the oil crisis are also periods of relatively high interest rates, these are periods when the basic diffusion model underpredicts interest rate volatility. This is likely because there are some periods of high interest rates that are not associated with higher volatility—for example the late 1980s and the early 1990s. The basic diffusion model assumes that the relationship between the level of interest rates and volatility is the same through time, yet this is clearly not the case. For this reason it may be profitable to consider models that allow this relationship to change over time, as do the models considered in the remainder of this section. 1.4-2 Stochastic Volatility As discussed previously, Ball and Torous (1999) presented an extension of CKLS that allows the log-volatility to evolve stochastically over time. The results of fitting such a stochastic volatility model using the quasi-maximum likelihood method is presented in Table 1.6. The nondiffusion null hypothesis is rejected in the stochastic volatility model. The relevant R L M statistic is 7.9221 (also distributed as chi-squared with one degree of freedom), which indicates a clear rejection of the null hypothesis of no diffusion. The CIR restricted model is also rejected by the data, because the test Essay 1. MSSV Interest Rate Models 20 statistic for this hypothesis is 4.4348, which has a p-value of 0.035. Table 1.6: Parameter Estimates of Stochastic Volatility Model Parameter estimates and robust standard errors are presented for three types of stochastic volatility model of interest rate volatility. Both models are of the gen-eral form yt = x ( + 27logr t_ 1 — 1.2704 + ^ , where xt = u + (pxt-\\ +nt. In the diffusion model, 7 is estimated as a free parameter; in the nondiffusion model, 7 is restricted to equal 0. Parameter Diffusion model Nondiffusion model Restricted model LO -3.3789 -0.0884 -0.2717 (3.9599) (0.0539) (0.1808) 0.4943 0.9407 0.9172 (0.5612) (0.0313) (0.0536) 0.8418 0.1991 0.1765 (1.1635) (0.1265) (0.1552) 7 1.4407 0 0.5 (0.2153) - — Log-Likelihood -710.3453 -719.8013 -715.5016 Robust L M — 7.9221 4.4348 The null hypothesis that is used in comparing the basic diffusion model with the more general stochastic volatility model is that a2 = 0. However, under this null hypothesis, is no longer identified and can take literally any value without chang-ing the likelihood function. For this reason, the information matrix becomes singular and standard asymptotic distribution theory no longer applies. Therefore, we are no longer able to assert that the Wald and likelihood ratio test statistics are asymptot-ically chi-squared. There are techniques for handling hypothesis testing when some parameters are not identified under the null hypothesis (Davies 1977, 1987), but these are extremely computationally intensive, so we calculate the empirical p-values of the standard likelihood ratio test. The log-likelihood of the stochastic volatility model is a sizable improvement over the basic diffusion model—the likelihood ratio statis-tic, calculated in the usual fashion, is equal to 9.8416. The asymptotic distribution of this statistic is unknown, but the empirical p-value was calculated at 0.043 per-cent using a Monte Carlo study with 1000 replications. The basic diffusion model is thus rejected in favor of the stochastic volatility model, which supports the similar conclusion reached in Ball and Torous (1999). Figure 1.4 plots the conditional standard deviation from the stochastic volatility model against the absolute residual. When we compare this with the basic diffusion model in Figure 1.3 the plots are very similar. Both models underpredict the size Essay 1. MSSV Interest Rate Models 22 of volatility in both the oil crisis and the Federal Reserve experiment. However, although difficult to see with the naked eye, it seems that the predicted volatility from the stochastic volatility model is a better fit than in the basic diffusion model. This issue is explored in more detail in Section 1.5. 1.4-3 The Markov-Switching Diffusion Model One of the first issues to be established is whether the parameters describing the conditional mean dynamics, a and b, should be allowed to be regime dependent. To examine this issue, two nested Markov-switching diffusion models were estimated and a likelihood ratio test was performed on the null hypothesis that a and b are the same in both regimes. The more general model allows o and 6 to depend on which regime the economy is in, whereas the null model restricts a and b to be the same in both regimes. The likelihood ratio statistic for the null hypothesis of mean parameter regime independence was calculated as 0.6843, which is asymptotically distributed as a chi-squared random variable with two degrees of freedom because two parameter restrictions are imposed under the null hypothesis. Thus, the null hypothesis that the mean parameters are regime-independent cannot be rejected at acceptable levels, so we can proceed with estimating the regime independent mean parameters a and b by ordinary least squares. Table 1.7 presents the parameter estimates of the Markov-switching diffusion model, the Markov-switching nondiffusion model, and the restricted Markov-switching diffu-sion model. Clearly, the Markov-switching models of short-term interest rates require a diffusion term to be included. The R L M statistic testing the restriction that 7 = 0 is 10.9254, which is comprehensively rejected at any reasonable significance level. However, the square-root specification of CIR, which sets 7 = 0.5, cannot be rejected by the data. The p-value of the R L M test statistic (which is 2.2222) is only 0.1360. The reason for the high p-value is the relative imprecision with which the elastic-ity parameter 7 is estimated. Its standard error is 0.3443 which is very large when compared with the parameters point estimate of only 0.9200. To test the Markov-switching model against the more simple basic diffusion model, we construct a simple likelihood ratio test. The value of this likelihood ratio statistic is 4.1648. Unfortunately, the exact distribution of this statistic is unknown because the transition parameters are unidentified under the null hypothesis. We therefore calculate an empirical p-value by running a simple Monte Carlo study. The empirical p-value is 0.095 and we are thus able to reject the null hypothesis of only one regime at the 10% significance level. Essay 1. MSSV Interest Rate Models 23 Table 1.7: Parameter Estimates of Markov-Switching Mode l Parameter estimates and robust standard errors are presented for three types of Markov switching model of interest rate volatility. Both models are of the general form yt = xt + 2 7 logr t _ 1 — 1.2704 + £ t, where xt = u>i and i G {1, 2}refers to the discrete state of the economy at time t. In the diffusion model, 7 is estimated as a free parameter; in the nondiffusion model, 7 is restricted to equal 0. Parameter Diffusion model Nondiffusion model Restricted model V 0.9919 0.9928 0.9929 (0.0065) (0.0051) (0.0052) q 0.9483 0.9531 0.9537 (0.0414) (0.0270) (0.0278) -5.0855 -1.9056 -3.6348 (1.1720) (0.1787) (0.1759) OJ2 -2.9990 1.4198 -0.9806 (1.7180) (0.2880) (0.2887) 7 0.9200 0 0.5 (0.3443) — — Log-Likelihood -711.6426 -719.1908 -713.1184 Robust L M — 10.9254 2.2222 It is interesting that the estimated elasticity parameter 7 is much lower in the Markov switching model (0.9200) than in either the stochastic volatility model (1.4407) or the basic diffusion model (1.4515). This supports Duffee's (1993) conclusion that failing to account for the structural break, or regime shift in our model, due to the Federal Reserve experiment, spuriously inflates the estimated elasticity of volatility. Figure 1.5 plots the conditional standard deviation from the Markov-switching dif-fusion model against the absolute residual. This figure is very similar to the figures for the basic diffusion and stochastic volatility models, yet it is apparent that the Markov-switching model outperforms the two previous models, especially in the pe-riod of the Federal Reserve experiment. This assertion is further borne out by the comparison of the specification tests in Table 1.9 in Section 1.5. Figure 1.6 plots the smoothed and forecast probability of the low-volatility regime over the sample period. Note that we identify two periods of high volatility—in late 1974 and the early 1980s. These two periods correspond to the OPEC oil crisis and the Federal Reserve experiment. Thus, our model supports the conclusion of Gray (1996) and Cai (1994), who argued for the need of a second regime to adequately model the dynamics of short-term interest rates. Essay 1. MSSV Interest Rate Models 24 C J o C J Q CN CD C i rH >> 1-5 o o • I—I bO S u > o SH rt 0) O (4-1 a o > Q SH CS c CO \"rt C o T3 O O C J C i .SP c3 Essay 1. MSSV Interest Rate Models 25 tN CO i H 3 0) 0 w 00 •u Ci O i H CU CO cu o CD 3 Q bJO M fa -u Essay 1. MSSV Interest Rate Models 26 1.4-4 A Markov-Switching Stochastic Volatility Diffusion Model Table 1.8 contains the parameter estimates of the Markov-switching stochastic volatil-ity model. Some interesting points to note are that the estimate of the elasticity of volatility is very similar to the Markov-switching model—7 = 1.0205. Second, it is clear that the MSSV model still requires the inclusion of a diffusion term because the R L M statistic testing the no-diffusion null hypothesis is 7.7127. Also, the restriction imposed by CIR is not rejected by the data. The R L M statistic is 2.3354, which has a p-value of 0.1360 and therefore cannot be rejected at reasonable significance levels. Similar to the Markov-switching model, the reason for this is the imprecision with which the diffusion parameter is estimated. Thus the MSSV model suggests that when fitting a model of the short-term interest rate, one should allow for a second regime and include a diffusion term. The final implication is that the elasticity of volatility should be set at unity, although we cannot reject the CIR model. The next issue is whether we need to include both Markov-switching and stochastic volatility effects to model interest rates. The answer to this question is somewhat ambiguous. First, note that traditional hypothesis testing procedures are not appli-cable because some of the parameters are not identified under the null hypothesis of only Markov-switching or stochastic volatility. However, the likelihood ratio statis-tics are not huge in either case: 4.1648 and 1.5702 respectively. One could calculate the empirical distribution of these test statistics using Monte Carlo methods, but the likelihood surface in the MSSV model is extremely bumpy with numerous local max-ima, and we require many starting values to be sure that we have attained the global maximum. In a Monte Carlo study, this is not feasible. Second, in estimating the models fitted to this data, the choice of starting value is critical, because many differ-ent starting values forces the maximization algorithm to look in impossible regions of the parameter space that result in an infinite likelihood, even when restrictions were imposed. For these reasons we choose not to perform such a study. Instead we resort to using an informal discussion. For the Markov-switching model the null hypothesis is that a2 = 0, which has a t value of 1.7064, which is only marginally significant if there were no unidentified parameters. Another point to note is that the Markov-switching model outperforms the MSSV model both out of sample and in sample. For these reasons we favor the more parsimonious Markov-switching model. Second, the likelihood ratio statistic for the stochastic-volatility model results in a p-value of 0.2102, which would not be significant if we resorted to traditional hypothesis testing procedures. Essay 1. MSSV Interest Rate Models 27 Table 1.8: Parameter Estimates of Markov-Switching Stochastic Volatility Model Parameter estimates and robust standard errors are presented for three types of Markov-switching stochastic volatility models of interest rate volatility. Both models are of the general form yt = xt + 27logr t_i — 1.2704 + £ t, where xt = u>i + 2 -2.7620 0.9193 -0.7785 (2.3400) (0.6366) (0.4697) 0.1971 0.3809 0.1911 (0.4496) (0.4157) (0.3966) 0.8513 0.8901 0.8412 '1 (0.7502) (0.8894) (0.7459) 1 1.0205 0 0.5 (0.6151) — — Log-Likelihood -709.5602 -715.7928 -711.0486 Robust L M — 7.7127 2.3354 Figure 1.7 plots the conditional standard deviation from the Markov-switching stochastic volatility model against the absolute residual. This plot is very similar to the stochastic volatility and Markov-switching models. Finally, Figure 1.8 plots the smoothed and forecast probability of the low-volatility regime over the sample period. Note that as in the Markov-switching model we identify two periods of high volatility, in late 1974 and the early 1980s. 1.5 Comparing the Models The first test we use to compare the Markov-switching and the stochastic-volatility models is a likelihood ratio test. As noted, these models are clearly non-nested, and traditional hypothesis testing procedures are inapplicable. However, we use the non-nested likelihood ratio test of Vuong (1989). Vuong's non-nested likelihood ratio test is valid in situations in which two models are competing to explain some variable, Essay 1. MSSV Interest Rate Models 29 Essay 1. MSSV Interest Rate Models 30 such as y in this case. Vuong (1989) showed that under certain regularity conditions the variable n-l/2LRn/un A(0,1), where LRn = L^v — L^s, L^v is the log-likelihood of the stochastic volatility model, L^s is the log-likelihood of the Markov-switching model, n is the number of obser-vations, and u, 1 n = -n [log f M m \\ } 2 jMs(yt)_ 1 ^ , fsviVt) jMs{yt) n t=i is the variance of the likelihood ratio statistic. Thus, the correctly standardized like-lihood ratio statistic converges in distribution to a standard normal random variable. The Vuong likelihood ratio statistic for the stochastic volatility model over the Markov-switching model is 0.2979, which is not significant at traditional signifi-cance levels. Thus, although the stochastic volatility model has a slightly higher log-likelihood, they are not statistically significantly different. A second way to compare the models is to compare the standardized residuals from each of the models. Table 1.9 presents a range of diagnostic tests on the standardized residuals from the four models considered. The model that performs the best in all these tests is the Markov-switching model. The stochastic volatility model's standard-ized residuals are leptokurtic and display positive skewness; in the Markov-switching model, although still non-normal (as evidenced by the Bera-Jarque statistic), the de-Table 1.9: Diagnostic Tests on the Standardized Residuals Stochastic Markov- Markov-switching Diffusion volatility switching stochastic volatility Mean -0.0364 -0.0223 -0.0334 -0.0340 Variance 1.5419 1.5653 1.4819 1.4921 Skewness 0.3374 0.6877 0.0939 0.2413 (Standard error) 0.1400 0.1400 0.1400 0.1400 Kurtosis 5.7241 7.7514 5.7578 6.0131 (Standard error) 0.2801 0.2801 0.2801 0.2801 Bera-Jarque 100.4205 311.9639 97.4189 118.7231 BP1 0.8778 0.2289 0.0116 0.0434 BP12 16.2448 15.9324 19.0878 18.0802 £ P A R C H 0.6801 0.2142 0.0084 0.0335 BpARCR 16.0252 16.0252 16.0252 16.0252 BP2 1.3669 0.0461 2.0625 1.1890 BPh 31.7113 22.3279 27.3609 30.8865 Essay 1. MSSV Interest Rate Models 31 viations from normality are less pronounced. Another point to note from the tables is that all models account for very-short-term dynamics well, and all the portmanteau test statistics at lag one are statistically insignificant, but do not perform nearly so well with long-term behavior. This is not considered a major issue because we are mainly interested with the short-term behavior. To determine how well the various models performed in predicting interest rate volatility, a number of specification tests were performed. The results of these tests are presented in Table 1.10. Three metrics were chosen to compare the predictive ability of the various models. The variable being predicted is the absolute value of the forecast error and the predictors are the conditional standard deviations from, re-spectively, the Markov-switching and stochastic volatility models. The three metrics are: Table 1.10: In-Sample and Out-of-Sample Specification Tests Sample (No. Obs) Statistic Diffusion SV MS MSSV In-Sample July 1964 -December 1989 (307) MAE RMSE R2 0.3786 0.5423 0.5477 0.3729 0.5315 0.5656 0.3709 0.5443 0.5443 0.3775 0.5581 0.5209 July 1964 -June 1974 (120) MAE RMSE R2 0.2280 0.2815 0.5773 0.2340 0.2868 0.5613 0.2283 0.2803 0.5811 0.2288 0.2798 0.5823 July 1974 -June 1984 (120) MAE RMSE R2 0.5306 0.7359 0.5680 0.5118 0.7140 0.5933 0.5284 0.7495 0.5518 0.5422 0.7744 0.5215 July 1984 -December 1989 (66) MAE RMSE R2 0.3762 0.4846 0.4051 0.3728 0.4828 0.4095 0.3438 0.4578 0.4690 0.3484 0.4598 0.4644 Out-of-Sample January 1990 -December 1996 (84) MAE MSE R2 0.2090 0.2681 0.4627 0.2090 0.2687 0.4603 0.1925 0.2517 0.5262 0.1959 0.2545 0.5159 Essay 1. MSSV Interest Rate Models 32 1. mean absolute error 2. root mean squared error i v r 2 In the full sample period, the Markov-switching model outperformed all other mod-els using all three metrics. It is also the best performing model in terms of out-of-sample predictive accuracy. Interestingly, the stochastic volatility model performs the best in the period that includes the two high-volatility periods. The Markov-switching model also outperformed the stochastic-volatility model in the other two subperiods. Thus, the general conclusion of the specification tests is that the Markov-switching model is the best at predicting interest rate volatility.2 1.6 Conclusions This essay is concerned with modeling short-term interest rates. Given the link between short-term interest rates and the prices of bonds of various maturities, this essay also could be thought of as empirically testing various term structure models. The specific goal of this essay was to compare the effect of allowing for the coefficient of lagged interest rates in discrete approximations of the stochastic differential equations of short-term interest rate dynamics to be time varying. Two models were specifically considered: the Markov-switching model, which allows the economy to switch between periods of high and normal volatility, and the stochastic-volatility model, which allows the volatility parameter (cjt) to vary stochastically over time. A second goal of the essay was to present a filtering algorithm that allows Markov-switching stochastic volatility models to be estimated using quasi-maximum likelihood procedures. The major conclusion of this essay is that the Markov-switching model is preferred to the stochastic volatility model, although this conclusion is not definitive. This conclusion is based on the Markov-switching model's superior ability to predict the volatility of interest rate changes and the bias that is evidenced in estimating the elasticity of volatility of interest rate changes with respect to the level of interest rates. One conclusion of this research is that estimated values of the elasticity of the volatility of changes in the interest rate with respect to the level of interest rates 2 If returns are normally distributed, then it would be appropriate to compare the conditional standard deviation with \\ A r / 2 times the absolute residual. These calculations yield qualitatively similar results, and do not alter the conclusions reached. Essay 1. MSSV Interest Rate Models 33 from models that do not allow for regime shifts are spuriously high. CKLS estimated the elasticity parameter 7 to be 1.4999. Our estimate of 7 made by using the quasi-maximum likelihood methodology employed in this essay is similar at 1.4515; the estimate of 7 in the single-regime stochastic volatility model is 1.4407. However, the two models estimated in this essay that account for regime shifts, and in particular the atypical behavior of interest rate volatility in the period of the Federal Reserve experiment, find estimates of 7 much closer to unity—the Markov-switching model estimates 7 = 0.9200, and the Markov-switching stochastic volatility model estimates 7 = 1.0205. This indicates that term-structure models should allow for regime shifts and have an estimated elasticity of 1. Fortunately, an elasticity of one means that the interest rate process will be positive almost surely. A subsidiary question asked by the research is whether it is important to include lagged volatility in the volatility function of interest rate models. A unanimous conclusion among all the four models considered in this essay is that it is important to allow for a diffusion term—interest rate volatility depends on the level of lagged interest rates. This implies that term-structure models should allow volatility to depend on the level of interest rates if they want to adequately capture the dynamics of interest rates. Our results are broadly consistent with Naik and Lee's (1997) term-structure model, which allows for regime switching. However our results suggest that to adequately account for short-rate dynamics a term structure must also allow for diffusion-type effects. Future research could attempt to develop a term structure model that models interest rate volatility both as regime dependent and as a function of the level of interest rates. l.A Appendix: Quasi-Maximum Likelihood Estimation l.A.l Stochastic Volatility Model The Kalman filtering equations are presented for the stochastic volatility model dis-cussed in Section 1.2.3. We also show how to construct the quasi-likelihood function for use in parameter estimation. We reproduce the state and observation equations used in the Kalman filter. This set-up is general and applies to both the diffusion and the nondiffusion model presented in the empirical results. The nondiffusion model is a special case of the diffusion model with 7 set equal to 0. yt = xt + 27logr 4_! - 1.2704 + & , Essay 1. MSSV Interest Rate Models 34 Xt=LO + Xt-i + J]t. The Kalman filter has two basic steps: constructing the linear projection and then updating that projection. Let the information set available to the econometrician at time t be denoted by ipt = {yt, • • • ,Vo}- The inference regarding log-volatility conditional on information available to date i — 1 is denoted xt\\t-\\ = E[xt\\ipt-\\], and the mean squared error of this forecast is denoted as Pt\\t-i = E[(xt — a; t|t_i) 2|^ t_i]. Now we are in a position to present the Kalman filtering equations. In the derivation of the Kalman filtering equations we assume that the parameters are known with certainty. Issues regarding parameter estimation are deferred until after the filtering equations have been derived. The equations to forecast the state variable and to calculate its mean squared error are as follows. Step 1. Forecasting log-volatility: Z t | t _ l = co + C t e t - l | t - l , (1.11) pt\\t-i = 4>2pt-i\\t-i + o-2. (1.12) Inference regarding the observed variable yt can now be made because E f ^ l ^ - i ] = yt\\t-\\ = z t | t _ i + 27logr t_! - 1.2704, and E[(yt - y t | t_i) 2 |^_i] = pt-i\\t-i + ^ • Along with the assumption that £ t is conditionally Gaussian, we can calculate the density as f(yt\\ipt-i) = , 1 === exp -{yt - yt\\t-\\ 2(p«|t-i + f) l2 H\\t-and this can be used in quasi-maximum likelihood parameter estimation, where the log-likelihood £(y; 0) can be calculated as £(y;6) = 1£logf(yt\\A-i), t=i where 0 is the vector containing all relevant parameters. The maximum likelihood estimate of the parameters is then obtained by maximizing the log-likelihood numer-ically with respect to the parameter vector 0. Robust standard errors can then be constructed using the Hessian outer product Hessian estimate, where the Hessian and outer product estimates of the information matrix are calculated numerically. Step 2. Update the forecast: xt\\t = xt\\t-i +pt\\t-i (pt\\t-i + (yt-Vt\\t-i), (1-13) Essay 1. MSSV Interest Rate Models 35 ( ^ V 1 , X Pt\\t = Pt\\t-i ~Pt\\t-\\ I Pt\\t-i + y I Pt\\t-i- (1-14) It is also possible to construct a series of smoothed estimates of the state vector (xtyr)' and associated mean squared errors (pt\\r)- This is done by iterating backward from XT\\T and PT\\T, which are the terminal steps in Step 2 from the basic filter, on the following equations: Xt\\T = Xt\\t + Jt(xt+1\\T ~ Xt+l\\t), (1-15) Pt\\T = Pt\\t + Jt(Pt+l\\T -Pt+l\\t)Jt (1-16) for t = T - 1, T - 2 , . . . , 1, where Jt = Pt\\t^>Pt+i\\t-1.A.2 Markov-Switching Models Hamilton's (1989) filter is designed to calculate the probability of observing each state at each point in the sample using a very simple recursive filter. The filter requires the probability of observing each state for the first period and this essay uses the ergodic probability of the two state model for this purpose. Alternative choices include arbitrarily selecting the probability and estimating it as a free parameter. If we denote the probability of remaining in each regime from one period to the next by p = Pr[St = l\\St-i = 1] and q = ~Pr[St = 2\\St-i = 2], then the ergodic probability of state one occurring is Pr[5 0 = 1] = 1 ~ q 2-p-q' if there are only two states. The ergodic probability of state two occurring is cal-culated in a similar fashion, or simply as PrfSo = 2] = 1 — Pr[Srj = 1]. Hamilton (1994, p. 684) provided the formulas for calculating the ergodic probabilities in the more general K regime case. Given the probability of the different states occurring at t = 0, Hamilton's filter proceeds by iterating forward from t = 1,..., T on the following steps. Step 1. Calculate a forecast of the regime probabilities: K Pr[S t = # t _i] = Y, P*[St = i, St-i = Mt-i], (1.17) where Pr[5 t = i, St-i = = Pr[S t = z|5 t_i = j] x Pr[5 t_i = Mt-i]- Note that Pr[5t_i = j\\ipt-i] is the output of the filter from the previous iteration. Essay 1. MSSV Interest Rate Models 36 Step 2. Calculate the joint density: f(yt,St = i\\A-i) = f(yt\\St = i,ibt-i) * Pi[St = i\\ipt-i]. (1.18) This calculation requires that the density of yt is known conditional on a particular regimes realization. For example, if yt is Gaussian conditional on state i occurring with mean / i ^ and variance uft, where the mean and variance can depend on previous values of y (e.g. fin = cn + hyt-i and a2t = 8 5 i s the case for the AR(1) Markov-switching diffusion model), then the conditional density is given by f(yt\\St = i, ipt-i) = J - exp 2<4 Step 3. Calculate the unconditional density by integrating out the regimes: K f(yt\\A-i) = £ f(yt, St = i | ^ _ i ) . (i.i9) =i Step 4- Update the probability of observing the regimes: Pr[St^^t] = n y - ^ ; ^ \\ (1.20) f(yt\\wt-i) This holds because ipt = {yt,A-i} and Pi[A\\B] = P^[B]]• 1.A.3 Markov-Switching Stochastic Volatility Models In the derivation of the filtering equations presented here, use is made of the following notation: let x ^ = E[xt\\St = i , S t _ i = j,rpt-i], Pt\\f-i = E\\(xt ~ 4t-i)2\\St = i,St-! =j,ibt-i], x ^ = E[xt\\St = andpJJ.! = E[(xt - x ^ ) 2 ^ = i,^t-i]-Step 1. Forecast log-volatility: 4-i = \" i + ^ x S V i , (1-21) = ^ 1 1 , - 1 + * ( L 2 2 ) Step 2. Updating the forecast: Essay 1. MSSV Interest Rate Models 37 # = -4-! + y) _ 1 4- i ' (1-24) where = xf^ + 2 7 logr t _ 1 - 1.2704. The conditional density can be calculated as f(yt\\St = i, St-i = j, ipt-i) = i 1 = exp 2 2 ^ 1 + f (1.25) It is fairly clear that forecasting used this procedure will cause the number of obser-vations to blow up and the process will become entirely path dependant. Kim (1994) uses a very simple approach to remove this path dependence, and this approach is followed here. At each forecasting step we use the conditional expectation of log-volatility that depends only on the state in that period. This is obtained by taking a weighted average of the output of the previous iteration: ( i ) _ £ f = 1 P r [ 5 t = »,5«-i=J#t ]xjy ) *\"* \" Pr[5 t = t\\A] ' ( L 6 ) W _ S £ i P ^ = i, fl-i = JN(4 J ) + (sjff ~ 4J))2) P * ~ Pr[5 t = i\\M \" ' ( L 2 7 ) where Pr[5 t = = £ * i Pr[5 t = i, St-i = j|Vt]-A slight modification to Hamilton's filter allows the iterative calculation of the regime probabilities to be calculated: Step 1. Calculate a forecast the regime probabilities: Pr[5 t = t, St-! = Mt-i] = Pr[St = i\\St-i = j] x Pr[5 t_i = Mt-i] (1.28) Note that Pr[St-\\ = j\\4>t-i] is the output of the filter from the previous iteration. Step 2. Calculate the joint density: f{yt, St = i, St-i = = f(yt\\St = i, St-i = j, ibt-i) x Pr[5 t_i = jjV't-i], (1.29) where f{yt\\St = i , , St_i = j,ipt-\\) is defined as previously. Step 3. Calculate the unconditional density by integrating out the regimes: Essay 1. MSSV Interest Rate Models 38 Step 4- Update the probability of observing the regimes: D r c • c - i / i f\\Dt, St = i, St-i = j\\ipt-i) / i o i \\ P r S t = i,St-i = j\\ibt] = 7 ^ \" n ^ ( L 3 1 ) Hytm-i) l.A.3.1 Smoothing We next turn our attention to issues relating to smoothing. Define = E[xt\\St+i = k, St = i, tpr] andpj^ = E[(xt—x^)2\\St+i = k,St = i, V>T]- Kim (1994) showed how to calculate smoothed regime probabilities using a very straightforward backward-iterative algorithm: P r [ 5 t + 1 = k,St = Mr] = Pr[St+1 = A # r ] x P r [ 5 t = m * ^ j f ^ = ^ .=\\j], (1.32) (k,i) (i) r(k,i) / (fe) (fe,i) \\ / , „ „ \\ Xt\\T ~ Xt\\t ' Jt \\Xt+l\\T Xt+l\\t)' (i-.OO) for t = T - 1, T - 2 , . . . , 1, where J^l) = pg^O^],)\"1, a n d (0 _ E ^ i P r t S t + ^ f c . S ^ z l ^ r ] ^ X t | T P r [ 5 t = Z | ^ T ] W _ ZZ=1Pr[St+i = k,St = i\\1fr]p$) Pt\\T Pr[St = i | V r ] The regime-independent smoothed estimates of log-volatility and the mean squared error, xt\\r and pt\\r, can be calculated by taking a simple weighted average: K K %t\\T = Yl Y^xt\\T)pic[SW = k,St = i\\ipT], k=li=l PAT = YYp$Vx[St+1 = k,St = i\\ik}. fc=li=l References A i t Sahalia, Yacine, 1996, Testing Continuous-Time Models of the Spot Interest Rate, Review of Financial Studies 9, 385-426. Essay J . MSSV Interest Rate Models 39 Andersen, Torben G., Hyung-Jin Chung, and Bent E. S0rensen, 1999, Efficient Method of Moments Estimation of a Stochastic Volatility Model: A Monte Carlo Study, Journal of Econometrics 91, 61-87. Ball, Clifford A., and Walter N. Torous, 1999, The Stochastic Volatility of Short-term Interest Rates: Some International Evidence, Journal of Finance forthcoming. Black, Fisher, and Myron Scholes, 1973, The Pricing of Options and Corporate Lia-bilities, Journal of Political Economy 81, 637-654. Bollerslev, Tim, 1986, Generalized Autoregressive Conditional Heteroskedasticity, Journal of Econometrics 31, 307-327. Bollerslev, Tim, Ray Y . Chou, and Kenneth F. Kroner, 1992, A R C H Modeling in Finance: A Review of the Theory and Empirical Evidence, Journal of Econometrics 52, 5-59. Bollerslev, Tim, and Jeffrey M . Wooldridge, 1992, Quasi-Maximum Likelihood Esti-mation and Inference in Dynamic Models with Time Varying Covariances., Econo-metric Reviews 11, 143-172. Broze, Laurence, Oliver Scaillet, and Jean-Michel Zakoian, 1995, Testing for Cointinuous-Time Models of the Short-Term Interest Rate, Journal of Empirical Finance 2, 199-223. Cai, Jun, 1994, A Markov Model of Switching-Regime ARCH, Journal of Business and Economic Statistics. Chan, K. C , G. Andrew Karolyi, Francis Longstaff, and Anthony Sanders, 1992, The Volatility of Short-term Interest Rates: An Empirical Comparison of Alternative Models of the Term Structure of Interest Rates, Journal of Finance 47, 1209-1227. Cox, John, Jonathan Ingersoll, and Stephen Ross, 1985, A Theory of the Term Struc-ture of Interest Rates, Econometrica 53, 385-408. Davies, R. B., 1977, Hypothesis Testing When a Nuisance Parameter is Present Only Under the Alternative, Biometrika 64, 247-254. Essay 1. MSSV Interest Rate Models 40 Davies, R. B., 1987, Hypothesis Testing When a Nuisance Parameter is Present Only Under the Alternative, Biometrika 74, 33-43. Diebold, F. X. , 1986, Testing for Serial Correlation in the Presence of ARCH, Proceed-ings of the Business and Economic Statistics Section of the American Statistical Association pp. 323-328. Dueker, Michael J., 1997, Markov Switching in GARCH Processes and Mean-Reverting Stock-Market Volatility, Journal of Business and Economic Statistics 15, 26-35. Duffee, Gregory, 1993, On the Relation Between the Level and Volatility of Short-Term Interest Rates: A Comment on Chan, Karolyi, Longstaff and Sanders, Work-ing paper, Federal Reserve Board Washington D.C. Duffee, Gregory, 1996, Idiosyncratic Variation of Treasury Bill Yields, Journal of Finance 51, 527-551. Dunsmuir, W., 1979, A Central Limit Theorem for Parameter Estimation in Station-ary Vector Time Series and its Application to Models for a Signal Observed with Noise, Annals of Statistics 7, 490-506. Durland, J. Michael, and Thomas H. McCurdy, 1994, Duration-Dependent Transi-tions in a Markov Model of U.S. GNP Growth, Journal of Business and Economic Statistics 12, 279-289. Engle, Charles, and James D. Hamilton, 1990, Long Swings in the Dollar: Are They in the Data and Do Markets Know It?, American Economic Review 80, 689-713. Engle, Robert F., 1982, Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of U.K. Inflation, Econometrica 50, 987-1008. Engle, Robert F., Victor Ng, and Michael Rothschild, 1990, Asset Pricing with a Fac-tor A R C H Covariance Structure: Empirical Estimates for Treasury Bills, Journal of Econometrics 45, 213-238. Filardo, Andrew J., 1994, Business-Cycle Phases and their Transitional Dynamics, Journal of Business and Economic Statistics 12, 299-309. Essay 1. MSSV Interest Rate Models 41 Fridman, Moshe, and Lawrence Harris, 1998, A Maximum Likelihood Approach for Non-Gaussian Stochastic Volatility Models, Journal of Business and Economic Statistics 6, 284-291. Goodwin, Thomas H., 1993, Business-Cycle Analysis with a Markov-Switching Model, Journal of Business and Economic Statistics 11, 331-339. Gourieroux, C , A. Monfort, and E. Renault, 1993, Indirect Inference, Journal of Applied Econometrics 8, S85-S118. Gray, Stephen F., 1996, Modeling the Conditional Distribution of Interest Rates as a Regime-Switching Process, Journal of Financial Economics 42, 27-62. Hamilton, James D., 1989, A New Approach to the Economics Analysis of Non-Stationary Time Series and the Business Cycle, Econometrica 57, 357-384. Hamilton, James D., 1994, Time Series Analysis. (Princeton University Press: Prince-ton, USA). Hamilton, James D., and Raul Susmel, 1994, Autoregressive Conditional Het-eroscedasticity and Changes in Regime, Journal of Econometrics 64, 307-333. Hansen, Lars Peter, 1982, Large Sample Properties of Generalized Method of Mo-ments estimators, Econometrica 50, 1029-1054. Harvey, Andrew, Esther Ruiz, and Neil Shephard, 1994, Multivariate stochastic vari-ance models', Review of Economic Studies 61, 247-264. Harvey, Andrew, and Neil Shephard, 1996, Estimation of an Asymmetric Stochastic Volatility Model for Asset Returns, Journal of Business and Economic Statistics 14, 429-434. Hsieh, David A., 1991, Chaos and Nonlinear Dynamics: Application to Financial Markets, Journal of Finance 46, 1839-1877. Jacquier, Eric, Nicholas G. Poison, and Peter E. Rossi, 1994, Bayesian Analysis of Stochastic Volatility Models, Journal of Business and Economic Statistics 12, 371-389. Essay 1. MSSV Interest Rate Models 42 Kim, Chang-Jin, 1994, Dynamic Linear Models with Markow-Switching, Journal of Econometrics 60, 1-22. Layton, Allan P., and Daniel R. Smith, 2000, A Further Note on the Three Phases of the US Business Cycle, Applied Economics 32, 1133-1143. Naik, Vasant, and Moon Hoe Lee, 1997, Yield Curve Dynamics with Discrete Shifts in Economic Regimes: Theory and Estimation, Working paper, Faculty of Commerce, University of British Columbia Vancouver, Canada. Sandmann, Gleb, and Siem Jan Koopman, 1998, Estimation of Stochastic Volatility Models via Monte Carlo Maximum Likelihood, Journal of Econometrics 87, 271-301. Schaller, Huntley, and Simon van Norden, 1997, Regime Switching in Stock Market Returns, Applied Financial Economics 7, 177-192. So, Mike K., K. Lam, and W. K. Li , 1998, A Stochastic Volatility Model with Markov-Switching, Journal of Business and Economic Statistics 16, 244-253. Taylor, S.J., 1986, Modelling Financial Time Series. (John Wiley and Sons: Chich-ester, UK). Turner, Christopoher M. , Richard Startz, and Charles R. Nelson, 1989, A Markov Model of Heteroscedasticity, Risk, and Learning in the Stock Market, Journal of Financial Economccs 25, 3-22. Vasicek, Oldrich, 1977, An Equilibrium Characterization of the Term Structure, Jour-nal of Financial Economics 5, 177-188. Vuong, Quang H., 1989, Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses, Econometrica 57, 307-333. White, Halbert, 1980a, A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity, Econometrica 48, 817-838. White, Halbert, 1980b, Maximum Likelihood Estimation of Misspecified Models., Econometrica 50, 1-26. 43 Essay 2: Conditional Coskewness and Asset Prices. 2.1 Introduction This essay presents a new approach to empirically estimating and testing nonlinear asset pricing models. The approach is inspired by Harvey's (1989) test of the C A P M and the extension to multi-factor pricing models by He et al (1996). The failure of linear pricing models has encouraged the consideration of non-linear pricing models. An alternative response to the poor performance of the C A P M is to include condi-tioning information to allow for risk and prices of risk to change over time. Neither of these approaches has been entirely satisfactory in explaining the cross-section of equity returns. This essay combines these two approaches by developing a conditional non-linear pricing model which improves the performance of both conditional linear and unconditional non-linear pricing models. The simple, linear single-factor C A P M only holds under very specific conditions. When asset prices are non-normal and investors have non-quadratic preferences, then investors will care about all return moments and not only mean and variance, the assumption that drives the C A P M . There are a number of extensions to the basic two-moment C A P M which predict a CAPM-like linear relationships in which terms like coskewness (Kraus and Litzenberger (1976)) and cokurtosis (Fang and Lai (1997)) risks are priced. To date no true conditional test of such a pricing model has been undertaken. A recent paper by Dittmar (2002), which is related to the three-moment C A P M , considers a pricing kernel, or stochastic discount factor, which is a quadratic function of market returns. However, this line of research does not constitute a test of the three-moment C A P M because the model parameters are not tied down to the distribution of market returns as predicted by the C A P M . This essay attempts to fill this gap in the empirical literature. We develop an empirical framework in the spirit of the beta method to testing as-set pricing models that incorporate moments higher than variance. The approach is parsimonious: we avoid modelling any asset-specific parameters, while imposing the restrictions implied by asset pricing theory. The approach is also flexible: by a judi-Essay 2. Conditional Coskewness and Asset Prices. 44 cious parameterization we avoid placing restrictions on the dynamics of conditional covariance and coskewness; and by using generalized method of moments we avoid making restrictions on the exact distribution of returns. This essay can be viewed as either an alternative to, or an extension of, the SDF method which has become very popular in estimating and testing nonlinear asset pricing models. In particular we compare the results from modelling the higher moments of market returns with the SDF methodology of Dittmar (2002). The two approaches are approximations to reality, since they are derived using Taylor series expansions of either expected utility (in the three-moment CAPM) or the marginal rate of substitution (in the SDF). Which approximation conforms more closely to reality is fundamentally an empirical question. Although the methodology developed herein can in principle be applied to an asset pricing model that prices any market return moment, the empirical component of the essay truncates consideration of higher order moments at skewness, the third moment. There are a number of reasons for this limited focus: the fist is pragmatic; after mean-variance analysis, mean-variance-skewness seems like a reasonable next step. Second, there is already a reasonably large literature devoted to the role of skewness in asset prices. And lastly, we have theoretical motivations to assert that a representative agent would exhibit a preference for skewness, yet no such motivation exists for higher moments. In fact, the third moment is the highest moment that aggregates: if each investor has a preference for skewness, then the representative agent in a pareto-efficient allocation will too; yet no higher moments are known to share this property. Since we typically derive asset pricing models in a representative agent framework, it makes sense to consider only preferences for those moments which are known to aggregate. We apply the framework to address a number of interesting questions, such as: Are covariances and coskewness time varying? Are covariance and coskewness risks priced? Does the price of these risks vary through time? Do investors have a pref-erence for positive skewness, or do they exhibit aversion to negative skewness? Can preference for skewness explain the usefulness of characteristic-based factors such as size and book-to-price in explaining the cross-section of equity returns? Conversely, does coskewness explain part of the returns that these factors do not? Can the per-formance of the SDF methodology, as a test of factor pricing, be improved by linking the coefficients on factor returns more closely with theory? The remainder of the essay proceeds as follows. Section 2.2 discusses the conditional two- and three-moment CAPMs, and discusses an empirical procedure useful in test-Essay 2. Conditional Coskewness and Asset Prices. 45 ing these models. Section 2.3 then discusses the data and presents some evidence on the need for conditioning information. The major results are discussed in Section 2.4. Section 2.5 presents the results from estimating a multi-factor asset pricing model that includes the SMB and HML factors. Section 2.6 compares the improvement in performance relative to the basic C A P M induced by considering three factors and three moments, while Section 2.7 conducts specification tests to determine if the mod-els make full use of the conditioning information, and if the parameters are stable. Section 2.8 compares the results reported here in favour of the three-moment C A P M and the usefulness of coskewness with the contradictory results of a contemporary paper by Dittmar (2002). The essay concludes in Section 2.9 with a summary and a suggestion for future research. 2.2 Model Development and Motivation Let ^t-i denote the information set available to the market at the end of time period t — 1. An econometrician drawing statistical inference about an asset pricing model is at a disadvantage relative to the market, having available information variables Zt-\\, an L-vector, which is a strict subset of the information available to the market. For the purposes of the current analysis, we will ignore any complications that may be caused by omitted information variables. Asset pricing models are subject to a joint testing problem since any rejection of an asset pricing model may be due to the use of only a subset of the market's conditioning information. A second possible interpretation of any rejection is that the assumed functional form translating information into expectations may be incorrect. For simplicity, this study uses a simple linear functional form whereby conditional moments are a linear function of lagged information. Theory gives the econometrician virtually no guidance as to the nature of this unknown true functional form. One could view the chosen linear specification as a first-order approximation to this general form. 2.2.1 The Two-Moment CAPM The conditional specification of the two-moment capital-asset pricing model (CAPM) of Sharpe (1964) and Linter (1965) predicts a very simple linear relationship between the expected return on any risky asset is excess of the conditionally risk-free return, and a measure of that assets systematic risk. This relationship is defined in terms of Essay 2. Conditional Coskewness and Asset Prices. 46 expected returns: Et-i(ri,t) = Pi,tEt-i(rmtt) for i = 1... n, (2.1) where r^t and rmit refer to the excess returns (over the return on a conditionally risk-free asset) to asset i and the market respectively, and Et-i(-) = E(-\\tyt-i) is the expectation operator conditional on information available at time t — 1. The systematic risk, or beta (/?), of security i at time t is defined as: o _ Cov t - i ( r M , r m , t ) Var t_i (r m i t ) where Cov t_i(-, •) and Var t_i(-) are the conditional covariance and variance operators and Covt-i(ri,t,rmjt) = Et^Kr^t - £ t - i ( r i i t ) ) ( r m , t - Et-i(rmtt))], and Var t _i(r m i t ) = E t _i [ ( r m ) t - £ t - i ( r m , t ) ) 2 ] . This setup allows for an asset's beta to vary over time, as indicated by the time subscript. One difficulty with the basic two-moment C A P M is its very poor empirical performance. Numerous studies have documented its empirical failure. For recent surveys of the literature on the failure of the C A P M , along with other asset pricing anomalies, see Campbell, Lo, and MacKinlay (1997) and Campbell (2000). There have been a number of responses to the poor empirical performance of the C A P M . The earliest studies that documented some of the CAPM's shortcomings were unconditional. Unconditional tests of the C A P M , such as the seminal studies of Fama and MacBeth (1973) and Black, Jensen, and Scholes (1973), imposed the restriction that betas were constant over time. However, there is a large literature that documents the time-varying nature of broad-based stock index variance both in the US and internationally. Harvey (1989) and Schwert and Seguin (1990) also present evidence that covariances too are time-varying. And finally, Ferson and Harvey (1991, 1993, 1999) demonstrate that betas, both US and international, are also time-varying. If the C A P M only holds conditionally, there is no reason to be surprised when an unconditional test fails. One response to the failure of unconditional tests was to test the C A P M using methodologies that relaxed the constant beta assumption. An early set of these con-ditional studies were Harvey (1989), Bodurtha and Mark (1991), Ng (1991), and Essay 2. Conditional Coskewness and Asset Prices. 47 Bollerslev, Engle, and Wooldridge (1988). Bodurtha and Mark (1991) and Ng (1991) considered testing the conditional C A P M using a GARCH-based specification for the variances and covariances that allowed the asset betas to vary through time. Bo-durtha and Mark (1991) used Generalized Method of Moments (GMM hereafter), while Ng (1991) pursued a maximum likelihood approach. Both these studies, and others such as Schwert and Seguin (1990) Ferson and Harvey (1991, 1993, 1999), present a sequence of studies that allow for time-varying betas using a number of techniques, including rolling window estimation, linear in instruments as linear func-tions of a set of conditioning variables, and the method of Davidian and Carrol (1987). Jagannathan and Wang (1996) present yet another approach since they demonstrate that an unconditional version of a conditional C A P M will include variables that pre-dict variation in the price of risk (such as the default spread) as priced \"factors\". Still another literature models beta to be time-varying in a basic varying parameter regression model used in Fabozzi and Francis (1978), Bos and Newbold (1984) and Thompson (1986). Note that tests such as those of Bodurtha and Mark (1991), and of Ng (1991), which explicitly model covariances and prices of risk, are the most restrictive in nature. One advantage of the linear beta specification of Ferson and Harvey (1999) is that it does not impose any restrictions on the price of covariance risk. Yet another technique, which is applied in this essay, explicitly models the prices of risk while leaving the covariance dynamics unspecified. These last two approaches are neither more general nor restrictive than the other — they are simply different since we must choose to model either covariances or prices of risk. We favour modelling prices of risk rather than betas because this approach seems to be supported by the data. Ghysels (1998) demonstrates that the linear beta specification fails Andrews' (1993) test for structural breaks, a failure which suggests that there is a misspecification. Our G M M procedure requires specifying certain moments of market returns and permits us to test this using a number of specification tests. The results reported below find that the three-moment C A P M specification passes the structural break tests, and that the overidentifying moment conditions associated with stock market dynamics are not rejected. An important caveat is in order here. The strength of our chosen methodology is also a weakness. Because we avoid making restrictive assumptions about covariance dynamics (arguably the most difficult parameter in the C A P M to specify) we are unable to obtain explicit beta estimates. Without explicit betas we cannot analyze pricing errors in the traditional way — the error terms in our model are doubly ex-Essay 2. Conditional Coskewness and Asset Prices. 48 post in that they use not only ex-post return realizations, but it also uses the ex-post covariance realization (using the term covariance extremely loosely). Before illustrating the empirical methodology, we should introduce some notation. The notational convention used here is to use capitals to denote actual returns, and lower cases to refer to excess returns. Recall that we have defined r i j t as the excess return (over the risk-free return Rft) on the i-th risky asset over the period t — 1 to period t for i = 1..., N, and the excess return on the market portfolio is similarly defined as rmj. Conditioning is done with respect to an L-vector of information vari-ables Zt-\\. The following notational shorthand is used for the conditional moments of the market returns: • /it = Et-\\(rmit) is the conditional mean market return, of = Et-i[(rmit — Et-i(rmit))2] is the conditional variance of market returns, and = Et-i[(rmtt — E t_i(rm jt)) 3] is the conditional third central moment of market returns, which we will term the conditional skewness,1 and • A t = /it/of is the conditional price of covariance risk. 2.2.1.1 An Empirical Specification We use the conditional pricing equation implied by the two-moment C A P M in equa-tion (2.1) and the definition of each asset's conditional beta (2.2) to construct a set of testable moment restrictions on asset and portfolio returns. We can use these two equations to rewrite the pricing implications of the two-moment one-factor C A P M in terms of each asset's conditional covariance with the market, and the conditional price of covariance risk: TP ( \\ r> i \\ (r™,t) Et-i(ritt) = C o v t - i ^ t . r ^ t ) ; - -, Var t_i (rm > t) or using our simplified notation: Et-i(ritt) = C o v t - i ^ t . r ^ t ) ^ . (2.3) °~t 1 Traditionally, the skewness of a distribution is the third central moment standardized by the cubed standard deviation. However, when we refer to skewness in this essay it can be understood as referring to be the unstandardized third central moment. We use this slightly inaccurate simplification because the term skewness is less wieldy than third central moment. Essay 2. Conditional Coskewness and Asset Prices. 49 All market-wide variables are now grouped together. The benefit of this seemingly innocuous rearrangement will soon become apparent. The basic approach we employ to construct our empirical procedure is similar in spirit to that of Harvey (1989) for the basic single-factor two-moment C A P M , and the extension to multiple-factor two-moment asset pricing models by He et al (1996). The approach is to construct moment conditions implied by the C A P M without explicitly modelling the covariances. This is done by defining the moment conditions as the expectation of the conditional pricing equation. The relevant parameters are then estimated by minimizing the distance between the theoretically implied moments from the sample moments. The law of iterated expectations allows us to avoid modelling conditional covariances, since E[Covt-i(ri!Urmj)] = E[Et^i{ri:t - Et-.1(ritt))Et-1(rrnit - £ , t _ i ( r m i t ) ) ] = E[{riit - # t - i ( r M ) ) ( r m | t - Et-i{rmit))]. We are able to construct a test of the C A P M by modelling the mean and the variance of the market, leaving the exact dynamics of the conditional covariances unspecified. One drawback of this procedure, as employed by Harvey (1989) and Ghysels (1998), is that it requires modelling the asset-specific returns on the individual securities. In most empirical studies, including the two previously mentioned, this is modelled as linear in the chosen instrumental variables. Harvey (1989) notes that this linear projection is the conditional expectation if the returns and instruments are jointly spherically invariant. However, this seems fairly restrictive. If the postulates required for the application of this method are met, why do we need the C A P M at all? If the conditions are not met then we are only using an approximation. Fortunately, we are able to avoid modelling any asset specific parameters and still construct a conditional test of the C A P M . The simplification we employ to avoid modelling Et-i(r^t) uses the well known representation of the covariance between two random variables: Cov(A,5) = E[{A - E(A)){B - E(B))] = E[AB] - E(A)E(B), which suggests using the alternate definition of the covariance operator: Cov(A, B) = E[A(B - E(B))}. (2.4) Essay 2. Conditional Coskewness and Asset Prices. 50 We are now able to rewrite the moment restrictions of the conditional C A P M with-out modelling any asset specific parameters: Et- n,t - ritt(rmtt - Mt) — at 0 for i = 1, (2.5) If the C A P M is valid, then the conditional pricing errors r -^ — riit(rmtt — /j,t)u.t/a2 w m be orthogonal to the information variables used to identify the dynamics of market moments. We can use the orthogonality condition to estimate the parameters. This is done by forming the N x L orthogonality conditions: E n,t - ru(rm Z t _ i 0> = 0 L x l for i = 1 N. (2.6) where O L X I is an L - column vector of zeroes. The simplest case that we can consider imposes a linear relationship between the moments and the instruments. A linear specification for the mean return on the market, that is quite common in the literature — see Harvey (1989), He et al (1996), Whitelaw (1994), among others — is: (2.7) L-column vector of parameters (note that the instrument set includes a constant so the first element of \\i is the intercept). We can construct moment where n is an conditions by the assumption that the forecast error2 (2-8) is orthogonal to the instruments that are used to form it: hence E[(rm,t ~ Mt) ® Zt-i] = 0 i x i . (2.9) We also model the standard deviation as linear in instruments, which approach was also used by Whitelaw (1994), at = Zj_xa (2.10) where a is an L-column vector of relevant parameters. One main advantage of this 2 The error is a forecast error since the conditional mean is formed using the instrument set Zt-\\ which is observable in period t — 1. Essay 2. Conditional Coskewness and Asset Prices. 51 specification is that the strict positivity of the variance is imposed by construction. To identify the parameters, we require that the deviation of the squared residual from its conditional expectation, the variance, e2mt — o\\ be orthogonal to the conditioning variables. We construct enough moment conditions to exactly identify the variance parameters3 E[{e2m,-G2t)®Zt_1] = 0 L l X l . (2.11) These two sets of moment conditions (2.9) and (2.11) exactly identify the mean and variance parameters for market returns. The (N + 2) error vector ( rm,t — Zt-ll1 ^ < t - (z tV)2 t ft — ft(fm,t — Z7-lP) (zf'^a)2 ) (2.12) can be used to calculate the (N + 2)L moment constraints used in the two-moment model as 1 gT = -Y,ut® zt-i, (2.13) t=i which is the sample estimate of the moment constraint that E[ut ® Z t_i] = 0 L ( i V + 2 )xi , or alternatively, that E[ht] = 0L(N+2)XI, where ht = ut ® Zt-\\. Since there are 2L parameters to be estimated and (N + 2)L moment conditions, the system is overidentified with NL overidentifying restrictions. 2.2.1.2 Modelling Mean and Variance, or Mean and Price of Covariance Risk. He et al (1996) (HKNZ hereafter) present a similar approach to testing the conditional C A P M where they explicitly model both the conditional mean of the market and the price of covariance risk (Xt = fJ*t/o~t) 8 5 linear functions of the instrumental 3 Note that although we follow Whitelaw (1994) by modelling the standard deviation as being linear in the instruments, we differ in that we use the variance moment restriction rather than \\Jl/it times the absolute value of the residual from the conditional forecast of the market's return. Using the absolute value of the return innovation is equivalent to identifying a moment restriction on the standard deviation if the returns are Gaussian. In the context of this essay it would be inconsistent to assume Gaussian disturbances to identify the standard deviation and simultaneously model time variation in the skewness of returns. Essay 2. Conditional Coskewness and Asset Prices. 52 variables. Our approach, in contrast, is to model the mean and the standard deviation themselves, and impose the theoretical constraint that the price of covariance risk be equal to the ratio of the conditional mean to variance. Both specifications have the same number of parameters, but HKNZ's approach has only (TV + 1)L moment conditions, L less than our approach. The difference is that our approach imposes further restrictions on the variance as well. In their specification, HKNZ model the price of covariance risk, A t = / i t/of , as A t = Zj_x\\. (2.14) To implement their approach one needs the conditional mean return to the market and the price of covariance risk, giving that the pricing on risky asset % as: E[(ritt - ritt(rm Z t_i] = 0 L x i . (2.15) One can rearrange the price of covariance risk to express the conditional variance as a function of o\\ = ^ (2.16) which has the direct testable implication E 2 ^ t - iM (2.17) 2.2.2 The Three-Moment CAPM A third response to the CAPM's failure is to consider multi-moment extensions of the basic asset pricing model. It is well known that to achieve an asset pricing model that depends only on mean and variance one must impose either strict distributional and/or preference restrictions. If returns are normal, or if agents have quadratic utility then the two-moment C A P M obtains. These form two theoretical objections to using the two-moment C A P M : Firstly, returns are notoriously non-normal — they are skewed and leptokurtic. Secondly, quadratic utility implies economic agents have increasing absolute risk aversion and will have decreasing marginal utility of income at some wealth levels — both unpalatable from a behavioural perspective. For these reasons extensions to the basic C A P M that define preferences over higher moments are considered. We may be satisfied to accept the two-moment C A P M even though its assumptions Essay 2. Conditional Coskewness and Asset Prices. 53 are so terribly flawed if the model worked. But it does not work. A large literature has developed demonstrating the failure of both unconditional and conditional versions of the basic C A P M . There are a number of unconditional tests of the three-moment C A P M , including Kraus and Litzenberger (1976), Friend and Westerfield (1980) and Lim (1989). Har-vey and Siddique (2000b) present some recent work on testing for time-variation in skewness; and Harvey and Siddique (2000a) test whether, in the context of a three-moment conditional C A P M , the market risk premium changes over time. The results from unconditional tests are mixed at best: Kraus and Litzenberger (1976) and Lim (1989) find evidence supporting the three-moment C A P M , while Friend and West-erfield (1980) find evidence inconsistent with the three-moment C A P M . Harvey and Siddique (1999) present an extensive analysis of the effect of conditional coskewness on asset prices. They find both that coskewness accounts for part of explanatory power of the SMB and HML factors of Fama and French (1993), and that coskew-ness can explain part of the returns unexplained by these other factors. The same objection to unconditional testing of the two-moment C A P M if a con-ditional specification is valid holds as in the three-moment word. Unfortunately, however, to date there has been no true test of the conditional three-moment C A P M . This essay fills this gap. The closest research on this topic is the non-linear pric-ing kernel literature, of which Dittmar (2002) is the closest in spirit to our research. Dittmar (2002) presents a test of the stochastic discount factor model, where the discount factor is a cubic function of the return on the NYSE value-weighted stock index, and labour growth a la Jagannathan and Wang (1996). Dittmar (2002) per-forms conditional tests, allowing the coefficients in the polynomial expansion of the aggregate investors marginal rate of substitution to be sign-corrected functions of lagged information variables. He finds that one needs to include cokurtosis with labour growth rates to produce an admissible pricing kernel. The conditional version of the three-moment C A P M is Et-i[ri,t] = A , tMi,t + li,t(Jv,t, (2.18) where fi\\!t is the price of covariance risk, which is measured by asset /?, and fi2,t is the price of 7 risk, which is measured by an asset's 7. Just as the beta of an asset is the ratio of its return covariance with the market to the market return variance, an asset's 7 is defined as the ratio of the coskewness of that asset's return with the Essay 2. Conditional Coskewness and Asset Prices. 54 market's return, to the market's skewness: Coskewt_! ( r ^ r ^ ) 7i,t = ™ 7 N » ( 2 - 1 9 ) bkewt_i (r m i t ) and Coskewt_i(-, •) and Skewt_i(-) are the conditional coskewness and skewness op-erators respectively: Coskew t_i(r i i t,rm it) = ^ - l t ^ t - E t _i(r i i t ) ) ( r m i t - £ t _ i ( r m i t ) ) 2 ] and Skew t_i(rm ] i) = Et_i[(rmtt - £ t _ i ( r m j t ) ) 3 ] , and (iitt a n d 1^2,t a r e the prices of covariance and coskewness risk respectively. This model is a conditional form of Kraus and Litzenberger's (1976) two-period three-moment C A P M . As will be demonstrated below in appendix 2.A, such a con-ditional specification is appropriate because it is equivalent to a pricing kernel which is a quadratic function of the market return. Investors who exhibit non-increasing absolute risk aversion have a preference for positive skewness. This implies that \\±2,t should have the opposite sign of the con-ditional market skewness, K 3 , since in equilibrium investors are willing to sacrifice returns when the market is positively skewed, but demand a premium when returns are negatively skewed. The three-moment C A P M has a direct implication for the market risk premium. Since /?m > t = 7 m > t = 1, the three-moment C A P M implies that the excess return on the market portfolio is Et-i(rm,t) = [iltt + n2tt. (2.20) This has some very useful implications. One puzzling result in the conditional asset pricing test literature is the tendency of conditional expected market returns to be negative over extended periods — see, e.g., He et al (1996) and Harvey and Siddique (2000b). This result is inconsistent with investors being risk averse and caring only about mean and variance. However, when investors have a preference for positive skewness, if the market was sufficiently positively skewed the negative price of skew-ness risk may dominate the positive price of variance risk, resulting in a negative market risk premium. This idea is explored by Harvey and Siddique (2000b). Fang and Lai (1997) develop a four-moment C A P M and present some empirical tests. Dittmar (2002) tests the empirical performance of a pricing kernel which is a Essay 2. Conditional Coskewness and Asset Prices. 55 cubic function of the return to a value weighted stock index and growth in labour income. A cubic pricing kernel is equivalent to a four-moment C A P M . The four-moment C A P M , like its two- and three-moment cousins, is derived in a representative agent framework. When all agents are risk averse, then the representa-tive agent's utility function will also exhibit risk aversion. A similar result was proven by Kraus and Litzenberger (1983, Theorem 1) that when all individual agents exhibit non-increasing absolute risk aversion, then the representative agent in a pareto ef-ficient allocation will similarly exhibit non-increasing absolute risk aversion. Thus, preference for positive skewness aggregates. However, we know of no proof which demonstrates that aversion to kurtosis ag-gregates. Scott and Horvath (1980) have a proof of the direction of preferences for moments higher than variance that is quite general. Their proof relies on the assump-tion that agents have consistent preferences, or that the direction of preference for a particular moment of the distribution of random wealth does not change for different wealth levels. Given that this assumption holds, Scott and Horvath (1980) apply the mean value theorem to show that one can derive the direction of preferences for all higher moments recursively, and in particular that investors with a preference for mean, and an aversion to variance, will have a preference for positive skewness and an aversion to kurtosis. However, this result does not directly transfer to a represen-tative agent framework with a pareto-efficient allocation. It is not even clear that a representative agent will have consistent preferences over skewness. To derive a higher moment C A P M we must work in a representative agent frame-work. Therefore, even though we have strong reasons to believe that individual investors are averse to kurtosis, and that returns clearly have fat tails, it is not clear that a four-moment C A P M is appropriate since we don't know whether this pref-erence will aggregate. For this reason the empirical focus of this essay is on the conditional three-moment C A P M . We do, however, show how our approach can be extended to still higher moment asset pricing frameworks in a fairly trivial fashion. 2.2.2.1 An Empirical Implementation To implement empirically the three-moment C A P M , we must model the prices of covariance and coskewness risk, along with the conditional variance and skewness of the market. We continue with our linear standard deviation specification, and proceed with a similar specification for conditional skewness of market returns: that Essay 2. Conditional Coskewness and Asset Prices. 56 it is linear in our instrument set4 «? = Zj_lK. (2.21) We also model the price of covariance risk ii^t as linear in the instrument set HU = Zj_lfn. (2.22) Investors who exhibit non-increasing absolute risk aversion will have a preference for positive skewness and an aversion to negative skewness. The price of 7 risk, Li2,t, will therefore will have the opposite sign of the conditional skewness of the market's return — when the market is negatively skewed, /j.2,t is negative, and when the market is positively skewed, \\i2x is positive. For this reason we choose to model fj,2,t as t*2,t = Zj_xLi2 + 1{ZT_iK>0}8, (2.23) where 1{ Z T i K > 0 } is an indicator variable taking the value one when the conditional skewness of market returns is positive, and zero otherwise. Because we include a constant in the information set, the effect of the indicator variable 5 is to allow the constant to change depending on the sign of the conditional skewness. We can now redefine the conditional mean of the market as Lit = /ii,t + Li2,t, which along with the redefined market return innovation et = rm, — Liitt — fJ*2,ti enables us to form the mean and variance restrictions given in (2.9) and (2.11), noting the alternative form for the conditional mean E[{rmtt - Z t T _ l M l - Zj_lLi2 - l{z;_lK>Q}S) ® Zt-X] = 0. (2.24) This system can be augmented by the similarly defined moment restrictions on the market's conditional skewness, i.e., £ [ ( < t - K ? ) ® Z * - i ] = 0 L l X i . (2.25) We now need some way to simplify the conditional coskewness to avoid modelling the conditional mean of the risky asset's returns, analogous to our simplification of 4 An alternative to our choice of the linear specification is the autoregressive conditional skewness model of Harvey and Siddique (1999). Essay 2. Conditional Coskewness and Asset Prices. 57 the conditional covariance. This is achieved by expanding the conditional coskewness: Coskew t _i(r i ] t ) r m j i ) = Et-i[(ritt - Et-\\ritt)(rm>t - Et^rmtt)2] = Et_i{r^t{rmtt - £ 4 _ i r m j t ) 2 ] - Et_i{ritt}Et-x(rm^ - £ t - i » w ) 2 ] = E t _ 1 [ r i , t {( r m > t - / i t ) 2 - ( a - t ) 2 }] , which conveniently avoids the need to model any asset specific moments in con-structing a test of the three-moment C A P M . This, along with the specification of the conditional mean, variance and skewness, are sufficient to construct the moment restrictions on the cross-sectional returns implied by the three-moment C A P M , which are E[{rltt-rittem,t^-ri^e2mt-a2t)^)®Zt_1} = 0 L x l , for i = 1,.. . , TV. (2.26) We can combine this moment restriction, the identifying restriction on the conditional mean (2.24), variance (2.11), and skewness (2.25) of market returns, to redefine the error vector \\ (ri,t ~ rittc z t - i M i m,t 17T em,t Zj_xK [zt<_lK>oy ZJ-\\K (2.27) The error vector, which now has N + 3 elements, can be used to calculate the (N + 3) x L moment conditions ht = ut® Zt-\\ implied by the three-moment C A P M . The sample moments are estimated just as in the two-moment case, using equation (2.13). There are (TV + 3)L moment conditions and only 3L + 1 parameters to be estimated, which leaves NL — 1 overidentifying moment restrictions that can be used in forming model specification tests. 2.2.2.2 An Alternative Empirical Specification Just as we can specify the two-moment C A P M by modelling the conditional mean and the price of covariance risk as discussed in section 2.2.1.2, so too we can specify an empirical version of the three-moment C A P M which requires modelling the price Essay 2. Conditional Coskewness and Asset Prices. 58 of covariance and coskewness risks: A,,, = ^ (2:28) and Xu = ^ . (2.29) We need to have the conditional mean of the market returns to use our modelling of the conditional covariance, and the conditional mean and variance of market returns to use the simplified representation of the conditional coskewness of each asset. We thus have four parameters to model separately. We employ a linear specification of each of these four terms (recall that it is the standard deviation and not the variance that is modelled as a linear function of the instruments), which requires 4L parameters. The two new parameter vectors are Aj and X2, such that A M = Zj^Xu (2.30) and X2,t = Zj_xX2. (2.31) The previous specification required 4L + 1 parameters, so this specification is more parsimonious. The set of moment restrictions needed to estimate the parameters in this model are as in equations (2.9), and (2.11), along with the following: E[(riit - rittemitXu - rue2mtX2i) Zj_x] = 0NLxl. (2.32) A second advantage of this specification is that it is straightforward to apply the restrictions on the two co-moment risk prices. Both the price of covariance risk and the conditional variance of the market are positive, so the market's price of covariance risk must be positive: A i ) t > 0. Investors have a preference for positive skewness and an aversion to negative skewness implying that the sign of the price of 7 risk will have the opposite sign of the conditional skewness of market returns, so the markets price of coskewness risk must be negative, i.e., X2tt < 0. The way this essay sign corrects the prices of co-moment risk is: Xu = (^ T - iA i ) 2 (2.33) X2,t = -{ZUX2f. (2.34) Essay 2. Conditional Coskewness and Asset Prices. 59 One difference between the market moment approach discussed earlier and the cur-rent price of co-moment approach is that the former has a moment restriction on the conditional skewness of market returns while the latter does not. The reason for this is that we need the conditional skewness of market returns to specify the price of 7 risk in the market moment specification, while we explicitly model the price of coskewness risk in the current approach. However, because the skewness price of covariance risk is defined as the ratio of the price of 7 risk to the markets conditional skewness, we can augment the current moment restrictions using the following:5 E O L X I . (2.35) This expanded set of moment restrictions has the same dimension as the market moment specification of the skewness C A P M test. 2.2.3 Multi-Moment Extension Consider a general multi-moment asset pricing model which is a direct generalization of the two- and three-moment CAPMs. Assume that the general form of our con-ditional multi-moment asset pricing model posits that conditional expected market returns take the following form: Et_1{fltt) = J2Pi,A-i,t (2.36) fc=2 where Jk _ Et-^fjf. - Et-\\ritt)(rmt-l< m,tj ( r > „ 7 \\ P i >* - P . Ar . — P.. .r .\\k ^ • 6 i > is the standardized measure of risk which is the contribution of holding each asset i to the k-th moment of a diversified portfolio. These (3s have a number of well known special cases: • (3ft, which the traditional two-moment CAPM's beta 5 These moment restrictions follow from rewriting the price of coskewness risk as fj,itt = A i j t o f and using this in connection with the fact that in the three-moment C A P M \\it = + /j,2,t to obtain H2tt = fJ-t — ^i,t°~t a r , d using this to rewrite equation (2.29) giving the conditional skewness of market returns implied by the other parameters: 3 _ f't — Ai,tO~t Kt - C ^2 , t Essay 2. Conditional Coskewness and Asset Prices. 60 • 0ft, which is 7 in the Kraus and Litzenberger (1976) three-moment C A P M , is the ratio of coskewness with market returns to market return skewness • 8ft, which is the ratio of cokurtosis to kurtosis, is a term in a four-moment C A P M of Fang and Lai (1997). Such a pricing equation is consistent with some asset pricing model in which a repre-sentative agent cares about the first K moments of the distribution of their portfolio. Let bk}t be the time t price of the A:-th moment risk. Further, denote by [it the expected return on the market portfolio Et-i(rm,t) = lh- (2-38) Note that 0^ t = 1 V7c holds trivially, and we then have the following representation of the market return: K fit = T, (2-39) fc=2 Note that the summation starts at = 2 corresponding to the first priced moment risk being variance. We will apply the same approach to forming testable moment restrictions as in the two- and three-moment CAPMs. Take the A>th summand and rewrite as £ U ( r M - £ V i r ^ ) ( r W - S t - i ^ w ) * - 1 * — , (2.40) Mfc.t where Hkit = £ t _i [ ( r m , t - £ t = 1 ( r m , t ) ) f e ] , (2.41) which is useful in rewriting (2.40) as fefc-i,t = £ t _i [ r i i t {( r m , t - fit)\"-1 - Hk-Lt}^]-(2.42) Taking the unconditional expectation and simplifying gives the following moment restrictions: K h E[rilt(l - E ( ( W - Vtf-1 ~ Vk-i,t]—)l (2-43) fc=2 /*M Essay 2. Conditional Coskewness and Asset Prices. 61 where LL 1 = 0 since u.k refers to the k-th. central moment and the first central moment is zero, i.e. E[ritt - LH} = 0. (2.44) Theory permits us to sign the price of the first four moment risks. We can augment' the analysis with the instrument set Zt-i, an L-dimensional vector of information variables available at time t — 1, since the pricing errors in (2.43) should be orthogonal to all such information: K i E[(ritt(l - £ { ( r m , t - lit)*-1 - / i f c _ i , t } - ^ } ) ® Zt-J = 0 L x l . (2.45) We also use the K x L moment restriction implied by our linear conditional moment parameterization K-l E[(rm,t - £ hit) ® Zt_x\\ = 0 i x i (2.46) fc=0 E[(rm,t - Lit)k - Lik,t) ® Zt^) = 0 Vfc. (2.47) We therefore have a set of (N + K) x L moment conditions. 2.3 Data and Estimation Methodology To test the empirical performance of the asset pricing models discussed above, we follow the traditional approach of using portfolios rather than individual assets, which dates back to the seminal work of Black, Jensen, and Scholes (1973) and Fama and MacBeth (1973). The principle advantage of using portfolios rather than individual assets is reduction in measurement error associated with individual assets. We consider two portfolio formation approaches. The first data source consists of seventeen portfolios formed by sorting the stocks on the NYSE, A M E X and NASDAQ according to industries. The second sorting procedure forms twenty-five portfolios (herein FF25) which are allocated on the basis of their size and the ratio of their book equity and their market equity. This data set has been particularly troublesome to price. This data set has been studied conditionally by, among others, He et al (1996) who found that a conditional two-moment C A P M and Fama and French (1993) three-and five-factor models failed to price these portfolios, and Ferson and Harvey (1999) who used a linear beta specification in testing the performance of again a two-moment C A P M and the FF three-factor model. The next step in our empirical study is to define what information variables we will Essay 2. Conditional Coskewness and Asset Prices. 62 consider. We use a set of six information variables which are fairly standard in the literature, in fact they are taken directly from He et al (1996), but many other studies use similar information variables. We include a constant, the S&P500 index return (S&P), the dividend price ratio on the S&P500 index, defined as the cumulative 12-month dividends divided by the current price level (Div), the term spread measured by the difference on yields on a three-month treasury bill and a one-month treasury bill rate (Term), the junk bond yield spread measured as the difference in yields on Baa rated bonds over Aaa investment grade bonds (Junk), and finally the one-month treasury bill rate (Tbill). Each variable is standardized to have zero same mean and unit sample variance to simplify interpretation of the parameter estimates. To ensure that the inference is conducted using only publicly available information, we use only lagged values of the information variables; hence the use of the t subscript to refer to asset and portfolio returns, e.g. rt, and the use of t — 1 as the information variable time subscript Zt-\\. To illustrate that these variables have power to explain the industry portfolios con-sider Table 2.1 where we report the results of running a very simple linear regression where we use the lagged economic information variables to predict individual stock returns as in £ t _ i ( r M ) = ZjLrfi for i = 1,..., n. (2.48) The parameters on all 17 and then 25 portfolios are estimated jointly and the het-eroscedasticity consistent standard errors are reported in the parenthesis. We also report the results of a Wald test along with asymptotic p-value. The results reported in this table are consistent with the vast majority of empirical studies which show that economic variables are quite powerful in explaining the variation in individual portfolio returns. The main focus of this study is on analyzing the effect of conditional covariance and coskewness on asset prices. Before proceeding with further analysis it would be useful to determine the extent to which the portfolio co-moments vary over time. To approach this question a very simple testing procedure is applied. Each co-moment is explicitly modelled as a linear function of the lagged information variables. We use the definition of the conditional covariance and coskewness operators to form the following moment conditions for covariance: E[(eiitemit - tr i 7 r M) ® Zt-i] = 0 i x i (2.49) Essay 2. Conditional Coskewness and Asset Prices. 63 Table 2.1: Portfolio Return Predictability This table presents the parameter estimates and G M M standard errors from projecting excess returns on industry and size and book-to-market sorted portfolios onto a set of six instrumental variables: E[(ra, - Zj_lQi) ® Zj_x\\ = Owxi for i = 1 , . . . , N over the period July 1963 to December 1997. Parameter Constant S & P Div Term Junk T b i l l Wald Industry 1 0.7248 -0.3013 0.0903 -0.6750 0.9747 -0.7976 25.8316 ( 0.2178) ( 0.3142) ( 0.3830) ( 0.3077) ( 0.3499) ( 0.3483) ( 0.0002) Industry 2 0.4771 0.2886 1.2207 -1.1040 0.4354 -1.7703 21.0752 ( 0.2989) ( 0.3789) ( 0.5215) ( 0.4513) ( 0.5148) ( 0.5253) ( 0.0018) Industry 3 0.6338 0.0283 0.6303 -0.4616 0.2771 -1.0617 20.4857 ( 0.2458) ( 0.3470) ( 0.4385) ( 0.3687) ( 0.4070) ( 0.4781) ( 0.0023) Industry 4 0.5645 0.3296 1.2589 -0.9194 1.2587 -2.0053 41.3497 ( 0.2992) ( 0.3712) ( 0.5220) ( 0.3705) ( 0.4768) ( 0.4502) ( 0.0000) Industry 5 0.5641 0.1118 0.3971 -1.1035 1.0408 -1.6670 31.6829 ( 0.2573) ( 0.3511) ( 0.5096) ( 0.3516) ( 0.4400) ( 0.4143) ( 0.0000) Industry 6 0.5131 -0.1269 0.2686 -0.7539 1.1003 -1.3621 22.9550 ( 0.2463) ( 0.3605) ( 0.4619) ( 0.3487) ( 0.4236) ( 0.4073) ( 0.0008) Industry 7 0.7511 -0.0458 0.0802 -0.7335 0.4884 -0.6089 18.5021 ( 0.2331) ( 0.3289) ( 0.4355) ( 0.3394) ( 0.3726) ( 0.3622) ( 0.0051) Industry 8 0.5539 0.0034 0.9781 -0.9066 0.9186 -1.8459 27.5108 ( 0.2808) ( 0.3785) ( 0.5104) ( 0.4079) ( 0.4809) ( 0.4645) ( 0.0001) Industry 9 0.2647 0.0336 0.2554 -0.5607 0.8893 -1.2372 10.6757 ( 0.2851) ( 0.3803) ( 0.5159) ( 0.4047) ( 0.4926) ( 0.5236) ( 0.0989) Industry 10 0.5501 -0.0045 0.8257 -0.9154 0.7092 -1.6214 30.9391 ( 0.2481) ( 0.3346) ( 0.4334) ( 0.3456) ( 0.4213) ( 0.3850) ( 0.0000) Industry 11 0.4782 0.1104 0.9240 -1.0816 1.0130 -2.1982 41.4714 ( 0.2658) ( 0.3526) ( 0.5005) ( 0.3583) ( 0.4289) ( 0.4074) ( 0.0000) Industry 12 0.4522 0.1660 0.6288 -0.8898 1.6179 -2.1989 57.4810 ( 0.2656) ( 0.3229) ( 0.4766) ( 0.3394) ( 0.4038) ( 0.4256) ( 0.0000) Industry 13 0.5836 0.2049 1.2267 -0.9892 0.7772 -1.9077 36.4546 ( 0.2865) ( 0.3891) ( 0.5199) ( 0.3713) ( 0.4596) ( 0.4555) ( 0.0000) Industry 14 0.3492 -0.5594 0.1582 -0.6461 0.7628 -0.7014 16.1490 ( 0.1876) ( 0.2629) ( 0.2926) ( 0.2256) ( 0.2750) ( 0.3238) ( 0.0130) Industry 15 0.6180 0.1161 0.4485 -0.7395 .1.3945 -1.4252 27.6342 ( 0.2755) ( 0.3718) ( 0.5012) ( 0.3792) ( 0.4524) ( 0.4488) ( 0.0001) Industry 16 0.6192 -0.1072 0.4869 -0.5320 0.7794 -1.1685 20.6450 ( 0.2526) ( 0.3483) ( 0.4386) ( 0.3268) ( 0.4210) ( 0.3839) ( 0.0021) Industry 17 0.4961 -0.1107 0.5429 -0.7213 0.8903 -1.2989 29.7533 ( 0.2169) ( 0.3089) ( 0.3968) ( 0.2893) ( 0.3610) ( 0.3424) ( 0.0000) S1-BM1 0.2037 0.6460 2.4506 -1.3905 0.5331 -2.8881 40.9238 ( 0.3702) ( 0.4479) ( 0.6178) ( 0.4388) ( 0.5549) ( 0.5821) ( 0.0000) S1-BM2 0.7141 0.4656 1.7798 -1.1741 0.7904 -2.3763 44.5149 ( 0.3242) ( 0.4000) ( 0.5491) ( 0.4012) ( 0.4922) ( 0.5183) ( 0.0000) Essay 2. Conditional Coskewness and Asset Prices. 64 Table 2.1 cont'd. Parameter Constant S & P D i v Term Junk T b i l l Wald S1-BM3 0.8198 0.3787 1.4482 -1.1260 0.9740 -2.2543 46.9675 ( 0.2953) ( 0.3679) ( 0.5135) ( 0.3622) ( 0.4727) ( 0.4788) ( 0.0000) S1-BM4 0.9948 0.3548 1.2032 -1.0710 0.8614 -2.0297 47.3004 ( 0.2771) ( 0.3612) ( 0.4778) ( 0.3697) ( 0.4595) ( 0.4653) ( 0.0000) S1-BM5 1.1271 0.4425 1.4269 -1.2424 0.8220 -2.2928 49.8160 ( 0.2951) ( 0.3799) ( 0.5102) ( 0.3971) ( 0.5074) ( 0.5023) ( 0.0000) S2-BM1 0.3821 0.0992 1.7609 -1.2868 0.8400 -2.5149 34.9017 ( 0.3484) ( 0.4309) ( 0.6079) ( 0.4276) ( 0.5215) ( 0.5306) ( 0.0000) S2-BM2 0.6772 0.1332 1.5589 -1.0412 0.9467 -2.2047 45.4474 ( 0.2947) ( 0.3852) ( 0.5337) ( 0.3782) ( 0.4598) ( 0.4693) ( 0.0000) S2-BM3 0.9374 0.0378 1.0865 -1.0347 1.0407 -2.0019 50.8823 ( 0.2682) ( 0.3522) ( 0.4808) ( 0.3462) ( 0.4482) ( 0.4302) ( 0.0000) S2-BM4 1.0147 0.0429 0.9763 -0.9638 0.9018 -1.7180 47.8487 ( 0.2500) ( 0.3366) ( 0.4407) ( 0.3360) ( 0.4227) ( 0.4116) ( 0.0000) S2-BM5 1.0777 0.0743 1.2290 -1.0013 0.7734 -1.8455 39.7271 ( 0.2839) ( 0.3623) ( 0.4921) ( 0.3690) ( 0.5001) ( 0.4853) ( 0.0000) S3-BM1 0.4410 0.0196 1.4662 -1.0420 0.9730 -2.2726 33.6775 ( 0.3159) ( 0.4146) ( 0.5523) ( 0.3990) ( 0.4842) ( 0.4864) ( 0.0000) S3-BM2 0.7267 0.0014 1.1293 -0.9286 1.0642 -1.9549 45.0165 ( 0.2646) ( 0.3518) ( 0.4807) ( 0.3458) ( 0.4363) ( 0.4179) ( 0.0000) S3-BM3 0.7697 0.0117 0.9423 -0.8856 0.8289 -1.7341 44.0586 ( 0.2408) ( 0.3225) ( 0.4307) ( 0.3227) ( 0.4104) ( 0.4015) ( 0.0000) S3-BM4 0.9006 -0.0101 0.7537 -0.8543 0.8228 -1.5396 43.6049 ( 0.2302) ( 0.3136) ( 0.3966) ( 0.3152) ( 0.3938) ( 0.3844) ( 0.0000) S3-BM5 1.0251 -0.0048 0.9827 -0.8958 0.7449 -1.5320 35.8926 ( 0.2655) ( 0.3664) ( 0.4738) ( 0.3721) ( 0.4795) ( 0.4523) ( 0.0000) S4-BM1 0.4768 -0.1081 1.0633 -1.0029 0.8569 -1.8173 29.0162 ( 0.2780) ( 0.3696) ( 0.4801) ( 0.3818) ( 0.4408) ( 0.4326) ( 0.0001) S4-BM2 0.5049 -0.0263 0.9644 -0.8431 0.9447 -1.7503 36.9114 ( 0.2503) ( 0.3489) ( 0.4580) ( 0.3398) ( 0.4145) ( 0.3962) ( 0.0000) S4-BM3 0.7573 -0.1347 0.7021 -0.7972 0.9602 -1.5363 38.5737 ( 0.2327) ( 0.3398) ( 0.4187) ( 0.3298) ( 0.3945) ( 0.3913) ( 0.0000) S4-BM4 0.8656 -0.2094 0.6032 -0.8170 0.9314 -1.5744 39.9430 ( 0.2219) ( 0.3008) ( 0.3829) ( 0.2942) ( 0.3775) ( 0.3710) ( 0.0000) S4-BM5 0.9581 0.0296 0.8627 -0.9143 0.9158 -1.6194 38.0299 ( 0.2569) ( 0.3531) ( 0.4297) ( 0.3561) ( 0.4419) ( 0.4386) ( 0.0000) S5-BM1 0.4743 0.0315 0.2809 -0.8766 0.8125 -1.2478 26.7015 ( 0.2252) ( 0.3224) ( 0.4243) ( 0.3251) ( 0.3655) ( 0.3502) ( 0.0002) S5-BM2 0.4814 -0.1517 0.3667 -0.7327 0.9192 -1.2356 29.2772 ( 0.2154) ( 0.2987) ( 0.4090) ( 0.2889) ( 0.3551) ( 0.3569) ( 0.0001) S5-BM3 0.4691 -0.2268 0.3811 -0.4935 0.6590 -1.1396 25.2222 ( 0.1928) ( 0.2807) ( 0.3567) ( 0.2612) ( 0.3151) ( 0.3339) ( 0.0003) S5-BM4 0.6492 -0.1758 0.2113 -0.4726 0.7884 -1.0011 29.4554 ( 0.1905) ( 0.2867) ( 0.3364) ( 0.2436) ( 0.3174) ( 0.3174) ( 0.0000) S5-BM5 0.7637 -0.1006 0.2218 -0.7564 0.9230 -1.0899 32.2912 ( 0.2219) ( 0.3116) ( 0.3685) ( 0.2861) ( 0.3527) ( 0.3389) ( 0.0000) Essay 2. Conditional Coskewness and Asset Prices. 65 and for coskewness: «im , t ) <8> Zt-i] = 0 L x l - (2.50) To implement this test we need the market and portfolio return residuals e m | t = rm = 0 a f - 6 ) = 0 = cr'i S4-BM3 141.9376 25.7475 142.8737 28.9491 3.6864 S4-BM4 147.3045 21.6469 148.5868 24.1601 3.4748 S4-BM5 126.6146 18.2108 129.0533 21.9987 3.2772 S5-BM1 143.8768 18.0848 139.9010 19.4355 2.4857 S5-BM2 157.0880 25.2237 157.0778 27.8773 3.7308 S5-BM3 145.0289 20.4316 146.9290 22.3938 2.8766 S5-BM4 152.8569 18.5939 151.8551 20.3178 2.4922 S5-BM5 135.1814 17.4915 135.8254 20.6972 3.5634 .1 p- value 10.6446 9.2364 10.6446 9.2364 10.6446 .05 p-value 12.5916 11.0705 12.5916 11.0705 12.5916 .01 p- value 16.8119 15.0863 16.8119 15.0863 16.8119 Table 2.3: Predictability of Conditional Coskewness This table presents Wald statistics for the null hypothesis of zero and constant con-ditional coskewness for two alternate specifications (resp. cols 2 and 4, and 3 and 5); and a test for the equivalence of the specifications (in col. 6) for both the industry and FF25 data sets. The test statistics in columns 1, 3 and 5 are distributed as chi-squared with 6 degrees of freedom, and the statistics in columns 2 and 4 have 5 degrees of freedom. Null Hypothesis Ki = 0 Kt6) = 0 «; = o 42\"6) = o = < Industry 1 11.4221 9.8132 12.2664 11.0306 2 3845 Industry 2 16.7075 15.4475 15.3192 14.3621 4 6562 Industry 3 11.1959 10.9013 10.5747 10.4208 4 7754 Industry 4 9.6256 9.2014 9.9925 9.5982 1 2912 Industry 5 11.2472 10.1716 10.2665 9.6924 4 5751 Industry 6 10.6372 10.4857 10.6374 10.5516 4 3061 Industry 7 6.3022 4.7167 5.3623 4.3641 2 9054 Industry 8 13.8846 13.6258 14.7880 14.7487 2 5705 Industry 9 13.2540 13.0011 13.1776 12.8005 2 6184 Industry 10 13.3198 12.7113 13.5346 13.2649 4 4177 Industry 11 10.4639 10.2284 9.4872 9.3827 3 7191 Industry 12 13.1802 11.6146 14.6456 14.3102 4 8075 Industry 13 13.7167 13.3943 13.1584 13.1061 3 9287 Industry 14 15.5776 12.9536 13.7123 11.8435 4 3022 Industry 15 6.4961 5.8630 6.8720 6.2955 4 0636 Industry 16 10.5049 9.4303 11.3799 10.7066 3 9468 Industry 17 11.8031 10.8348 12.1275 11.4979 4 2151 S1-BM1 12.8052 12.4478 12.3800 12.2712 4 0884 S1-BM2 11.5990 11.2952 11.6601 11.2917 3 4446 S1-BM3 11.4788 11.0150 12.1422 11.6983 3 2478 Essay 2. Conditional Coskewness and Asset Prices. 69 Table 2.3 cont'd. Null Hypothesis K-i = 0 = 0 < = o 42~6} = o H-i — S1-BM4 12.0609 11.3369 12.6105 11.7389 3.0514 S1-BM5 12.4003 11.4241 13.2081 11.9092 3.5175 S2-BM1 12.4372 12.2779 12.1349 12.1246 4.0445 S2-BM2 11.3686 10.8347 11.9725 11.5884 2.0027 S2-BM3 12.0991 11.5821 13.5147 13.2464 3.4557 S2-BM4 12.7791 11.9154 13.6639 12.8019 2.9953 S2-BM5 12.1559 11.0156 13.3770 12.0952 2.5235 S3-BM1 12.2432 11.9765 12.4375 12.3508 3.5380 S3-BM2 12.6984 12.2001 13.7657 13.5045 2.6660 S3-BM3 12.3032 11.2531 13.6486 12.9916 2.6154 S3-BM4 12.2776 11.4620 13.6275 12.9415 3.1284 S3-BM5 12.5209 11.9842 14.0354 13.5444 2.0566 S4-BM1 11.2680 10.5854 10.7989 10.5656 4.1717 S4-BM2 12.3831 11.8895 13.1282 12.9326 2.3273 S4-BM3 11.4907 10.6329 12.6646 12.1686 1.3378 S4-BM4 12.9956 11.5639 13.7875 12.7208 4.0326 S4-BM5 11.7909 10.7642 13.6117 12.7047 1.9881 S5-BM1 8.6456 8.0538 7.7884 7.5392 5.1517 S5-BM2 10.4718 9.6527 10.3957 10.0210 4.9463 S5-BM3 14.9645 14.5276 14.9099 14.7728 7.0990 S5-BM4 11.1459 10.2037 11.7892 11.1310 4.9697 S5-BM5 14.1540 13.1814 14.8738 14.3921 4.0084 .1 p- value 10.6446 9.2364 10.6446 9.2364 10.6446 .05 p-value 12.5916 11.0705 12.5916 11.0705 12.5916 .01 p- value 16.8119 15.0863 16.8119 15.0863 16.8119 2.4 Empirical Results Table 2.4 presents the parameter estimates of the two-moment C A P M where the price of covariance risk and variance of market returns are modelled explicitly. The param-eters are estimated using Hansen's (1982) generalized method of moments (GMM) with the efficient weighting matrix iterated until convergence. Except where explicitly stated to the contrary, all models are estimated using this procedure. The positive coefficient on ii in both data sets indicates that the unconditional price of covariance risk is positive as expected, although the coefficient is not significant in the industry data. It is interesting that the vast majority of the coefficients in the mean equation have the opposite sign to those in the standard deviation equa-Essay 2. Conditional Coskewness and Asset Prices. 70 Table 2.4: Parameter Estimates of the Two-Moment Model Parameter estimates and standard errors are reported for the test of the conditional two-moment C A P M with linear means and variances. Data covers two data sets: the FF25 size and book-to-market sorted stocks, and the 17 industry sorted assets. Data spans the period July 1962 to December 1997. The moment restrictions implied by the model are: E[(rm Zj_x] = 0Lxl E[(e2t - o\\) ZlJ = 0Lxl E[{ezt - K3t) Zj_x] = 0 i x l E[(rht - r u e m ^ - r M « t - e r 2 ) ^ ) ® Zj_x\\ = 0NLxl where et = rm)t - /xi,t - M2,t, Mi,t = ^t-iMi> M2,t = Zj-ito + 1 { Z , T _ 1 K > O } ^ . °t = Zt-ia> and = Z~J_XK. Parameter Constant S&P Div Term Junk Tbill FF25 Mi 0.6987 -0.2058 -0.3927 -0.5039 1.2634 -1.1117 ( 0.1797) ( 0.1723) ( 0.2151) ( 0.2130) ( 0.2295) ( 0.2442) M2 0.1096 0.2990 0.5471 -0.4573 -0.2991 0.0818 ( 0.0786) ( 0.0768) ( 0.1658) ( 0.1212) ( 0.0798) ( 0.1032) a 4.1132 -0.4654 0.2079 0.4969 0.0660 0.7187 ( 0.1226) ( 0.1119) ( 0.1488) ( 0.1101) ( 0.1488) ( 0.1313) K -29.4132 22.5666 91.3647 -66.0606 -21.5275 -58.9323 ( 5.6301) ( 4.3089) ( 17.3957) ( 12.5884) ( 4.0897) ( 11.2583) 5 -0.8712 (0.1857) J-statistic 172.5394 (0.0467) 143 d.f. Zero A i j t 66.5482 (0.0000) 6 d.f. Zero A 2 ) t 28.5560 (0.0002) 7 d.f. Constant 37.5735 (0.0000) 5 d.f. Constant H2,t 27.8783 (0.0001) 6 d.f. Constant a\\ 207.3796 (0.0000) 5 d.f. Constant K\\ 28.2343 (0.0000) 5 d.f. It is evident that just as conditioning information is useful in the two-moment C A P M , so too conditioning information is very important to the three-moment C A P M since we clearly reject the null hypotheses that the prices of both covariance and coskewness risk are constant. We also find strong evidence that both the markets' variance and skewness are time varying. The dynamics of the mean, variance and skewness of market returns were also estimated using only market data, but to save Essay 2. Conditional Coskewness and Asset Prices. 73 Table 2.5 cont'd. Parameter Constant S&P Div Term Junk Tbill Industry Mi 0.5847 -0.2002 -0.2235 -0.6745 1.2032 -1.5881 ( 0.1866) ( 0.2000) ( 0.2433) ( 0.2296) ( 0.2475) ( 0.2654) M2 0.1512 0.3480 0.5666 -0.4486 -0.3436 0.1466 ( 0.0931) ( 0.1172) ( 0.2071) ( 0.1513) ( 0.0938) ( 0.1495) cr 4.0842 -0.5849 0.2064 0.3591 0.0611 0.5877 ( 0.1042) ( 0.1017) ( 0.1733) ( 0.1132) ( 0.1559) ( 0.1507) K -29.3989 22.6047 91.3191 -66.1290 -21.4061 -58.8891 ( 0.6674) ( 0.5037) ( 1.4363) ( 1.9825) ( 0.6676) ( 1.5366) 6 -0.8019 (0.1612) J-statistic 118.2748 (0.0532) 95 d.f. Zero Ai^ 67.5960 (0.0000) 6 d.f. Zero A2,t 33.9650 (0.0000) 7 d.f. Constant u.lit 49.3241 (0.0000) 5 d.f. Constant H2,t 33.9512 (0.0000) 6 d.f. Constant of 155.1923 (0.0000) 5 d.f. Constant K\\ 24336.1363 (0.0000) 5 d.f. space are not reported. The variance and skewness parameters were much less sig-nificant. This interesting observation demonstrates that information provided by the cross-section of equity returns is useful in pinning down the time series dynamics of market returns. Theory predicts that the price of covariance risk be positive and the price of 7 risk have the opposite sign. Incorporating coskewness in the pricing model increases the constant term in the price of covariance risk in both data sets. This increase is sufficient that the industry data now support the conclusion that the unconditional price of covariance risk is positive and statistically significant, while it was insignif-icantly positive in the two-moment C A P M for industry portfolios. The constant in //2,t) which is the unconditional price of 7 risk when the market is negatively skewed, is positive in both data sets, but not statistically significant. However, the parameter 5, which captures a level shift in the intercept when the market is positively skewed, is statistically significant and negative, as theory suggests it should. These parame-ter estimates convey an interesting story: they indicate that investors are somewhat ambivalent towards negative skewness since they don't demand a premium for bear-ing coskewness risk when their diversified portfolio is negatively skewed, but have a Essay 2. Conditional Coskewness and Asset Prices. 74 strong preference for positive skewness since they are willing to sacrifice returns for bearing coskewness risk when returns are positively skewed. One is always concerned, when estimating models using G M M about focusing too closely on the JT statistic since this favours models that produce volatile pricing errors. One robustness check is to compare the statistical significance of the param-eters. When the variance-covariance matrix of moment conditions is large, then we have both large standard errors in both tests for the significance of pricing errors and parameters. A spurious JT statistic should therefore be accompanied by insignificant parameter estimates. Fortunately we observe that both the JT statistic and and the parameters are significant, somewhat alleviating our concerns. To further assuage these worries, we also calculate the models using two types of fixed weighting matrices — the identity matrix, which weights all moments as being equally important, and the matrix W, chosen to recognize the wisdom in focusing more attention on portfolios whose returns are less noisy, and which accounts for the correlation between portfolios. It is critically important that the weighting matrix W be independent of any parameter estimates. The matrix is formed by taking the de-pendent variable in each of the moment conditions, for example r m j i in the moments identifying the mean and t for the moments identifying the conditional standard deviation, and stacking them in a matrix yt. The conditioning information is incorpo-rated in a simple empirical fashion. With the exception of the markets variance, the fitted values of each element of yt are estimated using a linear relationship estimated using least squares, and the conditional variance is fitted by modelling the conditional standard deviation. Let vt = yt — yt{Zt-\\) be the residual from this fitting, then the weight is defined as the inverse of the linear residual covariance matrix. This moment matrix is moti-vated in spirit by the efficient weighting matrix, and would be the efficient weighting matrix if the conditional moments were the fitted yts and the moments were not autocorrelated. - l Essay 2. Conditional Coskewness and Asset Prices. 75 Both advantages of the covariance-matrix based weights are important, but we would argue that accounting for correlations is more critical. The pricing error on each portfolio conveys information regarding the adequacy of asset pricing models beyond its value. The direction of the correlation between portfolio returns, along with the relative sign of the pricing errors convey useful information in testing the degree of fit in an asset pricing model. If the returns are positively correlated and the pricing errors are of opposite signs, then we are more confident that the pricing errors are evidence of model miss-specification than we would be if the pricing errors had the same sign. A moderately large pricing error can arise by chance, and if the other portfolio's returns are positively correlated, then we should expect a pricing error of the same sign. If a model produces pricing errors of oppose signs in positively correlated portfolios then the model is more likely to be misspecified than an alterative which produced pricing errors of equal magnitude but of the same sign. Although portfolio correlations are incorporated when calculating Zhou's HT specification test, it is more reasonable to use a weighting matrix that accounts for the correlation in returns up front. The parameter estimates of the two- and three-moment C A P M are reported in Table 2.6 for both W (panel A) and the identity matrix (panel B). The economic story told by the point estimates are the same as for the efficient weights: the price of covariance risk is positive in both models but increases as one accounts for asymmetry in returns; and investors demand a small but insignificant premium for bearing coskewness when returns are negatively skewed but are willing to make quite large sacrifices when returns are positively skewed. These conclusions are robust to different portfolio formation strategies. The minimized value of the quadratic form, along with the HT test and its p-value are reported in the first row of each panel in Tables 2.11 and 2.12 for both portfolio formation rules and, respectively, the W and identity matrices. The minimized crite-ria improve quite substantially as one incorporates coskewness. The p-values of the HT statistic indicate that the three-moment C A P M provides a good fit to the data. Although the improvement when using the identity matrix is good, the p-value Essay 2. Conditional Coskewness and Asset Prices. 76 Table 2.6: Two- and Three-Moment C A P M Parameter Estimates: Fixed Weights This table presents the parameter estimates and standard errors from estimating the two and three-moment C A P M models that rely on explicit modelling of market moments. Panel A uses the linear projection based covariance matrix, while panel B uses an identity weighting matrix. Data spans the period July 1963 to December 1997. Panel A: Model Independent Covariance Based Weight. Parameter Constant S & P D i v Term Junk T b i l l F F 2 5 Two-moment M 0.6767 0.1207 0.6708 -0.7884 0.3947 -1.2858 ( 0.2460) ( 0.2807) ( 0.3163) ( 0.2415) ( 0.3011) ( 0.3980) a 4.0620 -0.7833 -0.2385 0.0941 0.3252 0.4798 ( 0.1828) ( 0.2857) ( 0.5144) ( 0.2777) ( 0.4014) ( 0.2739) Three-Moment Mi 0.5418 0.1199 0.9443 -1.0715 0.7292 -2.1434 ( 0.3371) ( 0.5449) ( 0.6207) ( 0.4042) ( 0.9200) ( 0.6324) M2 0.0022 -0.1057 -0.2086 0.1516 0.0036 0.6621 ( 0.6811) ( 0.2976) ( 0.6435) ( 0.4978) ( 0.4121) ( 0.6265) a 4.0822 -0.7810 -0.2633 0.0979 0.3755 0.4486 ( 0.2062) ( 0.2181) ( 0.4910) ( 0.3227) ( 0.3361) ( 0.3104) K -27.9843 22.1151 84.3381 -62.1143 -19.7429 -53.5546 ( 5.1513) ( 13.4067) ( 33.4642) ( 21.5699) ( 8.6061) ( 12.2080) 5 -0.2689 (1.0782) I n d u s t r y Two-Moment ti- 0.5603 0.0632 0.4510 -0.7307 0.5061 -1.2098 ( 0.2251) ( 0.2507) ( 0.2604) ( 0.2282) ( 0.2676) ( 0.3707) er 4.0788 -0.7914 -0.1845 0.0736 0.3612 0.4106 ( 0.1863) ( 0.2857) ( 0.5133) ( 0.2870) ( 0.3988) ( 0.2725) Three-Moment Mi 0.5897 -0.1124 0.2501 -0.6040 0.8609 -1.4474 ( 0.3304) ( 0.5870) ( 0.8354) ( 0.6098) ( 0.5818) ( 0.6440) M2 0.0510 0.1728 0.4239 -0.2556 -0.0716 0.0340 ( 0.2194) ( 0.2485) ( 0.8180) ( 0.6670) ( 0.3003) ( 0.7360) a 4.0756 -0.7939 -0.2533 0.0879 0.3651 0.4522 ( 0.1907) ( 0.2452) ( 0.4498) ( 0.3008) ( 0.3316) ( 0.2890) K -30.1553 18.5196 94.1880 -71.0711 -25.5027 -60.4636 ( 20.3378) ( 13.5652) ( 66.4816) ( 49.1049) ( 17.7766) ( 42.2516) 8 -0.4295 (0.4755) Essay 2. Conditional Coskewness and Asset Prices. 77 Table 2.6 cont'd. Panel B: Identity Weighting Matrix. Parameter Constant S & P Div Term Junk T b i l l FF25 Two-Moment M 0.5542 0.2770 0.2707 -0.1795 0.2848 -0.3445 ( 0.1743) ( 0.1051) ( 0.1541) ( 0.1656) ( 0.1906) ( 0.1851) <7 3.7552 -0.3665 0.2076 0.0102 0.3916 0.3523 ( 0.1388) ( 0.1886) ( 0.2820) ( 0.2065) ( 0.2612) ( 0.2230) Three-Moment Ml 0.5981 -0.3333 -0.3002 -0.5221 0.9890 -1.0407 ( 0.2892) ( 0.4878) ( 0.8931) ( 0.5810) ( 0.5411) ( 0.6558) M2 0.1266 0.2271 0.9206 -0.5832 -0.4412 0.0305 ( 0.2545) ( 0.2226) ( 0.3608) ( 0.3067) ( 0.2887) ( 0.2320) (7 4.1146 -0.5503 0.0804 0.2112 0.0361 0.6621 ( 0.2067) ( 0.2377) ( 0.5518) ( 0.3296) ( 0.4381) ( 0.3060) K -29.0636 22.7603 92.3605 -66.2930 -21.8154 -58.7942 ( 1.1488) ( 2.7714) ( 8.4392) ( 4.7438) ( 2.0016) ( 3.4522) S -0.8115 (0.7295) Industry Two-Moment M 0.1013 0.3241 -0.1171 -0.4811 0.7810 -0.6120 ( 0.2691) ( 0.2838) ( 0.2380) ( 0.3030) ( 0.3594) ( 0.2659) cr 4.2481 -0.4839 0.6010 0.2695 -0.0375 0.3075 ( 0.1955) ( 0.2156) ( 0.3589) Three-Moment ( 0.2246) ( 0.3067) ( 0.2597) Ml 0.7644 -0.1271 -0.4541 -0.4086 1.1927 -1.1633 ( 0.2792) ( 0.3699) ( 0.9270) ( 0.5339) ( 0.5502) ( 0.4692) M2 0.2168 0.3051 1.0370 -0.6889 -0.5821 0.0512 ( 0.1888) ( 0.2208) ( 0.6677) ( 0.4332) ( 0.3334) ( 0.3699) cr 4.0716 -0.5544 0.4129 0.4046 0.0293 0.5151 ( 0.2719) ( 0.3606) ( 0.2852) ( 0.1888) ( 0.3254) ( 0.3333) K -30.3128 21.0458 91.2514 -67.4382 -22.4619 -59.4835 ( 17.2528) ( 11.8012) ( 51.5537) ( 38.0491) ( 12.7004) ( 33.8820) S -1.1254 (0.5550) is only around 2.5 percent in both data sets; yet when using the W matrix, the p-values are much closer to their JT counterparts — in fact the p-value for the industry data jumps from less than one quarter of one percent to over 8 percent. For the reasons elaborated on above we place more weight on the W matrix results. We also consider a specification in which we explicitly model the price of covari-ance and coskewness risk as functions of the information variables and constrain Essay 2. Conditional Coskewness and Asset Prices. 78 them to have the theoretical sign7 as in restrictions in equations (2.15) for the two-moment model, and (2.32) for the three-moment model. This empirical specification requires modelling the mean market return and the price of covariance risk in the two-moment C A P M , requiring 2L parameters. In addition to these requirements, the three-moment C A P M requires modelling the conditional variance of the market and the price of 7 risk, or AL parameters. Table 2.7 reports estimates of the the parame-ters for the price of covariance and coskewness risk, all of which, we can conclude to a high degree of confidence, are non-zero and time varying. This specification provides an impressive fit to the industry data but a more modest fit to the FF25 data set. One major limitation of this approach is that it appears to be miss-specified since it implies inordinate conditional skewness. The average difference between the cubed market residual and the conditional skewness implied by the model is —8.5 x 105, while the sample skewness is only -30, and the lowest cubed residual is —1.6 x 104, and the maximum is 2.9 x 103. The fact that the average error in forecasting the skewness is an order of magnitude larger than the largest negative residual indicates how poorly this specification approximates market skewness. The corresponding value when explicitly modelling the skewness is 0.0026. 2.5 Multi-Factor Asset Pricing An alternative response to the CAPM's failure is to consider multi-factor alterna-tives. Examples of this line of research include Chen, Roll, and Ross (1986), Fama and French (1993), and Jagannathan and Wang (1996)8. These models can be the-oretically motivated following Merton (1973) and Breeden (1979) as intertemporal asset pricing models, where the other factors arise because of price distortions in-duced by hedging demand against changes in the investment opportunity set. An 7 Recall that the standardized price of covariance risk is unambiguously positive and because we standardize by the skewness the price of 7 risk is unambiguously negative. 8 It may be a little uncharitable to calling the conditional C A P M of Jagannathan and Wang (1996) a multi-factor model, since labour growth is included to account for omitted components of market return and the term spread is included to account for time-variation in the pricing equation. However, operationally this model is a three-factor model. Essay 2. Conditional Coskewness and Asset Prices. 79 - H \"0 5 ° •2 S \" d a o o o -s ^ , 'G CO 'G -~ co 03 co 0 S3 s o a <-i 0 a) co cu o 0 o CP -u O 01 &H 0 u cu HH -rf CO 1=1 3 -rf a> i n rf. 8 * 13 03 0) IH W ft c * ^ H J -rf 03 H (H o £ 0) o rt a CO H CP £ +i p a ° rt \"O a a3 rt CP 03 rt 0 3 > C U S3 fl o3 X ) io fl S 03 ^2 fl .2 03 °SH CD o3 a > _ c 03 O fl O o3 '•S a fl <-rf O C •S S a 03 iH 03 PH PH g U G -rf 3 fl ^ ® iH a a a _a o rf x x i-H >H O O || a . I ' 1\" c o O O << s. I O o I I 0 -rf fl 0 ft o co 03 0 ^ Is co o 8 S3 =1 'G CO O O •< - a G 03 O O O ft rf J H 3 G II ^ • « -4-) ~ \" « o °. o ^ CO G 0 03 ft b -a o3 4. a^ \" -2 ^ I CO 0 —i -rf - d G cS - o3 E -a 0 HG ^ § o o o ^ ^ i-H C M ^ C O C M o O J t - O i - H 00 C D C D 00 - C D L O o ai o C O O OJ i - H o 00 o C O O o o i - H i - H —' 1—1 1—1 ••^ —' co C O o O o o oa M CO t - 00 O L O 00 ^ H i - H -sH Oa r H co o 1—\\ o i—< o o »- s c5 § C O o i—H oa •«* co as Tt< H O o o r n L O 00 C O a O O C M C D m i n C N S o o o O O O 0 3 S L O J O o o o o m m N ^ co o o LO M CO o o o o o o o o • i i — i O O - CD O i i H O ) O J O O f_| o o o o O i - H i—i co O i - H o o o o o o d o L O oo C O ^ d o rf o o «o oa !£> IO H ^ 2 j o CD o O CD O o CD d \" L O co °2 O C O C D £ « C D C N C D O O C D C D C D d d C ^ C O 00 C D ^ o oa C D C D T H ^I LO oa 0 o C D d o o d rf C o O O o o 3^ C D L O o 1 ' o S C D oa i - H L O L O C O o o o o d oa oo oa 00 a> C O L O 00 i—I L O L O t--oq co oa o d d oo co o i L O L O co t - co oa o S C D d o o o i C D L O j o O C D d C D oa co L O oa i - H O d d o> i n O) N oo o C O C O b - ->HH i-H O d d CO Ol o o d d CD c u SH C O o d C D N 00 LO oa co L O co oa T-H o o o o o d C D O o i—H i - H r~~\\ Oa C O Tj< o 00 r o co I - H p o CD c5 d ^ c£ co I - H oa co H cn ^« oa o oa o o C D o o d d d CN o i n o L O O C D w co o N C D C D o C D d d o d o o c o O O Essay 2. Conditional Coskewness and Asset Prices. 80 alternative motivation is the arbitrage pricing theory of Ross (1977). Fama and French (1993) has become by far the most popular multi-factor pricing model applied over the past decade. This model uses three factors, HML (high minus low), SMB (small minus big) and the excess market return to price equity. SMB and HML are used in response to the authors finding in an earlier paper (Fama and French, 1992) that sorting stocks based on their size and book-to-market ratio helps explain cross-sectional variation in equity returns. HML is a zero investment portfolio formed by taking a long position in assets with a high book equity to market equity ratio and an equivalent short position in low book-to-market stocks. SMB is similarly formed by taking a long position in small stocks and a short position in big stocks.9 This model has received very wide application in the literature, even being applied by Ibbotson research associates in calculating costs of capital. Suppose there are K factors that are relevant to price individual risky assets. Denote by fjtt the return on the factor mimicking portfolio of the j-th risk factor. The return on any risky asset can be represented as K ri,t = £ A j / j , t + ei,t, (2.55) i=i where is an idiosyncratic error term uncorrelated with the K factor returns. The most common model of this type is Fama and French (1993). Ferson and Harvey (1999, Eq. 2) estimate a conditional model of this type, where the pricing equation is given by K Et-i[ri,t] = zZfojtHu (2.56) where 8itt = Cov.-.fa,., / t X V a r ^ / t ) ) - 1 (2.57) is the if-vector of asset z's conditional betas with respect to each of the factors, and where Var t _i(/ t ) denotes the conditional K x K variance-covariance matrix of the 9 The returns to these portfolios and more specific details on their formation are available on the internet at http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library. html. Essay 2. Conditional Coskewness and Asset Prices. 81 K-vector of time t factor returns, and /j,j>t is the condition expected return to the j - th risk factor. The Fama and French (1993) specification of this model uses three factors: the return on some value weighted stock market index in excess of the risk-free return, the excess return to a portfolio of small stocks over the return to a portfolio of large stocks (SMB), and the excess return to high book-to-market stocks over the return to a portfolio of low book to market stocks (HML), or j G {MKT, SMB, HML}. He et al (1996) present an extension of Harvey's (1989) conditional test. Their approach involves rewriting the moment restrictions of each asset in terms of K covariances and the price of each factor's covariance risk. They use the same \"trick\" discussed above to simplify the covariances which completely eliminates the need to explicitly model any asset specific parameters. The model is estimated using the implied moment conditions on the risky assets: E [(rht - rht(ft - u,t)T\\t) ®Z t _ i ] = 0 L x i for i = 1,..., N, (2.58) along with moment conditions to identify the conditional means of the K factors: E[(ft - fit) ® Zt-x] = 0_,/fxi. (2.59) This specification requires explicit modelling of the conditional means and condi-tional prices of risk for each of the K factors only. There are no asset specific param-eters to be estimated, and there are no explicit restrictions placed on the covariance dynamics other than stationarity. He et al (1996) find that although the muli-factor beta pricing model is rejected by the size and book-to-market portfolios, the model performed significantly better than the conditional two-moment C A P M , but they could still reject the null hypothesis that the model is correctly specified. We reach a similar conclusion using industry portfolios. The results of fitting the FF three-factor model are reported in Table 2.8 and Ta-Essay 2. Conditional Coskewness and Asset Prices. 82 ble 2.9 for fixed weighting matrix. Note that the definition of XMKT in this setting is not sign corrected and can take on negative values. We observe an improvement when including the SMB and HML factors, but that both p-values are still less than two percent. We must conclude that the model is still misspecified. There is, how-ever, strong evidence again supporting a conditional analysis since all three factors estimated using both data sets show strong evidence of time variation. Another interesting result is the fact that adding the Fama and French (1993) factors does not adversely affect the price of covariance risk. The standardized price of beta risk XMKT is unconditionally statistically significantly positive. Adding the extra factors actually serves to moderately increase the intercept in the price of market covariance risk. This suggests that previous studies that reject the significance of market returns in explaining the cross-section of equity returns, such as Fama and French (1993) and Ferson and Harvey (1999), are due to their restrictive assumptions on the dynamics of covariances. Once these restrictions are relaxed, covariance with the market is still significant in pricing the cross-section of equity returns. In spite of the importance of market returns, the other two factors are important to pricing returns in both the industry and size and book-to-market portfolios. 2.6 Three Factors or Three Moments? or Both?? What is the effect of combining a multi-factor and multi-moment asset pricing model? Do we need both the SMB and HML factors and coskewness? To address this issue we consider a pricing model which includes premia for covariance and coskewness risk: Et-\\[ritt] = Covt-i{ritt,rmtt)\\MKT,t + Coskewt_1(ri:t,rrn!t)\\sKEW,t+ Covt-xintJjjXjt, (2.60) where F_ denotes the set of all priced factors excluding the market return. To identify the moment we need the conditional mean on all the factors, including the market, Essay 2. Conditional Coskewness and Asset Prices. 83 13 11 2 s 0 C O 'o 03 o ig -d fl o3 co I - J . CD '•rf 03 o cfl \"o CD ft C O ^ CD - f l -rf -d fl 03 T3 C 03 2 A 03 co ^ fl CD O H cj -d CD -rf fl CD ft CD CJ ° 'C -2 ft < -fl . CJ r-v 03 1 Of 03 rtH CO co o COS CM CM o o CO os CO s CO L O IV oo O T-H 0 0 <=> CO O - II CO LO CO CO o o o o CO W CM <0 CO CM CM 1 — 1 O O O o o w -O CO CM p o p oS o oo o d CM OS CO T H LO CO O S 0 0 CO O t -H r H O d d T-H S o OH O w n L O O o o d d LO CM CO O O o LO O LO o O CM o O CO o t— d o o d CO CO T ° CO CO CO LO CM o o d d CO O o o T—I co o ^ o co fe. co p 0 o d d o d CO CO & H fe o d co CO CM O d LO CO CN ^ H N N M CO m CO O O o o d d o d O O L Q C O ^ i n O i T-H CM _ H T-H O O g O d d o d LO tt co d o LO rft CN O o P d <=> LO £ O CD 0 0 CM CO T-H T-H CN O O h— C7S LO i—I CO 0 0 LO CO ^ o T * co o O O o o o o C q H g , H I S 2 N 2 S S io S N 2 io 23 o 9 o 9 H q o C D o o o d BH CO S5 oo a i i o co I s - T-H IV O O O C O T-H d C D T-H O o ^ d » O * CN t O CN N O ® O ~ \" TT o ^O o oo d d CM CM CM OO iv cd o o o o CO as d CO o o o o IV as CO o 3 CO OO OS o as o T-H CN o o o as o o p o oq o d L O d as o CN LO T-H CO co O l T-H o o d d o co co o L O oo o co 0 0 CN H I H I p p o o ci CD CD ci I I T H T t O CD d o CO CTJ H I CO oq CN o L O CD CM CD CD o CD 0 0 L O CN S CM CM CD 0 O d o d o L O L O jz! O CD d o CD Jg as cjs CN 2 O CD d CD 0 0 T f CO IV § H CN ^ C D O o T H as LO LO CO T j i CO o d LO o d d C D EH 0 0 0 0 ^H T-H 0 0 O T—i LO 0 b- as co o T-H CN iv co p o g p p o cS cS <^ cS cS c5 0 0 CD L O pO L O CM CO N N N CO ° 0 ^ £ O CN 2 CN O CD 0 O o CD O CD o d o CD EH CO Si Essay 2. Conditional Coskewness and Asset Prices. 84 Table 2.9: Parameter Estimates of Fama-French's Three-Factor Model with Fixed Weighting Matrix Parameter estimates and standard errors are reported for the test of the conditional three-moment CAPM where the risk-free rate is not the zero-beta rate. The moment restrictions are: E[(fm,t - fijt) ® Zt l l l = 0 L X 1 E[(n,t - EJ6{F> n,tCjtXj,t) ® zJ-\\] = OJVLXI where ejtt = fjit - (ijt, Hjt = Z~J_xixh and Xjt = Zj^Xj, for factors j e {F} and F are the set of factors, in particular: F = {MKT, SMB, HML). Parameter Constant S&P Div Term Junk Tbill FF25 XMKT 0.0453 0.0220 0.0005 -0.0087 0.0296 -0.0288 ( 0.0158) ( 0.0144) ( 0.0227) ( 0.0149) ( 0.0213) ( 0.0259) XSMB 0.0298 0.0051 0.0523 -0.0222 0.0097 -0.0338 ( 0.0218) ( 0.0221) ( 0.0366) ( 0.0275) ( 0.0369) ( 0.0375) XHML 0.0676 0.0419 -0.0364 0.0325 -0.0143 0.0550 ( 0.0237) ( 0.0231) ( 0.0416) ( 0.0229) ( 0.0335) ( 0.0320) J-statistic 170.6055 (0.0133) Q 0.9662 Industry XMKT 0.0266 -0.0030 -0.0233 -0.0070 0.0491 -0.0160 ( 0.0167) ( 0.0171) ( 0.0263) ( 0.0153) ( 0.0237) ( 0.0262) XSMB 0.0313 0.0299 0.1135 0.0050 -0.0120 -0.0798 ( 0.0269) ( 0.0300) ( 0.0431) ( 0.0301) ( 0.0423) ( 0.0443) XHML 0.0162 -0.0118 -0.0827 0.0275 0.0776 0.0519 ( 0.0283) ( 0.0317) ( 0.0514) ( 0.0238) ( 0.0465) ( 0.0397) J-statistic 118.6759 (0.0076) Q 0.5412 and the conditional variance of the market, along with the following: E[{ritt - r i ; t ( r m j ( - ZJ_1U.MKT)Z7_1XMKT + r^t{e2m^ - (Zj^of^Zj^XsKEwf'-£ ri>t(fjit - Zj^Zl.Xj) ® - 0 L x l for i = 1,..., n, (2.61) where &MKT,t = RMKT,t — ZJ^U-MKT is the error term on the market. Note the plus sign in front of the coskewness term. This is due to the sign correction on the market moments. We don't adjust the signs of the FF factors since they are not theoretically motivated and we therefore don't have any clear behavioural constraints on the sign of the risk price. The specification of the model where the conditional moments are modelled and Essay 2. Conditional Coskewness and Asset Prices. 85 the price of covariance risk and coskewness risk are defined explicitly is inappropriate when there are other priced factors. The reason is analogous to the fact that the price of market covariance is not the ratio of the market's mean to variance, but rather depends also on the covariances between the factors. The proposed methodology avoids this issue by modelling the prices of covariance and coskewness risk as functions of the instruments. Parameter estimates are reported in Table 2.10 and demonstrate the impressive performance of the model. The benefit of adding coskewness over the three-factor model is quite significant, with the most striking results being in the size and book-to-market portfolios. A version of the FF three-factor model where the price of market risk is constrained to be nonnegative has a JT statistic with a p-value of 0.0396 on the FF25 data set. Adding conditional coskewness improves the p-value of the JT test to 0.1240. The industry data also experiences a dramatic improvement but caution should be exercised in interpreting these parameter estimates because they had trouble converging and ended up in an equilibrium where they would bounce from one region of the parameter space to another repeatedly. There is evidence that all factors are priced: the lowest p-value for any statistic is for the industry data and the HML portfolio which is only just significant at the 5 percent level. This is interesting because the SMB portfolio appears to be the stronger of the two FF factors, while previous studies seem to imply that HML is the dominant factor. One possible interpretation of the empirical success of the FF three-factor model is that the factors are proxying for coskewness. The evidence partially supports this conjecture, but not all of the explanatory power of the SMB and HML portfolios is due to coskewness. To see the motivation for this consider the pricing errors when the parameters of various models are estimated using common weighting matrices as in Tables 2.11 for the inverse-residual covariance matrix weights and 2.12 for the identity matrix.1 0 We can compare the pricing improvement induced by the inclusion of the 1 0 The table reports the sum of the terms for different sets of moment restriction and we don't adjust to a common base. Each set of moment conditions includes terms corresponding to the portfolio Essay 2. Conditional Coskewness and Asset Prices. 86 CD .1=1 O w CO fl S3 0 SH PH O „ 0 fl CD fl S * o ^ J3 o 0 cu cu SH .fl H T3 fl re SH O -(J U fa i CU CU SH -3 2 O .5 fa 0 O u cu , f l SH co 2 TO 03 re 0 (21 • • 0 3 O 0 SH o3 c_> 43 a 0 o a o3 5 0 3 o > S3 a) *H -fl -1—> 0 .2 SH G 03 03 _ H 4^ o3 03 c o o3 > S3 o « - U GO a 0 cn O g CU SH i j _ . SH 0 -H a3 0 43 fl 0 fl . 3 O \"0 0 __ 0 C - > \" O \" \" J SH '-3 CN -2 T_ J J 1 § H P L , •S o a _J__ l /J a .2 o o LN] -< co i c 03 0 03 . f l 0 43 +^ o ft >» 43 a 0 43 0 - f l T3 a 03 0 43 4H fl 0 SH 03 ft 0 3 > o .03 T 3 a 03 II g S3 a3 1X1 T3 0 fl 0 cfl 0 ft 0 3 0 43 -a fl o3 0 o 'SH P. 43 0 o3 0 -H3 03 nfl CO 03 II h - ^ 0 ^ o SH 0 i i 11 -H TP ^ CO o T — ' TP LO ^ -1 o co (-o 0 0 CD 0 0 O . TP CO C N C O 0 0 , - H O o CO o LO C O o CO CO CN o LO CO c4 T f CO co T—I LO T - H N l O N 0 0 LO LO O O CN ^ CS ^ co oo co co o o o o o c o o p o ^ S 52 t o O CO o co LO !rt o o d o o' LO CO T—I CO 0 0 CO CO T - H CO p o p o o o 00 0 ) 0 0 0 LO (M T - H CO t-~ o o o ^ T -H _ _ C O g S l O ^ H O o O O o ° O O O O Q O Q ' O H ^ N n o i CN f2 c o S! r5 SI ^ o CN LO o o* d CO LO CO „ p p o 0 o d d> CD <~, d p 2 oo o- g C O C N c o O o T f o o o LO 0 0 o o CN ^ w-, CO rr, t- co 5n o °^ 0 0 CO £q CN O O o o d d a co o O CO r o CO CO LO CN LO T -H o o o o o co 1 ^ S! to L O o C N o O ^ n f S o C N O \" — ' O C O | > . t _ o ^ O C N o C 0 o ° ^ O d o t o ' o c i o H O L O ^ T - H _ _ oo 1^ o T f c o CO CN l>-d LO O 0 c o O CN O 0 0 CN CN T - H CN LO o M s i o n CO O CN CO o o CN O ) C O C O — - - O o p CO d d o o o ^ 52 CO JO O M 3 H - ^ LO > J o p O p CD CD d ^ d o C O CO c o c o c o fr- 1 — 1 03 CN L O CO rt C3 LO T—I P CN P d ° d cd LO IV O TP LO LO T -H O O O o d d 03 CO CO CO o O ( O O l O O O ) P ) ffl CO H H CO 0 0 CO TP — — o ~ o d o d M 22 _5 co cj o o CO o o o o 0 0 CO CO T - H o o o EH CO a; co -c -< Essay 2. Conditional Coskewness and Asset Prices. 87 FF factors after accounting for coskewness risk informally by seeing the improvement in the middle column of each table going from C A P M to FF3 with the corresponding improvement in the right column. When the residual covariance matrix weights are used, the improvements in pricing errors are much smaller. Note the industry data in which ignoring skewness results in a 75 percent improvement to the inclusions of the FF3 factors, while there is only 20 percent improvement after accounting for coskewness. (In the FF25 data this effect is even more pronounced a measly 15 percent improvement after coskewness while a whopping 92 percent improvement without coskewness.) When we consider the identity weights the improvement is much less pronounced, but is qualitatively similar. The improvement from incorporating the SMB and HML factors is more pronounced when using the inverse of the residual covariance matrix than the identity matrix; in fact, when the covariance weights are used, adding the third moment appears to result in a greater improvement than when the other factors are added. This suggests that the three-factor model may be better at pricing more volatile assets, since using the inverse of the covariance matrix results in these assets being given relatively little weight. This is borne out by direct calculation of the pricing errors. The average pricing error on the small-value FF portfolio, which is by far the most volatile, has an average pricing error from the FF three-factor model (unconstrained using the identity weighting matrix) of -0.4021, while the three-factor model produces a pricing error of -0.7896. The error when both SMB and HML are included as factors and we use the constrained specification of the market price of covariance and coskewness risk is only -0.3511. When the inverse of the residual covariance matrix is used as the weighting matrix, the pricing errors are -0.5678, -0.4166 and -0.4702. and managed portfolio returns, and moments used to identify the extra moment conditions. One estimation approach, the two-stage estimation strategy of Ogaki (1993), would be to exactly fit the non-asset specific moments which is sufficient to identify the parameters (except in the three-moment C A P M with explicit moments) and then compared the pricing errors. Results are very similar when simply using the weighted sum of the portfolio pricing errors. Essay 2. Conditional Coskewness and Asset Prices. 88 Table 2.11: Comparing Various Specifications of the Asset Pricing Models We compare the performance of the two-moment and three-factor asset pricing models both with (first column) and without skewness priced (last column). Results are reported that initially don't constrain the price of market risk and then constrain the price of market covariance and coskewness risk to be theoretical. We report the minimum quadratic form using the sample covariance matrix as weights. For each model we report the value of the criterion function, the iJ-statistic (in square brackets) and the p-value of the //\"-statistic (in parenthesis). Model Without Skewness W i t h Skewness FF25 Modelling Moments 1.1745 0.9436 [199.1673] [172.6520] ( 0.0045) ( 0.0461) Unconstrained C A P M 1.1415 0.9591 [194.1116] [176.8996] ( 0.0034) ( 0.0142) F F 3 0.9662 0.8042 [169.9643] [151.6040] ( 0.0145) ( 0.0598) Constrained C A P M 50.8843 0.9892 [180.9782] [190.7102] ( 0.0200) ( 0.0020) F F 3 4.2297 0.8297 [157.0165] [160.8504] ( 0.0678) ( 0.0196) Industry Modelling Moments 0.7106 0.4914 [147.7852] [114.9054] ( 0.0021) ( 0.0805) Unconstrained C A P M 0.6587 0.5087 [141.9254] [119.2065] ( 0.0016) ( 0.0214) F F 3 0.5412 0.4256 [118.8967] [101.5544] ( 0.0073) ( 0.0379) Constrained C A P M 3.0059 0.4961 [128.1441] [128.8336] ( 0.0158) ( 0.0046) F F 3 0.6958 0.3910 [119.5170] [106.7707] ( 0.0066) ( 0.0170) Essay 2. Conditional Coskewness and Asset Prices. 89 Table 2.12: Comparing Various Specifications of the Asset Pricing Models: Equally Weighted Moments We compare the performance of the two-moment and three-factor asset pricing models both with (first column) and without skewness priced (last column). Results are reported that initially don't constrain the price of market risk and then constrain the price of market covariance and coskewness risk to be theoretical. We report the minimum quadratic form using the identity matrix for weights . For each model we report the value of the crite-rion function, the //-statistic (in square brackets) and the p-value of the //-statistic (in parenthesis). Model Without Skewness W i t h Skewness F F 2 5 Modelling Moments 5.9310 3.6926 [197.8664] [178.6753] ( 0.0054) ( 0.0230) Unconstrained C A P M 3.7823 3.0434 [206.2016] [219.2010] ( 0.0005) ( 0.0000) F F 3 1.1154 0.9961 [180.3696] [164.1818] ( 0.0033) ( 0.0126) Constrained C A P M 8.2715 4.5353 [199.3245] [202.0209] ( 0.0016) ( 0.0003) F F 3 1.2221 1.0024 [170.9144] [166.0521] ( 0.0128) ( 0.0097) I n d u s t r y Modelling Moments 3.2959 2.5254 [145.0307] [123.7446] ( 0.0033) ( 0.0254) Unconstrained C A P M 2.9232 1.8271 [145.6653] [117.6317] ( 0.0008) ( 0.0269) F F 3 1.7591 1.3668 [141.3509] [133.0451] ( 0.0001) ( 0.0001) Constrained C A P M 5.7860 3.1402 [138.1839] [127.1349] ( 0.0031) ( 0.0061) F F 3 1.4199 1.3665 [117.3259] [105.1355] ( 0.0096) ( 0.0220) Essay 2. Conditional Coskewness and Asset Prices. 90 2.7 Further Specification Tests The JT and HT tests are useful for testing for general model miss-specification, but they have very low power to detect other forms of miss-specification. In this section the asset pricing models are exposed to two specification tests: testing for time varying intercepts and parameter stability. 2.7.1 Time Varying Alphas A well specified asset pricing model will have zero intercept, and our specifying a set of moment conditions that exclude an intercept means that the null hypothesis that the average pricing errors are zero, another way of stating a hypothesized zero intercept, is incorporated in the JT and HT tests. However, Ferson and Harvey (1999) present evidence that both the basic C A P M and Fama and French (1993) three-factor asset pricing model have a non-zero time varying alpha.1 1 This finding that the conditioning information is important for the intercept is robust to allowing the betas to be time varying as a linear function of a set of information variables. We test for a time-varying intercept by augmenting each of the pricing equations by the term —Zj^oti, for example the two-moment C A P M in equation (2.12) becomes: for each asset i. To keep the number of parameters to a minimum we test each asset individually holding all other asset's alpha's at the hypothesized value of zero. An alternative, but very closely related, interpretation of this test is to see if the functional form of the conditional asset pricing model is sufficiently flexible to capture all the predictive ability of the information variables. If the model is misspecified then there will be residual predictive ability left in the information variables even after accounting for the time variation in prices of risk. The very simple linear format of the time varying intercept acts as a first-order approximation to detect this residual 1 1 To conform with historical tradition we will preserve the custom in finance of referring to the intercept as alpha to the factor betas. E[(rht - ZUoti - ri,t(rm,t - Zj^u.) -) ® Zt-i] = OLXI Essay 2. Conditional Coskewness and Asset Prices. 91 explanatory power. The results for this test are presented in Table 2.13. These tables confirm the findings of Ferson and Harvey (1999) that in both the basic two-moment C A P M and the three-factor model there is strong evidence of a time varying intercept. We present individual Wald tests and the Hochberg (1988) lower bound on the p-value for the hypothesis that at least one of the alpha's is non-zero. This is constructed as P H o = h b . r s = .e{minN)](PjN/j) where j is the index of the sorted portfolio p-values. The Hochberg lower bounds are less than 0.05 for both the C A P M and three-factor models in both data sets. These results extend the results in Ferson and Harvey since this methodology relaxes the restrictions on conditional covariances implies by the linear beta assumption. Our setup allows the covariances to follow some arbitrary process. However, we must model the price of covariance risk as a linear function, while Ferson and Harvey's approach makes no restrictive assumptions. Which approach is better is an empirical question, though they are certainly complementary. An interesting result, however, is that including the effect of conditional coskew-ness on asset prices in the three-moment and three-factor and three-moment models increases the Hochberg bound dramatically, in fact only the three-moment model on the industry portfolios is lower than 0.25 and this is still greater than 0.075. This ev-idence indicates the need to include coskewness in both single factor and multi-factor asset pricing models. 2.7.2 Structural Breaks A second type of miss-specification that is not readily detected using the goodness-of-fit statistics is parameter instability. Andrews (1993) and Andrews and Ploberger (1994) present Lagrange multiplier (LM) based tests, which are in the spirit of the famous Chow test, for a structural break in a data set. Essay 2. Conditional Coskewness and Asset Prices. 92 Table 2.13: Lagrange Multiplier Tests for Time-Varying Intercepts: In-dustry Portfolios This table reports the Lagrange multiplier statistics for testing the null hypothesis that there is no intercept against the alternative that the intercept is a linear function of the information variables. The LM statistics are distributed as chi-squared random variables with 6 degrees of freedom and their p-values are reported below the statistic in parenthesis. Also reported in the infimum Hochberg bound of the individual p-values. Results are reported for the two-moment, three-moment, three-factor and three-factor-three-moment asset pricing models using the Industry portfolios in the period 1963-1997. Portfolio Two-Moment Three-Moment Three-Factor Three-FactorMoment Industry 1 10.6036 10.0800 11.0048 9.7014 ( 0.1014) ( 0.1213) ( 0.0882) ( 0.1378) Industry 2 13.1150 6.9136 7.5776 5.3945 ( 0.0412) ( 0.3289) ( 0.2707) ( 0.4943) Industry 3 20.9412 16.0088 19.5119 13.6062 ( 0.0019) ( 0.0137) ( 0.0034) ( 0.0344) Industry 4 7.9695 10.3293 6.0188 2.3467 ( 0.2403) ( 0.1115) ( 0.4211) ( 0.8852) Industry 5 5.1760 5.1306 2.6540 2.7170 ( 0.5214) ( 0.5272) ( 0.8508) ( 0.8434) Industry 6 8.9895 9.6399 6.5468 7.7110 ( 0.1742) ( 0.1407) ( 0.3648) ( 0.2601) Industry 7 13.1521 9.0886 9.5094 10.0333 ( 0.0407) ( 0.1687) ( 0.1469) ( 0.1233) Industry 8 6.3826 12.0407 4.8706 4.8417 ( 0.3817) ( 0.0611) ( 0.5605) ( 0.5643) Industry 9 18.1409 13.2747 21.7929 10.1937 ( 0.0059) ( 0.0389) ( 0.0013) ( 0.1167) Industry 10 5.4902 4.1805 2.7807 3.8075 ( 0.4826) ( 0.6523) ( 0.8358) ( 0.7027) Industry 11 10.3235 7.2256 11.4084 6.8049 ( 0.1117) ( 0.3005) ( 0.0765) ( 0.3393) Industry 12 4.7659 8.1100 1.0273 5.4209 ( 0.5742) ( 0.2302) ( 0.9846) ( 0.4911) Industry 13 8.4972 10.0979 9.9110 6.8785 ( 0.2039) ( 0.1206) ( 0.1285) ( 0.3322) Industry 14 18.0361 18.8412 23.4807 12.1303 ( 0.0061) ( 0.0044) ( 0.0007) ( 0.0591) Industry 15 7.1690 10.6791 9.5901 4.9944 ( 0.3055) ( 0.0988) ( 0.1430) ( 0.5445) Industry 16 14.8915 16.5567 17.2720 5.1643 ( 0.0211) ( 0.0111) ( 0.0083) ( 0.5229) Essay 2. Conditional Coskewness and Asset Prices. 93 Table 2.13 cont'd Portfolio Two-Moment Three-Moment Three-Factor Three-Factor Moment Industry 17 6.4113 6.3007 3.9249 2.3481 ( 0.3787) ( 0.3904) ( 0.6868) ( 0.8851) Hochberg p-value 0.0320 0.0755 0.0111 0.4685 S1-BM1 22.1406 16.4952 22.3551 13.1402 ( 0.0011) ( 0.0113) ( 0.0010) ( 0.0409) S1-BM2 6.7638 2.9703 6.1124 5.2063 ( 0.3432) ( 0.8126) ( 0.4107) ( 0.5176) S1-BM3 6.1130 2.9887 3.4420 5.6158 ( 0.4107) ( 0.8103) ( 0.7517) ( 0.4676) S1-BM4 6.3021 3.2207 3.4195 1.3756 ( 0.3902) ( 0.7807) ( 0.7546) ( 0.9673) S1-BM5 14.6252 9.7587 11.6413 12.1495 ( 0.0234) ( 0.1352) ( 0.0705) ( 0.0587) S2-BM1 7.9239 4.5331 4.5531 5.4648 ( 0.2437) ( 0.6049) ( 0.6023) ( 0.4857) S2-BM2 8.3558 4.2245 4.4663 7.9057 ( 0.2132) ( 0.6463) ( 0.6138) ( 0.2451) S2-BM3 13.5548 8.5206 11.0264 5.5380 ( 0.0350) ( 0.2024) ( 0.0876) ( 0.4769) S2-BM4 8.5892 7.4460 6.7147 8.2211 ( 0.1980) ( 0.2816) ( 0.3480) ( 0.2223) S2-BM5 6.0321 4.1734 6.5455 5.9447 ( 0.4196) ( 0.6532) ( 0.3649) ( 0.4294) S3-BM1 6.5145 4.9357 2.7984 2.2234 ( 0.3681) ( 0.5521) ( 0.8337) ( 0.8980) S3-BM2 11.2162 7.4614 5.9052 10.6691 ( 0.0819) ( 0.2803) ( 0.4339) ( 0.0992) S3-BM3 8.1630 4.1404 8.7748 3.3674 ( 0.2264) ( 0.6577) ( 0.1866) ( 0.7615) S3-BM4 9.1814 9.7761 8.5816 6.9000 ( 0.1636) ( 0.1344) ( 0.1985) ( 0.3302) S3-BM5 6.6663 4.3108 4.2334 3.6476 ( 0.3528) ( 0.6347) ( 0.6451) ( 0.7242) S4-BM1 11.6177 9.4498 5.9722 10.2882 ( 0.0711) ( 0.1498) ( 0.4263) ( 0.1130) S4-BM2 8.8617 6.7627 3.6363 2.1194 ( 0.1815) ( 0.3433) ( 0.7258) ( 0.9084) S4-BM3 7.5861 4.3838 4.6804 5.5486 ( 0.2700) ( 0.6249) ( 0.5854) ( 0.4756) S4-BM4 10.6667 12.9748 7.1115 8.1142 ( 0.0992) ( 0.0434) ( 0.3107) ( 0.2299) Essay 2. Conditional Coskewness and Asset Prices. 94 Table 2.13 cont'd Portfolio Two-Moment Three-Moment Three-Factor Three-FactorMoment S4-BM5 9.0132 7.2476 8.8775 6.8056 ( 0.1728) ( 0.2986) ( 0.1806) ( 0.3392) S5-BM1 8.7597 6.8500 7.7423 8.6823 ( 0.1875) ( 0.3349) ( 0.2576) ( 0.1922) S5-BM2 5.1046 5.0303 3.4422 6.8210 ( 0.5305) ( 0.5399) ( 0.7516) ( 0.3377) S5-BM3 7.5406 8.1133 4.2437 4.3839 ( 0.2737) ( 0.2299) ( 0.6437) ( 0.6249) S5-BM4 5.7872 6.4360 5.4386 2.2115 ( 0.4474) ( 0.3762) ( 0.4889) ( 0.8993) S5-BM5 8.2949 5.7985 6.0917 4.0124 ( 0.2173) ( 0.4461) ( 0.4130) ( 0.6750) Hochberg p-value 0.0285 0.2832 0.0261 0.7064 The LM statistic does not have a standard distribution because under the null hy-pothesis of no structural breaks, or that the parameters before and after the break are equal, and an unknown change point then the break point is an unidentified parame-ter and this causes the statistic to have a non-standard distribution. Andrews (1993) presents a test based on the supremum of the LM statistics over all possible break points (known as the sup LM test) and derives the limiting asymptotic distribution which is a function of integrals of Brownian motions. Because this distribution has no simple closed form expression, Andrews tabulates the critical values for different numbers of parameters. This statistic is based on only one break point and therefore misses some important information contained in the values of the LM statistic at all other break points. Andrews and Ploberger (1994) provide an extension to the sup LM test by deriving the asymptotic distribution of an LM statistic which is calculated as an average over all break points. Two special cases are tabulated in Andrews and Ploberger - the avg LM and the exp LM statistics which are a simple average and an exponentially weighted average over the LM statistics for each given break point. In Table 2.14 we present the values of these three LM tests for a range of asset pricing models. The results suggest weak evidence that the two-moment C A P M contains a structural break, and no evidence that models which include coskewness and the SMB and HML factors have a structural break. Essay 2. Conditional Coskewness and Asset Prices. 95 T h i s weak evidence is at odd's w i t h Ghysels (1998) who finds that even models w i t h impl i c i t beta's (or models such as ours that do not require expl ic i t mode l l ing of beta's) exhibi t s t ructural breaks. One reason for these inconsistent findings is that out approach does not require any asset specific parameters to be estimated, while Ghysels (1998) approach does. Table 2.14: Testing for Structural Breaks Andrews (1993) and Andrews and Ploberger (1994) sup L M , exp LM and avg LM tests for structural breaks are reported for each model estimated. The naming convention refers to two- and three-moment C A P M when modelling market moments, and con. (for constrained) three-moment C A P M when using the sign corrected price of covariance and three-moment-three-factor includes the F F factors and coskewness with explicitly modelled prices of risk. * indicates significance at the 0.10 level, ** significance at the 0.05 level, and * * * significance at the 0.01 level. Model s u p L M exp LM a v g L M FF25 Two-Moment C A P M 28.9948* 12.8685 10.6899* Three-Moment C A P M 39.9903 21.5566 16.7389 Three-Factor 48.3197 27.0480 20.7299 Con. Three-Moment C A P M 40.7286 23.2528 17.4653 Three-Moment-Three-Factor 66.4202 41.1771 30.4702 Industry Two-Moment 33.8269** 14.2092 12.6778** Three-Moment 32.8856 19.0434 13.8143 Three-Factor 46.5592 24.5720 19.6037 Con. Three-Moment C A P M 40.9074 23.9948 17.2881 Three-Moment-Three-Factor 68.6896 40.9831 30.9462 2.8 Comparison with Dittmar (2001) Of the existing literature on nonlinear asset pricing, the closest in spirit to the current research is Dittmar (2002), however our results broadly support the three-moment C A P M as an adequate characterization of the cross-section of equity returns is at odds with Dittmar's (2001) results rejecting the three-moment C A P M . Dittmar (2002) empirically tested a pricing kernel that is a polynomial in market returns. This polynomial is motivated as an approximation to the true pricing kernel which in representative agent economy is that agent's intertemporal marginal rate of Essay 2. Conditional Coskewness and Asset Prices. 96 substitution. This polynomial approximation is closely linked with nonlinear asset pricing models: a first-order polynomial is equivalent to the mean-variance C A P M , the second-order polynomial is equivalent to a three-moment asset pricing model, while adding the cubed market return incorporates preference for the fourth moment of asset returns. The highest polynomial considered was cubic since we can sign preferences of \"reasonable\" agents out to the fourth moment. To proxy for the return to wealth, he includes as factors the return on the CRSP value-weighted stock market index and, following Jagannathan and Wang (1996), the lagged moving average smoothed monthly growth rate of labour income. The linear pricing kernel is related to the basic C A P M , the quadratic pricing kernel to the three-moment C A P M , and the cubic pricing kernel to a four moment C A P M . Under certain assumptions about the aggregate investor's preferences we can impose constraints on the sign of the coefficients on the returns to wealth raised to different powers in the stochastic discount factor (or SDF). 1 2 Dittmar (2002) finds a number of interesting results. When only including the return on the stock index in the return to wealth, all asset pricing models up to and including the fourth moment are rejected by the data. However, by adding labour growth rates, the empirical performance can be dramatically improved such that the p- value for the Hansen-Jagannathan distance statistic increases (to approximately 23 percent). However, in the current essay we find that the three-moment C A P M adequately prices the cross-section of equity returns. Why do these two approaches yield such contradictory results? Virtually all asset pricing theories, whether statements of general equilibrium or the law of one price, can be represented as a stochastic discount factor (see Cochrane (2001)). A stochastic discount factor is a random variable £t such that all asset prices 1 2 Note that Dittmar (2002) uses aggregate investor's aversion to kurtosis to sign the coefficient on cubed market returns and labour growth rate. As previously argued, although we have sound economic reason to think that investors are averse to kurtosis, to our knowledge there are no results proving that aversion to the fourth moment aggregates. Essay 2. Conditional Coskewness and Asset Prices. 97 satisfy Et-1[(l + Rt)Zt] = l (2.62) and that = (l + Ru)-1 (2.63) for the time t conditionally risk-free asset Rf. The G M M technique is an extremely powerful tool for estimating stochastic discount factors. See Cochrane (1996, 2001) and Jagannathan and Wang (1996,2001) for discussions and examples of this method-ology. Just as the two-moment C A P M has a SDF representation, so too does the three-moment C A P M , as demonstrated in Harvey and Siddique (1999) and the appendix. Consider the following random variable, which we will proceed to demonstrate is stochastic discount factor which is equivalent to the empirical version of the three-moment C A P M discussed above: & = Co.t + Ci,t(rm,t - fit) + C 2 , t « t - cr2t), (2.64) where C0tt = {l + Rfit)-1 ; u *Kl + Rf,t) U T KUI+ Rf,tY This demonstrates the very close link between the co-moment methodology used in this essay with the SDF methodology. This variable is derived by taking the pricing constraint implied by the three-moment C A P M (2.26) and dividing through by the gross risk free return (1 + R/t), which gives Et-i[riitZt] = 0. (2.65) Dividing through by the gross conditional risk-free return does not affect the con-Essay 2. Conditional Coskewness and Asset Prices. 98 ditional expectation of the pricing error, and so long as the mean and variance are unbiased then the expectation of the pricing kernel in equation (2.64) is Et-i[Zt] = (o,t + Ci,tEt-i(rm>t - lit) + (2,tEt-i(e2mtt - of) = (1 + Ru)~l (2.66) which is a necessary condition of the pricing kernel. Taken together, equations (2.65) and (2.66) verify that £ t, as defined in equation (2.64), is indeed a stochastic discount factor. Finally, taking the unconditional expectation of both sides and applying the law of iterated expectations gives the unconditional excess return stochastic discount factor representation: E[n£t] = o. Consider now the calculation of E t _i [ ( l + Rf,t)€t\\- Since the gross risk-free return appears in the denominator of every term of £ t, we have 2J___[(1 + RfM = ^ - i [ l - (rm,t ~ Ht)^- + - a2)^f)] = 1 (2.67) at Kt since Et-\\{rm,t — Mt) = Et_\\(e2mt — a2) = 0. Add to this each side of (2.65) and take the unconditional expectation of each side to obtain: Et-^1 + Ri,t)£t] = 1, (2.68) which enables direct comparison with Dittmar (2002) who considers the gross return specification of a pricing kernel which (among other things) is a quadratic function of gross market returns. One can verify by simple algebra that £ t is similarly a quadratic function of market returns, but that the coefficients have very specific values related to the conditional moments of market returns. In this representation of the three-moment CAPM's pricing kernel we must model four terms: the price of covariance and coskewness risk, and the mean and variance of excess market returns. In Dittmar (2002) there are only three terms to be modelled: the coefficients on each of the terms in the quadratic equation. Even though the conditional beta method discussed Essay 2. Conditional Coskewness and Asset Prices. 99 previously pins down the coefficients of the pricing kernel, the added flexibility granted by the extra parameter dramatically improves the models performance; more than making up for the decreased degrees of freedom in the JT statistic. Furthermore, even though we no longer impose the moment conditions on the mean, variance and skewness, as will be shown below by including the risk-free asset as a security to be priced we effectively impose the mean and variance pricing restrictions. Dittmar (2002) takes a different root to testing a SDF which incorporates preference for skewness (and also kurtosis) by defining a p-th order polynomial in the market return: p & = % , l + £ 7 7 M r m , i ( 2 - 6 9 ) 1=1 where Vi,t = Si(Zj_1T]i)) are the time varying coefficients, which incorporates conditional pricing, and is a random variable taking one for even i and negative one for odd i, accounting for the preference for mean and skewness and aversion to variance and kurtosis. The first reason that this study supports the three-moment C A P M while Dittmar (2002) does not is that the SDF representation of the three-moment C A P M which involves explicit modelling of market moments provides a better fit to the the data than the polynomial approximation. We demonstrate this point by fitting the two-and three-moment C A P M SDF approximations that explicitly model the moments of market returns and the linear, quadratic and cubic polynomial approximations. The parameters are estimated using G M M and two different model independent weighting matrices to ensure direct comparisons can be made. The first is an identity matrix and the second is the inverse of the second moment matrix as advocated by Hansen and Jagannathan (1991). Table 2.15 reports the value of the objective function, along with the p-value of the Zhou's (1994) HT specification test. We follow Dittmar and consider the industry portfolios13 and augment them with the conditionally risk free asset. The estimation 1 3 Although Dittmar used 20 portfolios our results only use 17 industry portfolios. Essay 2. Conditional Coskewness and Asset Prices. 100 uses the same information variables as i n previous sections of this essay which includes the default spread which is not one of D i t tmar ' s variables. A s can be seen by equation (2.67) the effect of including the gross risk-free return is to tie down the condi t ional mean and variance to the market returns, so we w i l l continue to interpret these parameters in this fashion. Table 2.15: Stochastic Discount Factor Estimation of Industry Data The value of the minimized criterion function for the multi-moment asset pricing models in S D F form are presented for two arbitrary weighting matrices: the identity matrix and the inverse of the second-moment matrix. The table also reports the p-values of the HT test of Zhou (1994). We report the S D F version of the two- and three-moment C A P M models and the first- through third-order Taylor series expansion of Dittmar (2002). Model Identity Mat r ix Second Moment Mat r ix Two-Moment 0.3501 0.5369 ( 0.0026) ( 0.0053) Three-Moment 0.2026 0.4855 ( 0.0091) ( 0.0076) First-Order 0.3095 0.5662 ( 0.0077) ( 0.0091) Second-Order 0.2651 0.5587 ( 0.0060) ( 0.0083) Third-Order 0.2314 0.5390 ( 0.0077) ( 0.0114) T h e two most important points to take from this table are that the three-moment C A P M approach provides for a much better pr ic ing fit than even the cubic pr ic ing kernel, which is consistent w i t h a four moment C A P M ; and that despite this superior pr ic ing performance, the p-values of the specification tests s t i l l indicate that the model be rejected by the data. T h e two most important differences between the model l ing is the number of param-eters and the shape of the mapping from information to coefficients. T h e moment model l ing approach has 4 L + 1 parameters while the quadratic po lynomia l specifica-t ion has only 3 L parameters. T h e second difference is the relationship between the parameters and coefficients in the po lynomia l . B y expand the moment model l ing S D F and collect terms in r m j 4 and t we observe that a l l the coefficients i n the po lynomia l depend on a l l parameters while the po lynomia l approach only uses L parameters on Essay 2. Conditional Coskewness and Asset Prices. 101 each term. These two effects: the extra L +1 parameters and the explicit dependence of each coefficient on all parameters, drives the improved fit. 2.9 Conclusions This essay develops a methodology for testing nonlinear asset pricing models in the spirit of the conditional tests of the C A P M and multi-factor pricing models of Harvey (1989) and He et al (1996). To keep the number of parameters to a bare minimum we use a transformation of the conditional co-moments to avoid the need to model any asset specific parameters. The model is very parsimonious: we are able to test the restrictions of the theory while estimating the bare minimum number of parameters, all of which are common to all assets in the economy, and we avoid making any restrictive assumptions on the dynamics of co-moments, the joint distribution of returns and factors, or the exact form of heterosckedasticity. These latter two benefits derive from our use of G M M . The resulting moment conditions are very close in form and spirit to the SDF methodology. The reason for this is that we write all terms in the pricing equation as a product between the relevant co-moment and the price of that co-moment's risk. The co-moment is then rewritten as the product between the return on the asset and a term which includes the return on the factor and various conditional moments of the factor. When we combine this setup with the generalized method of moments estimation we end up with a representation that looks very much like an SDF. In fact we can turn the co-moment restrictions into a SDF moment restriction by simply dividing through by the gross conditional risk-free return. The key difference between the two approaches is that the co-moment approach directly links the price of risk to the moments of market returns as implied by the asset pricing theory, while the SDF method does not. The essay documents a number of interesting and important empirical observations. First, we show that standard information variables predict portfolio returns, and explain time variation in covariance and coskewness. To verify our simplification of Essay 2. Conditional Coskewness and Asset Prices. 102 the conditional covariance and coskewness, we compare the rewritten form of the co-moments (i.e. the specifications that avoid modelling the conditional mean return on the individual assets) with their traditional specification, and we cannot reject the hypothesis that they are equivalent. This lends credence to the validity of our pragmatic modelling choice. We also find that the three-moment C A P M is broadly consistent with the data. Although the data soundly rejects the two-moment C A P M , the three-moment C A P M has goodness-of-fit statistics that are not rejected at conventional levels (about the five percent level) and that the price of risk, in addition to co-moments, are time-varying. The parameter estimates indicate that investors care about covariance risk and also coskewness risk when the market is positively skewed, but are only marginally averse to negative skewness. We also find that including a cross-section of equity returns contributes to the precision with which we can estimate the time-series dynamics of the markets variance and skewness. We also fit the three-factor asset pricing model due to Fama and French (1993) which has become an industry standard, and find that, depending on the choice of weighting matrices, the improvement due to adding the extra two factors SMB and HML is similar to the improvement that results from including skewness; and that the increased explanatory power due to the FF factors is attenuated after accounting for conditional coskewness. Interestingly, the three-factor model does not appear to capture all the predictive ability of the information variables, while the three-moment model does. The two-moment C A P M is rejected as having stable parameters, while both the three-factor and three-moment models have stable parameters. There does appear to be some usefulness of the FF factors in explaining returns over coskewness, and that coskewness retains its explanatory power over the FF factors also. These results are at odds with Dittmar's (2001) rejection of the three-moment C A P M using an SDF framework. The essay demonstrate that there are two ba-sic reasons for this difference. First, the three-moment specification employed has more parameters than the comparable SDF setup, and the functional form mapping Essay 2. Conditional Coskewness and Asset Prices. 103 the information variables to polynomial coefficients is significantly more flexible. Sec-ond, there appears to be gains to including the moment restrictions linking the prices of risk to market moments since the co-moment specification is not rejected by the industry data, but the SDF method is. The fundamental conclusion reached by this research is twofold: • That coskewness is important for pricing the cross-section of equity returns. • That both risk, including covariance and coskewness risk, and the prices of risk are strongly time varying. The great flexibility our chosen specification and econometric methodology have af-forded us in drawing these robust conclusions comes at a price. Because we don't have an explicit representation for the dynamics of conditional covariance and coskewness we are unable to make predictions about returns in the normal sense. Future research should attempt to specify the dynamics of covariances and coskewness so we can use this model for estimating costs of capital. 2.A Appendix: Conditional Asset Pricing The three-moment C A P M , as developed by Kraus and Litzenberger (1976), is a static, one-period representative agent model, just as is the basic two-moment C A P M . However, it is relatively straightforward to reconcile a conditional two-moment C A P M by specifying the stochastic discount factor in the economy. It is well know that in an economy that does not permit arbitrage, there will exist a random variable mt such that where Ritt is the return (and not excess return) on any asset i.The pricing kernel mt is the same across all assets in the economy. The conditional two-moment C A P M is equivalent to a pricing kernel that is linear in the market return E ( ( l + Rit)mt) = 1 (2.70) mt = 80tt + 5lttR, \"m,t (2.71) Essay 2. Conditional Coskewness and Asset Prices. 104 where