Filtering in Asset PricingbyAlberto RomeroB.Sc., ITAM, 2002M.Math. University of Waterloo, 2006a thesis submitted in partial fulfillmentof the requirements for the degree ofDOCTOR OF PHILOSOPHYinthe faculty of graduate and postdoctoral studies(Business Administration)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2013? Alberto Romero, 2013AbstractIn the first chapter of this thesis, I propose a nonlinear filtering method to estimate latentprocesses based on the Taylor series approximations. The filter extends conventional methodssuch as the extended Kalman filter or the unscented Kalman filter and provides a tractableway to estimate filters of any order. I apply the filter to different models and demonstratethat this method is a good approach for the estimation of unobservable states as well as forparameter inference. I also find that filters with Taylor approximations can be as accurate asconventional Monte Carlo filters and computationally more efficient. Through this chapter Ishow that filters with Taylor approximations are a good approach for a number of problems infinance and economics that involve nonlinear dynamic modeling.In the second chapter, I investigate the recently documented, large time-series variation inthe empirical market Sharpe ratio. I revisit the empirical evidence and ask whether estimates ofSharpe ratio volatility may be biased due to the limitations of the standard ordinary least squares(OLS) methods used in estimation. Based on simulated data from a standard calibration of thelong-run risks model, I find that OLS methods used in prior literature produce Sharpe ratiovolatility five times larger than its true variability. The difference arises due to measurementerror. To address this issue, I propose the use of filtering techniques that account for the Sharperatio?s time variation. I find that these techniques produce Sharpe ratio volatility estimates ofless than 15% on a quarterly basis, which match more closely the predictions of standard assetpricing models.iiPrefaceThis dissertation is original, unpublished, independent work by the author, Alberto Romero.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Filtering via Taylor Series Approximations . . . . . . . . . . . . . . . . . 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.1 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Nonlinear Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Filtering Based on Taylor Series Expansions . . . . . . . . . . . . . . . . . . . 72.3.1 Gaussian Densities for Filtering . . . . . . . . . . . . . . . . . . . . . . 92.3.2 Taylor Series Approximations . . . . . . . . . . . . . . . . . . . . . . . 102.3.3 Calculation of the Moments of a Multivariate Normal Distribution . . 122.3.4 Rao-Blackwellisation for Filtering with Taylor Series Approximations . 172.4 Quasi-Maximum Likelihood Parameter Estimation . . . . . . . . . . . . . . . 182.4.1 Quasi-Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . 192.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.1 Stochastic Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . 202.5.2 Risk and Return Model . . . . . . . . . . . . . . . . . . . . . . . . . . 262.5.3 A Dynamic Stochastic General Equilibrium Model . . . . . . . . . . . 322.6 Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.6.1 Highly Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 39iv2.6.2 Multivariate Stochastic Volatility Models . . . . . . . . . . . . . . . . 412.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.8 Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 On the Volatility of the Market Sharpe Ratio . . . . . . . . . . . . . . . . 683.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.1.1 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.2 Sharpe Ratios in Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 723.3 The Long-Run Risks Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.3.1 Implications for Expected Returns, Volatilities and Conditional SharpeRatios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.4 Sharpe Ratios Simulated from Structural Models . . . . . . . . . . . . . . . . 763.4.1 Predictive Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.4.2 Filtering and Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 803.4.3 Other Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.4.4 External Habit Formation Model . . . . . . . . . . . . . . . . . . . . . 843.5 Sharpe Ratios Estimated from Reduced Form Models . . . . . . . . . . . . . 843.5.1 Brandt and Kang (2004) . . . . . . . . . . . . . . . . . . . . . . . . . . 853.5.2 Implied Sharpe Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.5.3 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.5.4 Parameter Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.5.5 Expected Returns, Volatilities and Sharpe Ratios . . . . . . . . . . . . 883.6 Implications for Portfolio Choice . . . . . . . . . . . . . . . . . . . . . . . . . 923.6.1 Portfolio Optimization: One Risky Asset . . . . . . . . . . . . . . . . 923.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943.8 Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115A Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126A.1 Efficient Calculation of Derivatives of Composite Functions . . . . . . . . . . 126A.2 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128A.3 Standard Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131A.4 The Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 132A.5 The Unscented Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 133A.6 The Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136A.6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136vA.6.2 Sampling Importance Resampling . . . . . . . . . . . . . . . . . . . . 137A.7 Quasi-Maximum Likelihood Standard Errors . . . . . . . . . . . . . . . . . . 139A.8 Calculation of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140B Appendix to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141B.1 Sharpe Ratios in Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 141B.2 The Solution to the Long-Run Risks Model . . . . . . . . . . . . . . . . . . . 143B.2.1 Consumption Claim . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144B.2.2 Dividend Claim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144B.2.3 Risk-Free Interest Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 145B.2.4 Return on the Market Portfolio . . . . . . . . . . . . . . . . . . . . . . 145B.2.5 Linearization Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 147B.3 Excess Returns Conditional Moments Implied by the Long-Run Risks Model 148B.3.1 Expected Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148B.3.2 Variance of Excess Returns . . . . . . . . . . . . . . . . . . . . . . . . 148B.3.3 Aggregate Excess Returns . . . . . . . . . . . . . . . . . . . . . . . . . 149B.3.4 Variance of Aggregate Excess Returns . . . . . . . . . . . . . . . . . . 149B.4 Quasi-Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . 151B.5 External Habit Formation Model . . . . . . . . . . . . . . . . . . . . . . . . . 153B.5.1 Stochastic Discount Factor . . . . . . . . . . . . . . . . . . . . . . . . 154B.5.2 Risk-Free Rate and Maximum Sharpe Ratio . . . . . . . . . . . . . . . 154B.5.3 Price-Dividend Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154viList of TablesTable 2.1 Stochastic Volatility Model (One Observation Equation): Simulation Results 56Table 2.2 Stochastic Volatility Model (Two Observation Equations): Simulation Results 57Table 2.3 Stochastic Volatility Model: CPU Time . . . . . . . . . . . . . . . . . . . . 58Table 2.4 Stochastic Volatility Model: Quasi-Maximum Likelihood Estimation Results 59Table 2.5 Stochastic Volatility Model: Descriptive Statistics . . . . . . . . . . . . . . 60Table 2.6 Stochastic Volatility Model: Parameter Estimates . . . . . . . . . . . . . . 61Table 2.7 Risk-Return Model: Quasi-Maximum Likelihood Estimation Results . . . . 62Table 2.8 Risk-Return Model: Descriptive Statistics . . . . . . . . . . . . . . . . . . 63Table 2.9 Risk-Return Model: Parameter Estimates . . . . . . . . . . . . . . . . . . 64Table 2.10 DSGE Model: Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 65Table 2.11 DSGE Model (One Equation): Simulation Results . . . . . . . . . . . . . . 65Table 2.12 DSGE Model (Two Equations): Simulation Results . . . . . . . . . . . . . 66Table 2.13 DSGE Model: Estimation Results . . . . . . . . . . . . . . . . . . . . . . . 67Table 2.14 Multivariate Stochastic Volatility Model: Parameter Values . . . . . . . . 67Table 3.1 Long-Run Risks Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 101Table 3.2 Long-Run Risks Moment Comparison: OLS . . . . . . . . . . . . . . . . . 102Table 3.3 Long-Run Risks Moment Comparison: Filtering . . . . . . . . . . . . . . . 103Table 3.4 Asset Pricing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Table 3.5 Quasi-Maximum Likelihood Parameter Estimates . . . . . . . . . . . . . . 105Table 3.6 Regressions on Quarterly Data: Expected Returns . . . . . . . . . . . . . . 106Table 3.7 Regressions on Quarterly Data: Volatility . . . . . . . . . . . . . . . . . . . 107Table 3.8 Summary Statistics of Expected Returns, Volatilities and Sharpe Ratio Esti-mates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Table 3.9 Quasi-Maximum Likelihood Parameter Estimates . . . . . . . . . . . . . . 109Table 3.10 Quasi-Maximum Likelihood Parameter Estimates: Model with Predictors . 110Table 3.11 Summary Statistics of Sharpe Ratio Estimates . . . . . . . . . . . . . . . . 111viiList of FiguresFigure 2.1 Stochastic Volatility Model: Filter Performance. . . . . . . . . . . . . . . 45Figure 2.2 Stochastic Volatility Model: Quasi-Likelihood Contours . . . . . . . . . . 46Figure 2.3 Risk-Return Model: Filter Performance . . . . . . . . . . . . . . . . . . . 47Figure 2.4 Risk-Return Model: Order of Approximation . . . . . . . . . . . . . . . . 48Figure 2.5 Risk-Return Model: Quasi-Likelihood Contours . . . . . . . . . . . . . . . 49Figure 2.6 Risk-Return Model: Quasi-Likelihood Contours (Cont.) . . . . . . . . . . 50Figure 2.7 Risk-Return Model: Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 2.8 DSGE Model: Filter Performance . . . . . . . . . . . . . . . . . . . . . . 52Figure 2.9 DSGE Model: Quasi-Likelihood Contours . . . . . . . . . . . . . . . . . . 53Figure 2.10 Nonlinear Model: Filter Performance . . . . . . . . . . . . . . . . . . . . . 54Figure 2.11 Multivariate Stochastic Volatility Model: Filter Performance . . . . . . . 55Figure 3.1 Comparison OLS Estimates versus Simulated Values: Long-Run Risk Model 95Figure 3.2 Comparison Simulations versus Filtered values: Long-Run Risks Model . 96Figure 3.3 Comparison OLS versus Filtered Estimates . . . . . . . . . . . . . . . . . 97Figure 3.4 Comparison OLS Estimates versus Simulated Values: Habit Formation Model 98Figure 3.5 Expected Returns, Volatility and Sharpe Ratio Estimates. . . . . . . . . . 99Figure 3.6 Portfolio Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100viiiAcknowledgmentsCompleting my degree at the Sauder School of Business would not have been possible withoutthe continued support from a great number of faculty, colleagues, friends, and family duringthese six years.I owe an infinite amount of gratitude to my Ph.D. supervisors Adlai Fisher and LorenzoGarlappi, who provided guidance and support on so many levels, but foremost as great mentors.I want to thank Harjoat Bhamra, Murray Carlson, Nando de Freitas, Howard Kung andJason Chen for their help, encouragement and guidance on my research endeavors. Jan Bena,Ron Giammarino, Alan Kraus, Kai Li, Hernan Ortiz-Molina and Tan Wang provided muchwelcome advice throughout my studies.My colleagues and friends Milka Dimitrova, Vincent Gregoire, Alejandra Medina and GonzaloMorales have been helpful every step of the way.My parents, Maria del Carmen and Jose Humberto, never gave up on me. My parents-in-law,Luz Maria and Modesto, have been a source of support and encouragement.Special thanks go to Zara Contractor and Daniella Santos Coy for their help and supportduring the last couple of years during the Ph.D. program.Finally, it was my wife Ana that made my time in Vancouver truly special. Without herconstant support I could not have completed my thesis. No words could describe how grateful Iam to her for always believing in me, and keeping up with the ups and downs of this journey.Lastly, I want to thank my son David, whose birth gave me the motivation I needed to completemy Ph.D.Thanks to all of you and to all that I have forgotten to mention by name.ixDedicationTo Ana and David.xChapter 1IntroductionOne of the most important research topics in financial economics is the impact of noisy informationon investment decisions. As a result, the use of latent variables in economic models has becomecrucial, since those variables capture unobserved changes in the economic environment. Therefore,developing statistical methods for estimating latent variables from information observed withnoise is of utmost importance. A standard approach for estimating these variables is the use offiltering methods. Filtering, in general, refers to an extraction process, and statistical filteringrefers to an algorithm for extracting a latent state variable from noisy measurements. In thisthesis I develop a statistical technique for estimating unobserved state variables and explore theuse of filtering methods in asset pricing.In the first chapter of this thesis, "Filtering via Taylor Series Approximations," I propose anonlinear filtering method to estimate latent processes based on Taylor series approximations.The method can be applied to both state and parameter inferences and generalizes conventionalmethods such as the extended Kalman filter (EKF) or the unscented Kalman filter (UKF). Myfindings show that filters with Taylor approximations can be as accurate as standard particlefilters for state estimation. Based on different empirical applications, I provide evidence thatmy filtering method is a good approach for econometric inference of dynamic models. Theestimation technique I propose can be applied to a number of empirical and theoretical problemsthat involve calculating conditional expectations based on noisy information.The second chapter of this thesis, "On the Volatility of the Market Sharpe Ratio," is basedon recent literature that empirically documents large time-series variation in the market Sharperatio, which has spurred theoretical explanations for this phenomenon. I revisit the empiricalevidence and ask whether estimates of Sharpe ratio volatility might be biased due to limitationsof the standard ordinary least squares (OLS) methods used in estimation. Based on simulateddata from a standard calibration of the long-run risks model, I find that OLS methods usedin the literature produce Sharpe ratio volatility five times larger than its true variability. Thedifference arises due to measurement error. To address this issue I propose the use of filtering1techniques that account for the Sharpe ratio?s time variation. I find that these techniquesproduce Sharpe ratio volatility estimates that match more closely the predictions of standardasset pricing models. Additionally, my results have practical implications for portfolio allocation,where upward-biased estimates of Sharpe ratio volatility imply excessive portfolio rebalancing.2Chapter 2Filtering via Taylor SeriesApproximations2.1 IntroductionFiltering is a statistical tool that recovers unobservable state variables using measurementsthat are observed with noise.1 Kalman (1960) proposed a well known solution to the linearfiltering problem, the Kalman filter, that computes the estimates of the state of a system, giventhe set of observations available. It has been applied to problems in economics and finance inwhich agents make decisions based on noisy information. Generalizations of the Kalman filter,commonly referred to as nonlinear filters, allow state variables to have a nonlinear relation withmeasurements or previous states. The problem is that the solution to the filtering problem isknown analytically only in a limited number of settings, such as the linear, and alternativesolutions are required.In this chapter, I propose a nonlinear filtering method to estimate unobserved state variableswhich is based on an efficient calculation of Taylor approximations. The method can be appliedto both state and parameter inferences and generalizes conventional methods such as the EKFor the UKF. My findings show that filters with Taylor approximations can be as accurate asstandard particle filters for state estimation. The importance of the filter with Taylor seriesis that it overcomes a number of difficulties previously documented in the filtering literature(Fern?ndez-Villaverde and Rubio-Ram?rez, 2007; Fern?ndez-Villaverde, Rubio-Ram?rez, andSantos, 2006). First, it allows for arbitrary nonlinearities in the data-generating process. Second,the filtering calculations are as efficient as the standard Kalman filter because only functionevaluations are required to calculate the recursions. Third, the order of approximation of the1This technique has been the subject of considerable research during the past decades due to its numerousapplications in science and engineering, such as satellite navigation systems, tumor identification and weatherforecasting.3filter can be chosen exogenously by the researcher. Fourth, in addition to state estimation, thefilter with Taylor series can be applied for inference purposes since a quasi-likelihood function isobtained in the filtering recursions and quasi-likelihood methods can be applied (Bollerslev andWooldridge, 1992; White, 1982). In addition, the quasi-likelihood functions of the filter withTaylor series are continuous with respect to the parameter values, and conventional methodsfor numerical optimization can be applied to conduct statistical inference. Lastly, the filteris flexible enough to include several nonlinear observation equations to improve the state andparameter estimation.The filter approximates all the densities involved in the state estimation process with Gaussiandistributions. The theoretical foundation for these approximations is that any probability densityfunction can be approximated as closely as desired by the sum of Gaussian density functions (Itoand Xiong, 2000; Maz?ya and Schmidt, 1996), where the first and second moments are necessaryto characterize the whole distribution. In addition, the filter can be combined with Monte Carlosimulations to handle a more general class of models that involve discrete state variables via theRao-Blackwellised particle filter (Doucet, De Freitas, Murphy, and Russell, 2000).I test the proposed filter in a number of nonlinear models that involve latent variablespreviously studied in the finance, economics and filtering literature. The first application is thestochastic volatility model (Andersen, Bollerslev, Diebold, and Ebens, 2001; Andersen, Bollerslev,Diebold, and Labys, 2003; Andersen and S?rensen, 1996; Broto and Ruiz, 2004). I start bystudying the performance of the filter with simulated data. The filter with Taylor series generatesvolatility estimates as accurate as those of the particle filters and at least four times faster. I alsofind that quasi-maximum likelihood methods are a good approach for parameter estimation. Mysimulation exercises suggest that the filtering method with Taylor series approximations is analternative approach for both state and parameter inference. Finally, I estimate the parameters,for different orders of approximation, of an endowment process with stochastic volatility using aseries of US data for monthly consumption growth. For higher approximation orders, I foundevidence of stochastic volatility comparable with the recent findings by Bidder and Smith (2011)and Ludvigson (2012) which suggest that the stochastic volatility model is a good representationfor consumption growth, as posited in the long-run risks literature (Bansal and Yaron, 2004).In the second application, I analyze a nonlinear latent vector autoregressive (VAR) processstudied in Brandt and Kang (2004) and recently used by Boguth, Carlson, Fisher, and Simutin(2011) in the conditional asset pricing literature. In this setup, the conditional mean andvolatility of stock returns are modeled as a two-dimensional latent VAR process. This approachhas several advantages: It guarantees positive risk premia and volatilities, eliminates the relianceof arbitrary conditioning variables for the construction of conditional moments, and allows thestudy of the contemporaneous and intertemporal relationships between expected returns andrisk. In this case, by including an additional observation equation I find that the filters based on4Taylor series generate estimates of expected returns and volatilities as accurately as the particlefilters do.The last application is a stochastic general equilibrium model, which is particularly interestingas it shows that perturbation techniques that have been previously used to solve generalequilibrium models (Judd, 1998; Schmitt-Grohe and Uribe, 2004) can be directly combinedwith nonlinear filtering for state and parameter estimation. Moreover, the filter with Taylorapproximations may be another feasible approach for parameter inference of these models sincea quasi-likelihood function can be constructed instead of Monte Carlo simulation methods.As a robustness check, I test the filter in a high-dimensional multivariate stochastic volatilitymodel and a standard highly nonlinear model from the filtering literature. I find that the filterwith Taylor series provides accurate state estimates that are comparable with those obtained withparticle filters. More importantly, I confirm that the filter with Taylor series is computationallymore efficient than standard particle filters.2.1.1 Related LiteratureA number of applications involve nonlinear VAR processes. These nonlinearities complicatethe filtering process as well as the parameter inference procedures because the Kalman filter isno longer an optimal solution. To resolve this issue, different lines of research have emerged.One strand of research is based on deterministic filtering and uses deterministic recursions tocompute the mean and variance of the state variables given the observed information. Twowidely used algorithms have been successfully applied: the EKF (Jazwinski, 1970) and the UKF(Julier and Uhlmann, 1997). These approaches rely on first- and second-order approximations ofthe functions that characterize the nonlinear data-generating process. However, if nonlinearitiesare significant enough, these filters do not provide efficient estimates, and a number of biasesarise.2 A recent extension of this approach is the Smolyak Kalman filter, proposed by Winscheland Kr?tzig (2010), which extends the UKF by applying the Smolyak quadratures to constructthe filtering recursions. This chapter extends these approaches by allowing an arbitrary orderof approximation. The filter is based on the efficient Taylor series expansions recently used inSavits (2006) and Garlappi and Skoulakis (2010) in two manners: first, for the computation ofhigher order derivatives of functions and second, for the computation of higher order moments ofnormally distributed random variables. The filters with Taylor series approximations proposedin this paper fall into the deterministic filtering literature and extend the current techniques fornonlinear filtering.The second line of research for nonlinear filtering is based on Monte Carlo techniques.These filtering techniques, also called particle filtering techniques, are based on Monte Carlo2Fern?ndez-Villaverde and Rubio-Ram?rez (2007) uncover significant biases that arise from first- and second-order approximations to the functions that characterize the system.5simulation with sequential importance sampling. The overall goal is to directly implementoptimal Bayesian estimation by recursively approximating the complete posterior state densitythrough Monte Carlo methods (Gordon, Salmond, and Smith, 1993; Pitt and Shephard, 1999).Different extensions of the particle filter have been proposed in the filtering literature, such asthe Rao-Blackwellised particle filter (Doucet, De Freitas, Murphy, and Russell, 2000) and theunscented particle filter (Van Der Merwe, Doucet, De Freitas, and Wan, 2001). The currentapproach to evaluating the likelihood of a nonlinear state-space model is dominated by particlefilters; its extensions are described in Doucet, de Freitas, Gordon, and Smith (2001). Particlefilters are often an alternative to the EKF or UKF and have the advantage that, with sufficientsamples, they approach the Bayesian optimal estimate, improving the accuracy of the EKFor UKF. However, when the simulated sample is not sufficiently large, particle filters mightsuffer from sample impoverishment. In this paper, I show that the filter with Taylor series canprovide state estimates as accurate as those obtained by standard particle filters by includingadditional observation equations, such as the squared or cubed observation equations of astandard model. By adding observation equations to the model, we can also achieve a betterparameter identification.Tanizaki and Mariano (1996) are the first to suggest using Taylor series approximations toresolve the filtering problem as well as the biases that arise while taking first- and second-orderapproximations. However, they propose the use of Monte Carlo simulations to avoid numericalintegration. Instead, I apply Taylor series approximations in filtering and use the efficientrecursions of Savits (2006) to estimate the derivatives of a function with several variables as anargument.The rest of this chapter is structured as follows: Section 2.2 outlines the general filteringproblem, Section 2.3 presents the filtering techniques with Taylor series and Section 2.4 describesthe quasi-maximum likelihood approach for parameter estimation in filtering. Section 2.5presents three different applications and describes the data, results and empirical findings;Section 2.6 provides a set of robustness checks and Section 2.7 concludes the chapter.2.2 Nonlinear FilteringState-space models are mathematical tools commonly used to represent dynamic systems thatinvolve unobserved state variables.3 A state-space representation is characterized by a set ofmeasurements and a state transition, usually obtained from a theoretical model. The statetransition reflects the time evolution of the state variables, whereas the state measurementrelates the unobserved state vector and the observed variables. Let xt denote an N -dimensionalvector that represents the state of the system at time t and yt be a p-dimensional vector ofobservables. It is generally assumed that the states of the system follow a first-order Markov3See Hamilton (1994) and Kim and Nelson (1999) for a standard introduction to state-space models.6process, and the observations are assumed to be conditionally independent given the states.The state-space model is characterized by the state transition and state measurement densities,denoted by p(xt|xt?1) and p(yt|xt), respectively.A number of applications characterize the state transition and measurement densities throughthe transition and measurement equations, which are expressed as follows:Observation Equation: yt = h (xt) + vt, (2.1)Transition Equation: xt+1 = g (xt) + ?t+1, (2.2)where vt and ?t are p-dimensional and N -dimensional distributed noise vectors with variance?covariance matrices R and Q, respectively. In this case, Eq. (2.1) represents the observationequation, while Eq. (2.2) represents the transition equation. Intuitively the function h definesthe measurement based on the current state and the function g characterizes the current statefrom the previous state. The mappings h : RN ?? Rp and g : RN ?? RN are assumed to becontinuous and smooth.To complete the specification of the model, it is assumed that the initial state of the system, x0,has a known prior distribution, denoted by p(x0). The filtering problem is to find the distributionof the state vector, xt, given the set of observations available, y1,..., yt. The posterior densityof the states conditional on the history of observations, denoted by p(x0, x1, ..., xt|y1, ..., yt),constitutes the complete solution to the filtering problem. For tractability purposes, themathematical object that is usually analyzed is the marginal distribution, or marginal densityof the state conditional on the set of observations available, which is denoted by p(xt|y1, ..., yt).If h and g are linear, then Eqs. (2.1) and (2.2) define a linear filtering problem; moreover, if vtand ?t are normally distributed, then the filtering problem has a well-known solution given bythe Kalman filter (Kalman, 1960). In the linear case, the conditional density, p(xt|y1, ..., yt), isGaussian with mean and variance constructed recursively based on the set of observations andthe state-space representation. If either of the mappings h or g is nonlinear, then the filteringproblem is nonlinear and no standard solution exists.2.3 Filtering Based on Taylor Series ExpansionsA number of solutions have been proposed to solve the nonlinear filtering problem.4 If thenonlinear models can be expressed in a state-space setting, then the Kalman filter may beuseful by calculating linearizations at each time step, so that the standard filter recursionscan be applied. This approach is known as the extended Kalman filter (EKF).5 The EKFreverts to optimal Kalman filters when the problems become linear. As a result, the EKF can4An extensive review of nonlinear filtering from a theoretical and empirical perspective is provided by Crisanand Rozovskii (2011).5See Appendix A.4 for a detailed explanation of the EKF.7yield approximate minimum-variance "try at your own risk" category. Indeed, Anderson andMoore (1979) caution that the EKF "can be satisfactory on occasions." Moreover, Julier andUhlmann (1997) document a number of biases generated by the EKF, and as a result proposean improvement which is the unscented Kalman filter (UKF).6The UKF relies on the idea that approximating the moments of a transformed random variableis simpler than approximating the density function itself. The unscented filter approximates thefirst two moments needed for the Kalman update. The approximation is based on quadraturetechniques where the number of grid points is taken to be 2d+ 1, where d is the dimension ofthe integrands to be analyzed. As shown by Julier and Uhlmann (1997), this approximationis comparable to a second-order Taylor approximation of the state and observation equations.Winschel and Kr?tzig (2010) find that the UKF is an attempt to solve the curse of dimensionalitygenerated by the number of integrands in the filtering recursions; however, the filter generatesanother curse in terms of approximation errors. As the dimension of the problem increases, thenumber of points used by the UKF rises linearly. Unfortunately, the accuracy of the numericalintegration decreases with the dimensionality and nonlinearity of the integrands. Therefore,this curse of approximation errors has an effect in state and parameter estimation. The UKFis therefore restricted to tractable nonlinearities (such as a low-order polynomial) and a smallnumber of states.To overcome this issue, I propose the use of higher Taylor series expansions for nonlinearfiltering. This technique assumes that the nonlinear functions that define the state-space have aTaylor series expansion. In order to calculate the moments involved in the filtering recursions,the filter uses the moment calculations of a Taylor series with a level of approximation previouslychosen by the researcher. The moments of the Taylor series expansion are then used in thestandard Kalman filter recursions. This approach extends the EKF and UKF to any order ofapproximation, and its computational efficiency is comparable to that of the standard Kalmanfilter. The use of Gaussian distributions to approximate filters is the basis of deterministicfiltering algorithms. These techniques have been analyzed by Ito and Xiong (2000) in thefiltering literature. Additionally, the filters with Taylor series can be applied for parameterinference via quasi-maximum likelihood methods, first introduced by White (1982) and analyzedin Bollerslev and Wooldridge (1992).The following sections present the filtering method based on Taylor approximations. I firstintroduce the use of Gaussian densities in nonlinear filtering and explain how the standardKalman filter is applied to estimate the first two moments of the unobserved state variables.The Kalman filter makes use of means, variances and covariances of nonlinear transformationsof normally distributed random vectors. I show how to estimate these moments with Taylorseries approximations and present theoretical results that help estimating them efficiently. At6See Appendix A.5 for a detailed explanation of the UKF.8the end of the section, I summarize the results in a general algorithm, and, finally, I discuss howto apply the filters with Taylor approximations to a more general class of state variables viastandard Rao-Blackwellisation methods.2.3.1 Gaussian Densities for FilteringThis section introduces Gaussian densities for nonlinear filtering and describes how the meanand variance of the unobserved state variables are approximated with the standard Kalmanfilter. This approximation requires calculating expected values of nonlinear transformations ofnormally distributed random vectors which may not have a closed form. I describe how theTaylor series can be applied to estimate these expected values, and finally, I discuss how theEKF and the UKF are particular cases of these approximations.The notation N (z;?,?) is shorthand for the density of a multivariate normal distri-bution with argument z, mean ?, and covariance ?. Let xt|t ? E [xt |y1,..., yt ] and Pt|t ?V ar [xt |y1,..., yt ] . I assume that the initial state density is normal with mean x0 and covariancematrix P0. I also assume that the densities involved in each of the filtering steps are normal.In this case, the conditional density of the state variable xt, denoted by p (xt |y1, ..., yt ), ischaracterized by its first and second conditional moments; that is,p (xt |y1, ..., yt ) ? N(xt;xt|t , Pt|t).Moreover, the conditional density of xt+1 is also Gaussian,p (xt+1 |y1, ..., yt ) ? N(xt+1;xt+1|t , Pt+1|t),with conditional moments obtained from the transition equation represented by Eq. (2.2),xt+1|t = E [g (xt) |y1, ..., yt ] , (2.3)Pt+1|t = V ar [g (xt) |y1, ..., yt ] +Q. (2.4)Similarly, the measurement density, defined by the observation equation in (2.1) is Gaussian,p (yt+1 |y1, ..., yt ) ? N(yt+1; yt+1|t , Pyyt+1|t),with meanyt+1|t = E [h (xt+1) |y1, ..., yt ] (2.5)and variance?covariance matrixP yyt+1|t = V ar [h (xt+1) |y1, ..., yt ] +R, (2.6)where R is the covariance matrix of the measurement shocks. Moreover, the conditional9covariance between the measurements and states is represented byP xyt+1|t = Cov [xt+1, h (xt+1) |y1, ..., yt ] . (2.7)By assuming that the conditional densities are Gaussian, the conditional moments can beobtained recursively by applying the standard Kalman filter,7 represented by the following setof equations:8p (xt+1 |y1, ..., yt+1 ) = N(xt+1;xt+1|t+1 , Pt+1|t+1), (2.8)Kt+1 = Pxyt+1|t(P yyt+1|t)?1,xt+1|t+1 = xt+1|t +Kt+1(yt+1 ? yt+1|t),Pt+1|t+1 = Pt+1|t ?Kt+1Pyyt+1|tK>t+1.Equations (2.8) are based on the calculations of the moments of Eqs. (2.3) - (2.7),which are expected values of nonlinear transformations of random variables which maynot have a closed form. A natural approach consists of replacing the observation andtransition equations with their Taylor series expansions, using its mean vector as the centerof expansion of the series. Consequently, the moments of the observation and transitionequations involved are calculated with the expected values of the Taylor approximations. Inthis setup, the EKF corresponds to a first-order approximation, while the UKF coincideswith the second-order approximation of the functions that define the state-space model. As aresult, the numerical integration problem is solved by calculating the derivatives of the observa-tion and transition equations as well as the cross moments of normally distributed random vectors.The next section provides the basic setup for estimating the moments of Gaussian randomvectors using Taylor approximations. A brief overview of the multivariate version of the Taylorseries is presented first, followed by an explanation of how these approximations are used toestimate the moments of possibly nonlinear transformations of normally distributed randomvectors.2.3.2 Taylor Series ApproximationsLet y = f (x) denote a smooth function, where f : RN ?? R, and let ? denote an N -dimensionalconstant vector.9 Let q = (q1, ..., qN ) denote a vector of nonnegative integers, |q| =?Nn=1 qn,q! =?Nn=1 (qn!) , and fq(?) denote the partial derivative of order q of the function f(?) evaluatedat ?; i.e.,fq (?) =?q1+...+qN f?xq11 ...?xqNN(?) . (2.9)7A formal proof of this result can be found in Theorem 2 of Kalman (1960).8For a detailed description of the standard Kalman filter, please see Appendix A.3.9I will follow the convenient tensor notation from Savits (2006) and Garlappi and Skoulakis (2010).10Theorem 2.3.1 Let U ? RN be an open subset, x ? U, ? ? RN , so that tx + (1? t)? ? U forall t ? [0, 1] . Assume f : U ?? R is (M + 1) times continuously differentiable. Then, there is a? ? [0, 1] , so thatf (x) =?{q:|q|?M}1q!fq (?)N?n=1(xn ? ?n)qn +?|q|=M+11q!fq (?)N?n=1(xn ? ?n)qn , (2.10)where ? = ?x + (1? ?)?.Theorem 2.3.1 is the preamble to the Taylor series approximations. It shows that f (x) can berewritten as the sum of a polynomial, where the coefficients are determined by the derivativesof the function evaluated at the point ?, and a term that includes its derivatives of order M + 1evaluated at a point ?. The polynomial is defined as the M -th order Taylor approximation tothe function f (x), and the second term is known as the remainder.Definition The generic M -th order Taylor approximation of f centered at ?, denoted by f? , isdefined asf?(x) =?{q:|q|?M}1q!fq (?)N?n=1(xn ? ?n)qn , (2.11)for x ? U.Now, suppose that x ? N (?,?) and that we are interested in calculating the expected value off (x). A natural approach is to replace the function f with its M -th order Taylor approximationand estimate the expected value of this approximation. Thus, from Eq. (2.11) we haveE [f (x)] '?{q:|q|?M}1q!fq (?)E[N?n=1(xn ? ?n)qn]. (2.12)Intuitively, Eq. (2.12), provides an approximation for the expected value of a transformation of anormally distributed random vector which is based on two separate elements: the derivatives ofthe function f evaluated at ? and the cross moments of a normally distributed random vector.10In most of the applications, the derivatives of the function f have an analytical expression andcan be calculated explicitly. However, the filtering recursions in (2.8) not only involve calculatingthe expected value of a transformation, they require the calculation of variances and covariancesof this transformation with state variables.The following sections describe how to calculate the cross moments in Eq. (2.12) basedon the results of Savits (2006). It is also shown how to apply the Taylor approximations for10 Conditions under which?|q|=M+11q!E[fq (?)N?n=1(xn ? ?n)qn]?? 0 as M ? ?, can be found in Jiming(2010) and Garlappi and Skoulakis (2011).11the estimation of variances and covariances. Propositions 2.3.3 and 2.3.4 as well as Lemma2.3.5 provide the basis for calculating variance?covariance matrices efficiently using Taylorapproximations. Finally, all the results are summarized in the nonlinear filter based on Taylorapproximations which is presented in Algorithm 2.3.6.2.3.3 Calculation of the Moments of a Multivariate Normal DistributionLet Z = (z1, z2, ..., zN ) denote a multivariate normal random vector with zero-mean vector andcovariance matrix ?, where the component i, j denotes the covariance between the randomvariables zi and zj . LetM?q be the q ? (q1, ..., qN ) moment, where q1, ..., qN are nonnegativeintegers; i.e.,M?q = E[zq11 ...zqNN ]. Then, from Theorem 5.1 in Savits (2006), we have the followingrecursive relation between the multivariate moments of Z.Proposition 2.3.2 Set M?(0,...,0) = 1; then, for all q = (q1, ..., qN ) ? 0N and 1 ? j ? N, wehaveM?q+ej ? E[zq11 ...zqj+1j ...zzNN]=N?k=1?jkqkM?q?ek , (2.13)where ej is the N-dimensional unit vector with j-th component equal to 1 and all the othercomponents equal to zero.Proof See Savits (2006).Proposition 2.3.2 provides a recursive method to estimate the cross moments of a normallydistributed random vector based on the variance?covariance matrix only. This result is the basisfor the filtering methods via Taylor approximations, because the calculation of moments andderivatives can be separated in a tractable form.To calculate the second moments involved in the Kalman filter recursions in Eqs. (2.4) and(2.6), I approximate the product of a transformed random variable with its Taylor series aroundthe mean vector ?. The choice of the mean vector, ?, as a center of expansion of the Taylorseries is convenient for three reasons: First, all the calculations that involve derivatives areindependent of the expectation operator; second, the cross moments are independent of themean vector; and third, E[?Nn=1 (xn ? ?n)qn ] = 0, for all vectors q such that?Nn=1 qn is an oddnumber. In any case, the results will be valid if the center of expansion is any constant vector.If the problem involves the calculation of conditional expectations, then the results will still bevalid. The only requirement is for the center of expansion to be measurable with respect to thecurrent information set.We know that the variance and covariance of any set of random variables involve calculatingexpectations of the product of random variables. For example, the variance of f (x) requires the12calculation of E[f2 (x)]and E [f (x)] . As described previously, the expectation of f2 (x) can beapproximated with its Taylor series; i.e.,E[f2 (x)]??{q:|q|?M}1q!(f2)q(?)M?q , (2.14)where(f2)q (?) denotes the partial derivative of order q of the function f2 evaluated at ?.Finally, the variance is obtained as the difference between the estimate of the second moment ofthe function and the squared value of the estimate of the first moment; that is,V ar [f (x)] = E[f2 (x)]? E2 [f (x)] .The same method can be applied to estimate the covariance of two transformed randomvectors. The idea is to approximate the expected value of the product of two functions with theirderivatives evaluated in the vector of means and the cross moments of a normally distributedrandom vector. Following Eq. (2.12) , we haveE [f1 (x) ? f2 (x)] ??{q:|q|?M}1q!(f1 ? f2)q (?)M?q , (2.15)and the covariances involved in the calculation of the covariance matrix of the observation vector,can be obtained ascov [f1 (x) , f2 (x)] = E [f1 (x) ? f2 (x)]? E [f1 (x)] ? E [f2 (x)] .Clearly, from Eqs. (2.14) and (2.15) , we learn that variances and covariances could beestimated with the derivatives of the square and the product of functions which may result incumbersome calculations. However, Propositions 2.3.3 and 2.3.4 provide a tractable recursivescheme to compute the derivatives of these functions based on the derivatives of the functionsf , f1 and f2. This is obtained via the Fa? di Bruno formula for the derivative of a compositefunction and its extensions to the multivariate case recently proposed in Savits (2006) andapplied in Garlappi and Skoulakis (2010, 2011).11Proposition 2.3.3 Let f : RN ?? R be an (M + 1)- times continuously differentiable function.Then the derivatives of ? (x) = f (x)2 can be obtained from the following vector recursion:?0 (x) = f (x)2 (2.16)?q+ej (x) =?{`?NN0 :0N?`?q}2?(q`)fq+ej?` (x) f` (x) .Proof See Appendix A.211A brief overview of the fundamentals of these recursions and the main results of Savits (2006) are presentedin Appendix A.1.13Proposition 2.3.4 Let f1, f2 : RN ?? R be (M+1)- times continuously differentiable functions.Let ? (x) = f1 (x) ? f2 (x), then the derivatives of ? (x) , are given by?0 (x) = f1 (x) f2 (x) (2.17)?q+ej (x) =?{`?NN0 : 0N?`?q}(q`)f1,q+ej?` (x) f2,` (x) ,+?{`?NN0 : 0N?`?q}(q`)f2,q+ej?` (x) f1,` (x) .Proof See Appendix A.2Although the use of Taylor series for the calculation of covariances is quite convenient, if eitherf1(x) or f2(x) is linear then we can estimate these covariances in a simpler way. Stein?s Lemmaprovides an expression for the covariance between a normally distributed random vector and itsnonlinear transformation. Indeed, Stein?s Lemma can be applied to calculate the covariancematrix involved in the Kalman filter step of Equation (2.7).Lemma 2.3.5 (Stein?s Lemma). Suppose that X ? (x1, ..., xN ) ? N (?,?). For any functionf (x1, ..., xN ) such that ?f /?xi exists almost everywhere and E?????xif (X)??? < ?, i = 1, ..., N.Let ?f (X) =(?f?x1, . . . , ?f?xn)>. Then the following identity holds:cov (X, f (X)) = ?? E [?f (X)] . (2.18)Specifically,cov (x1, f (x1, ..., xN )) =N?i=1cov (x1, xi)? E[??xif (x1, ..., xN )]. (2.19)If f is a vector function, then ?f (X) is replaced by the transpose of the Jacobian matrix off , Jf defined byJf =? (f1, f2, ..., fp)? (x1, x2, ..., xn)=??????f1?x1? ? ? ?f1?xn.... . ....?f1?x1? ? ? ?fp?xn?????.Proof See Appendix A.2.Stein?s Lemma involves calculating the expectation of a vector of partial derivatives. Ifthe function f is a polynomial of order less or equal to three, then this expectation is a linearfunction of the mean vector ? and the variance?covariance matrix, ?. If the function f is not a14polynomial of order less than or equal to three, it is necessary to obtain the Taylor series of eachof the components of the vector of partial derivatives. Fortunately, the Taylor approximationsof these components are obtained directly from the Taylor series of the function f ; that is,?f?xi'?{q:0<|q|?M}qiq!fq (?)? (xi ? ?i)qi?1 ?N?n=1n6=i(xn ? ?n)qn . (2.20)Now, taking the expectation of Eq. (2.20) yields toE[?f?xi]??{q:0<|q|?M}qiq!fq (?)E????(xi ? ?i)qi?1 ?N?n=1n 6=i(xn ? ?n)qn???? .As in Eq. (2.12), calculating the covariance in Eq. (2.18) can be separated in two steps: firstthe calculation of the derivatives of the vector ?f?x ; and, second, estimating the cross moments ofa normally distributed random vector. Following the notation of Proposition 2.3.2, we have thatthe moments required to calculate this covariance are written asM?q?ei = E????(xi ? ?i)qi?1 ?N?n=1n6=i(xn ? ?n)qn???? .Therefore, the expectation of each of the partial derivatives is written asE[?f?xi]??{q:0<|q|?M}qiq!fq (?)M?q?ei. (2.21)Algorithm 2.3.6 summarizes the filtering method with Taylor series approximations. Themain inputs to perform the filtering recursions are the derivatives of the functions that definethe observation and transition equations of a state-space model and the cross moments of amultivariate normal distribution. Moreover, the algorithm coincides with the EKF and the UKFwhen the first-and second-order approximations are considered, respectively.Algorithm 2.3.6 The process is initialized with the unconditional moments of xt:x0|0 = E [x0] ,P0|0 = V ar [x0] .15For t = 0, ..., T? Calculate the derivatives of orders less than or equal to M of each of the components of thevector function g (x) , gi (x) , i = 1, ..., n, and evaluate them at the point xt|t .? Based on Propositions 2.3.3 and 2.3.4, calculate the derivatives of orders less than or equal toM of functions of the form gi (x) gj (x) , i, j = 1, ...n, and evaluate them at the point xt|t .? Based on Proposition 2.3.2, calculate all the cross moments,MPt|tq , with |q| ?M of a normallydistributed random vector with a mean vector of zero and variance?covariance matrix Pt|t .? Calculate E [g (xt) |y1, ..., yt ] and V ar [g (xt) |y1, ..., yt ] according to Eqs. (2.14) and (2.15).Time Update? Estimate xt+1|t and Pt+1|t asxt+1|t = E [g (xt) |y1, ..., yt ] ,Pt+1|t = V ar [g (xt) |y1, ..., yt ] +Q.? Calculate the derivatives of orders less than or equal to M of each of the components of thevector function h (x) , hi (x) , i = 1, ..., p, and evaluate at the point xt+1|t .? Based on Propositions 2.3.3 and 2.3.4, calculate the derivatives of orders less than or equal toM of functions of the form hi (x)hj (x), i, j = 1, ...n, and evaluate them at the point xt+1|t .? Calculate all the cross moments,MPt+1|tq , with |q| ?M of a normally distributed random vectorwith a mean vector of zero and variance?covariance matrix Pt+1|t according to Proposition2.3.2.? Calculate E [h (xt+1) |y1, ..., yt ], E [?h (xt+1) |y1, ..., yt ] and V ar [h (xt+1) |y1, ..., yt ], accordingto Eqs. (2.14), (2.15) and (2.21).Measurement Update? Estimate yt+1|t and Pyyt+1|t asyt+1|t = E [h (xt+1) |y1, ..., yt ] ,P yyt+1|t = V ar [h (xt+1) |y1, ..., yt ] +R,P xyt+1|t = Pt+1|tE [?h (xt+1) |y1, ..., yt ] .Kalman Filter Update? Estimate xt+1|t+1 and Pt+1|t+1 according to the Kalman update:Kt+1 = Pxyt+1|t(P yyt+1|t)?1,xt+1|t+1 = xt+1|t +Kt+1(yt+1 ? yt+1|t),Pt+1|t+1 = Pt+1|t ?Kt+1Pyyt+1|tK>t+1.16From the previous sections, we learned that the filter with Taylor approximations can beapplied to a general class of models. However, all these models assume that the state variablesare continuous. If a subset of the state variables is discrete, such as the unobserved statevariables in a discrete Markov switching model, then the filter with Taylor series approximationscannot be applied directly.Next section discusses how to apply standard Rao-Blackwelisation methods12 to the filteringtechniques with Taylor approximations. These methods allow including a subset of potentiallydiscrete state variables in a general state-space representation. The approach is done via theRao-Blackwellised particle filter (Doucet, De Freitas, Murphy, and Russell, 2000). As a result,hybrid filters are obtained where one part of the calculations is performed with the filters basedon Taylor approximations and the other part relies on Monte Carlo simulation methods.2.3.4 Rao-Blackwellisation for Filtering with Taylor Series ApproximationsRao-Blackwellised particle filtering was introduced by Doucet, De Freitas, Murphy,and Russell (2000); the basic idea is to reduce the number of variables that mustbe sampled by identifying variables that have an analytical expression for theirdensity function or can be analyzed in a tractable way. A general overview ofthe Rao-Blackwellised particle filter can be found in Crisan and Rozovskii (2011).Suppose that we are interested in analyzing the conditional density of a random vector xthat can be written as two different vectors, z and u; i.e., x = [z, u]> . We can think of z as thecontinuous random variable and u a discrete random vector. The conditional density of x canbe written asp (x) = p (z, u) = p (z |u) p (u) . (2.22)If p (z |u) admits a closed-form expression, then to approximate the probability density functionof p (x) we need to approximate only the unknown density, p (u) . In this case, we reduce thenumber of variables that need to be analyzed. In general, Rao-Blackwellised particle filteringallows to jointly analyze discrete and continuous state variables and reduces the number ofvariables that must be sampled by taking advantage of the analytical structure of the problem.Suppose that the state vector xt can be written as xt = (zt, ut)> , where zt ? RN and ut isan unobserved Markov process with known transition probability density and has the followingstate-space representation:yt = h(xt, ut) + vt, vt ? N (0, R (ut)) , (2.23)xt+1 = g(xt, ut+1) + ?t+1, ?t ? N (0, Q (ut)) , (2.24)12See Casella and Robert (1996) for a general reference of this topic.17where R (ut) and Q (ut) have the appropriate dimensions. In this case, we can think of ut as adiscrete or continuous state variable that determines a switching state-space model. To solvethe filtering problem, we need to estimate a conditional density of the form:p (z1, ...zt, u1, ..., ut |y1, ..., yt ) = p (u1, ..., ut |y1, ..., yt ) p (z1, ...zt |y1, ..., yt, u1, ..., ut ) . (2.25)Conditional upon (u1, ..., ut) , we have a nonlinear state-space model of the form of Eqs. (2.1)and (2.2). Therefore, to obtain filtered estimates of zt, we first condition upon (u1, ..., ut) , andobtain E (zt |y1, ..., yt, u1, ..., ut ) with the filters based on Taylor approximations as presented inthe previous sections. It follows that to fully estimate the unobserved state variables, it remainsto approximate the conditional density of u1, ..., ut with other filtering approximations such asthe particle filter.13 As a result, we obtain hybrid filters where one part of the calculations isperformed analytically and the other part uses Monte Carlo methods or another filter.The state-space model in Eqs. (2.23) and (2.24) is general enough and can be applied to anumber of models such as the partially observed Gaussian models and the Markov switchingstochastic volatility models. However, all of these models rely on a set of fixed parameter valuesthat need to be estimated. In section 2.4, I present the quasi-maximum likelihood method forparameter inference of state-space models defined in Section 2.2.2.4 Quasi-Maximum Likelihood Parameter EstimationAlthough the focus of the previous sections has been in state estimation, filters based onGaussian densities, as described in Section 2.3.1 can also be applied for parameter inferenceof state-space models. White (1982) introduced econometric methods for misspecified models,known as quasi-maximum likelihood (QML) methods. The general idea is to replace the truelikelihood function with a Gaussian density and obtain parameter estimates as if the truelikelihood function is Gaussian. The parameter estimates are known as QML estimates and canbe obtained via standard numerical optimization methods. Moreover, White (1982) shows thatQML parameter estimates are statistically consistent. In addition, Gallant and White (1988)provide regularity conditions under which robust standard errors exist.For dynamic models, such as the state-space models defined in Section 2.2, Bollerslev andWooldridge (1992) show that by replacing the true likelihood function with the likelihoodfunction constructed with means and covariances obtained from the filtering Eqs. (2.8), QMLparameter estimates can be obtained.1413Appendix A.6 provides a general introduction to Particle Filters.14For recent applications of these methods in nonlinear state-space representations, see Christoffersen, Jacobs,Karoui, and Mimouni (2012), van Binsbergen and Koijen (2011), Campbell, Sunderam, and Viceira (2011) andCalvet, Fisher, and Wu (2013).182.4.1 Quasi-Likelihood FunctionLet L (?) denote the quasi log-likelihood function of a dynamic model evaluated at the vector ofparameter values ?; then the function L is constructed as follows:For each t = 1, ..., T , the conditional mean, yt+1|t , and conditional covariance, Pyyt+1|t , arecalculated recursively through Eqs. (2.8). The quasi log-likelihood function is calculated byassuming that yt+1, is normally distributed with mean yt+1|t and covariance matrix Pyyt+1|t ; thatis,lt (?) = ?p2ln (2pi)?12ln???P yyt+1|t????12(yt+1 ? yt+1|t)> (P yyt+1|t)?1 (yt+1 ? yt+1|t). (2.26)Finally, the QML parameter estimates, denoted by ?QML, are obtained by choosing the vectorof parameters ? that maximizes the quasi-likelihood function; that is,?QML ? arg max?L (?) ,whereL (?) = ?Tt=1lt (?) . (2.27)Bollerslev and Wooldridge (1992) show that the QML function, (2.27), is well defined.Moreover, they show that the true but unknown vector of parameters is the global maximizer of(2.27) if the following conditions hold:E [vt+1 |y1, ..., yt ] = 0,V ar [vt+1 |y1, ...yt ] = R,which means that if the first and second moments are well specified, then the global maximizer ofthe QML function will be the true but unknown vector of parameter values. Finally, asymptoticstandard errors can be estimated as in Gallant and White (1988). They show that under certainregularity conditions, the covariance matrix of the QML estimator has a closed-form expressionas shown in Appendix A.7.2.5 ApplicationsIn this section, I test the filter with Taylor series in three different nonlinear models. The firstmodel is the standard stochastic volatility model as in Andersen and S?rensen (1996). Thesecond model is the risk-return representation analyzed by Brandt and Kang (2004) in thepredictability literature, and the third model is a simple version of the dynamic stochasticgeneral equilibrium model studied by Schmitt-Grohe and Uribe (2004). I implement the filterwith Taylor series for different approximation orders in simulated and real data. I conduct twosets of exercises. The first set consists of pure filtering exercises that test the accuracy of the19Taylor series for state estimation by using different approximation orders. The second set ofexercises analyze the precision of the parameter estimates by comparing the true values withtheir sample counterparts. Finally, I compare the results with those obtained from standardmethodologies such as the EKF, the UKF and the particle filter.2.5.1 Stochastic Volatility ModelsThe standard stationary stochastic volatility15 model in discrete time is represented byyt = ?t ? ?t (2.28)ln ?2t = d+ ? ln ?2t?1 + ?t, ?t ? N(0, ?2?),where yt is the value of a time series observation at time t and ?t is the corresponding volatility,d is a scale parameter for the volatility process and ?t is a white noise process with unit variancethat represents the innovations in the level or returns. The disturbance of the volatility equation?t, is assumed to be a Gaussian white noise process; in addition, |?| is considered as a measureof persistence of shocks to the volatility. The variance of the log-volatility process, ?2? , measuresthe uncertainty of future volatility.The log-normality specification for the volatility is consistent with Andersen, Bollerslev,Diebold, and Ebens (2001) and Andersen, Bollerslev, Diebold, and Labys (2003) which showthat the log-volatility process can be well approximated by a normal distribution, and withTaylor (2008), who proposes to model the logarithm of volatility as an AR(1) process. When? is close to one and ?2? is close to zero, then the evolution of volatility over time is very smooth;however, in the limit, if ? = 1 and ?2? = 0, the volatility is constant over time, and consequently,the returns are homoscedastic. As noted by Broto and Ruiz (2004), if ?2? = 0 the model cannotbe identified.State-Space Representation and ImplementationAn alternative representation of Eq. (2.28) is obtained by using the demeaned log-volatility,st ? ln ?2t ? ln ?2, and ?t as state variables. In this case, the re-parameterized stochastic volatilityprocess isyt = ? exp(st2)?t (2.29)st = ?st?1 + ?t, ?t ? N(0, ?2?). (2.30)In terms of the state-space representation in (2.1) and (2.2), the state variables are given bythe vector xt = [st, ?t]>, Eq.(2.29) is the observation equation, where h (xt) ? h (st, ?t) =15See Ghysels, Harvey, and Renault (1996) and Shephard (2005) for a comprehensive review.20? exp (st/2) ?t, and the random noise, vt, as well as its variance, R, are equal to zero.16 Finally,the law of motion is given by two equations. Eq. (2.30) represents the first equation, whilethe second consists of only one element, a random shock. Overall, the transition equations arerepresented by[st?t]=[? 00 0] [st?1?t?1]+[?t?t],[?t?t]? N (0, Q) (2.31)Q =[?2? ?????? 1],where ? ? corr (st, ?t) . This representation allows capturing the correlation coefficient in atractable way; since standard state-space representations assume that the shocks between theobservation and transition equations are uncorrelated. Moreover, all the moments involved inthe filtering recursions exist and have a closed-form expression which allows testing the accuracyof the Taylor series for both filtering and parameter estimation.Lastly, from Eq. (2.31) we learn that the function that characterizes the transition equationis linear in the state vector, i.e., g (xt) ? ? ? xt, where? =[? 00 0].According to Section 2.3.2 and Eqs. (2.5)? (2.7), the moments that need to be approximatedby the Taylor series areyt+1|t = E [yt+1 |Yt ]= E [? exp (st+1) ?t+1 |Yt ]andP yyt+1|t = var [yt+1 |Yt ]= E[?2 exp (2st+1) ?2t+1 |Yt]? y2t+1|t .Finally, for the Kalman update, we need to estimate the covariance matrix between theobservation and transition equations, given byP xyt+1|t = Pt+1|t ? E[? exp (st+1) ?t+1 |Yt? exp (st+1) |Yt],16This is a unique feature of the Gaussian filters; most of the Monte Carlo filters, such as the particle filter,require that all the variance?covariances involved in the transition and measurement equations should be positivedefinite.21where the last equality comes from applying Lemma 2.3.5.17Improved State IdentificationA conventional approach to improve state and parameter identification in the stochastic volatilityliterature18 is to model the squared value of the observation equation in addition to theobservation equation (2.29).19 As a result, the stochastic volatility model can be representedwith the following vector of observables:yt =[? exp( st2)?t?2 exp (st) ?2t], (2.32)and Eq. (2.31) as the transition equation. The nonlinear function is given by h (xt) ? h (st, ?t) =[? exp (st/2) ?t, ?2 exp (st) ?2t]> . A standard approach consists of log-linearizing the secondcomponent of Eq. (2.32)20 and performing the state and parameter inference on that version ofthe model. The main advantage is that the standard model becomes linear and the standardKalman filter can be applied. However, the major disadvantage of log-linearizing the squaredequation is that the correlation coefficient, ?, cannot be identified.21 The advantage of includingthe squared value of the original observation equation as an observable in Eq. (2.32) is that theinformation provided by the dynamics of the squared observation can be incorporated directlyin the estimation process. Moreover, there is no need log-linearizations, and any informationabout the correlation coefficient can be incorporated directly in the estimation process.The first and second moments of Eq. (2.32), can be calculated using Propositions 2.3.3 and2.3.4 as well as covariance obtained from Stein?s Lemma. Finally, I test the accuracy of theTaylor series approximations, by comparing the filtered series of different approximation orderswith the one obtained with the closed-form expressions of the moments. 22Monte Carlo Simulation ResultsIn this section I conduct a Monte Carlo study to test the accuracy of the filters with Taylorapproximations for both filtering and parameter inference. For the pure filtering exercise, I firstsimulate a time series of 500 observations of returns (yt) and log-volatilities st according to Eqs.(2.29) and (2.31), assuming parameter values ? = 0.98, ?? = 0.1414, ? = 1 and ? = ?0.5. Theseparameter values have been used in Broto and Ruiz (2004), as well as in empirical applicationsof daily returns.17LemmaA.8.1 in Appendix A.8 provides closed-form expressions for the expected values, variances andcovariances involved in the Kalman filter recursions of the stochastic volatility model.18See Koopman and Sandmann (1998) and Broto and Ruiz (2004) for details.19I used the squared of the observation equation as a new observable. However, the cubed or the fourth powercould be included as well.20See Ruiz (1994) for details.21Harvey and Shephard (1996) show that the information about the correlation can be recovered from the signsof the innovations.22 See Appendix A.8 for details22For each simulated series, I calculate the filtered values for the level yt, as well as for thelog-volatility st, with the filtering techniques based on Taylor approximations with up to atwelfth approximation order using the true parameter values. I denote by TKF-M the filterconstructed with M ? th order Taylor series approximation, M = 2, 3, ..., 12. For comparisonpurposes, I also filter the same simulated series with the EKF, the UKF,23 the particle filterwith 1000 particles and the filter that uses the closed?form expressions of the moments. I willrefer to this filter as the Gaussian filter.24 Finally, I calculate the mean squared error (MSE)between the simulated series and its filtered counterpart. The experiment is repeated 500 timeswith a random re-initialization for each run.Table 2.1 contains means and standard deviations of the MSE of the log-volatility processas well as the MSE of the level of the series. For this model specification, the minimum MSEfor both the log-volatility and the observable is obtained with the particle filter, on average,followed by the filters with Taylor approximations of fourth and fifth order. This result is notsurprising since the particle filter provides unbiased estimates for both unobserved states andtheir joint densities. It is worth noting that the statistics of the MSE obtained with the filtersof order eight and higher converge to the statistics of the Gaussian filter. Although the UKF iscommonly known as a second-order filter, my simulation results show evidence that the MSEresults of the second-order filter are slightly smaller than those of the UKF.As noted from Table 2.1, the particle filter provides the most accurate state estimates.However, to improve the state estimation, I filter the simulated series including the squaredvalue of the first observation equation as a second observation equation and using the trueparameter values. The results are shown in Table 2.2. In this case, the filters of order eight orhigher provide more accurate state estimates than the particle filter, on average. However, thereis not much improvement for the MSE of the observable, yt, which is slightly smaller than theone calculated from the model specification with one equation.Figures 2.1a?2.1d compare the state estimates of the log-volatility process generated froma single run using different filters. Although the tests are conducted using all the orders ofapproximation, I only report the eighth order. Figure 2.1a compares the true state with thestate estimate and a 95% confidence interval. The true state is represented by a dot, the meanof the filter obtained with Taylor series is given as a dashed line and the solid lines give the 95%confidence interval, constructed as the interval between the 2.5 and 97.5 percentile points. Theactual value of the state is within the 95% probability region on approximately 94% of occasions.Figure 2.1b compares the true state with the filter that applies two observation equations. Inthis case, the actual state is usually very close to the state estimate; on a couple of occasions, the23The UKF parameters were set to ? = 1, ? = 0 and ? = 2; which are optimal for the scalar case.24 Ito and Xiong (2000) propose the name of Gaussian filter for the filtering methods that use Gaussian densitiesto approximate the probability density functions involved in the filtering recursions in Eq.(2.8).23actual state is just outside these percentile estimates and the performance is obviously superiorto the filter that uses one equation only.Figure 2.1c compares the actual states with the filtered estimates using the true parametervalues. Clearly, the filter that uses two observation equations outperforms the filter that usesonly one observation equation. Finally, Figure 2.1d shows the filtered series along with theresults of the particle filter. Clearly, both filtered series are very close to the actual state.However, the CPU time used by the particle filter is more than ten times that used to estimatethe series with Taylor approximations. To test for the efficiency of the filtering algorithm, Irecorded the CPU time needed to estimate the log-volatility filtered series for each order ofapproximation along with the EKF, the UKF and the particle filter. The results are displayedin Table 2.3. Model 1 represents the model with one observation equation and Model 2, themodel with two observation equations. In this case, the filtered series based on Taylor seriesapproximations are calculated at least four times faster than the ones obtained with the particlefilter. The filters that use the exact moments, take on average 0.03 and 0.04 seconds to calculatethe filtered series, as compared to the 17.87 seconds it takes the particle filter to estimate thesame series. The improvement in CPU time as well as the precision of the filtered estimatesshows a notable strength of the filters that apply Taylor series with respect to standard particlefilters.As with most nonlinear models it is difficult, if not impossible, to prove that the parametersof a state-space model are uniquely identified. In order to analyze the uniqueness of the QMLestimates, I implemented the following procedure. For a fixed vector of parameter values, a pathof noisy returns is simulated and a quasi-likelihood function is constructed using the simulatedpath. An initial identification exercise is performed by calculating the quasi-likelihood functionof a fifth order of approximation in the set of parameters used for the simulation and changingindependently each parameter ?, ??, ? and ?. These plots are known as quasi-likelihood contoursand are shown in Figure 2.2. The dashed lines represent the unknown parameter values thatare used to simulate the data. If the parameter is well identified, then the quasi-likelihoodfunction should achieve a maximum parameter value that is close enough to the one used tosimulate the data. For comparison I show the quasi-likelihood functions of the EKF, the UKFand the Gaussian filter. The concavity of the quasi-likelihood function with respect to eachparameter evidence that all the parameters are well identified. For this specific exercise, ? is wellidentified by all the filters. However, ??, ? and ? are only identified properly by the fifth-orderand Gaussian filters. For these parameters, the quasi-likelihood function of the EKF and UKFprovide biased parameter values, since the maximum value is far from its true value. Moreover,Figure 2.2c shows a constant value for the quasi-likelihood function of ?, showing that the EKFis not able to identify it.A second alternative to analyze the finite sample properties of the QML estimator is viaMonte Carlo simulations. In particular, I estimate the model from 250 independent samples of24T = 500 using QML estimation methods. For each simulation, I construct QML functions oforders three, five, seven and nine. I optimize the function numerically and obtain parameterestimates. As a starting point for the parameter estimates, I chose the true parameter value usedto simulate the data.25 I conduct the same exercise with the EKF, the UKF and the Gaussianfilter and present the results in Table 2.4. The table shows the true parameter values in thefirst row as well as the sample mean and standard deviation of the corresponding parameterestimates in parentheses. Clearly, the average estimates are all close to the true parametervalues, suggesting that the QML estimates via Taylor series are relatively unbiased while theaverage estimate of ? under the EKF, the UKF and the third-order approximations are biased.Moreover, the standard deviations of the QML estimates of the ninth-order approximation arehigh and coincide with the standard errors of the third-order approximation. This may becaused by the small sample bias as well as the effect of numerical errors.Consumption Growth Model: EstimationFrom the Monte Carlo simulation exercises, we learned that the filter with Taylor approximationsis a good approach for parameter estimation of the standard stochastic volatility model. However,the results may differ in historical data in which there is substantial evidence of stochasticvolatility. As a robustness check, I perform a parameter estimation exercise with real consumptiongrowth data and estimate a stochastic volatility model with different approximation orders.A number of models, including Tallarini (2000) and Barillas, Hansen, and Sargent (2009),assume that the log-consumption growth, denoted by ? ln(Ct+1), follows a random walk withdrift ?c and standard deviation, ?,? ln(Ct+1)? ?c = ??t+1,?t+1 ? N (0, 1) .However, there is substantial evidence of time variation in the conditional standard deviationof many macroeconomic series (Bloom, Floetotto, Jaimovich, Saporta-Eksten, and Terry, 2012;Clark, 2009; Fern?ndez-Villaverde and Rubio-Ram?rez, 2007; Justiniano and Primiceri, 2008;McConnell and Perez-Quiros, 2000; Stock and Watson, 2002).As a result, Bidder and Smith (2011) propose an alternate endowment process that featuresstochastic volatility in log-consumption growth that is consistent with (2.29). The model is25This point is used merely for convenience. However, as a robustness check, I conducted a similar estimationexercise that uses random vectors as starting points for the numerical optimization procedure. In all the cases,the vector of parameter values is fully identified and the average estimates are similar to the ones obtained withthe true parameter value as initial point.25represented as follows:? ln(Ct+1)? ?c = ? exp(st+12)?t+1, ?t ? N (0, 1)st+1 = ?st + ?t+1, ?t+1 ? N(0, ?2?),where st represents the log-volatility processes.I analyze the performance of the filtering technique by implementing the stochastic volatilitymodel in (2.29) and (2.30) with consumption growth data. I use the monthly series from theFederal Reserve Bank of Philadelphia to construct the real consumption per capita from January1959 to March 2012. I construct the monthly log-consumption growth data using the real-timedata of real personal consumption expenditures in nondurables and services from the Real-TimeData Set for Macroeconomists from the Federal Reserve Bank of Philadelphia. This real-timedata set of macroeconomic variables was created to update and verify the accuracy of forecastingmodels of macro variables and provides snapshots of the macroeconomic data available at anygiven date in the past.26 Summary statistics of the time series of monthly consumption growthare shown in Table 2.5.Quasi-likelihood parameter estimation is performed to the data with the filters of ordersthree, five, seven and nine, along with the estimates obtained with the EKF, the UKF andthe Gaussian filters. The choice of the starting point used for the numerical optimization is asfollows. First, I simulated 100 random vectors from the parameter space. Each simulated vectorwas used as an initial point for the numerical maximization of the quasi-likelihood functionconstructed with the monthly consumption growth data. Finally, the initial point used toestimate the parameter values is the average of the parameter estimates obtained from theprevious procedure.The results are shown in Table 2.6. In this case, all the parameters are identified. Theparameter estimates using the quasi-likelihood function constructed with an order of threeor higher have similar values; however, the adjusted standard errors change as the order ofapproximation changes, mainly caused by numerical errors of the second derivative of thequasi-likelihood function. In general, the magnitude of the standard errors decreases as theorder of approximation increases. Overall, the parameter estimates are consistent with Bansaland Yaron (2004), Bidder and Smith (2011) and Ludvigson (2012), with a slightly lower growthrate and higher variance, most likely due to the longer data series, which include the recessionstarting in the last quarter of 2007.2.5.2 Risk and Return ModelBrandt and Kang (2004) introduce a nonlinear representation for the return dynamics that26See Croushore and Stark (2001) for details.26allows for positive risk premium in the context of a latent vector autoregressive (VAR) process.Let yt be the continuously compounded excess returns with time-series dynamics represented byyt = ?t?1 + ?t?1?t with ?t ? N (0, 1) , (2.33)where ?t?1 and ?t?1 represent the conditional volatility of the excess returns. It is assumed thatthe conditional mean and volatility are unobservable and that they follow a first-order VARprocess in logs:[ln?tln ?t]= d+A[ln?t?1ln ?t?1]+ ?t with ?t ? N (0,?) , (2.34)whered =[d1d2], A =[a11 a12a21 a22]and? =[b11 b12b21 b22]with b12 = b21 = ??b11b22. (2.35)The first equation of the VAR in (2.34) describes the dynamics of the logarithm of the conditionalmean, and it captures the permanent and temporary components as shown in Fama and French(1988b) and Lamoureux and Zhou (1996), in which the stock prices are governed by a randomwalk and a stationary random process, respectively. The second equation of the VAR describesthe dynamics of the logarithm of the conditional volatility and includes the standard stochasticvolatility model. Setting a21 = 0 is the stochastic volatility model estimated by Andersen andS?rensen (1996), Kim, Shephard, and Chib (1998), Jacquier, Polson, and Rossi (2004) andJacquier, Johannes, and Polson (2007). The latent VAR approach in Eqs. (2.34)?(2.35) allows usto study the contemporaneous and intertemporal relationships between expected returns and riskwithout relying on predictors. The contemporaneous relationship between the conditional meanand volatility is captured by the correlation coefficient ?, while the intertemporal relationshipsbetween expected returns and volatilities are captured by the coefficient matrix A.Following Hamilton (1994), if the VAR is stationary, the unconditional moments for themean and volatility are given byE[ln?tln ?t]= (I ?A)?1 dandvec(cov[?t?t])= (I ? (A?A))?1 vec (?)where ? represents the Kronecker product.The return dynamics presented in Eq. (2.34) has two key elements: the transition matrix27A and the correlation coefficient ?. The diagonal elements of A capture the persistence of theconditional moments, and the off-diagonal elements reflect the intertemporal feedback betweenthe conditional volatility and the conditional mean. A general correlation structure can beincorporated by allowing the conditional mean and volatility to be correlated with the returninnovations, denoted by Corr [?t, ?t] = [??, ??]> . Different models can be specified by settingthese correlations to zero. The assumptions about the correlation between the return innovationsand the conditional moment innovations serve to capture and potentially distinguish two popularexplanations of asymmetric volatility. Asymmetric volatility refers to the empirical finding thatincreases in volatility tend to be associated more often with large negative returns than withequally large positive returns. The two popular explanations of asymmetric volatility are theleverage effect and the volatility feedback effect. The leverage effect states that when the value ofa firm drops given a large negative return, the leverage of the firm and the associated probabilityof bankruptcy increase, causing the equity claims to become riskier. The volatility feedbackeffect attributes the asymmetric volatility to the equilibrium response of the conditional meanto changes in volatility.State-Space Representation and ImplementationThe representation given in Eqs. (2.33) and (2.34) defines a state-space model, in which the firstequation is the nonlinear measurement equation and the second equation is a linear transitionequation. To make inferences about expected returns, volatilities and the parameters of theVAR based on the observed returns, a nonlinear filtering problem needs to be solved. Thesolution to the nonlinear filtering problem generates estimates of expected log-returns andlog-volatilities, E [ln?t, ln ?t |y1, ..., yt ], as well as variances V ar [ln?t, ln ?t |y1, ..., yt ] . As in thestochastic volatility model, parameter inference can be performed with QML methods.A simpler representation of the state-space model can be obtained by redefining the statevariables in demeaned terms; that is,mt = ln?t?ln? and vt = ln ?t?ln ?, so that ?t = ? exp (mt)and ?t = ? exp (vt) , where ? = exp (E [ln?t]) and ? = exp (E [ln ?t]) . Finally, by rewriting Eqs.(2.33) and (2.34) in terms of new state variables, the state-space becomes standard in thesense that all the state variables are observed at time t. Let xt = [x1t, x2t, x3t, x4t, x5t]> =[mt?1, vt?1, ?t,mt, vt]> ; then equations (2.33) and (2.34) can be rewritten as:yt = ? exp (x1t) + ? exp (x2t)x3t, (2.36)andxt = A?xt?1 + ?wt with wt ? N (0,?) , (2.37)28whereA? =?????????0 0 0 1 00 0 0 0 10 0 0 0 00 0 0 a11 a120 0 0 a21 a22?????????, ? =?????????0 0 00 0 01 0 00 1 00 0 1?????????and? =????1 ???b11 ???b22???b11 b11 ??b11b22???b22 ??b11b22 b22???? ,where Corr [?t, ?t] = [??, ??]> . Clearly, h (xt) = ? exp (x1t) + ? exp (x2t)x3t and, as in thestochastic volatility model, the random noise of the observation equation, vt, is identically zero.As a result the variance of the observation equation is zero (R ? 0) .27 The transition equationis linear; i.e., g (xt) = A?xt.The filtering and parameter estimation problem can be solved using the results of Section2.3.2. In this case, the first and second moments that need to be calculated areyt+1|t = E [yt+1 |Yt ] = ?E [exp (x1t+1) |Yt ] + ?E [exp (x2t+1)x3t+1 |Yt ] , (2.38)andP yyt+1|t = V ar [yt+1 |Yt ] = E[y2t+1 |Yt]? y2t+1|t . (2.39)The covariance term involved in the Kalman gain is calculated asP xyt+1|t = cov????x1t, yt+1 |Ytx2t, yt+1 |Ytx3t, yt+1 |Yt???? = Pt+1|t ? E????? exp (x1t) |Yt? exp (x2t)x3t |Yt? exp (x2t) |Yt???? , (2.40)by applying Lemma 2.3.5. Closed-form expressions for the moments of Eqs. (2.38) - (2.40) canbe obtained from Proposition A.8.1 in Appendix A.8.Improved State IdentificationAs in the stochastic volatility example and following Brandt and Kang (2004), state identificationcan be improved by including additional observation equations to the filtering problem. An27The main purpose of this representation is to allow for a general correlation structure between the shocks ofthe observation equation and transition equations, which in standard state-space representations are assumed tobe zero.29alternative is to include the squared value of Eq. (2.36), and the vector of observables becomesyt =[? exp (x1t) + ? exp (x2t)x3t(? exp (x1t) + ? exp (x2t)x3t)2](2.41)with Eq. (2.37) as the transition equation of the state-space representation. Other equationsthat can be included are the third observation equation, given by the cube of Eq.(2.36) or theproduct of the current observed return with its lagged value. By including an extra observationequation in Eq. (2.41), we need to estimate its mean and all its covariances to have estimates ofyt+1|t , Pyyt+1|t and Pxyt+1|t , which can be done via the Taylor series.Monte Carlo Simulation ResultsAccuracy in state estimation is one of the most important features of any filtering technique.To test the accuracy of the filters with Taylor approximations in the model represented in Eqs.(2.36)-(2.37), I use Monte Carlo simulations. First, using the parameter estimates of Brandt andKang (2004) for Model A, I simulate a random path of returns and calculate the filtered seriesof both log-expected returns and log-volatilities using different orders of approximation and thetrue parameter values. The pure filtered series of log-expected returns and log-volatilities arealmost constant, showing an average correlation of 0.05%. To improve the state estimation, Ifilter the same series using the vector of observation equations (2.41). The results of a randomsimulation are shown in Figure 2.3. The filtered log-returns are shown in the top left (Figure2.3a) while the filtered log-volatilities are displayed in the top right (Figure 2.3b). In bothcases, the filtered series based on one observation equation provide almost constant estimates.However, the filtered series based on two observation equations provide a much more accurateestimate. The average correlation coefficient between the true states and the filtered series isabove 90%. As a robustness check, I include the filtered series from a model that has the cubedobservation equation. As shown in Figures 2.3a and 2.3b, the difference between the model withtwo observation equations and the model with three observation equations is indistinguishable.To test the accuracy of the filter with respect to the particle filter, based on a simulatedseries, I estimate the filtered series of the model with two observation equations using the trueparameter values and contrast the estimates with the ones obtained with the particle filter. Asshown in Figures 2.3a and 2.3b, both methods provide similar values for the filtered log-expectedreturns and log-volatilities.To get a sense of the accuracy of the Taylor approximations, I simulated a random path ofT = 792 using the parameter estimates by Brandt and Kang (2004). Using this simulation, Icalculated the value of the quasi-log likelihood function of the model in Eqs. (2.36) and (2.37)in the true vector of parameter values for up to fifteen orders of approximation. For comparisonpurposes, I include the quasi-likelihood function obtained with the UKF and the one constructed30with the exact moments. Figure 2.4 shows the results. The figure contains in the x-axis thedifferent orders of approximation, while the y-axis contains the values of the quasi-likelihoodfunction. In this case, a Taylor series approximation of order six, is necessary to get an estimateof the likelihood function close enough to the one estimated with the exact moments.As with the stochastic volatility model, it is a challenge to prove that the parameters of themodel are well identified. As an identification exercise and assuming ?? = ?? = 0, I simulate asample path of stock returns with T = 5000 using the parameter estimates from Brandt andKang (2004). The quasi-likelihood function is constructed with a degree of approximation ofM = 6. The function is evaluated numerically at the true parameter values, by varying oneparameter at a time and keeping the remaining fixed. The plot of the parameter value versusthe likelihood function is known as the likelihood contour. If the parameter is well identified,then the plot will have a concave shape and the maximum is achieved at the true parametervalue. If the shape of this function is constant or a straight line, then there is evidence ofmisidentification. The results are shown in Figure 2.5; the dashed lines represent the unknownparameter values used to generate the data. The concavity of the quasi-likelihood contoursis evidence that all the parameters are well identified and the maxima are achieved at valuesclose enough to the ones used to simulate the data. It is worth mentioning that the numberof observations, T , is important. By using a smaller number of observations, the correlationcoefficients may not be identified.To provide more evidence of the accuracy of QML estimates, I study four different assumptionsabout the correlation structure between return innovations and the conditional mean andvolatility innovations. Model A assumes that these innovations are uncorrelated; in Models Band C, the return innovations are allowed to be correlated either with the conditional mean orthe conditional volatility innovations, respectively; and lastly, Model D is unrestricted. For eachmodel, I simulate 500 independent samples of T = 792 monthly returns using the parametervalues obtained of Brandt and Kang (2004) and I obtain the QML parameter estimates fordifferent orders of approximation.28 I report the results only for M = 6, since Figure (2.4) showsthat an order of M = 6 is necessary to obtain an accurate likelihood.Table 2.7 presents the results for the four models. For each model, I show the parametervalues used to simulate the data as well as the sample mean and standard deviation of the QMLestimates. Overall, the parameter estimates show evidence of consistency. However, the smallnumber of simulations does not provide an accurate assessment of the asymptotic unbiasedness.In general, the standard deviations for b11, b22, ? and ? are relatively small compared to theoverall standard deviations of the other parameters. The correlation coefficients, ? and ??,28As in the stochastic volatility model, the starting point for the estimation exercise was the true vector ofparameter values. As a robustness check, I estimated numerically the models using random vectors as startingpoints and obtained similar results.31in general, are not identified.29 A common approach to correct this issue is to add anotherobservation equation, such as the squared value of returns, or to include a set of predictors inthe dynamics of the mean and volatility of returns (Brandt and Kang, 2004; Lundblad, 2007).Market Excess Returns: EstimationI study monthly returns on the value weighted CRSP index in excess of the one month Treasurybill rate from January 1946 through December 2011 (792 observations). The short rate is theyield of a one-month Treasury bill. Table 2.8 presents summary statistics of the data and Figure2.7 plots the time series of the market portfolio (top) and the short rate (bottom).Parameter EstimatesThe model in Eqs. (2.36)-(2.37) is estimated with QML methods using as starting point ofthe numerical optimization the parameter estimates of Brandt and Kang (2004). Table 2.9displays the results of the four models. Clearly, the contemporaneous correlation between theconditional mean and volatility, ?, is negative and statistically significant for all the models. Asa result, I strongly reject the hypothesis of the lack of contemporaneous relationship betweenthe conditional mean and the conditional volatility. In addition, the correlations betweenthe return innovations and the mean and volatility innovations (?? and ??) are negative andsignificant. These results are consistent with French, Schwert, and Stambaugh (1987), Campbelland Hentschel (1992a) and Brandt and Kang (2004). The parameter estimates for the transitionmatrix (a11, a12, a21, a22), are robust under the four specifications. However, the estimates ofthe standard deviation of the conditional mean and volatility (b11 and b22) differ between thespecifications in models A and B and the ones of models C and D.2.5.3 A Dynamic Stochastic General Equilibrium ModelIn this section I show how filtering with Taylor series facilitates quasi-likelihood-based inferencein dynamic equilibrium models. I describe how to use the filter with Taylor series to estimatethe structural parameters of the model, those characterizing preferences and technology, basedon macroeconomic variables measured with noise. Flury and Shephard (2011) suggest thatparticle filters are the only feasible approach to estimating parameters of dynamic stochasticgeneral equilibrium (DSGE) models. I show that filtering techniques based on Taylor series arean alternative approach for state and parameter estimation in a DSGE model without relyingon Monte Carlo filters. Finally, I illustrate the technique with a very simple real business cyclemodel.Likelihood-based inference is a useful tool to take dynamic equilibrium models to thedata (An and Schorfheide, 2007). However, most dynamic equilibrium models do not29The lack of identification of correlation coefficients is a common problem in the standard filtering applications;see Hamilton (1994) for details.32imply a likelihood function that can be evaluated analytically or numerically. To cir-cumvent this problem, the literature has used the approximated likelihood derived froma linearized version of the model, instead of the exact likelihood. But linearization de-pends on the accurate approximation of the solution of the model by a linear relation.This assumption is arguable. First, the impact of linearization is more problematic thanit appears. Fern?ndez-Villaverde, Rubio-Ram?rez, and Santos (2006) prove that second-orderapproximation errors in the solution of the model have first-order effects on the likelihoodfunction. Moreover, the error in the approximated likelihood gets compounded with the size ofthe sample. Period by period, small errors in the policy function accumulate at the same rate atwhich the sample size grows. Therefore, the likelihood implied by the linearized model divergesfrom the likelihood implied by the exact model. Fern?ndez-Villaverde and Rubio-Ram?rez (2007)document how those insights are quantitatively relevant to real-life applications.Filters based on Taylor approximations are an alternative approach for filtering and parameterestimation of DSGE models, as they integrate higher order approximations for solutions ofDSGE models, such as perturbations, with QML methods. In this section I introduce a simplemodel that illustrates this method. Based on Monte Carlo simulations, I illustrate the accuracyof the filter in both state and parameter estimation.The ModelIn this setup, it is assumed that there is a representative household maximizing its lifetimeutility given byE0[??t=0?tC1??t1? ?], ? ? (0, 1) , ? > 0, (2.42)where Ct is consumption at time t, ? is the subjective discount factor and ? is the risk aversionparameter.In this economy, there is a production sector, where the date t output flow Yt, is related todate t level of the capital stock, Kt, viaYt = AtK?t , (2.43)where At is an exogenous technological shock, given by a first-order autoregressive process; thatis,lnAt+1 = ? lnAt + ?t+1, ?t ? N(0, ?2A). (2.44)The stock of capital in the date t is related to the date t investment flow It and the capitaldepreciation rate ? via the standard capital accumulation equation,Kt+1 = (1? ?)Kt + It.33The aggregate resource constraint isCt = It + Yt.The central planner?s problem consists of choosing Ct and Kt that maximize the expected utilityof the formmax{Ct,Kt+1}?t=0E0[??t=0?tC1??t1? ?], ? ? (0, 1) , ? > 0 (2.45)subject toKt+1 + Ct ? AtK?t + (1? ?)Ktand (2.43) , for t = 0, 1, ...; K0, A0 given.Characterizing the SolutionThe first-order condition implied by (2.43), (2.44) and (2.45) yields to the following Eulerequation,C??t = ?Et[C??t+1(1? ? + ?At+1K??1t+1)], (2.46)whileKt+1 = AtK?t + (1? ?)Kt ? Ct, (2.47)lnAt+1 = ? lnAt + ?t+1, (2.48)are the constraints implied by the model. Equations (2.46)?(2.48) fully characterize the solutionof the optimization problem faced by the central planner.The solution to the system in (2.46) consists of finding policy functions pi and ? such thatCt = pi (Kt, At, ?)[Kt+1logAt+1]=[? (Kt, At, ?)log (At)]+ ?[0?A]?t+1,where ? is a perturbation parameter. As ?? 0, the dynamic system in (2.46)? (2.48) convergesto a point known as the non-stochastic steady state. However, the system of functional equationsimplied by the equilibrium conditions does not have, in general, an analytic solution. Anapproximate solution for the policy functions can be obtained via perturbation methods.30The main objective of these methods is to estimate values of the derivatives of pi and ? at thenon-stochastic steady state. For analytical convenience, the system is written in terms of the30See Judd (1998) and DeJong and Dave (2011) for a detailed explanation of perturbation methods in Economics.34log-deviations of the non-stochastic steady state. Letc?t = ln (Ct/Css)k?t = ln (Kt/Kss)a?t = ln (At) ,where Css and Kss are the non-stochastic steady state values for Ct and Kt, given by:Css = (Kss)? ? ?Kss, (2.49)Kss =[??1? ? (1? ?)] 11??. (2.50)The general idea of perturbation methods is to provide a Taylor expansion of the policyfunctions that characterize the equilibrium of the economy in terms of the state variables of themodel and a perturbation parameter, ?. In this case, I construct a second-order approximationfor c?t and k?t+1 of the formk?t+1 = ?kk?t + ?aa?t +12(?kkk?2t + 2?aka?tk?t + ?aaa?2t)+12????2, (2.51)c?t = pikk?t + piaa?t +12(pikkk?2t + 2piaka?tk?t + piaaa?2t)+12pi???2,anda?t+1 = ?a?t + ?t+1, (2.52)where ?k, ?a, ?kk, ?ak, ?aa, ???, pik, pia, pikk, piak, piaa and pi?? denote the first- and second-orderderivatives of the functions ? and pi, respectively.31 These derivatives are calculated numericallyas in Schmitt-Grohe and Uribe (2004).32State-Space Representation and ImplementationFollowing Flury and Shephard (2011), I assume that the econometrician observes only thedetrended real gross domestic product per capita, G?DP t,G?DP t = y?t + vy,t, vy,t ? N(0, ?2y), (2.53)where vy,t represents the measurement error. Now, from Eq. (2.43) we know that the log-GDPimplied by the model isy?t = ?k?t + a?t.31For simplicity, I apply second-order approximations to solve the model. However, the following results can begeneralized to the perturbation of any order.32I thank Stephanie Schmitt-Grohe and Martin Uribe for making their code available.35Additionally, I assume that k?t and a?t are the unobservable state variables and the econometricianwants to make inference of the state variables as well as the parameters based solely on theG?DP t observations available. Consequently, a nonlinear filtering problem has to be solved.In this case, Eqs. (2.51)? (2.53) define a state-space model of the form:G?DP t = ?k?t + a?t + vy,t, vy,t ? N(0, ?2y), (2.54)k?t+1 = ?kk?t + ?aa?t +12(?kkk?2t + 2?aka?tk?t + ?aaa?2t)+12????2,a?t+1 = ?a?t + ?t+1, ?t ? N(0, ?2A).The state variables of the filtering problem are represented by the vector xt =[k?t, a?t]>. Theobservation equation is linear; i.e., h (xt) ? h(k?t, a?t)= ?k? + a?t with a random noise, vy,t,with variance R ? ?2y . The transition equation is characterized by the nonlinear mapping g (xt)represented by the vector:g (xt) ? g(k?t, a?t)=???kk?t + ?aa?t + 12(?kkk?2t + 2?aka?tk?t + ?aaa?2t)+ 12????2?a?t?? . (2.55)The first component of the vector in Eq. (2.55) is a quadratic function of the state variables,while the second component of g (xt) is a linear function of a?t only. Finally, the variance thatcharacterizes the shock of the transition equation is defined by ?2A; i.e., Q ? ?2A.As the observation and transition equations of the model are polynomials, their expectedvalues will coincide with the expected values calculated with the Taylor series expansions, as longas the order of the polynomial is smaller than the order of the Taylor approximation. Accordingto Section 2.3.2 and Eqs. (2.5)? (2.7), the mean vector of state variables,xt+1|t =[E[k?t+1 |Yt]E [a?t+1 |Yt ]]>(2.56)is computed by applying the second-order Taylor series approximations of Eq. (2.55). Now,since the transition equation is quadratic, its variance requires fourth-order polynomials; as aresult, the variance of the transition equation is calculated exactly with a fourth-order Taylorseries, asPt+1|t =??V art[k?t+1 |Yt]covt[k?t+1, a?t+1 |Yt]covt[k?t+1, a?t+1 |Yt]V art [a?t+1 |Yt ] + ?2A?? , (2.57)where each of the components can be calculated using the results from Section 2.3.2. Theobservation equation is linear in the state variables; therefore, Eqs. (2.5)? (2.6) are given byyt+1|t = E [yt+1 |Yt ] = [?, 1] ? xt+1|t (2.58)36andP yyt+1|t = [?, 1] ? Pt+1|t ? [?, 1]> + ?2y . (2.59)Now, from Lemma 2.3.5 the covariance between the observation and transition equation isP xyt+1|t = Pt+1|t ? [?, 1]> . (2.60)Improved State and Parameter IdentificationFor inference purposes, a second observation equation can be included. A natural observable toinclude is consumption per capita, which is measured with noise,C?t = c?t + vc,t, vc,t ? N(0, ?2c), (2.61)wherec?t = pikk?t + piaa?t +12(pikkk?2t + 2piaka?tk?t + piaaa?2t)+12pi???2.In this case, the state-space model becomesG?DP t = ?k?t + a?t + vy,t, vy,t ? N(0, ?2y), (2.62)C?t = pikk?t + piaa?t +12(pikkk?2t + 2piaka?tk?t + piaaa?2t)+12pi???2 + vc,t, vc,t ? N(0, ?2c),k?t+1 = ?kk?t + ?aa?t +12(?kkk?2t + 2?aka?tk?t + ?aaa?2t)+12????2,a?t+1 = ?a?t + ?t+1, ?t ? N(0, ?2A).Clearly, the state-space in Eq. (2.62) is nonlinear in both the state and observation equations.In this case, the filters with Taylor expansions can be applied for both state and parameterestimation. For parameter estimation, we can obtain more information and achieve betteridentification from the second observation equation. In this case, the parameters ? or ? affectexplicitly the consumption decision and can be inferred from the new observable.EstimationFor parameter estimation, the task for the econometrician is to construct a quasi-likelihoodfunction and carry out inference on ? ? (?, ?, ?, ?, ?, ?A, ?y, ?c) . However, the construction ofthe quasi-likelihood requires slightly more numerical work than the previous exercises. Thequasi-likelihood function L (?) is constructed as follows. Given a vector of parameter values ?, Icalculate the non-stochastic steady state of the system, Css and Kss, based on Eqs. (2.49)?(2.50).Given these values, I solve the model in (2.51) using perturbation methods. As a result, I obtainvalues for the derivatives of the policy functions ? and pi. Finally, I calculate the moments(2.56)? (2.60) and obtain the value of the quasi-likelihood function at ?, using Algorithm (2.3.6).37Monte Carlo Simulation ResultsIn this section, I test the filter using Monte Carlo methods. First, I simulate a sample pathof size T = 500 using the parameter values of Table 2.10. Then, using an order of M = 4,33 Iestimated the filtered series of k?t and a?t using two state-space models and the true parametervalues. The first representation is in Eqs. (2.54) and the second in Eqs. (2.62). The results areshown in figure 2.8. We learn in this case, that the filtered series under both models providesimilar results; indeed, the correlation coefficient between both series is above 90%. To assessthe accuracy of both filters, I repeat the same exercise 500 times and keep track of the MSEsof the state variables and their simulated counterparts, as well as for the simulated GDP andconsumption series. The results are displayed in Tables 2.11 and 2.12. Clearly, in this example,the addition of a second observation equation does not improve the identification of the statevariables, since the average MSEs are higher in the second state-space representation.For parameter estimation the results are different. As in the previous models, basic identifi-cation exercises are performed. Sample paths of log-investment (k?t) and random shocks (a?t) aresimulated with T = 500 using the parameters of Table 2.10. As in the filtering exercise, thedegree of approximation of the quasi-likelihood function is M = 4. For each simulation, thelikelihood function of each of the state-space representations is evaluated numerically in the truevector of parameter values except the one shown on the x-axis. The results are shown in Figure2.9. The two figures on the left represent the quasi-likelihood function of the first state-spacerepresentation, while the other figures were calculated with the second.We learn from these exercises that the quasi-likelihood functions are continuous with respectto the parameter values. This is an important advantage with respect to particle filters, sincestandard numerical optimization methods can be applied for statistical inference. In addition, Ianalyze the identification of the subjective discount factor, ?, and the capital share, ?. Fluryand Shephard (2011) present evidence of misidentification of these parameter values. However,by adding a second observation equation, we learn that we can identify both parameter values,as the quasi-likelihood function presents a concave shape. Moreover, the maximum is achievedin a point very close to the true parameter value, which is represented by the vertical dottedlines.Finally, to provide more evidence of the accuracy of QML estimates, I simulate 100 inde-pendent samples of T = 100 monthly returns with the parameters of Table 2.10. Based on thesecond-order approximations of Schmitt-Grohe and Uribe (2004), I construct a quasi-likelihoodfunction for each of the two state-space representations with an order of approximation ofM = 4.For each simulated series I maximize these quasi-likelihood functions using the true vector of33The choice of M = 4 is due to the second-order approximation used to solve the policy function and the factthat these filtering recursions involved first and second moments.38parameter values as starting point and obtain parameter estimates.34 The results are shown inTable 2.13. The average estimate and the standard deviation of the estimates constructed withthe simulated series is reported in the table. In general, the standard deviation of the parameterestimates of the second model is smaller than the standard deviation of the first set of parameterestimates. Therefore, I confirm that including consumption as a second observation equationhelps to better identify the set of parameter values. Although the number of observations isrelatively small, in almost all cases the true parameter value differs from the average estimate byless than two sample standard deviations. Finally, in the second model, the standard deviationof the correlation coefficient estimates is relatively high. This finding is consistent with theresults of Flury and Shephard (2011), that show that the correlation coefficient estimates are notwell identified. Estimating the DSGE model is done in a spirit of demonstrating the workingsand capabilities of the Kalman filter with Taylor series rather than gaining any new insight onmodel parameters.2.6 Robustness ChecksIn this section I discuss robustness checks for the filtering method with Taylor series approxima-tions. In the first robustness check, I test the filter in one state-space representation commonlyused in the nonlinear filtering literature to illustrate the operation of the filter when systems arehighly nonlinear. As a second robustness check, I study the effect of dimensionality in the filter,by studying a multivariate stochastic volatility model recently studied in Chib, Nardari, andShephard (2006) and Chib, Omori, and Asai (2009).2.6.1 Highly Nonlinear SystemsThe first example is a benchmark commonly used in the nonlinear filtering literature andcorresponds to the univariate nonstationary growth model from Gordon, Salmond, and Smith(1993). The following nonlinear model is considered:xt = 0.5xt?1 +25xt?1(1 + x2t?1) + 8 cos (1.2 (t? 1)) + wt, (2.63)andyt =x2t20+ vt, (2.64)where wt and vt are zero-mean Gaussian white noise with variances 10 and 1, respectively.34As a robustness check, the quasi-likelihood function was maximized numerically using different initial pointsrandomly chosen from the parameter space. Although the numerical results are similar to the ones obtained usingthe true value as initial point, the correlation coefficient (?) is not well identified in a few of cases, as the estimateis a corner solution. As documented by Rytchkov (2012), this issue can be corrected by including additionalobservation equations or increasing the sample size.39State-Space Representation and ImplementationEqs. (2.63) and (2.64) define a state-space model with state variable xt. Both, the observationand transition equations are clearly nonlinear. The observation equation is quadratic in xt; i.e.,h (xt) =x2t20 , with a random noise vt with variance R ? 10. The transition equation is nonlinearas well, defined by the functiong (x) = 0.5x+25x(1 + x2), (2.65)with variance Q ? 1. The deterministic function in Eq. (2.63) is measurable at time t and isincorporated in the filtering exercise as a known constant at the beginning of each time step.To apply the Taylor series approximation, it is necessary to obtain the Taylor series expansionof the functions h and g. However, a major complication occurs with the Taylor series of g,since its Taylor series does not converge uniformly in the entire domain. As a result, Eq. (2.12)does not hold. An alternative approach consists of applying Lemma 2.6.1; and obtain a powerseries functions x/(1 + x2), x/(1 + x2)2 and x2/(1 + x2)2 in terms of ??1?x2? . With this newrepresentation, we can estimate the mean, variance and covariance necessary for the filteringrecursions.Lemma 2.6.1 Let x ? R, |x| 12 ; thenx(1 + x2)=??j=0x?(?? 1? x2?)j, (2.66)x2(1 + x2)=??j=0x2?(?? 1? x2?)j(2.67)andx2(1 + x2)2=??j=0j(x?)2(?? 1? x2?)j?1. (2.68)Proof See Appendix A.2.From Lemma 2.6.1, we learn that to estimate the mean and variance of Eq. (2.65) we needto estimate moments of the formE??x?(?? 1? x2?)j?? , j = 0, 1, 2, ...ME??x2?(?? 1? x2?)j?? , j = 0, 1, 2, ...M40andE??(x?)2(?? 1? x2?)j?1?? , j = 1, 2, ...MThese moments can be calculated directly from the results of the first section for an orderof M = 20. Once the moments are calculated, ? has to be large enough to guarantee that theestimation results converge.Monte Carlo Simulation ResultsFollowing Gordon, Salmond, and Smith (1993), I assume that the initial state is x0 = 0.1 andsimulate a realization of Eq. (2.63) of 50 time steps. The filters are initialized with a priordistribution N (0, 2) . Figure 2.10a compares the filter with Taylor series approximations withthe true state estimates. The true state is represented by a star, the mean of the filter withTaylor series is given as a solid line and the dashed lines give the 95% probability interval.35In this case, none of the simulations exceeded the 95% intervals. Figure 2.10b compares thesame simulated series with the results obtained from the filter with Taylor series as well as theones obtained with the particle filter constructed with 1000 particles. The solid line gives thefiltered series based on the Taylor approximations, while the dashed line shows the estimate ofthe particle filter. For this specific case, the correlation coefficient between the filtered series isabove 90%.2.6.2 Multivariate Stochastic Volatility ModelsAs a robustness check to high-dimensional systems, I consider a multivariate stochastic volatilitymodel.36 Consider a p ? 1 vector of asset log-returns yt = (y1t, ..., ypt)> with a constantmean vector ? = (?1, ..., ?p)> and stochastic time-varying variance covariance matrix Vt. Morespecifically, the vector yt is specified asyt |ht ? N (?, Vt) , t = 1, ..., n, (2.69)where ht is a scalar or a vector stochastic process. The variance matrix Vt is a function of ht. Inthis case, the latent variables ht are modeled as an autoregressive process of order one. It isalso assumed that the variance?covariance matrix can be decomposed by Vt = D?2tD>, wherethe matrix D is a lower unity triangular matrix. This is the Cholesky decomposition with adiagonal matrix ?2t that is time-varying according to the stochastic process ht. In this case boththe variances and correlations implied by Vt are time-varying. The resulting multivariate SV35This interval is constructed using the 2.5 and 97.5 percentiles of the normal distribution.36See Chib, Omori, and Asai (2009) and Asai, McAleer, and Yu (2006) for comprehensive reviews of multivariatestochastic volatility models.41model based on this decomposition is given byyt = ?+D?t?t, ?t ? N (0, I) , (2.70)ht+1 = ? + ?ht + ?t, ?t ? N (0,??) ,where ?t = exp(12diag (ht))and D can be regarded as a nonsingular load matrix. The modelis a special case of the class of multivariate SV models that was originally proposed by Harvey,Ruiz, and Shephard (1994) and Shephard (1996) and further extended by Chib, Nardari, andShephard (2006).State-Space Representation and ImplementationFor easiness of exposition, I assume that yt is a two-dimensional vector of demeaned returns.37As a result, the representation is of the form[y1ty2t]=[D11 0D21 D22]??exp(h1t2)?1texp(h2t2)?2t?? ,[?1t?2t]? N (0, I2) , (2.71)where[h1t+1h2t+1]=[?11 00 ?22] [h1th2t]+[?1t?2t],[?1t?2t]? N (0,??) . (2.72)For simplification purposes, I represent the system in Eqs. (2.71) and (2.72) in terms of thevector of state variables, xt = [x1t, x2t, x3t, x4t, x5t, x6t]> = [h1t, ?1t, ?1t, h2t, ?2t, ?2t] . Therefore,the observation equations become[y1ty2t]=[D11 0D21 D22] [exp(x1t2)x3texp(x4t2)x6t]. (2.73)The four-dimensional vector of state variables evolves according to the following:????????????x1tx2tx3tx4tx5tx6t????????????=?????????????11 1 0 0 0 00 0 0 0 0 00 0 0 0 0 00 0 0 ?22 1 00 0 0 0 0 00 0 0 0 0 0????????????????????????x1t?1x2t?1x3t?1x4t?1x5t?1x6t?1????????????+ ?????????1t?1t?2t?2t???????,????????1t?1t?2t?2t???????? N (0, Q) , (2.74)where37An observational equivalent model is obtained by setting ? = 0 and D as a lower triangular matrix withnonzero values on its leading diagonal.42Q =????????11 0 ?12 00 1 0 0?12 0 ?22 00 0 0 1???????and?> =???????0 1 0 0 0 00 0 1 0 0 00 0 0 0 1 00 0 0 0 0 1???????.Clearly, from Eq. (2.74), we learn that the system has a linear transition equation; i.e.,g(xt) = ? ? xt, with covariance matrix Q. In this case, the noise vector of the observationequation is assumed to be identically zero; therefore, the covariance of the observation matrixis zero (i.e., R ? 0). However, this representation is flexible enough and can be extended to amore general correlation structure between the innovations of the state and transition equations.The vector of observation equations, h (xt) = [h1 (xt) , h2 (xt)]> , is represented by Eqs. (2.73).Although, the moments involved in the filtering recursions have closed-form expressions, theTaylor approximations can be implemented.As in the one-dimensional case, by modeling the product of the observation Eqs. (2.73),we can perform better moment identification. As a result, I include three more observationequations to the vector of observables:y3t = y21t = D211e2x1tx23t, (2.75)y4t = y22t = (D21ex1tx3t +D22ex2tx4t)2 ,y5t = y1t ? y2t = D11(D21e2x1tx23t +D22ex1t+x2tx3tx4t).The first two equations in 2.75 follow from the one-dimensional case. Moreover, I includethe third observation equation for robustness. The overall system has five nonlinear observationequations, given by Eqs. (2.73) and (2.75), and six state variables that satisfy the law ofmotion in Eq. (2.74). The filtering exercise is performed with an order of approximationM = 8. However, closed-form expressions for means and covariances can be calculated directlyby applying the results shown in Appendix 2.3.2.Monte Carlo Simulation ResultsTo test the quality of the approximation, I simulate a random path of log-volatilities and shockswith T = 100, using the parameters from Jungbacker and Koopman (2006) presented in Table2.14. The results are shown in Figures 2.11a and 2.11b. We learn from this implementationthat the filter with Taylor series provides similar estimates to the particle filter. Surprisingly,43the filter with Taylor series provides results as accurate as the standard particle filter. For thisrandom simulation, the correlation between the filtered series with Taylor expansions and theones with particle filters is above 90%.2.7 Concluding RemarksIn this chapter, I propose a nonlinear filter based on Taylor series approximations, which is basedon efficient calculation of derivatives and can be applied for state and parameter estimation.My results suggest that filtering methods via the Taylor series filter are superior to conventionalmethods such as the EKF or the UKF. I also find that the filter with Taylor approximations isas accurate as the standard particle filter and at least twenty times faster than conventionalMonte Carlo filters.I test the filter in a number of models such as univariate and multivariate stochastic volatilitymodels, a risk-return model, a dynamic stochastic general equilibrium model and a highlynonlinear filter. My findings suggest that the filter with Taylor approximations provides accurateresults for both state and parameter estimation. The filter provides accurate estimates in high-dimensional and nonlinear systems. The time efficiency is comparable to the low-dimensionalcase, in which some applications are at least twenty times faster than standard particle filters.I also find that by adding more observation equations to the state-space model, such asthe squared value of the current observation equation, the unobserved states can be betteridentified. With this augmented state-space model, the filtered estimates are comparable withthose obtained via standard particle filters. The filter with Taylor series can be applied forinference purposes since a quasi-likelihood function is obtained with the filtering recursions. Inall the examples in this chapter, I find continuous quasi-likelihood functions with respect to theparameter values and conventional methods for numerical optimization can be applied for theirestimation. The efficiency of the state estimation can be used for parameter estimation.Finally, the filter with Taylor series approximations can be applied to a more general class ofmodels by combining its analytical tractability with Monte Carlo methods, via Rao-Blackwellisedfilters. By combining these techniques, the density of the filter with Taylor approximations canprovide accurate estimates of the true density of the filter, up to a normalization constant.Although the filters provide accurate results for a number problems, some care shouldbe taken in the modeling and implementation. Some limitations arise when the function tobe approximated is non-differentiable or the Taylor series approximations are not uniformlyconvergent. One approach to circumvent this issue is by taking the Taylor series in anothercenter of expansion or by changing the scale of the state variables. The results can be extendedto non-differentiable functions, as long as the function can be approximated with a power series.Another shortcoming is that significant work has to be done to include the correct observationequation within the estimation process.442.8 Figures and Tables0 50 100 150 200 250 300 350 400 450 500?4?3?2?101234tLog?volatility: s t True StateTaylor series Filter: One Equation95% Confidence Intervals(a) Stochastic Volatility Model 10 50 100 150 200 250 300 350 400 450 500?4?3?2?101234tLog?volatility: s t True StateTaylor series Filter: Two Equations95% Confidence Intervals(b) Stochastic Volatility Model 20 50 100 150 200 250 300 350 400 450 500?4?3?2?101234tLog?volatility: s t True StateTaylor series Filter: One EquationTaylor series Filter: Two Equations(c) Stochastic Volatility Models 1 and 20 50 100 150 200 250 300 350 400 450 500?4?3?2?101234tLog?volatility: s t True StateParticle FilterTaylor series Filter: Two Equations(d) Stochastic Volatility Model 2:ComparisonFigure 2.1. Stochastic Volatility Model: Filter Performance.This figure compares a simulated time series of 500 observations for the standard stochastic volatilitymodel with its filtered values. The filtered estimates in the two figures on top were calculated using afifth-order of approximation and an infinite order of approximation (Gaussian filters). The parametervalues used for the simulation as well as for the filtered estimates are ? = 0.98, ?? = 0.1414, ? = 1 and? = ?0.5.450.7 0.75 0.8 0.85 0.9 0.95 1?1000?950?900?850?800?750? EKFUKFFifth?OrderGaussian(a) Stochastic Volatility Model: ?0 0.05 0.1 0.15 0.2 0.25 0.3?1100?1050?1000?950?900?850?800?750?700?? EKFUKFFifth?OrderGaussian(b) Stochastic Volatility Model: ??0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5?1400?1300?1200?1100?1000?900?800? EKFUKFFifth?OrderGaussian(c) Stochastic Volatility Model: ??1 ?0.5 0 0.5?1000?950?900?850?800?750? EKFUKFFifth?OrderGaussian(d) Stochastic Volatility Model: ?Figure 2.2. Stochastic Volatility Model: Quasi-Likelihood ContoursThis figure plots the quasi-likelihood function of a standard stochastic volatility model for different setsof parameter vectors. The plots show the quasi-log-likelihood function of the data for different values of? (top left), ?? (top right), ? (bottom right) and ? (bottom left). The vertical dashed lines represent theparameter values that were used to simulate the data.460 10 20 30 40 50 60 70 80 90 100?0.8?0.6?0.4?0.200.20.40.6tx 1t True StateTaylor series Filter: One EquationTaylor series Filter: Two EquationsTaylor series Filter: Three Equations(a) Different Observation Equations:log-Expected Returns0 10 20 30 40 50 60 70 80 90 100?1?0.500.511.5tx 2t True StateTaylor series Filter: One EquationTaylor series Filter: Two EquationsTaylor series Filter: Three Equations(b) Different Observation Equations:log-Volatility0 10 20 30 40 50 60 70 80 90 100?0.8?0.6?0.4?0.200.20.40.6tx 1t True StateTaylor series FilterParticle Filter(c) Risk-Return Model: log-ExpectedReturns0 10 20 30 40 50 60 70 80 90 100?1?0.500.511.5tx 2t True StateTaylor series FilterParticle Filter(d) Risk-Return Model: log-VolatilityFigure 2.3. Risk-Return Model: Filter PerformanceThis figure plots the filtered series of a random draw of T = 100 returns simulated from the model byBrandt and Kang (2004). The parameters used for the simulation are a11 = 0.8589, a21 = ?0.0531,a12 = 0.1081, a22 = 0.9237,b11 = 0.0076,b22 = 0.0554 and ? = ?0.6345.470 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151250130013501400145015001550Degree of Approximation (M)Q?Log?lik Taylor series FilterGaussian FilterUKFFigure 2.4. Risk-Return Model: Order of ApproximationThis figure plots the quasi-likelihood function of a random simulation of the model by Brandt and Kang(2004) evaluated with the parameter values a11 = 0.8589, a21 = ?0.0531, a12 = 0.1081, a22 = 0.9237,b11 = 0.0076, b22 = 0.0554 and ? = ?0.6345. The plot displays the values of the quasi-likelihood functionfor different orders of approximation, M = 1, 2, ..., 15 (asterisks). The continuous lines represent thequasi-likelihood functions constructed with the UKF and the Gaussian filters.480 0.2 0.4 0.6 0.8 1838083858390839584008405841084158420a11 Q?Lik(a11)a11 = 0.8589(a) Risk-Return Model: a110 0.2 0.4 0.6 0.8 18000805081008150820082508300835084008450a12 Q?Lik(a12)a12 = 0.1081(b) Risk-Return Model: a12?0.5 ?0.4 ?0.3 ?0.2 ?0.1 0 0.1837083808390840084108420a21 Q?Lik(a21)a21 = ?0.0531(c) Risk-Return Model: a210 0.2 0.4 0.6 0.8 1780079008000810082008300840085008600a22 Q?Lik(a22)a22 = 0.9237(d) Risk-Return Model: a22Figure 2.5. Risk-Return Model: Quasi-Likelihood ContoursThis figure plots the quasi-likelihood function of a random draw of T = 5000 returns simulated fromthe model by Brandt and Kang (2004). The parameters used for the simulation are a11 = 0.8589,a21 = ?0.0531, a12 = 0.1081, a22 = 0.9237, b11 = 0.0076, b22 = 0.0554 and ? = ?0.6345.490 0.02 0.04 0.06 0.08 0.1832083408360838084008420b11 Q?Lik(b11)b11 = 0.0076(a) Risk-Return Model: b110 0.02 0.04 0.06 0.08 0.178007900800081008200830084008500b22 Q?Lik(b22)b22 = 0.0554(b) Risk-Return Model: b22?1 ?0.8 ?0.6 ?0.4 ?0.2 0 0.2 0.4 0.6 0.88390839584008405841084158420? Q?Lik(?)? = ?0.6345(c) Risk-Return Model: ?Figure 2.6. Risk-Return Model: Quasi-Likelihood Contours (Cont.)This figure plots the quasi-likelihood function of a random draw of T = 5000 returns simulated fromthe model by Brandt and Kang (2004). The parameters used for the simulation are a11 = 0.8589,a21 = ?0.0531, a12 = 0.1081, a22 = 0.9237, b11 = 0.0076, b22 = 0.0554 and ? = ?0.6345.501950 1960 1970 1980 1990 2000 2010?0.3?0.25?0.2?0.15?0.1?0.0500.050.10.150.2YearReturn(a) Market Returns1950 1960 1970 1980 1990 2000 201000.020.040.060.080.10.120.140.160.18YearShort Rate(b) Short RateFigure 2.7. Risk-Return Model: DataThis figure plots the monthly returns on the value weighted CRSP index as well as the short rate fromJanuary 1946 through December 2011.510 50 100 150 200 250 300 350 400 450 500?4?3?2?101234Time Taylor series Filter: One EquationTaylor series Filter: Two EquationsTrue kt = ln(Kt/Kss)(a) Filtered Log-Investment (kt = ln(KtKss))0 50 100 150 200 250?2?1.5?1?0.500.511.52Time Taylor series Filter: One EquationTaylor series Filter: Two EquationsTrue at = ln(At)(b) Filtered Log-TFP Shock (at = ln (At))Figure 2.8. DSGE Model: Filter PerformanceThis figure plots the filtered estimates of the state variables evaluated in a simulated sample path of sizeT = 500 with the parameter values ? = 0.95, ? = 0.15, ? = 0.30, ? = 0.90,? = 3,?y = 0.30 and ?? = 0.2.520.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1?36?35.5?35?34.5? Q?Lik (?)? = 0.95(a) QL function ?: Model 10.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1?55?50?45?40? Q?Lik (?)? = 0.95(b) QL function ?: Model 20 0.1 0.2 0.3 0.4 0.5?120?100?80?60?40?20? Q?Lik (?)? = 0.30(c) QL function ?: Model 10 0.1 0.2 0.3 0.4 0.5?80?75?70?65?60?55?50?45?40?35? Q?Lik (?)? = 0.30(d) QL function ?: Model 2Figure 2.9. DSGE Model: Quasi-Likelihood ContoursThis figure plots the quasi-likelihood function evaluated at a sample path of size T = 500 with parametervalues ? = 0.95, ? = 0.15, ? = 0.30, ? = 0.90,? = 3,?y = 0.30 and ?? = 0.2.530 5 10 15 20 25 30 35 40 45 50?40?30?20?100102030tx t True StateTaylor series Filter95% Confidence Intervals(a) Filter Performance0 5 10 15 20 25 30 35 40 45 50?30?20?100102030tx t True StateTaylor series FilterParticle Filter(b) Filter ComparisonFigure 2.10. Nonlinear Model: Filter PerformanceThis figure plots the filtered series of the nonlinear model by Gordon, Salmond, and Smith (1993).540 10 20 30 40 50 60 70 80 90 100?1?0.500.511.522.5th 1t True stateTaylor series FilterParticle Filter(a) Filter Performance: first log-volatility process (h1,t)0 10 20 30 40 50 60 70 80 90 100?1?0.500.511.522.5th 2t True stateTaylor series FilterParticle Filter(b) Filter Performance: second log-volatility process (h2,t)Figure 2.11. Multivariate Stochastic Volatility Model: Filter PerformanceThis figure plots the filtered series of the multivariate version of the stochastic volatility model by Chib,Nardari, and Shephard (2006).55Table 2.1. Stochastic Volatility Model (One Observation Equation):Simulation ResultsFiltering MethodMSE (log ?2t ) MSE (yt)Average Std. Dev. Average Std. Dev.EKF 0.3450 0.1529 1.3100 0.4688UKF 0.3430 0.1406 1.3122 0.4626TKF (2) 0.3321 0.1289 1.3108 0.4630TKF (3) 0.3321 0.1289 1.3108 0.4630TKF (4) 0.3366 0.1400 1.3111 0.4630TKF (5) 0.3366 0.1400 1.3111 0.4630TKF (6) 0.3412 0.1436 1.3111 0.4630TKF (7) 0.3412 0.1436 1.3111 0.4630TKF (8) 0.3424 0.1444 1.3111 0.4630TKF (9) 0.3424 0.1444 1.3111 0.4630TKF (10) 0.3426 0.1445 1.3111 0.4630TKF (11) 0.3426 0.1445 1.3111 0.4630TKF (12) 0.3426 0.1445 1.3111 0.4630Gaussian Filter 0.3426 0.1445 1.3111 0.4630Particle Filter 0.1691 0.0416 0.7439 0.2597Monte Carlo simulation results. This table presents the mean and variance of the MSEs betweenthe simulated and the filtered values of the log-volatility (log ?2t ) process and the observable (yt) of thestandard stochastic volatility model:yt = ?t ? ?tlog ?2t = (1? ?) log ?2 + ? log ?2t?1 + ?t, ?t ? N(0, ?2?),The results are based on 500 independent samples of T = 500 simulated from the model with theparameters ? = 0.98, ?? = 0.1414, ? = 1 and ? = ?0.5 .56Table 2.2. Stochastic Volatility Model (Two Observation Equations):Simulation ResultsFiltering MethodMSE (log ?2t ) MSE (yt)Average Std. Dev. Average Std. Dev.EKF 0.3649 0.1654 1.2693 0.4077UKF 0.2675 0.1022 1.3073 0.4651TKF (2) 0.6084 0.4301 1.3314 0.5509TKF (3) 0.6084 0.4301 1.3314 0.5509TKF (4) 0.4047 0.3116 1.3120 0.4718TKF (5) 0.4047 0.3116 1.3120 0.4718TKF (6) 0.1805 0.0450 1.3073 0.4647TKF (7) 0.1805 0.0450 1.3073 0.4647TKF (8) 0.1658 0.0376 1.3073 0.4647TKF (9) 0.1658 0.0376 1.3073 0.4647TKF (10) 0.1636 0.0367 1.3073 0.4647TKF (11) 0.1636 0.0367 1.3073 0.4647TKF (12) 0.1632 0.0365 1.3073 0.4647Gaussian Filter 0.1632 0.0365 1.3073 0.4647Particle Filter 0.1691 0.0416 0.7439 0.2597Monte Carlo simulation results. This table presents the mean and variance of the MSEs betweenthe simulated and the filtered values of the log-volatility (log ?2t ) process and the observables (y1,t, y2,t)of the standard stochastic volatility model:y1,t = ?t ? ?ty2,t = ?2t ? ?2tlog ?2t = (1? ?) log ?2 + ? log ?2t?1 + ?t, ?t ? N(0, ?2?),The results are based on 500 independent samples of T = 500 simulated from the model with theparameters ? = 0.98, ?? = 0.1414, ? = 1 and ? = ?0.5 .57Table 2.3. Stochastic Volatility Model: CPU TimeFiltering MethodModel 1 Model 2Average Std. Dev. Average Std. Dev.EKF 0.0292 0.0049 0.0827 0.0126UKF 0.0848 0.0076 0.1110 0.0075TKF (2) 0.1664 0.0152 0.3234 0.0221TKF (3) 0.2577 0.0246 0.4306 0.0342TKF (4) 0.3804 0.0209 0.5483 0.0292TKF (5) 0.6027 0.0785 0.7828 0.1000TKF (6) 0.8498 0.0650 1.0252 0.0650TKF (7) 1.1805 0.0519 1.3624 0.0534TKF (8) 1.6703 0.1242 1.8646 0.1449TKF (9) 2.6574 0.2697 2.9169 0.3152TKF (10) 3.1050 0.2253 3.3022 0.2256TKF (11) 3.5978 0.0939 3.7845 0.0983TKF (12) 4.4738 0.0819 4.6712 0.0888Gaussian Filter 0.0318 0.0028 0.0412 0.0037Particle Filter 17.8776 0.2131 17.8776 0.2131CPU time. This table presents the CPU time that a standard computer takes in seconds to computethe filtered values of the log-volatility (log ?2t ) process. The second and third columns contain the meanand standard deviation of the CPU time in seconds for a model with one observable. The fourth andfifth column are the mean and standard deviation of the CPU time in seconds for a model with twoobservables. The results are based on 500 independent samples of T = 500 simulated from the modelwith the parameters ? = 0.98, ?? = 0.1414, ? = 1 and ? = ?0.5 .58Table 2.4. Stochastic Volatility Model: Quasi-Maximum LikelihoodEstimation ResultsFiltering MethodParameters? ?? ? ?0.9800 0.1414 1.0000 -0.5000EKF0.9348 0.1491 1.2402 -0.3598(0.1454) (0.0701) (0.4162) (0.4479)UKF0.9287 0.1614 1.2926 -0.4383(0.1258) (0.1050) (0.4539) (0.4656)TKF (3)0.9241 0.1601 1.2926 -0.4805(0.1698) (0.1012) (0.4277) (0.4663)TKF (5)0.9292 0.1753 0.966 -0.4663(0.2085) (0.1030) (0.3975) (0.4353)TKF (7)0.9564 0.1781 0.8502 -0.4852(0.0693) (0.1035) (0.4103) (0.4091)TKF (9)0.9405 0.1907 0.8358 -0.4418(0.1629) (0.1013) (0.5483) (0.3999)Gaussian Filter0.9550 0.1542 0.9147 -0.5393(0.0820) (0.0899) (0.4263) (0.3840)Finite sample properties of the QML estimator. This table presents the sample mean andstandard deviation in parentheses of the QML estimates of the model:yt = ?t ? ?tlog ?2t = (1? ?) log ?2 + ? log ?2t?1 + ?t, ?t ? N(0, ?2?),The results are based on 250 independent samples of T = 500 simulated from the model with parametersin the first row.59Table 2.5. Stochastic Volatility Model: Descriptive StatisticsMonthly Consumption GrowthMean 0.0013Std. Dev. 0.0036Max 0.0140Min -0.0191Median 0.0014Skewness -0.4148Kurtosis 5.6120Autocorrelation1-month -0.18696-month 0.068812-month -0.024224-month -0.0955Descriptive statistics. This table presents descriptive statistics of monthly log-consumption growthon the monthly real consumption series per capita for nondurables and services from January 1959 toMarch 2012. The series was obtained from the Real-Time Data Set for Macroeconomists from the FederalReserve Bank of Philadelphia.60Table 2.6. Stochastic Volatility Model: Parameter EstimatesFiltering MethodParameters? ?? ? ? ?CEKF0.9575 0.2997 0.1591 0.9991 0.0014(0.0862) (0.8654) (0.2196) (2.5046) (0.0002)UKF0.9623 0.0198 0.0035 0.9983 0.0012(0.0616) (0.0508) (0.0012) (0.0004) (0.0027)TKF (3)0.9623 0.0221 0.0035 0.8923 0.0012(0.0287) (0.0218) (0.0007) (0.0005) (0.0012)TKF (5)0.9626 0.0266 0.0035 0.7436 0.0012(0.0724) (0.0465) (0.0006) (0.0007) (0.0014)TKF (7)0.9627 0.0216 0.0035 0.9179 0.0012(0.0203) (0.0332) (0.0006) (0.0045) (0.0018)TKF (9)0.9627 0.0228 0.0035 0.8683 0.0012(0.0446) (0.0297) (0.0005) (0.0003) (0.0011)TKF (11)0.9626 0.0336 0.0035 0.5867 0.0012(0.0204) (0.0146) (0.0002) (0.0008) (0.0002)Gaussian Filter0.9626 0.0353 0.0035 0.5574 0.0012(0.0205) (0.0160) (0.0001) (0.0006) (0.0002)Estimation results. This table presents the QML estimates of the model? ln(Ct+1)? ?c = ? exp(st2)? ?t+1st+1 = ?st + ?t+1, ?t ? N(0, ?2?),The estimates are for monthly real consumption growth on the monthly series from the Federal ReserveBank of Philadelphia from January 1959 to March 2012. Each row contains the estimates underthe filtering techniques based on different orders of approximation. Standard errors are reported inparentheses.61Table 2.7. Risk-Return Model: Quasi-Maximum Likelihood EstimationResultsParametersModel A Model BTrueAverageStd. TrueAverageStd.Value Dev. Value Dev.a11 0.8589 0.9111 0.1158 0.8313 0.8277 0.2100a21 -0.0529 -0.0099 0.0983 -0.0211 -0.0377 0.2350a12 0.1084 0.3474 0.3015 0.1168 0.3771 0.3617a22 0.9226 0.8792 0.1273 0.9110 0.8181 0.2125b11 0.0076 0.0033 0.0070 0.0064 0.0031 0.0081b22 0.0553 0.0347 0.0375 0.0561 0.0812 0.0112? -0.6336 0.1687 0.8037 -0.4577 -0.0018 0.5760? 0.0067 0.0067 0.0016 0.0065 0.0067 0.0015? 0.0418 0.0523 0.0046 0.0385 0.0524 0.0045?? - - - -0.0866 0.1154 0.7475?? - - - - - -ParametersModel C Model DTrueAverageStd. TrueAverageStd.Value Dev. Value Dev.a11 0.8658 0.7841 0.2581 0.8677 0.8037 0.2501a21 -0.0885 -0.0259 0.2425 -0.1292 -0.1065 0.2374a12 0.0861 0.3229 0.2872 0.0947 0.3121 0.3052a22 0.8973 0.8727 0.1687 0.9086 0.8540 0.1880b11 0.0060 0.0065 0.0078 0.0047 0.0044 0.0069b22 0.0614 0.0104 0.0268 0.0591 0.0338 0.0755? -0.5584 -0.0211 0.5872 -0.5621 0.0036 0.6259? 0.0062 0.0063 0.0014 0.0062 0.0063 0.0014? 0.0382 0.0506 0.0035 0.0382 0.0508 0.0034?? - - - -0.0517 0.1438 0.7477?? -0.2541 -0.6629 0.3716 -0.2430 -0.5841 0.4675Estimation results. This table describes the sampling distribution of the QML estimates of the model:yt = ? exp (x1t) + ? exp (x2t)x3t,xt = A?xt?1 + ?wt with wt ? N (0,?) ,whereA? =??????0 0 0 1 00 0 0 0 10 0 0 0 00 0 0 a11 a120 0 0 a21 a22??????, ? =??????0 0 00 0 01 0 00 1 00 0 1??????and ? =??1 ???b11 ???b22???b11 b11 ??b11b22???b22 ??b11b22 b22?? .The results are based on 500 independent samples of T = 792 returns simulated from the model with theparameters displayed in the first column.62Table 2.8. Risk-Return Model: Descriptive StatisticsMarket Index Short rateMean 0.0083 0.0036Std. dev. 0.0435 0.0025Max 0.1532 0.0134Min -0.2554 0.0000Median 0.0127 0.0034Skewness -0.7680 0.9463Kurtosis 5.6443 4.2273Autocorrelation1-month 0.0908 0.96846-month -0.0556 0.890712-month 0.0348 0.808024-month -0.0008 0.6327Descriptive statistics. This table presents descriptive statistics of monthly log-returns on the value-weighted CRSP index and the short rate from January 1946 to December 2011. The short rate is theyield on a one-month Treasury bill.63Table 2.9. Risk-Return Model: Parameter EstimatesParametersModel A Model BEstimate Std. Error Estimate Std. Errora11 0.9436 0.0662 0.9586 0.0161a21 -0.0778 0.1146 -0.0285 0.0065a12 0.3799 0.1171 0.3054 0.0253a22 0.7317 0.1298 0.8750 0.0001b11 0.1684 0.0296 0.1739 0.0015b22 0.0002 0.0316 0.0024 0.0015? -0.1306 0.0091 -0.9000 0.0014? 0.0047 0.0165 0.0047 0.0012? 0.0434 0.0070 0.0437 0.0002?? - - -0.6642 0.0170?? - - - -L 1366.23 1373.35ParametersModel C Model DEstimate Std. Error Estimate Std. Errora11 0.9999 0.0355 0.9927 0.0859a21 -0.0202 0.0232 -0.0135 0.0000a12 0.2873 0.0261 0.6779 0.0067a22 0.8576 0.0495 0.8523 0.0464b11 0.0474 0.0762 0.0615 0.0074b22 0.0054 0.0019 0.0053 0.0277? -0.8999 0.0133 -0.8897 0.0034? 0.0046 0.0011 0.0047 0.0035? 0.0440 0.0211 0.0436 0.0002?? - - -0.1902 0.0135?? -0.9000 0.0002 -0.8942 0.0203L 1388.43 1395.5Estimation results. This table describes presents the QML estimates of the model:yt = ? exp (x1t) + ? exp (x2t)x3t,xt = A?xt?1 + ?wt with wt ? N (0,?) ,whereA? =??????0 0 0 1 00 0 0 0 10 0 0 0 00 0 0 a11 a120 0 0 a21 a22??????, ? =??????0 0 00 0 01 0 00 1 00 0 1??????and ? =??1 ???b11 ???b22???b11 b11 ??b11b22???b22 ??b11b22 b22?? .The estimates are for quarterly returns on the value-weighted CRSP index in excess of the one-monthTreasury bill from the January 1953 to December 2011.64Table 2.10. DSGE Model: Parameter ValuesParameter ? ? ? ? ? ?y ?A ?cValue 0.95 0.15 0.30 0.90 3.00 0.10 0.10 0.15Parameter values. This table contains the parameter values used to analyze the dynamic stochasticgeneral equilibrium model. The parameters are indicated in the first row. The values, used for simulatingthe series, are given in the second row.Table 2.11. DSGE Model (One Equation): Simulation ResultsState-Space Model 1MSE(k?t) MSE (a?t) MSE (G?DP t)Average 0.0855 0.0133 0.0582Std. Dev. 0.0284 0.0026 0.0039State estimation results. This table presents the mean and variance of the MSEs between thesimulated and the filtered values of the log-investment (k?t), the random shocks (a?t) and the observabledetrended gross domestic product per capita (G?DP t). The state-space representation implied by theDSGE model isG?DP t = ?k?t + a?t + vy,t, vy,t ? N(0, ?2y),k?t+1 = ?kk?t + ?aa?t +12(?kkk?2t + 2?aka?tk?t + ?aaa?2t)+12????2,a?t+1 = ?a?t + ?t+1.The results are based on 500 independent samples of T = 500 simulated from the model with parametersshown in Table 2.10 and a perturbation parameter ? = 0.10.65Table 2.12. DSGE Model (Two Equations): Simulation ResultsState-Space Model 2MSE(k?t) MSE (a?t) MSE (G?DP t) MSE (C?t)Average 0.1108 0.0185 0.0722 0.0433Std. Dev. 0.1471 0.0159 0.0291 0.0423State estimation results. This table presents the mean and variance of the MSEs between thesimulated and the filtered values of the log-investment (k?t), the random shocks (a?t), the observables arethe detrended gross domestic product per capita (G?DP t) and consumption per capita (C?t). Thestate-space representation implied by the DSGE model isG?DP t = ?k?t + a?t + vy,t, vy,t ? N(0, ?2y),C?t = pikk?t + piaa?t +12(pikkk?2t + 2piaka?tk?t + piaaa?2t)+12pi???2 + vc,t, vc,t ? N(0, ?2c)k?t+1 = ?kk?t + ?aa?t +12(?kkk?2t + 2?aka?tk?t + ?aaa?2t)+12????2,a?t+1 = ?a?t + ?t+1.The results are based on 500 independent samples of T = 500 simulated from the model with parametersshown in Table 2.10 and a perturbation parameter ? = 0.10.66Table 2.13. DSGE Model: Estimation ResultsParameter True ValueModel 1 Model 2Average Std. Dev. Average Std. Dev? 0.9500 0.8946 0.0937 0.9835 0.0207? 0.1500 0.1965 0.0803 0.1146 0.0307? 0.3000 0.3990 0.2475 0.4529 0.1203? 0.9000 0.7319 0.1207 0.6687 0.1070? 3.0000 5.1245 3.1691 2.3750 1.2033?y 0.1000 0.0686 0.0410 0.0904 0.0376?A 0.1000 0.1018 0.0289 0.1226 0.0298?c 0.1500 - - 0.1631 0.0210Estimation results. This table describes the sampling distribution of the Quasi-maximum likelihoodof the model:G?DP t = ?k?t + a?t + vy,t, vy,t ? N(0, ?2y),C?t = pikk?t + piaa?t +12(pikkk?2t + 2piaka?tk?t + piaaa?2t)+12pi???2 + vc,t, vc,t ? N(0, ?2c)k?t+1 = ?kk?t + ?aa?t +12(?kkk?2t + 2?aka?tk?t + ?aaa?2t)+12????2,a?t+1 = ?a?t + ?t+1.Model 1 corresponds to the state-space model with the first observation equation only, while Model 2 isthe model with two observation equations. The parameters ?, ?, ?, ?, ?,?y,?A and ?C are used as aninput to solve a DSGE model with second-order approximations as in Schmitt-Grohe and Uribe (2004).The coefficients pik, pia, pikk, piak,piaa,pi??,?k, ?a, ?kk, ?ak,?aa and ???, are the numerical values of theseapproximations. Finally, the perturbation parameter is ? = 0.10.Table 2.14. Multivariate Stochastic Volatility Model: Parameter ValuesParameter ?11 ?22 ?11 ?21 ?22 D11 D21 D22Value 0.9639 0.9128 0.0415 0.0349 0.1165 0.6077 -0.5095 0.3694Parameter values. This table displays the parameter values of the multivariate stochastic volatilitymodel with time-varying variances and correlations. The coefficients are indicated in the first row. Thetrue parameter values, used for simulating the series, are given in the second row.67Chapter 3On the Volatility of the MarketSharpe Ratio3.1 IntroductionThe Sharpe ratio measures the excess return of an investment relative to its standard deviation.Most leading consumption-based asset pricing theories imply a relatively stable market Sharperatio. However, empirical evidence suggests that there is more variability in the Sharpe ratiothan standard models account for. Recently, Lettau and Ludvigson (2010) suggest that thefinance literature should address this "Sharpe ratio variability puzzle." They document thatthe empirical standard deviation of the estimated Sharpe ratio is about 47% per quarter. Incontrast, a quarterly calibration of the standard Campbell and Cochrane (1999) model producesa substantially lower volatility of 9%. In turn, Chien, Cole, and Lustig (2012) suggest thatpassive investors? infrequent rebalancing explains the high variability of market Sharpe ratios.In this chapter, I examine whether estimates of the variability of the Sharpe ratio mightbe biased due to limitations of the empirical methodology used in estimation. In particular, Ishow that measurement error in estimated Sharpe ratios may help to explain the Sharpe ratiovolatility puzzle. To do this, I simulate data from a standard calibration of the Bansal andYaron (2004) long-run risks (LRR) model. Following practice common in the literature, I thenestimate Sharpe ratios using ordinary least squares (OLS) methods to infer the variability ofthe model-generated Sharpe ratios. OLS methods lead to estimates of Sharpe ratio volatility ofapproximately 18%, even though the true variability of the model-implied Sharpe ratio is only3%. The difference in estimates is due to measurement error induced by the standard Sharperatio estimation methodology.Once I have documented the difference between the Sharpe ratio?s estimated and truevolatility, I study whether improved empirical methodologies might better account for the true68variability of the Sharpe ratio. In particular, I implement filtering methods, which are statisticaltools that recover unobservable state variables using measurements that are observed withnoise. These techniques are flexible enough to allow the econometrician to perform statisticalinference based on time-varying information observed with measurement error. Moreover, themodeling representation is general enough to include time varying information as well as flexiblecorrelation structures and errors in variables.38I use two different exercises to show the limitations of OLS methods. First, I run a controlledexperiment in which I have full information of the data?generating process of stock returns, statevariable dynamics and parameter values. I use simulated data from the LRR model as calibratedby Bansal and Yaron (2004) to estimate conditional means, variances and Sharpe ratios. Theuse of artificial data from a fully specified economy is important because it allows the economicreasons that drive the variation in Sharpe ratios to be isolated. Moreover, information aboutmodel specification and state-variable dynamics is incorporated within the filtering estimationprocedure. Furthermore, the tractability of the LRR model allows data to be simulated withrelative ease.I then implement two econometric techniques: I run standard OLS regressions and thenI apply filtering techniques. I compare both sets of results with the closed-form expressionsimplied by the LRR model as a benchmark. My results show that the Sharpe ratios based onstandard OLS methods are more volatile than the estimates obtained with filtering techniques.Moreover, the volatility estimates obtained via filtering differ from the true value by less than1%, which is a significant improvement over OLS estimates. The main driver of this result isthe use of conditioning information within the estimation process.There are a number of reasons why a filtering approach can improve upon predictiveregressions to estimate expected returns and conditional volatilities. First, filtering explicitlyacknowledges that both expected returns and volatilities are time varying. Filtering techniquesaggregate the entire history of realized returns parsimoniously; in contrast, predictive regressionsuse lagged predictors to form estimates of expected returns and volatilities. Instead of addinglags to a vector autorregressive (VAR) model, which would increase the number of parameters tobe estimated, a latent variable approach such as filtering incorporates the information containedin the history of observed returns. Moreover, filtering techniques are flexible enough to be usedwith large information sets without relying on additional instruments that may be misspecified(Ferson, Sarkissian, and Simin, 2003). Finally, filtering is more robust to structural breaks thanare OLS techniques (Rytchkov, 2012), since it is insensitive to robust shifts in relations over thelong run. For example, in the predictability literature, a substantial shift in the dividend-priceratio destroys its forecasting power.39 Also, robustness to structural breaks makes the filtering38See Hamilton (1994); Kim and Nelson (1999) and Doucet, de Freitas, Gordon, and Smith (2001) for anintroduction to filtering methods. Crisan and Rozovskii (2011) provide a more recent literature review in nonlinearfiltering methods.39See Lettau and Van Nieuwerburgh (2008) for a detailed explanation.69approach more valuable from an ex-ante point of view, when it is unclear whether structuralbreaks will occur.The standard method used in the literature to estimate Sharpe ratios is to use fittedmoments from first-stage predictive regressions as proxies for the unobserved conditional meanand volatility. Such a technique has some important drawbacks. First, the dynamics of theconditional mean and volatility are determined by the joint conditional distribution of thefirst-stage predictors. Thus, with any model misspecification, such as omitted variables, thedynamics of the fitted moments would not necessarily correspond to the dynamics of the truemoments. In addition, even if the predictive models for the conditional mean and volatility arewell specified, the effect of errors in variables, which are induced by the first-stage regressions, isnot trivial to quantify in a VAR model.Simulating data of stock returns by means of theoretical models is a powerful tool becausethe economic reasons that drive the simulated time-series variation are fully identified. However,theoretical models are abstractions, and by definition misspecified. An alternative form ofanalyzing stock returns is via reduced form models, which are statistical representations that donot impose any economic structure and thus aim to better describe historical data. To inferSharpe ratios and their variability from the data, I conduct a second exercise based on thereduced form model by Brandt and Kang (2004). In this model, expected returns and volatilitiesare estimated as latent variables and identified from the history of returns. The main advantageof this approach is that it does not rely on prespecified predictors and is not subject to errorsin variables or model misspecification. I apply the filtering techniques described in Chapter 2to estimate the parameters of the model and to extract estimates of conditional moments ofreturns as well as conditional Sharpe ratios. As a result, my estimate for quarterly Sharpe ratiovolatility using the reduced form model is in the order of 5% to 10%, whereas my estimate forthe quarterly Sharpe ratio volatility using the OLS methods is 42%.Consistent with the results of the simulation exercise, I find that conditioning informationdrives the results above. Reduced form models do not rely on predetermined conditioningvariables to estimate conditional moments: The state variables are identified from the history ofreturns. Standard OLS techniques generate fitted moments from a set of predictive regressionsas proxies for the unobservable conditional mean and volatility. The fitted moments dependon the joint distribution of these predictors. Consequently, any model misspecification wouldgenerate fitted moments that do not correspond to the true dynamics of the conditional meanand volatility, and thus, the dynamics of the Sharpe ratio.My findings have important implications in an asset management context since the Sharperatio is a commonly used measure of performance evaluation. For investors willing to allocatetheir wealth between the market portfolio and the risk-free instrument, the market Sharperatio becomes a natural benchmark of their investments. If this ratio is highly volatile, the70variation needs to be taken into account for hedging and rebalancing purposes. Indeed, Lustigand Verdelhan (2012) report that accounting for time variation in Sharpe ratios may lead tooptimal trading strategies that differ markedly from buy-and-hold strategies.Furthermore, a mean-variance investor would have an obvious interest in understanding thevolatility of Sharpe ratios. For example, in a partial equilibrium setting,40 the Sharpe ratiodetermines the fraction of wealth that an agent invests in the market portfolio. I show that if aninvestor uses OLS methods to determine this fraction, then the portfolio weights would exhibitextremely volatile behavior over time, which may result in high rebalancing costs. I also showthat if an investor applies filtering techniques to estimate the fraction of wealth invested in themarket portfolio, then these costs will be substantially lower. Even further, for a representativeagent with habit formation preferences, the Sharpe ratio indicates the timing and magnitudeof fluctuations of risk aversion (Campbell and Cochrane, 1999; Constantinides, 1990). Thus,the time variation in the market Sharpe ratio may provide information about the fundamentaleconomics underlying the asset prices.3.1.1 Related LiteratureA number of studies analyze the predictable variation of the mean and volatility of stock returnsfrom an empirical point of view.41 However, only a few papers have investigated the timevariation observed in equity Sharpe ratios. Lettau and Ludvigson (2010) measure the conditionalSharpe ratio of U.S. equities by forecasting stock market returns and realized volatility usingdifferent predictors. They obtain highly counter-cyclical and volatile Sharpe ratios and showthat neither the external habit model of Campbell and Cochrane (1999) nor the LRR modelBansal and Yaron (2004) deliver Sharpe ratios volatile enough to match the data. Using alatent VAR process, Brandt and Kang (2004) also find a highly counter-cyclical Sharpe ratio.Ludvigson and Ng (2007) document the same result using a large number of predictors in adynamic factor analysis.Tang and Whitelaw (2011) document predictable variation in stock market Sharpe ratios.Based on a predetermined set of financial variables, the conditional mean and volatility ofequity returns are constructed and combined to estimate the conditional Sharpe ratios. Tangand Whitelaw (2011) find that conditional Sharpe ratios show substantial time variation thatcoincides with the phases of the business cycle. Lustig and Verdelhan (2012) provide evidencethat Sharpe ratios are higher in recessions than in expansions in the United States and otherOECD countries. They also find that the changes in expected returns during business-cycleexpansion and contractions are not explained by changes in near-term dividend growth rates.These papers focus on the counter-cyclical behavior of Sharpe ratios. My paper focuses on40Some examples are Merton (1969, 1971).41See Lettau and Ludvigson (2010) for a comprehensive survey.71the conditional volatility of market Sharpe ratios and finds that the volatility estimates aresubstantially smaller than the evidence previously documented.My paper is also related to Brandt and Kang (2004); P?stor and Stambaugh (2009); vanBinsbergen and Koijen (2010) and Rytchkov (2012), who analyze return predictability usingstate-space models.42 I contribute to the literature by focusing on the dynamic behavior of themarket Sharpe ratio and by showing that standard OLS methods as applied in the literaturegenerate measurement error which impacts estimates of Sharpe ratio volatility. Moreover, I alsoshow that filtering techniques are a good approach for estimating the ratio?s true volatility. Ialso find that filtering techniques are better able to capture the dynamic behavior of marketSharpe ratios.The remainder of this chapter is organized as follows. Section 3.2 provides a theoreticalframework to interpret Sharpe ratios. Section 3.3 introduces the LRR model and its implicationsfor empirical moments. Section 3.4 describes the simulation exercise as well as the estimationmethodologies for expected returns, volatilities and Sharpe ratios. In section 3.5, an analysis ofhistorical Sharpe ratios based on reduced form models is described and the empirical resultsare shown. Section 3.6 presents asset allocation implications. Finally, concluding remarks arepresented in section 3.7.3.2 Sharpe Ratios in Asset PricingThe conditional Sharpe ratio of any asset at time t, denoted by SRt, is defined as the ratio ofthe conditional mean excess return to its conditional standard deviation; that is,SRt =Et [Rt+1 ?Rft+1]?t [Rt+1 ?Rft+1], (3.1)where Rt and Rft denote the gross asset return of an asset and the one-period risk-free interestrate, respectively, and the conditional expectations are based on the information available attime t.Harrison and Kreps (1979) show that the absence of arbitrage implies the existence of astochastic discount factor (SDF) or pricing kernel, denoted by Mt, that prices all assets inthe economy.43 An implication of no arbitrage is that the expectation of the product of thestochastic discount factor and the gross asset return of any asset must be equal to one; that is,Et [Mt+1Rt+1] = 1. (3.2)42In an early work in this body of literature, Conrad and Kaul (1988) use the Kalman filter to extract expectedreturns, but only from the history of realized returns. Other studies that relate latent variables with predictabilityinclude Ang and Piazzesi (2003); Lamoureux and Zhou (1996) and Dangl and Halling (2012).43A detailed explanation is shown in Appendix B.1.72An implication of (3.2) is that the conditional Sharpe ratio is proportional to the risk-freerate, the volatility of the pricing kernel and the correlation between the pricing kernel and thereturn; that is,SRt = ?Rft+1?t [Mt+1]Corrt [Rt+1,Mt+1] , (3.3)where ?t and Corrt are the standard deviation and correlation, conditional on information attime t, respectively. The conditional Sharpe ratio of any asset in the economy is time varyingas long as the risk-free rate varies or the pricing kernel is conditionally heteroskedastic -thatis, ?t [Mt+1] changes over time- or if the correlation between the stock market return and theSDF is time varying. In this paper, I focus on the conditional Sharpe ratio of the aggregatestock market, which is defined as the instrument that pays the aggregate dividend every period.However, the analysis can be extended to the Sharpe ratios of any traded asset.The next section presents the LRR model of Bansal and Yaron (2004), with a particularfocus on the implications for expected returns, volatilities and Sharpe ratios of the aggregatestock market. This model explains stock price variation as a response to persistent fluctuationsin the mean and volatility of aggregate consumption growth by a representative agent with ahigh elasticity of intertemporal substitution. The tractability of the LRR model allows datato be simulated with relative ease. It provides analytical expressions for expected returns,volatilities and Sharpe ratios for the market portfolio, conditional on the Campbell and Shiller(1988) log-linearizations. Later in the paper I briefly present other asset pricing models andtheir implications for market Sharpe ratios.3.3 The Long-Run Risks ModelBansal and Yaron (2004) and Bansal, Kiku, and Yaron (2012a) (BY and BKY hereafter) proposethe following stochastic processes for the log-consumption and log-dividend growth, denoted by?ct+1 and ?dt+1, respectively:?ct+1 = ?c + xt + ?t?t+1xt+1 = ?xt + ?e?tet+1?2t+1 = ?2 + v(?2t ? ?2)+ ?wwt+1 (3.4)?dt+1 = ?d + ?xt + ??tut+1 + pi?t?t+1wt+1, et+1, ut+1, ?t+1 ? i.i.d. N (0, 1) ,where, xt is a persistently varying component of the expected consumption growth rate and ?2tis the conditional variance of consumption growth, which is time varying and highly persistent,with unconditional mean ?2. The variance process can take negative values, but it will happenwith small probability if its conditional mean is high enough with respect to its variance.Dividends are correlated with consumption since the growth rate, ?dt+1, shares the same73persistent predictable component scaled by a parameter ?, and the conditional volatility ofdividend growth is proportional to the conditional volatility of consumption growth.BY solve the LRR model using analytical approximations. They assume a representativeagent with Epstein-Zin utility with time discount factor ?, coefficient of relative risk aversion ?,and elasticity of intertemporal substitution ?. The log of the stochastic discount factor, mt+1,for this economy is given bymt+1 = ? ln ? ????ct+1 + (? ? 1) ra,t+1, (3.5)where ? = (1? ?) / (1? ?) and ra,t+1 is the return on the consumption claim, or equivalently,the return on aggregate wealth. BY use the Campbell and Shiller (1988) log-linearizationsto obtain analytical approximations for the returns on the consumption and dividend claims.Further details on the model and derivations are explained in Appendix B.2.3.3.1 Implications for Expected Returns, Volatilities and ConditionalSharpe RatiosUnder the long-run risks framework, the equity premium is an affine function of the volatility ofconsumption growth alone:Et [rm,t+1 ? rf,t+1] = E0 + E1?2t . (3.6)The model also implies that the conditional variance of the market return is an affine functionof ?2t :V art (rm,t+1) = D0 +D1?2t . (3.7)The coefficients E0, E1, D0 and D1 are known functions of the underlying time-series andpreference parameters. The general expressions and details about their derivation are shown inAppendix B.2.4.The covariance between the observed market excess return, rm,t+1, and the innovation tothe volatility process, wt+1 is given bycovt (rm,t+1, wt+1) = ?1,mA2,m?w. (3.8)One of the appealing properties of the long-run risk model, is that A2,m < 0, for standardcalibrations, implying that the LRR model is able to reproduce the negative feedback effect.44Another implication of Eq. (3.8) , is that the conditional correlation between excess returns and44Campbell and Hentschel (1992b), Glosten, Jagannathan, and Runkle (1993), and Brandt and Kang (2004),among others document the volatility feedback effect; that is, return innovations are negatively correlated withinnovations in market volatility.74the innovations to consumption risk is time varying because the conditional variance of stockreturns is time varying.Following Bansal, Kiku, and Yaron (2012b), the cumulative log-return over K time periodsis just a sum of K one-period returns,45K?k=1(rm,t+k ? rf,t+k) .The conditional moments are given byEt[K?k=1rm,t+k ? rf,t+k]= E0,K + E1,K?2t , (3.9)andV art[K?k=1rm,t+k ? rf,t+k]= D0,K + D1,K?2t , (3.10)where E0,K , E1,K , D0,K , and D1,K are known functions of the preference parameters and thenumber of periods, K, used for time aggregation. If the time unit is the month, then evaluatingEqs. (3.9) and (3.10) provides an expression for annual estimates.The conditional Sharpe ratio of an investment over K time periods is given by the ratio ofthe conditional mean returns divided by its conditional standard deviation, and is representedbySRt,t+K =E0,K +D0,K2 +(E1,K +D1,K2)?2t?D0,K + D1,K?2t. (3.11)Eq. (3.11) implies that the only source of variation in the conditional Sharpe ratio under theLRR framework is the volatility of consumption growth. Moreover, the conditional Sharpe ratiois stochastic unless ?2t is deterministic. Furthermore, under the standard calibrations by BY,the conditional Sharpe ratio is strictly increasing in the volatility of consumption growth. Thisimplies that the long-run risk framework predicts counter-cyclical Sharpe ratios; that is, for badtimes (high values of the volatility of consumption growth) the Sharpe ratios are high and forgood times, (low values of volatility of consumption growth) the conditional Sharpe ratios arelow, consistent with the habit formation model of Campbell and Cochrane (1999).Moreover, Eqs. (3.9) , (3.10) and (3.11) characterize the expected return, volatility andSharpe ratios of a buy and hold strategy over K time units. These equations define the term45Time aggregation is an important mechanism for parameter and state inference. Bansal, Kiku, and Yaron(2012b) explicitly consider time aggregation of variables. They find that time aggregation can affect parametervalues and they provide evidence that ignoring time aggregation leads to false rejection of the LRR model. Earlierpapers that account for time aggregation in estimation in asset pricing context include Hansen and Sargent (1983)and Heaton (1995).75structure of risk premia, volatility, and Sharpe ratios of the market portfolio. Moreover, byevaluating Eqs. (3.9) and (3.10) in the unconditional value of the volatility of consumptiongrowth, ?2, we obtain expressions for the unconditional moments of cumulative returns. Similarexpressions can be obtained for the cumulative return moments of the risk-free instrument andmarket portfolio. Details about the derivations are described in Appendix B.3.3.4 Sharpe Ratios Simulated from Structural ModelsIn this section, I conduct a simulation study in the spirit of Beeler and Campbell (2012). Theobjective is to simulate equity returns from the LRR model at a monthly frequency, and thentime aggregate them to obtain annual estimates of returns, volatilities and Sharpe ratios. Iexplain the simulation exercise as follows.First, I generate four sets of independent standard normal random variables and use themto construct monthly series for consumption, dividends and state variables using the state-spacemodel in Eq. (3.4).46 Next, I construct annual consumption and dividend growth by addingtwelve monthly consumption and dividend levels, respectively, and then take the growth rateof the sum. The log market returns and risk-free rates are the sum of monthly values, whilethe log price-dividend ratios use prices measured from the last period of the year. As theprice-dividend ratio in the data is divided by the previous year?s dividends, the price-dividendratio in the model is multiplied by the dividend in that month and divided by the dividendsover the previous year.As in BY, BKY and Beeler and Campbell (2012), negative realizations of the conditionalvariance are censored and replaced with a small positive number.47 I also retain sample pathsalong which the volatility process goes negative and is censored.48 Since the volatility is highlypersistent, it is quite likely to have negative values for the conditional variance; indeed, Beelerand Campbell (2012) report that under the BK calibration less than 1% of the volatilitysimulations are negative for a sample of 100,000 simulations. Each simulation is initialized fromthe steady-state values and run for a "burn-in" period of ten years.3.4.1 Predictive RegressionsThe conditional moments of market returns as well as the Sharpe ratio are unobservable. Acommon approach that has been applied in the empirical literature to circumvent this issue isto project excess stock returns series on a predetermined set of conditioning variables, such aseconomic or financial indicators observed by the econometrician.46The frequency is consistent with the parameters calibrated by BY and BKY, which are provided in monthlyterms.47The number is (10?14) and is consistent with the simulation exercise of Beeler and Campbell (2012).48An alternative approach is to replace negative realizations with their absolute values, as in Johnson and Lee(2012).76Empirical studies differ in the conditioning information used in projection of excess returns.Among the most commonly used predictor variables are the price-dividend ratios (Campbell,1991; Fama and French, 1988a; Hodrick, 1992), short-term interest rates (Ang and Bekaert, 2007;Campbell, 1991; Fama and Schwert, 1977; Hodrick, 1992), term spreads and default spreads(Fama and French, 1988a), book market ratios (Lewellen, 1999; Vuolteenaho, 2000), proxiesfor consumption-wealth ratio (Lettau and Ludvigson, 2001a,b), and latent factors obtainedfrom large data sets (Ludvigson and Ng, 2007).49 Expected returns are calculated by regressingrealized returns on the set of predictors and taking the fitted values as estimates.Conditional volatility may also be measured by a projection onto predetermined conditioningvariables, taking the fitted value from this projection as a measure of conditional variance orconditional standard deviation. This type of modeling is commonly used; for example French,Schwert, and Stambaugh (1987) use a time-series model of realized variance to model theconditional variance.Within the set of techniques to measure conditional volatility by a projection onto prede-termined conditioning variables, three approaches are common. One is to take the squaredresiduals from a regression of excess returns onto a predetermined set of conditioning variablesand regress them on to the same set of conditioning variables, using the fitted values from thisregression as a measure of conditional variance.50 Alternatively, volatility can be estimatedusing high-frequency return data, commonly referred to as realized volatility. This is an ex-postmeasure that consists of adding up the squared high-frequency returns over the period of interest.The realized volatility is then projected onto time t information variables to obtain a consistentestimate of the conditional variance of returns.51 The third approach estimates conditionalvolatility of excess stock market returns by specifying a parametric form for the conditionalvolatility, such as the GARCH type of models, or stochastic volatility.52 The volatility estimatesare then obtained from the history of observed returns. For this part of the paper, I focus on thesecond type of methodology to calculate conditional volatilities of stock returns by projectingthe sum of squared monthly returns on a set of predictors.As for the conditional Sharpe ratio, a standard measure is the ratio of the estimated expectedexcess return to the estimated volatility, both obtained from separate projections. This approachhas been taken by Kandel and Stambaugh (1990), Tang and Whitelaw (2011) and Lettau andLudvigson (2010), among others.49Goyal and Welch (2008); Lettau and Ludvigson (2010) provide a comprehensive review of predictive variablescommonly used in the literature.50Campbell (1987) and Breen, Glosten, and Jagannathan (1989) apply these methods in the predictabilityliterature.51This approach is taken by French, Schwert, and Stambaugh (1987), Schwert (1989), Whitelaw (1994), Ghysels,Santa-Clara, and Valkanov (2006), Ludvigson and Ng (2007), Lettau and Ludvigson (2010) and Tang andWhitelaw (2011).52French, Schwert, and Stambaugh (1987), Bollerslev, Engle, and Wooldridge (1988) and Glosten, Jagannathan,and Runkle (1993) have applied this approach in the predictability literature.77I model the conditional moments of annual returns as follows:Et [Rt+1 ?Rf,t+1] = Xt??, (3.12)V art [Rt+1 ?Rf,t+1] = Xt??, (3.13)where Xt is the set of predictor variables observed at time t and Rt+1 ? Rf,t+1 is the annualexcess return on the market. I assume that the predictor variables available at time t are theprice-to-dividend ratio, the current excess returns and the risk-free rate, constructed in anannual basis.The regression equations that correspond to (3.12) and (3.13) areRt+1 ?Rf,t+1 = Xt?? + ??,t+1, (3.14)vt+1 = Xt?? + ??,t+1, (3.15)where Rt?Rf,t is the annual excess return on the market portfolio and vt is the realized variancefor year t. The annual excess return is calculated as the sum of the monthly excess log-returns,while the realized variance is the sum of the squared monthly excess log-returns. Both sums arecalculated within the same year.Based on the information available at time t and the parameter estimates from (3.12) and(3.13), the conditional Sharpe ratio is calculated as follows:S?Rt =Xt??? +Xt???2?Xt???, (3.16)where ??? and ??? denote the OLS estimates for ?? and ?? respectively.53Figure 3.1 shows the results of a simulated path of annual returns. Each simulation has 100annual observations of returns. Panel A shows the time series of expected returns calculated froman OLS regression, Panel B shows the conditional variance estimated from an OLS regressionand Panel C contains the conditional Sharpe ratio estimates using the fitted values from theconditional mean and conditional volatility from panels A and B. Finally, Panel D displays thetime series of annual Sharpe ratios implied by the BY model. These are obtained by evaluatingEq. (3.11) in K = 12.For this specific simulation, the standard deviation of the Sharpe ratio estimates is 3% whilethe standard deviation of the model Sharpe ratio is 17%. Moreover, the correlation coefficientbetween the Sharpe ratio estimates based on OLS methods and the Sharpe ratio implied by themodel is 7.9%.53This definition of Sharpe ratio includes the Jensen?s adjustment due to log-returns. However, my results arerobust if the Sharpe ratio is defined as the ratio of expected returns to conditional volatility.78The use of artificial data from a fully specified economy is important because it allows theeconomic reasons that drive the variation in Sharpe ratios to be isolated. In the first case,the variation in the Sharpe ratio is driven by the volatility of consumption growth. In thesecond case, the volatility of the Sharpe ratio is driven by consumption risk and measurementerror caused by the OLS estimation method. In order to verify the robustness of my results, Irepeated the previous exercise 100,000 times via Monte Carlo simulations with sample periodsof length 100 years. Table 3.2 reports the median moments implied by the simulations of the BYcalibrations of the LRR calibrations. I look at the empirical first and second moments and atthe empirical Sharpe ratios constructed via OLS methods and compare them with the medianfirst and second moments as well as the Sharpe ratios implied by the model.From Table 3.2, we learn that the level of expected returns and volatilities implied by the LRRmodel are well captured by OLS techniques. Indeed, the difference between expected returnsand the LRR model counterpart is almost indistinguishable. As for the volatility estimates, OLStechniques do a good job in matching the mean level as well as standard deviation. However,there are some differences worth noting. The standard deviation of the risk premia calculatedwith OLS techniques is 3.01%, while the standard deviation implied by the model is 0.87%.That is, the standard deviation estimated via OLS methods is more than three times the truestandard deviation. Moreover, the median of the correlation coefficient between the risk premiaand its OLS estimate is 0.52%. A more serious discrepancy is observed in the estimates of theconditional Sharpe ratio. The model implies a median annual Sharpe ratio of 33.33% while theestimates obtained with projection techniques is 26.45%; the standard deviation of the Sharperatio calculated regressions is 15.82%, while, the value implied by the model is 3.53%. Thecorrelation between the true Sharpe ratios implied by the model and its OLS estimates is 0.39%.We learn from this simulation exercise that the use of fitted moments as proxies for theunobserved conditional mean and volatility of stock returns has some obvious drawbacks. First,the dynamics of the conditional mean and volatility are determined by the joint conditionaldistribution of the first-stage predictors. Thus, with any model misspecification the dynamics ofthe fitted moments would not need to correspond to the dynamics of the true moments. Evenwhen the predictive models for the conditional mean and volatility are well specified, the effect oferrors in variables, which are induced by the first stage-regressions, is not trivial to quantify andhas an important effect in the Sharpe ratio volatility estimates. Moreover, OLS methods do notaccount for time-varying observations or time-varying information sets; therefore OLS methodsare not robust to structural changes. In that sense, an econometric technique that accounts forsuch deficiencies may be a good approach for Sharpe ratio estimation and its dynamic behavior.Filtering techniques are able to overcome these issues.793.4.2 Filtering and EstimationLet yt+1 = rm,t+1 ? rf,t+1 be the continuously compounded monthly excess return. Thetime-series dynamics of yt+1 is represented byyt+1 = ?t + ?t?t+1 with ?t+1 ? N (0, 1) , (3.17)where ?t and ?t represent the expected return and conditional volatility. Under the LRR model,these are given by?t = E0 + E1?2t , (3.18)and?2t = D0 +D1?2t . (3.19)According to Eq. (3.4) , the evolution of ?2t is represented by?2t+1 = ?2 + v(?2t ? ?2)+ ?wwt+1, (3.20)wt+1 ? i.i.d. N (0, 1) ,and the covariance between the observed market excess return, yt+1, and the innovation to thevolatility process, wt+1 is given in Eq. (3.8).FilteringEqs. (3.17) through (3.20) make up a state-space model. In the terminology of state-spacemodels, Eq. (3.17) is the measurement or observation equation and Eq. (3.20) is the transitionor state equation. I assume that ?2t is a latent variable; therefore, both the conditional meanand volatility of market returns are unobservable. I also assume that I am able to observe thefull history of realized returns. To draw inferences about the dynamic behavior of ?2t as well asthe conditional distribution of excess returns, we need to solve a filtering problem.The solution to the filtering is the distribution of the latent variable ?2t conditional on thehistory of observed returns. From Eqs. (3.9) through (3.11), we learn that expected returns,volatilities and conditional Sharpe ratios can be estimated based on this conditional distribution,for any holding period. Unfortunately, the filtering problem generated by the LRR model isnot standard because of the nonlinearities in the measurement equation as well as the non-zerocovariance between the observation and transition equations. As a result, the standard Kalmanfilter (designed for linear Gaussian state-space models) cannot be used directly in the estimationof the model. I instead rely on nonlinear filtering methods to estimate the distribution of ?2t ,conditional moments of market excess returns and market Sharpe ratios.80Particle FiltersI estimate the latent process ?2t , conditional moments and Sharpe ratios via particle filters. Theparticle filter is a nonlinear filter which works through Monte Carlo methods. The conditionaldistribution of the state variables is replaced by an empirical distribution drawn by simulation.This method does not require the explicit computation of Jacobians and Hessians, and capturesthe conditional distribution of the state variable accurately up to a prespecified accuracy levelthat depends on the number of simulations chosen by the researcher. To implement the particlefilter, it is necessary to specify the state-space model.54 A brief description of the particle filterand its implementation is given in Appendix A.6.I test the accuracy of the filtered estimates as follows. First, I simulate a path of annual excessreturns according to the calibrations by BY. Given the simulated excess returns, I numericallyconstruct the conditional distribution of the volatility of consumption growth, ?2t , using Eqs.(3.17) to (3.20) as well as the original calibrations by BY. Once, the conditional distribution ofthe volatility of consumption growth is obtained, I estimate risk premia, conditional variancesand Sharpe ratios according to Eqs. (3.9) to (3.11). Figure 3.2 shows a sample simulation ofthe volatility of consumption growth, conditional moments and Sharpe ratios along with theirfiltered counterparts. In panel A, I show a path for the volatility of consumption growth; panelsB and C show the simulated expected returns and their volatility; panel D shows the simulatedannual Sharpe ratio with its filtered estimates. For this specific simulation, the correlationcoefficient between the simulated volatility of consumption growth and its filtered value is 60%.As for the expected returns, volatilities and Sharpe ratios, the simulated values have a correlationcoefficient above 64% with their filtered counterparts.A common concern is that filtering is thought of as a smoothing technique, and therefore, ifthe state variable to be filtered is too volatile, a filtering technique will reduce such volatility andthe unconditional moments of interest may not reflect the true state variable dynamics. However,filtering techniques are robust enough to provide accurate estimates even if the true statevariable to be filtered is volatile. This is due to the fact that filtered estimates are conditionalexpectations of the state variables. To evaluate the unconditional moments, it is necessary toaccount for this fact; thus, I calculate the unconditional mean and variance of the state variablesaccording to the properties of the law of iterated expectations.55To verify the robustness of my results, the simulation exercise was repeated 1,500 times. Foreach simulation, I obtain time series of expected returns, volatilities and Sharpe ratios as well astheir filtered counterparts. I then calculate the unconditional means, variances and correlations54 Doucet, de Freitas, Gordon, and Smith (2001) and Crisan and Rozovskii (2011) describe in detail theproperties of the filter and its practical implementation, and van Binsbergen, Fernandez-Villaverde, Koijen, andRubio-Ramirez (2012) apply the method to estimate a dynamic stochastic general equilibrium model with aparticular focus on the term structure of interest rates.55The unconditional variance estimate comes from the following identity, which relates conditional andunconditional variances: V ar[X] = V ar[E[X|Y ]] + E[V ar[X|Y ]].81between the simulated and filtered series. The results are reported in Table 3.3. In general, thefilters do a good job of capturing the unconditional moments of expected returns, volatilitiesand Sharpe ratios. Overall, the moments estimated via filtering methods are precise match thetrue values in at least two decimal places.We learn from these simulation exercises that filtering techniques are better able to capture thedynamic behavior of the conditional moments and Sharpe ratios than OLS methods. Nonetheless,these filtered estimates rely on a number of assumptions: the state-space model is well specified;realized returns are a noisy measure of expected returns and the volatility of consumptiongrowth is the only unobservable state variable of the system. However, the researcher has fullknowledge of its dynamics as well as the functional forms of expected returns and variances.Finally, I assume that the econometrician has full knowledge of the parameter values and theonly problem that she faces is the estimation of conditional moments based on the time seriesof observed returns. By using OLS methods, we approximate expected returns and varianceswith a linear projection on a set of exogenous predictors and can potentially face a number ofwell-known econometric problems, such as omitted variables or misspecification. For clarity ofexposition, I collect all parameters that define the state-space model into a single parametervector ?. Each parameter vector characterizes a model; hence, conditional distributions andfiltered state variables. As a result, an estimation problem needs to be solved and will beexplained in detail as follows.EstimationThe previous results were obtained by assuming that the set of parameter values is known.This assumption is quite unrealistic, because in reality the researcher is uncertain about thetrue parameter values. A natural way to approach this issue is by estimating the vector ofparameters from the observed data. A common technique for nonlinear dynamic models isQML estimation,56 as described in Chapter 2. Details about its implementation are describedin Appendix B.4. I conduct a simulation exercise to better identify the effect of parameterestimation within the filtering exercise. First, I simulate a time series of excess returns from theLRR model, and then I estimate the parameter values via QML estimation methods using thestate-space representation implied by the LRR model. The parameter estimates are then usedin the filtering estimation procedure.A sample simulation is illustrated in Figure 3.3. Panels 3.3a to 3.3c compare conditionalSharpe ratio estimates with their true values. Panel 3.3a shows the empirical estimate obtainedvia OLS methods, Panel 3.3b displays Sharpe ratio estimates calculated with filtering methodsby assuming that the true parameter values are known and Panel 3.3c contains the filteredSharpe ratios using the parameter estimates obtained via QML methods.56Some examples are Campbell, Sunderam, and Viceira (2012); van Binsbergen and Koijen (2011) and Calvet,Fisher, and Wu (2013).82For this specific simulation, the time-series average of the model-implied Sharpe ratio is33%, while the average Sharpe ratio estimates obtained with filtering techniques are 34% and36%, where the first is obtained by assuming that the true parameter values are known and thesecond is obtained with the parameter estimates from observed returns. Finally, the averageSharpe ratio obtained with OLS methods is 25%. An explanation for this difference is themodel misspecification that is generated from running OLS regressions for expected returns andvolatility calculations on a set of predetermined variables. The volatility estimates obtained viafiltering methods are 5% and 6%. My results are similar to the estimates obtained from thetrue simulated data, which is 4%. In contrast, OLS methods deliver a Sharpe ratio volatilityestimate of 15%. This exercise illustrates the effect of parameter estimation on the volatility ofSharpe ratios. I show evidence that filtering methods deliver Sharpe ratio volatility estimatesconsistent with the true model implied values, even if parameter values have to be estimated.3.4.3 Other ModelsRecent consumption-based asset pricing models have made substantial progress in explainingmany asset pricing puzzles across various markets. Even though such models are not often usedto study Sharpe ratios or their volatility, they do make theoretical predictions about their values.In standard asset pricing models, the market Sharpe ratio is constant (Breeden, 1979; Lintner,1965; Lucas, 1978; Sharpe, 1964) or has negligible variation (Mehra and Prescott, 1985; Weil,1989). Habit formation preferences can help to capture the counter-cyclicality of the risk premia(Abel, 1990; Campbell and Cochrane, 1999; Constantinides, 1990) as well as other features ofmacro-economic outcomes over the business cycle (Jermann, 2010). Bansal and Yaron (2004)combine the preferences of Epstein and Zin (1989) with stochastic volatility of consumptiongrowth and generate time variation in the conditional volatility of the SDF.Other studies have found different channels for time variation in risk premia, such as differ-ences in risk aversion (Bhamra and Uppal, 2010; Chan and Kogan, 2002; Gomes and Michaelides,2008); rare disasters (Barro, 2006, 2009; Rietz, 1988; Wachter, 2012); incomplete markets(Constantinides and Duffie, 1996; G?rleanu and Panageas, 2011); participation constraints,(Basak and Cuoco, 1998; Chien, Cole, and Lustig, 2012; Guvenen, 2009); investment shocks(Papanikolaou, 2011) and heterogeneity in the frequency of shocks to fundamentals (Calvetand Fisher, 2007). A brief summary of the aforementioned models and their asset pricingimplications are shown in Table 3.4.The asset pricing implications of the models shown in Table 3.4 provide a general idea ofthe model-implied variability of Sharpe ratios. Indeed, this variability could be used as a metricto better assess the performance of a model. For example, an asset pricing model with constantSharpe ratios would fail in explaining the observed variation in empirical Sharpe ratios. On theother hand, a model that predicts highly volatile Sharpe ratios may exceed the true variabilityobserved in the data. Therefore, the variance of Sharpe ratios can be used as a metric to better83assess theoretical asset pricing models. This metric would be in the spirit of the entropy measurerecently proposed by Backus, Chernov, and Zin (2012) and studied in Martin (2012).As a robustness check of the variability generated by OLS methods to calculate Sharperatios, I performed a second simulation exercise based on the external habit formation model byCampbell and Cochrane (1999). A brief description of the model and a brief overview of theresults are presented below.3.4.4 External Habit Formation ModelIn the external habit formation model of Campbell and Cochrane (1999), the consumptiondynamics are the same as in the standard Lucas model; that is, consumption growth rates areassumed to be independent and identically distributed. Furthermore, the agent is assumed tohave external habit formation preferences. The habit level is assumed to be a slow-moving andheteroscedastic process. The heteroscedasticity of the habit process, the sensitivity function, canbe chosen so that the real interest rate in the model is constant or linear in the habit. Furtherdetails can be found in Appendix B.5.I use the same calibrated monthly parameters as those in Campbell and Cochrane (1999) tosimulate returns from the model and compute annual expected returns, volatilities and Sharperatios using standard OLS techniques. I compare these results with the numerical values impliedby the model. Consistent with my previous results, I find that the Sharpe ratios based onstandard OLS methods are at least twice more volatile than the model-implied variability. Theresults are plotted in Figure 3.4. Panel A displays the Sharpe ratio estimates based on OLSmethods, while Panel B displays the values of the true Sharpe ratios. Clearly, the Sharperatio estimates based on OLS methods are more volatile than the values implied by the habitformation model.3.5 Sharpe Ratios Estimated from Reduced Form ModelsThe use of data simulated by means of theoretical models helps to better identify the economicreasons that drive the time-series variation. An alternative form of analyzing returns is viareduced form models, which are statistical models that do not impose any economic structure.These models aim to better describe historical data. Moreover, they do not rely on arbitrarypredictors and are not subject to the effects of errors in variables or misspecification.In this section, I introduce the nonlinear latent VAR representation proposed in Brandt andKang (2004), in which the first and second conditional moments are considered latent variablesidentified from the history of returns. In this setup, the Sharpe ratio and its dynamics areobtained endogenously as the ratio of the conditional moments of excess returns. The frameworkis general enough and can be extended to a setup that includes flexible correlation structuresand exogenous predictors.843.5.1 Brandt and Kang (2004)Let yt be the continuously compounded excess returns with time-series dynamics represented byyt = ?t?1 + ?t?1?t with ?t ? N (0, 1) (3.21)where ?t?1and ?t?1 represent the conditional volatility of the excess returns. In addition, itis assumed that the conditional mean and volatility are unobservable and that they follow afirst-order VAR process in logs:[ln?tln?t]= d+A[ln?t?1ln?t?1]+ ?t with ?t ?[?1t?2t]? N (0,?) , (3.22)whered =[d1d2], A =[a11 a12a21 a22]and? =[b11 b12b21 b22]with b12 = b21 = ??b11b22. (3.23)Following Hamilton (1994), if the VAR is stationary, the unconditional moments for the meanand volatility are given byE[ln?tln?t]= (I ?A)?1 d (3.24)andvec(cov[ln?tln?t])= (I ? (A?A))?1 vec (?) (3.25)where ? represents the Kronecker product.The key elements of the return dynamics presented Eq. (3.22) are the transition matrixA and the correlation coefficient ?. The diagonal elements of A capture the persistence of theconditional moments, and the off-diagonal elements reflect the intertemporal feedback betweenthe conditional volatility and the conditional mean. The correlation coefficient ? capturesthe contemporaneous correlation between the innovations to the conditional moments. Thisparameter is of considerable importance since it captures the risk-return trade-off.5757Most asset pricing models predict a positive relationship between the market?s risk premium and conditionalvolatility (Merton, 1973). However, the empirical evidence on the sign of the risk-return relation is inconclusive.Indeed, some studies find a positive relation (Ghysels, Santa-Clara, and Valkanov, 2005; Ludvigson and Ng, 2007;Lundblad, 2007; Pastor, Sinha, and Swaminathan, 2008; Scruggs, 1998), but others find a negative relation (Brandtand Kang, 2004; Campbell, 1987; Glosten, Jagannathan, and Runkle, 1993; Harvey, 2001; Lettau and Ludvigson,2010). Others have shown through theoretical studies that the intertemporal mean-variance relationship may notbe positive or negative (Ang and Liu, 2007; Whitelaw, 2000).85The model in Eq. (3.22) generalizes the permanent and temporary components of Fama andFrench (1988b) and the standard stochastic volatility model. The equation for the conditionalmean isln?t = d1 + a11 ln?t?1 + a12 ln?t?1 + ?1t, where ?1t ? N (0, b11) . (3.26)If a12 = 0, the dynamics of the conditional mean is similar to the temporary component as inLamoureux and Zhou (1996). Now, the equation that describes the conditional volatility isln?t = d2 + a21 ln?t?1 + a22 ln?t?1 + ?2t, where ?2t ? N (0, b22) , (3.27)and corresponds to the standard stochastic volatility model; in particular if a21 = 0, Eq. (3.27) isthe standard stochastic volatility model as in Andersen and S?rensen (1996) and Kim, Shephard,and Chib (1998). Finally, we learn from Eq. (3.25) that the unconditional variance is determinedby the variance?covariance matrix ? and the matrix A. For identification purposes, I assumefour different specifications for the transition matrix A. First, in model A, I consider the case inwhich the conditional mean and volatility evolve as in Eqs. (3.26) and (3.27) . Models B and Cconsider a12 = 0 and a21 = 0, respectively, allowing for the model of permanent and temporarycomponent in the first case, and the standard stochastic volatility model in the second case.Finally, model D considers the case in which a12 = a21 = 0.An interesting property is the nonnegativity of expected returns and volatilities. Thisnonnegativity of the conditional mean guarantees a positive risk premium, as suggested inMerton (1980), and has been used by Bekaert and Harvey (1995) and Jacquier, Johannes, andPolson (2007), among others. The log-normality specification for the volatility is consistentwith Andersen, Bollerslev, Diebold, and Ebens (2001) and Andersen, Bollerslev, Diebold, andLabys (2003), which show that the log-volatility process can be well approximated by a normaldistribution.3.5.2 Implied Sharpe RatioThe latent VAR implies a conditional Sharpe ratio of the formSRt =?t + ?2t /2?t, (3.28)where ?t and ?t are the conditional mean and volatility of stock returns.58 It follows that theSharpe ratio is stochastic if the innovations that affect both the numerator and denominatorin Eq. (3.28) are stochastic and do not cancel each other out. Moreover, the Sharpe ratio istime-varying due to the mean reversion of the two conditional moments. The distribution of theSharpe ratio corresponds to the sum of two correlated log-normal distributions, which is notstandard.58The squared term in the numerator comes from a Jensen?s adjustment for log-returns.863.5.3 The DataI study quarterly returns on the value-weighted index market portfolio from CRSP. Excessreturns are calculated by subtracting the quarterly yield on a three-month T-bill from thecorresponding stock return. I use this yield instead of the monthly yield due to the idiosyncraticvariation documented in Duffee (1996). The predictors are the CRSP dividend-price ratio (d-p),calculated as the log-ratio of the CRSP dividends to the price level of the CRSP value-weightedstock index; the relative bill rate (RREL), which is the difference between the three-monthtreasury bill and its four-quarter moving average; the term spread (TRM), the difference betweenthe ten-year treasury bond yield and the three-month treasury bill; the default spread (DEF ),the difference between the BAA corporate bond rate and the AAA corporate bond rate and theconsumption-wealth ratio proxy (cay).59 The RREL, TRM and DEF are obtained from theFederal Reserve statistical release. Data on the dividend-price ratio is taken from CRSP andthe time series of cay is taken from Sidney Ludvigson?s website.60 All data are quarterly fromthe period April 1953 to December 2011.3.5.4 Parameter EstimatesThe model in Eqs. (3.21) and (3.22) is nonlinear since the first equation is nonlinear in thestate-variables. The parameters are estimated via QML methods and are shown in Table 3.5.The first column corresponds to the estimates of model A, the second column shows the estimatesfor model B, and the third and fourth columns contain the parameter estimates for models Cand D, respectively. Given the frequency of returns, expected returns are persistent since theestimates for a11 range from 0.6727 to 0.7204.61 The conditional volatility is more persistentthan the conditional mean, for all model specifications.The parameter estimates of the models A through D show evidence of a strong and negativerisk-return trade-off, measured by the correlation between the innovations to the conditionalmean and the volatility of excess returns. The estimates range from -0.1760 to -0.7995, forboth the constrained and unconstrained representations, and are statistically significant. Thisfinding is consistent with the negative risk-return relationship found in Brandt and Kang (2004),Campbell and Hentschel (1992b) and Campbell (1987). The negative sign of the correlationcoefficient between the conditional mean and the volatility of returns amplifies the variability ofthe Sharpe ratio, whereas a positive correlation between expected returns and volatility makesSharpe ratios less variable than its mean or even constant.59These predictors are used in the predictability literature. See Goyal and Welch (2008) and Lettau andLudvigson (2010) for details.60I thank Sidney Ludvigson for making the time series data for cay available. This variable is calculated in aquarterly basis.61These values correspond to a monthly persistence of roughly 0.87 to 0.89.87The estimates show that there is more variation in the mean than in the conditional volatility,since the conditional variance of the innovations to the conditional mean, b11, differs substantiallyfrom that of the innovation to the conditional volatility, b22. The off-diagonal elements of thetransition matrix A are significant. However, the values for a21 are similar across models, whilethe values for a12 differ. The differences in signs of a12 and a21 are consistent with the resultsof Whitelaw (1994) and Brandt and Kang (2004), which state that the cross-autocorrelationsbetween the conditional mean and volatility offset each other through time.3.5.5 Expected Returns, Volatilities and Sharpe RatiosGiven the parameter estimates in Table 3.5, I estimate expected returns, volatilities and Sharperatios via particle filtering. The left column of Figure 3.5 presents the filtered estimates ofquarterly expected returns (first row), volatility (second row) and Sharpe ratios (third row).Each plot also shows in vertical bars the NBER recession dates. It is clear that the conditionalmean, volatility and Sharpe ratio are time varying. The quarterly mean has a standard deviationof less than 1% and it varies from 1% in the third quarter of 1974 to 3% in the last quarterof 2003. The quarterly volatility has a standard deviation of 2% and ranges from 7.3% to11.6%. Expected returns revert more quickly to their unconditional mean than do conditionalvolatilities, and this is consistent with the estimates of a11 and a22.Quarterly Sharpe ratios are displayed in the last row of the first column. The Sharpe ratiorises from the peak to the trough of the recession dates in the sample, and is consistent withthe empirical results documented by Lustig and Verdelhan (2012); Tang and Whitelaw (2011)and Lettau and Ludvigson (2010). This countercyclical variation of the Sharpe ratio is alsoconsistent with the habit formation models (Campbell and Cochrane, 1999; Constantinides,1990). Intuitively, at the peak of the business cycle, consumers enjoy consumption levels farabove their habits. As a result, a low Sharpe ratio, or low reward per unit of risk, is required fora consumer to invest in the stock index at the peak of the cycle, in contrast to the trough of acycle, where consumption levels are closer to those of the habits, which makes consumers morerelative risk averse. For an investor willing to invest in the trough of the cycle, the rewards perunit of risk or Sharpe ratios should be substantially high.OLS estimatesI calculate expected returns, volatilities and Sharpe ratios based on OLS techniques for compari-son purposes. Table 3.6 presents the estimates from OLS regressions of quarterly realized excessreturns and excess log-returns from the first quarter of 1953 to the last quarter of 2011. Theresults are generally consistent with those reported in the predictability literature. There is nosubstantial difference between the regression estimates obtained by using excess returns andexcess log-returns. At a one-quarter horizon, cay and RREL show a consistent predictive powerfor excess returns. Indeed, cay alone explains 3% of next quarter?s total variability. Adding the88lagged value of excess returns, cay, d?p, RREL and TRM explains 8% of the quarter?s variationin the next quarter?s excess return. The R?squared of 8% for log-returns is lower than thevalues reported in previous studies, but the sample, which includes the 2007-2008 financial crisis,may account for this result. The results for the volatility equation are presented in Table 3.7.In this representation, the lagged volatility, d? p, TRM and DEF are significant. The positiveserial correlation in realized volatility reflects the autoregressive conditional heteroskedasticity ofquarterly returns. The lagged value of volatility alone explains 37% of next the quarter?s excessreturn volatility. Lagged volatility values, cay, d? p, RREL, and TRM explain altogether 41%.Finally, the high R?squared value of 43% in the full volatility equation reflects that realizedvolatility is much more predictable than excess returns.Empirical moments of expected returns, volatilities and Sharpe ratios are displayed inTable 3.8. The first set of estimates is calculated based on OLS regressions of quarterlyrealized log-returns for the CRSP value-weighted index on lagged explanatory variables. Thesecond set of estimates is based on the reduced form model by Brandt and Kang (2004),in which the conditional mean and volatility of stock returns are treated as latent variables.This representation guarantees positive values for expected returns and volatilities. As in thesimulation exercises described in section 4, I find differences worth noting among the estimates.First, expected returns and volatilities calculated via OLS have a quarterly standard deviationof 2%, while the standard deviation of the filtered estimates is 1%. Filtered volatilities arehigher, on average, than the ones obtained with OLS methods and more autocorrelated. Theautocorrelation of expected returns obtained with OLS methods is 81%, in contrast with theone estimated from the filtered series, which is less than 59%. This is not surprising, sincethe regressors used for its estimation are highly persistent. The autocorrelation of the filteredestimates is consistent with the estimated value of a11.As for the Sharpe ratio estimates, there are major differences worth noting. First, the averagequarterly Sharpe ratio estimated via filtering is 26% while the OLS estimate is 30%. As for thestandard deviation estimates, the difference is quite substantial. For the OLS estimates, thestandard deviation is 42%, which is similar to the 45% reported by Lettau and Ludvigson (2010),while the standard deviation of the filtered Sharpe ratio ranges from 5%. An explanation of thisdifference is the use of standard OLS techniques for its estimation. Reduced form representationsdo not rely on predetermined conditioning variables to estimate conditional moments; the statevariables are identified from the history of returns whereas standard OLS methods generatefitted moments from a set of predictive regressions as proxies for the unobservable conditionalmean and volatility. The fitted moments depend on the joint distribution of the predictors;therefore, any model misspecification generates fitted moments that do not correspond to thetrue dynamics of the conditional mean and volatility, and as a result, the Sharpe ratio dynamics.Another important issue is that the ratio of the fitted moments does not adjust for the correlationbetween expected returns and volatility of stock returns, whereas filtering techniques do.89Alternative Reduced FormsFor comparison purposes, I also analyze an unconstrained version of the representation of Brandtand Kang (2004). The excess returns have time-series dynamics of the formyt = ?t?1 + ?t?1?t with ?t ? N (0, 1) , (3.29)where ?t?1and ?t?1 represent the conditional volatility of the excess returns. In addition, it isassumed that the conditional mean and the log-volatility are unobservable and that they followa first-order VAR process of the form[?tln?t]= d+A[?t?1ln?t?1]+ ?t with ?t ? N (0,?) , (3.30)where d, A and ? are defined as in Eq. (3.23) . The main difference between the modelrepresentation by Brandt and Kang (2004) and Eqs. (3.29) and (3.30) is that expected returnscan potentially be negative, as in Lamoureux and Zhou (1996). As in the previous model, Iconsider four model specifications for the matrix A. The covariance matrix, ?, has the samestructure as Eq. (3.23). The sign of the correlation coefficient between the conditional meanand the volatility of excess returns has the same sign as the correlation between the conditionalmean and the log-volatility.62QML estimates of the model with an unconstrained risk premia are shown in Table 3.9.Under all model specifications, the parameter estimates, are similar to the estimates of the firstmodel. An important difference is that the estimates of the off-diagonal elements a12 and a21 arenegative, although a12 is not statistically significant. The right column of Figure 3.5 displays thefiltered estimates of conditional moments and Sharpe ratios for the model with an unconstrainedrisk premia. The main difference between the constrained and unconstrained representations isthat expected returns can take negative values; indeed, expected return estimates took negativevalues for six quarters of the whole sample. Qualitatively, both latent VAR models show similardynamic behavior; in fact, the correlation coefficient between the implied Sharpe ratio estimatesis 70%.Exogenous PredictorsThe main advantage of the latent VAR approach by Brandt and Kang (2004) is that it allowsthe study of the dynamics of the conditional mean, volatility and Sharpe ratios without relyingon exogenous predictors. At the same time, useful information is potentially discarded, since anycorrelation structure between predictors and conditional moments is ignored. As a robustness62From Stein?s lemma, we have that the conditional covariance between excess returns and the conditionalvolatility is covt?1 (?t, ?t) = Et [?t] ? covt?1 (?t, ln?t) . Thus, the sign of the correlation coefficient between theconditional mean and the volatility of stock returns is the same as the conditional correlation of the conditionalmean and the log-volatility of returns.90check, I estimate an extended version of the model in which each moment is a function of thesame exogenous predictors used in the predictive regressions ( cay, d? p, RREL, and TRM).The model specification is given byyt = ?t?1 + ?t?1?t with ?t ? N (0, 1), (3.31)where [ln?tln ?t]= d+A[ln?t?1ln ?t?1]+ Cxt?1 + ?t, with ?t ? N (0,?) , (3.32)where xt denotes the de-meaned vector of predictors observed at date t.Table 3.10 reports the parameter estimates of the extended model D and also replicates forcomparison the results of model D. The estimates of A and ? are similar across the two models.When I add the exogenous predictors, all the parameter estimates of the base model decrease inmagnitude, which means that the exogenous predictors help explain some of the variation inthe moments that was left unexplained. Finally, the correlation between the innovations to themean and volatility decreases in magnitude from -0.7995 to -0.4523, both significant.In the mean equation of the extended model, the coefficients of cay, d? p, TRM (c11, c12and c14) are positive and the coefficients of RREL and DEF (c13 and c15) are negative. In thevolatility equation, all coefficients are negative, except for one, DEF . The signs of the coefficientsare all consistent with the results of the predictive regressions. However, it is important to notethat these results are not directly comparable to standard predictive regressions, since thesecoefficients correspond to regressions with the conditional moments as dependent variables.ComparisonEmpirical moments of the different Sharpe ratio estimates are displayed in Table 3.11. The first,second and third sets of Sharpe ratio estimates are based on the latent VAR approach fromthe model representation by Brandt and Kang (2004). The first representation is based on Eqs.(3.21) and (3.22), while the second representation guarantees a positive volatility only and isbased on Eqs. (3.29) and (3.30). The third representation is an extended version of the firstmodel in which the conditional moments are positive functions of exogenous predictors and isrepresented in Eqs. (3.31) and (3.32). Finally, the last set of Sharpe ratio estimates is based onthe conditional moments calculated from OLS regressions of log-returns on lagged explanatoryvariables.The results from Table 3.11 show that the average quarterly Sharpe ratios under the firsttwo models are 25% and 26%, respectively. The third model implies a quarterly Sharpe ratio of31%, while the estimates obtained from OLS methods have a quarterly Sharpe ratio of 30%. Thedifference is caused by the set of exogenous predictors included within the estimation procedure.The first set of results represents the Sharpe ratio estimates based on the set of observed returns,91while the third and fourth correspond to Sharpe ratio estimates using the history of returnsand the set of exogenous predictors. The parameter estimates used in the filtering calculationsdepend on the data used in the estimation process. In the first two models, the parameter andfiltered estimates depend on the time series of excess returns, while the last two models dependon the same series of returns as well as on the set of exogenous predictors.As for the Sharpe ratio volatility implied by the models, there are some differences worthnoting. The first two models imply a volatility of 5% and 10%, respectively. The difference isdue to the model representation. The first model considers a positive risk premia and the seconddoes not. Since the second model allows for negative Sharpe ratios, there is more variability. Asfor the third representation, the variability is 25%, which is mainly driven by the inclusion ofa set of exogenous predictors that affect the conditional mean and volatility of excess returns.None of these representations deliver a Sharpe ratio volatility of 42% as OLS methods do. Themain driver of this difference is the use of conditioning information within the estimation process.In the first two cases, the model representations as well as the history of returns determine thevariability of the Sharpe ratio. In contrast, the set of exogenous predictors that are included inthe estimation process of the third model and fourth model determines a higher variability ofthe Sharpe ratio estimates.3.6 Implications for Portfolio ChoiceIn this section, I discuss a standard model from the portfolio-choice literature and its relation tothe market Sharpe ratio.3.6.1 Portfolio Optimization: One Risky AssetI consider an investor with mean-variance preferences that optimizes the tradeoff between themean and the variance of portfolio returns. Two assets are available to an investor at time t.One is risk free, with return Rf,t+1 from time t to time t+ 1, and the other is risky. The riskyasset has simple return Rt+1 from time t to time t + 1 with conditional mean Et [Rt+1] andconditional variance ?2t . The investor allocates a share ?t of her portfolio into the risky asset.Then the portfolio return isRp,t+1 = Rf,t+1 + ?t (Rt+1 ?Rf,t+1) .The mean portfolio return is Et [Rp,t+1] = Rf,t+1 + ?t (Et [Rt+1]?Rf,t+1) , while the varianceof the portfolio is ?2pt = ?2t?2t . If the investor has mean-variance preferences, then she tradesoff between the mean and variance in a linear fashion. In other words, she maximizes a linearcombination of mean and variance with a positive weight on mean and a negative weight onvariance,max?t(Et [Rp,t+1]??2?2pt).92The solution to this optimization problem is?t =Et [Rt+1]?Rf,t+1??2t. (3.33)The optimal weight for the stock index coincides with the myopic demand and can be interpretedas the product of the relative risk tolerance63 and the market Sharpe ratio normalized by thevolatility of the market returns; that is,?t =SRt??t. (3.34)We learn from Eq. (3.34) that for investors with mean-variance preferences, the optimal allocationin the market portfolio is determined by three elements: the Sharpe ratio of the market portfolio,the conditional volatility of the market portfolio and the risk aversion parameter. Moreover, thevariability of portfolio weights is determined by the variability of Sharpe ratios and the standarddeviation of the market portfolio.Campbell and Viceira (2002) derive a similar expression by assuming an investor with powerutility and that the return on an investor?s portfolio is lognormal, with the slight difference thatthe optimal weight in Eq. (3.33) is adjusted by half the variance of the risky asset; that is,?t =Et [rt+1]? rf,t+1 + ?2t /2??2t. (3.35)Now I implement the model following the standard plug-in approach; that is, I solve theoptimization problem assuming that the mean and variance of returns are known. Once theproblem is solved, I replace the moments with their estimates obtained via regression or filteringtechniques. For simplicity, I assume that the investor ignores estimation risk while making aninvestment decision.Figure 3.6 shows the optimal allocations in Eq. (3.35) using OLS and filtering methods toestimate conditional moments assuming a risk aversion parameter ? = 5. Clearly, the portfolioweights constructed via OLS methods are more volatile than the ones obtained with the filteredmoments. Indeed, the average portfolio weight under the OLS model specification is 1.27 with astandard deviation of 2.13, in contrast to the portfolio weight computed with filtering methods,which is on average 56% with a standard deviation of 12%. Finally, the correlation between thetwo weights is 15%. These results have practical implications for portfolio allocation, especiallyfor an investor who faces proportional costs by trading the optimal portfolio of an investor withmean-variance preferences.64 As the optimal weight is proportional to the market Sharpe ratio,the percentage of wealth traded in each period will depend upon the volatility of the market63 This term is the inverse of the relative risk aversion.64This fact was noted by De Miguel, Garlappi, and Uppal (2009) for performance evaluation.93Sharpe ratio. It is clear that upward-biased estimates of the Sharpe ratio volatility would implyexcessive portfolio rebalancing, and therefore more transaction costs.3.7 Concluding RemarksIn this chapter I examine whether estimates of the variability of the Sharpe ratio may be biaseddue to limitations of the empirical methodology used in its estimation. I provide evidence thatmeasurement error in estimated Sharpe ratios helps to explain the Sharpe ratio volatility puzzle.I further show that this measurement error is caused by the use of standard OLS methods toestimate the ratio. The empirical question I address is important because many studies haveused the results implied by OLS methods to calibrate the volatility of the market Sharpe ratio.Based on simulated data from standard asset pricing models, I document that OLS methodsproduce Sharpe ratio volatility that is larger than the ratio?s true variability. Using the OLSapproach may also provide conditional moment estimates that do not necessarily correspond totheir true values.Once I have documented the upward bias in the Sharpe ratio?s variability generated by OLSmethods, I consider if using improved empirical methodologies may better reflect the ratio?strue variability. To accomplish this goal, I propose filtering methods as a way to better assessthis variation. These techniques explicitly allow for the estimation of time-varying moments byaggregating the entire history of realized returns in a parsimonious way. Moreover, filtering isflexible enough to be used with large information sets without relying on exogenous predictors,while being robust to structural breaks. I also show that filtering techniques better reflect thetrue variation of Sharpe ratios even when parameter values need to be estimated.Motivated by the simulation results, I use real data on excess stock returns to compare theSharpe ratio volatility estimates produced by OLS and filtering methods. I find that filteringmethods deliver Sharpe ratio variability estimates that are much smaller than the Sharperatio variability estimates implied from OLS methods. The difference in results from the twomethodologies arises due to the use of conditioning information within the filtering estimationprocess.My findings have significant implications for asset pricing. For example, in a portfolioallocation setting, the optimal portfolio weight is proportional to the market Sharpe ratio.Thus, upward biased estimates of the Sharpe ratio volatility would imply excessive portfoliorebalancing, and therefore more transaction costs.943.8 Figures and Tables0 10 20 30 40 50 60 70 80 90 100?0.0200.020.040.060.080.10.120.14Expected ReturnsYears OLS(a) Expected Returns (OLS)0 10 20 30 40 50 60 70 80 90 1000.150.1550.160.1650.170.1750.18VolatilityYears OLS(b) Conditional Volatility (OLS)0 10 20 30 40 50 60 70 80 90 10000.10.20.30.40.50.60.70.80.9Sharpe RatioYears OLS(c) Conditional Sharpe Ratios (OLS)0 10 20 30 40 50 60 70 80 90 10000.10.20.30.40.50.60.70.80.9Sharpe RatioYears Bansal?Yaron(d) Conditional Sharpe Ratios (BY)Figure 3.1. Comparison OLS Estimates versus Simulated Values: Long-RunRisk ModelThis figure shows the results of a simulated path of annual returns using the calibration by Bansal andYaron (2004). Each simulation has 100 annual observations of returns. Fitted values for the conditionalmean and variance were constructed using predictor variables. Panel A shows a random path of annualreturns with the fitted OLS values. Panel B shows the realized variance constructed with realized returnsalong with its OLS fitted values in dotted lines. Panel C contains the conditional Sharpe ratio estimatesbased on the OLS fitted values of the conditional mean and conditional volatility; Panel D contains theSharpe ratios implied by the model.950 10 20 30 40 50 60 70 80 90 100345678910 x 10?5TimeConsumption Growth Consumption Growth: BYConsumption Growth: Filtered(a) Consumption Growth0 10 20 30 40 50 60 70 80 90 1000.020.0250.030.0350.040.0450.050.0550.060.065TimeRisk Premia Annual Risk Premia: BYAnnual Risk Premia: Filtered(b) Risk Premia0 10 20 30 40 50 60 70 80 90 1000.120.130.140.150.160.170.180.190.20.21TimeConditional Volatility Annual Volatility: BYAnnual Volatility: Filtered(c) Conditional Volatility0 20 40 60 80 1000.240.260.280.30.320.340.360.380.40.42TimeSharpe Ratio Annual Sharpe Ratio: BYAnnual Sharpe Ratio: Filtered(d) Conditional Sharpe RatioFigure 3.2. Comparison Simulations versus Filtered values: Long-Run RisksModelThis figure shows the results of a simulated path of the volatility of consumption growth using thecalibration by Bansal and Yaron (2004). Each simulation has 100 annual return observations. PanelA shows a random path of monthly returns of the volatility of consumption growth. The dotted linerepresents the filtered values of ?2t . Panel B shows the simulated risk premia along with its filteredvalues in dotted lines. Panel C contains the simulated standard deviation of the risk premia as well as itsfiltered values. Panel D contains the simulated conditional Sharpe ratio along with its filtered values.The dashed lines are assumed to be unobservable to the econometrician, while the continuous lines arethe filtered values.960 10 20 30 40 50 60 70 80 90 100?0.100.10.20.30.40.50.60.7YearsAnnual Sharpe Ratio BYOLS(a) OLS Fitted Values0 10 20 30 40 50 60 70 80 90 100?0.100.10.20.30.40.50.60.7YearsAnnual Sharpe Ratio BYFiltered(b) Filtered Values (known parametervalues)0 10 20 30 40 50 60 70 80 90 100?0.100.10.20.30.40.50.60.7YearsAnnual Sharpe Ratio BYFiltered with Parameter Estimation(c) Filtered Values (unknown parametervalues)Figure 3.3. Comparison OLS versus Filtered EstimatesThis figure shows the results of a simulated path of the volatility of consumption growth using thecalibration by Bansal and Yaron (2004). Each simulation has 100 annual return observations. Panel Acontains the conditional Sharpe ratio estimates based on the OLS fitted values. Panel B contains thefiltered Sharpe ratio estimates implied by the long-run risks model; the dotted lines represent the annualSharpe ratio implied by the model which are assumed to be unobservable to the econometrician; PanelC contains the filtered Sharpe ratio estimates implied by the long-run risks model based on parameterestimates obtained via QML. The dotted lines represent the annual Sharpe ratio implied by the model,which are assumed to be unobservable to the econometrician. The simulations were performed with thecalibrated parameter values from Bansal and Yaron (2004).970 10 20 30 40 50 60 70 80 90 100?0.8?0.6?0.4?0.200.20.40.60.811.2TimeAnnual Sharpe Ratio OLS(a) OLS0 10 20 30 40 50 60 70 80 90 100?0.8?0.6?0.4?0.200.20.40.60.811.2TimeAnnual Sharpe Ratio External Habit Formation Model(b) External Habit FormationFigure 3.4. Comparison OLS Estimates versus Simulated Values: HabitFormation ModelThis figure shows the results of a simulated path of the volatility of consumption growth using thecalibration by Campbell and Cochrane (1999). Each simulation has 100 annual return observations.Panel A contains the conditional Sharpe ratio estimates based on the OLS fitted values; Panel B containsthe filtered Sharpe ratio estimates implied by the external habit formation model. The simulations wereperformed with the calibrated parameter values from Campbell and Cochrane (1999).9860 70 80 90 00 10?0.02?0.0100.010.020.030.04YearExpected Return60 70 80 90 00 10?0.02?0.0100.010.020.030.04YearExpected Return60 70 80 90 00 100.050.10.15YearVolatility60 70 80 90 00 100.050.10.15YearVolatility60 70 80 90 00 10?0.200.20.40.6YearSharpe Ratio60 70 80 90 00 10?0.200.20.40.6YearSharpe RatioFigure 3.5. Expected Returns, Volatility and Sharpe Ratio Estimates.This figure shows the conditional mean, volatility and Sharpe ratio estimates. The figures show thequarterly estimates of the conditional mean, ?t, conditional volatility, ?t and Sharpe ratio, SRt, obtainedvia filtering techniques. The left column corresponds to the model with a positive risk premia and theright column contains the filtered estimates of the model with an unconstrained risk premia. The verticalbars represent the NBER recession dates.9955 60 65 70 75 80 85 90 95 00 05 10?20246810121416YearOptimal WeightFigure 3.6. Portfolio WeightsThis figure shows the portfolio weights estimates based on the conditional mean, volatility and Sharperatio. The figure shows the time series of optimal weights, wt = (?t + ?2t /2)/(??2t ), where ? representsthe risk aversion parameter, and ?t and ?t are the quarterly estimates of the conditional mean andconditional volatility respectively. The figure shows the optimal weights based on OLS techniques (blue)and the model based on nonlinear latent variables, assuming a positive risk premium (red) and ? = 5.The vertical bars represent the NBER recession dates.100Table 3.1. Long-Run Risks ParametersEndowment Process Parameters Symbol BY CalibrationMean Consumption Growth ?c 0.0015LRR Persistence ? 0.979LRR Volatility Multiple ?e 0.044Mean Dividend Growth ?d 0.0015Dividend Leverage ? 3Dividend Volatility Multiple ? 4.5Dividend Consumption Exposure pi 0Baseline Volatility ? 0.0078Volatility of Volatility ?w 0.0000023Persistence of Volatility ? 0.987Preference Parameters Symbol BY CalibrationRisk Aversion ? 10EIS ? 1.5Time Discount Factor ? 0.998Endowment Process:?ct+1 = ?c + xt + ?t?t+1xt+1 = ?xt + ?e?tet+1?2t+1 = ?2 + v(?2t ? ?2)+ ?wwt+1?dt+1 = ?d + ?xt + ??tut+1 + pi?t?t+1wt+1, et+1, ut+1, ?t+1 ? i.i.d. N (0, 1) .Parameter values. This table displays the model parameters for Bansal and Yaron (2004) (BY).The endowment process is described above. All parameters are given in monthly terms. The standarddeviation of the long-run innovations is equal to the volatility of consumption growth times the long-runvolatility multiple, and the standard deviation of dividend growth innovations is equal to the volatility ofconsumption growth times the volatility multiple for dividend growth. Dividend consumption exposure isthe magnitude of the impact of the one-period consumption shock on dividend growth. Dividend leverageis the exposure of dividend growth to long-run risks.101Table 3.2. Long-Run Risks Moment Comparison: OLSMoment OLS Regressions ModelExpected Returns 0.0417 0.0417Standard Deviation 0.0301 0.0087Correlation 0.0052Volatility 0.1653 0.1641Standard Deviation 0.0092 0.0167Correlation 0.0434Conditional Sharpe Ratio 0.2645 0.3333Standard Deviation 0.1582 0.0353Correlation 0.0039Simulation results. This table displays moments calculated for the Bansal and Yaron (2004) modelfrom annual data-sets. Columns 1 and 2 display the results using years as time interval. The momentdisplayed is the median from 100,000 finite sample simulations of length 100 years. The returns on equityand the risk-free rate are aggregated to a yearly level by adding the log-returns within the year.102Table 3.3. Long-Run Risks Moment Comparison: FilteringMoment Filtering ModelExpected Returns 0.0418 0.0417Standard Deviation 0.0080 0.0087Correlation 0.5721Volatility 0.1645 0.1650Standard Deviation 0.0168 0.0167Correlation 0.5651Conditional Sharpe Ratio 0.3341 0.3333Standard Deviation 0.0322 0.0353Correlation 0.5694Simulation results. This table displays moments calculated for the Bansal and Yaron (2004) model.Columns 2 to 5 display the results using years as time interval. The moment displayed is the medianfrom 1500 finite sample simulations of length 100 years. The returns on equity and the risk-free rate areaggregated to a yearly level by adding the log-returns within the year.103Table 3.4. Asset Pricing ModelsPreferences Time-VaryingCRRA Recursive Habit Equity Premium Volatility Sharpe RatiosLucas (1978) XBreeden (1979) XMehra and Prescott (1985) X X X XRietz (1988) X X X XWeil (1989) X X X XConstantinides (1990) XAbel (1990) X X XCampbell and Cochrane (1999) X X X XChan and Kogan (2002) X X X XMenzly, Santos, and Veronesi (2004) X X X XBansal and Yaron (2004) X X X XBarro (2006) XCalvet and Fisher (2007) X X X XBarro (2009) XJermann (2010) * X X XPapanikolaou (2011) X X X XWachter (2012) X X X XChien, Cole, and Lustig (2012) X X X XAsset pricing models. This table compares features of asset pricing models which have been used to price the aggregate stock market: Lucas (1978),Breeden (1979), Mehra and Prescott (1985), Rietz (1988), Weil (1989), Constantinides (1990), Abel (1990), Campbell and Cochrane (1999), Chan andKogan (2002), Menzly, Santos, and Veronesi (2004), Bansal and Yaron (2004), Barro (2006), Calvet and Fisher (2007), Barro (2009), Jermann (2010),Papanikolaou (2011), Wachter (2012),Chien, Cole, and Lustig (2012). The comparison table is divided into two panels. The first panel focuses on thefeatures of the model (preferences, endowment and technology), while the second focuses on the pricing implications of the various models.104Table 3.5. Quasi-Maximum Likelihood Parameter EstimatesPositive Risk PremiaParameters Model A Model B Model C Model DEstimate S.E. Estimate S.E. Estimate S.E. Estimate S.E.a11 0.6727 0.0066 0.7029 0.0041 0.7204 0.0834 0.7079 0.1211a21 -0.0894 0.0279 -0.1521 0.0184 - - - -a12 0.3215 0.0011 - - -0.4948 0.0938 - -a22 0.8310 0.0025 0.7400 0.0114 0.9182 0.1798 0.8730 0.1142b11 0.2897 0.0063 0.1350 0.0194 0.0944 0.5390 0.1924 0.1674b22 0.0020 0.0070 0.0001 0.0054 0.0055 0.1111 0.0072 0.8430? -0.3073 0.0009 -0.1760 0.0004 -0.7989 0.2773 -0.7995 0.0029? 0.0131 0.0000 0.0131 0.0159 0.0131 0.0675 0.0131 0.1065? 0.0857 0.0000 0.0857 0.0005 0.0857 0.0166 0.0857 0.5172L 245.37 245.32 244.93 244.69Estimation results. This table presents the QML estimates of the models of the formyt = ?(St?1) + ?(St?1)?t,andSt = ASt?1 + ?t with ?t ? N (0,?) ,whereA =[a11 a12a21 a22],? =[b11 ??b11b22??b11b22 b22].?(St) = ? exp(S1t) and ?(St) = ? exp(S2t). The estimates are for quarterly returns on the value-weightedCRSP index in excess of the three-month Treasury bill from the second quarter of 1953 to the fourthquarter of 2011. Standard errors are reported in the column next to the parameter estimate.105Table 3.6. Regressions on Quarterly DataNo. Constant Lag Cay d-p RREL TRM DEF R2Panel A: Excess Returns: 1953:2 - 2011:41953:2 - 2011:41 0.01 0.07 0.01(2.76) (1.18)2 0.02 0.79 0.03(2.92) (2.57)3 0.01 0.08 0.80 0.03(2.81) (1.28) (2.71)4 0.14 0.76 0.02 0.04(1.84) (2.30) (1.61)5 0.13 0.08 0.67 0.02 -1.46 0.07(1.86) (1.31) (2.16) (1.64) (-2.64)6 0.16 0.06 0.61 0.03 -1.29 0.84 0.08(2.12) (0.99) (1.99) (1.96) (-2.20) (1.59)7 0.16 0.06 0.61 0.03 -1.30 0.84 -0.07 0.08(1.87) (0.99) (1.99) (1.83) (-2.30) (1.59) (-0.05)Panel B:Log Excess Returns:1953:2 - 2011:41 0.01 0.08 0.01(2.14) (1.31)2 0.01 0.82 0.03(2.34) (2.65)3 0.01 0.09 0.83 0.03(2.18) (1.41) (2.81)4 0.15 0.78 0.03 0.04(1.96) (2.35) (1.77)5 0.15 0.09 0.70 0.03 -1.37 0.07(2.00) (1.43) (2.23) (1.81) (-2.38)6 0.17 0.07 0.65 0.03 -1.21 0.79 0.08(2.25) (1.14) (2.07) (2.11) (-1.99) (1.49)7 0.18 0.07 0.63 0.03 -1.26 0.83 -0.38 0.08(2.07) (1.14) (2.01) (2.04) (-2.17) (1.56) (-0.28)OLS estimation results. This table reports estimates from OLS regressions of quarterly realizedreturns and log-returns for the CRSP VW index on lagged explanatory variables for the second quarterof 1953 to the fourth quarter of 2011. The conditioning variables are lagged realized volatility (Lag);the consumption, wealth, income ratio (cay); log dividend-price ratio (d ? p); the relative bill rate(RREL); the term spread, the difference between the ten-year Treasury bond yield and the three-monthTreasury bond yield (TRM); the Baa-Aaa default spread (DEF ). The t-stats were constructed withheteroscedasticity-consistent standard errors.106Table 3.7. Regressions on Quarterly DataNo Constant Lag Cay d-p RREL TRM DEF R2Panel C: Realized Volatility of Excess Returns:1953:2 - 2011:41 0.03 0.61 0.37(5.16) (7.39)2 0.07 -0.32 0.02(16.76) (-2.02)3 0.03 0.60 -0.23 0.38(5.35) (7.54) (-2.76)4 -0.08 -0.28 -0.03 0.12(-2.16) (-1.69) (-3.90)5 -0.05 0.54 -0.23 -0.02 -0.27 0.41(-2.74) (6.84) (-2.58) (-3.97) (-1.00)6 -0.06 0.54 -0.22 -0.02 -0.28 -0.07 0.41(-3.02) (6.84) (-2.48) (-4.39) (-1.06) (-0.51)7 -0.10 0.46 -0.16 -0.02 -0.18 -0.19 1.34 0.43(-3.64) (5.33) (-1.71) (-4.57) (-0.75) (-1.28) (2.83)Panel D: Realized Volatility of Log Excess Returns:1953:2 - 2011:41 0.03 0.61 0.36(5.15) (7.38)2 0.07 -0.32 0.02(16.7) (-2.01)3 0.03 0.60 -0.23 0.37(5.34) (7.53) (-2.75)4 -0.08 -0.28 -0.03 0.12(-2.18) (-1.68) (-3.92)5 -0.05 0.54 -0.23 -0.02 -0.27 0.41(-2.76) (6.83) (-2.57) (-3.99) (-0.99)6 -0.06 0.54 -0.22 -0.02 -0.28 -0.07 0.41(-3.04) (6.83) (-2.46) (-4.41) (-1.05) (-0.48)7 -0.10 0.46 -0.16 -0.02 -0.17 -0.19 1.34 0.42(-3.65) (5.32) (-1.71) (-4.58) (-0.73) (-1.24) (2.81)OLS estimation results. This table reports estimates from OLS regressions of quarterly realizedvolatility of returns and log-returns for the CRSP VW index on lagged explanatory variables for thesecond quarter of 1953 to the fourth quarter of 2011. The conditioning variables are lagged realizedvolatility (Lag); the consumption, wealth, income ratio (cay); log dividend-price ratio (d ? p); therelative bill rate (RREL); the term spread, the difference between the ten-year Treasury bond yield andthe three-month Treasury bond yield (TRM); the Baa-Aaa default spread (DEF ). The t-stats wereconstructed with heteroscedasticity-consistent standard errors.107Table 3.8. Summary Statistics of Expected Returns, Volatilities and SharpeRatio EstimatesMean Std. Dev. Min. Max. A.C.(1)OLS Methods?t 0.01 0.02 -0.04 0.07 0.80?t 0.07 0.02 0.02 0.22 0.79SRt 0.30 0.42 -0.61 1.84 0.81Brandt and Kang (2004)?t 0.02 0.01 0.01 0.03 0.59?t 0.09 0.01 0.07 0.12 0.85SRt 0.25 0.05 0.14 0.41 0.61Moment comparison. This table reports descriptive statistics of the estimates of expected returns,volatilities and Sharpe ratios. The first set of conditional moments are estimated from OLS regressionsof quarterly realized log-returns for the CRSP VW index on lagged explanatory variables for the firstquarter of 1953 to the last quarter of 2011. The second set of estimates are based on the reduced formmodel by Brandt and Kang (2004) in which the conditional mean and volatility of stock returns aretreated as latent variables.108Table 3.9. Quasi-Maximum Likelihood Parameter EstimatesPositive Risk PremiaParameters Model A Model B Model C Model DEstimate S.E. Estimate S.E. Estimate S.E. Estimate S.E.a11 0.5276 0.0498 0.5532 0.0002 0.5090 0.0440 0.5282 0.0026a21 -0.4967 0.0206 -0.4154 0.0001 - - - -a12 -0.0165 0.1388 - - -0.0247 0.3278 - -a22 0.8426 0.1521 0.8551 0.0000 0.8859 0.9465 0.8221 0.0068b11 0.0002 0.0014 0.0004 0.0012 0.0001 0.0029 0.0004 0.0020b22 0.0088 0.0091 0.0048 0.0013 0.0091 0.1097 0.0132 0.1982? -0.7994 0.0409 -0.7491 0.0002 -0.7999 0.2009 -0.7678 0.0012? 0.0131 0.0259 0.0131 0.0300 0.0131 0.0013 0.0131 0.1253? 0.0857 0.0045 0.0857 0.0036 0.0857 0.0001 0.0857 0.0106L 246.12 245.72 246.01 245.52Estimation results. This table presents the QML estimates of the models of the form:yt = ?(St?1) + ?(St?1)?t,andSt = ASt?1 + ?t with ?t ? N (0,?) ,whereA =[a11 a12a21 a22],? =[b11 ??b11b22??b11b22 b22],?(St) = ?+ S1t and ?(St) = ? exp(S2t).The estimates are for quarterly returns on the value-weighted CRSP index in excess of the three-monthTreasury bill from the second quarter of 1953 to the fourth quarter of 2011. Standard errors are reportedin parentheses.109Table 3.10. Quasi-Maximum Likelihood Parameter Estimates: Model withPredictorsParametersModel D Extended Model DEstimate Std. Error Estimate Std. Errora11 0.7079 0.1211 0.5135 0.3421a21 - - - -a12 - - - -a22 0.8730 0.1142 0.7649 0.1381b11 0.1924 0.1674 0.0049 0.5434b22 0.0072 0.8430 0.0006 0.0690? -0.7995 0.0029 -0.4523 0.0882? 0.0131 0.1065 0.0131 0.0021? 0.0857 0.5172 0.0857 0.0009c11 - - 7.7812 2.6462c12 - - 1.0911 0.8892c13 - - -38.8899 1.5632c14 - - 0.4021 0.7437c15 - - -39.1056 0.1624c21 - - -1.3562 1.7051c22 - - -0.1460 0.0644c23 - - -0.1245 5.6614c24 - - -5.4407 1.4016c25 - - 10.7989 0.4681L 244.69 263.46Estimation results. This table presents the QML estimates of the model of the form:yt = ?(St?1) + ?(St?1)?t,andSt = Cxt +ASt?1 + ?t with ?t ? N (0,?) ,whereC =[c11 c12 c13 c14 c15c21 c22 c23 c24 c25], A =[a11 a12a21 a22],? =[b11 ??b11b22??b11b22 b22],?(St) = ? exp(S1t) and ?(St) = ? exp(S2t).The vector of conditioning variables xt contains the de-meaned consumption, wealth, income ratio (cay);log dividend-price ratio (d? p); the relative bill rate (RREL); the term spread, the difference betweenthe ten-year Treasury bond yield and the three-month Treasury bond yield (TRM); and the Baa-Aaadefault spread (DEF ). Heteroscedasticity-consistent standard errors are reported. The estimates are forquarterly returns on the value-weighted CRSP index in excess of the three-month Treasury bill from thesecond quarter of 1953 to the fourth quarter of 2011.110Table 3.11. Summary Statistics of Sharpe Ratio EstimatesMean Std. Dev. Min. Max. A.C.(1)BK 0.25 0.05 0.14 0.41 0.61BK (Unconstrained) 0.26 0.10 -0.05 0.49 0.71BK (Extended) 0.31 0.25 0.07 1.58 0.88OLS Methods 0.30 0.42 -0.61 1.84 0.81Moment comparison. This table reports descriptive statistics of the estimates of Sharpe ratios basedon quarterly realized log-returns for the CRSP VW index for the first quarter of 1953 to the last quarterof 2011. The first, second and third sets of Sharpe ratio estimates are based on the reduced form model byBrandt and Kang (2004) (BK) in which the conditional mean and volatility of stock returns are treatedas latent variables. The first representation guarantees positive values for the conditional mean andvolatility, while the second representation guarantees a positive volatility only. The third representationis an extended version in which the conditional moments are positive functions of exogenous predictors.Finally, the last set of Sharpe ratio estimates is based on the conditional moments estimated from OLSregressions of log-returns on lagged explanatory variables.111Chapter 4ConclusionIn this thesis, I investigate filtering methods and some applications of asset pricing. Chapter 2extends the nonlinear filtering literature by proposing a new filtering method based on efficientTaylor approximations. The filter can be applied to estimate latent variables and for parameterinference. I find that the filtering methods that are based on Taylor approximations generatestate estimates that are as accurate as the estimates obtained with Monte Carlo filters, whilebeing computationally more efficient.The filter can be applied in finance and economics where stochastic volatility has become thestandard paradigm. I test the filter in three models, namely, in the standard stochastic volatilitymodel (Ghysels, Harvey, and Renault, 1996), in a model of risk and returns (Brandt and Kang,2004) and in a dynamic stochastic general equilibrium model (Flury and Shephard, 2011). In allthese applications, I find that the filter generates accurate state estimates at least five timesfaster than standard particle filters. I also show how these filters, along with perturbationmethods, can be applied to estimate dynamic stochastic general equilibrium models. Finally, byconducting a set of robustness checks I also find that the Taylor series filter is accurate in highlynonlinear and highly dimensional state-space models.Filtering methods can be naturally applied in finance, where investors and managers takeimportant decisions based on noisy information. These findings are significant, since a moreefficient use of filtering methods in finance can help agents to take more informed decisions aswell as enable them to identify underlying risks in the economy.Chapter 3 investigates the dynamic behavior of the market Sharpe ratio. I have examinedestimates of the Sharpe ratio volatility and question whether they are biased due to limitations ofthe empirical methodology used to estimate them. I show that measurement error in estimatedSharpe ratios explains the high time-series variation documented in the empirical literature(Lettau and Ludvigson, 2010). I also show that filtering methods are a better approach to assessthe Sharpe ratio?s time variation.112My findings have significant implications for portfolio allocation, especially for investors whoface proportional costs by trading the mean-variance optimal portfolio. As the optimal weightis proportional to the market Sharpe ratio, the percentage of wealth traded in each period willdepend upon the volatility of the market Sharpe ratio. It is clear that upward-biased estimatesof the Sharpe ratio volatility would imply excessive portfolio rebalancing, and therefore moretransaction costs.4.1 LimitationsBoth chapters have room for improvement. In the first chapter, we learned that although thenonlinear filters with Taylor approximations provide accurate results for a number of problems,some care should be taken in their modeling and implementation. For the modeling part, asignificant amount of work has to be done to select observation equations that generate accuratestate and parameter estimates. Moreover, the filter may diverge when the functions that definethe observation equation or transition equation of the state-space models are not differentiableor the Taylor series approximations are not uniformly convergent.Likewise, in the essay on the volatility of the market Sharpe ratio, filtering is just onetechnique that provides a correct estimate of the volatility of the market Sharpe ratio. Theremay be other techniques that could provide accurate measures of the market Sharpe ratiovariability, such as the mixed data sampling (MIDAS) regression models (Ghysels, Santa-Clara,and Valkanov, 2005, 2006) and related econometric methods that are based on data sampled atdifferent frequencies.Lastly, the CRSP index is a proxy for a market portfolio that is unobservable. Themeasurement error generated by this proxy generates some additional variation that I am unableto explicitly account for in the current setup. By identifying this source of variation, we wouldbe able to know to what extent the Sharpe ratio variability is generated by the estimationmethodology, the measurement error of the market portfolio or any source of risk.4.2 Future WorkThe filtering techniques developed in Chapter 2 have the flexibility to extend to non-differentiablefunctions or non-Gaussian errors. This can be done by using a basis of orthogonal polynomials(such as the Chebyshev, Hermite, Legendre or Laguerre polynomials) to approximate nonlinearfunctions, instead of using the Taylor series. As a result, more general state-space models canbe analyzed through this filtering setup.113In addition to the study of non-differentiable functions, there are a number of extensionsthat I plan to follow based on this setup, such as optimal selection of observation equations;the addition of extra centers of expansions to conduct better state and parameter estimation;the use of Bayesian techniques for parameter estimation and finally, the application of thesetechniques in Markov switching models.As for Chapter 3, a number of extensions can be done. The most natural applicationconsists of formally studying the implications of filtering in optimal portfolio allocation andpredictability, such as in Section 3.6. A large body of empirical work has found evidence ofpredictability of returns in equity and across other financial markets. 65 Such predictability hasimportant implications for the asset allocation of investors. The empirical evidence suggeststhat long-term investors should take into account the predictability in strategic asset allocation(Campbell and Viceira, 2002). In particular, one strand of the literature advocates, in principle,the countercyclical market timing of the portfolio equity share, using predictive variables forexpected returns. In the standard portfolio allocation setup, rebalancing to a fixed strategicequity share ignores the time variation in the equity risk premium. If such variation is realand persists in a similar way to the historical asset return data, a fixed strategic equity weightwould not be optimal. The rebalancing of the equity share should therefore be modified so as toincorporate predictability within the asset allocation setup. Filtering techniques are the naturalapproach to incorporate such predictability.Another natural extension of Chapter 3 is based on the analysis of the cross-section of theequity Sharpe ratios which can vary significantly by the characteristics of the firm or the portfolio.We learn from Chapter 3 that the Sharpe ratio of any asset is unobservable. However, filteringtechniques are the natural econometric techniques to be applied for inference purposes. Anotherimportant extension is related to the data-based performance measures for asset pricing models.Backus, Chernov, and Zin (2012) introduce entropy as a measure for capturing the dispersionand time-series dependence of a model-implied pricing kernel. This measure is linearly related tothe volatility of the market Sharpe ratio. In this extension I intend to empirically quantify theentropy of asset pricing models via filtering methods and to relate it to model-implied measures,such as the market Sharpe ratio and its volatility.Even 40 years after the discovery of filtering techniques, we still have the capacity fortheoretical and empirical applications. Given that, in most cases, investors and econometriciansonly have partial information, I posit that filtering techniques are a powerful tool that shouldbe used to analyze dynamic models.65 More recently, the evidence on predictability has grown and become inconclusive (Boudoukh, Richardson,and Whitelaw, 2008; Campbell and Shiller, 1988; Fama and French, 1989; Goyal and Welch, 2008; Hodrick, 1992;Keim and Stambaugh, 1986; Lewellen, 2004; Stambaugh, 1999; Valkanov, 2003).114BibliographyAbel, A. B., 1990. Asset Prices under Habit Formation and Catching up with the Joneses. TheAmerican Economic Review 80(2), pp. 38?42. ? pages 83, 104An, S., F. Schorfheide, 2007. Bayesian Analysis of DSGE Models. Econometric Reviews 26(2-4),113?172. ? pages 32Andersen, T. G., T. Bollerslev, F. X. Diebold, H. Ebens, 2001. The distribution of realizedstock return volatility. Journal of Financial Economics 61(1), 43 ? 76. ? pages 4, 20, 86Andersen, T. G., T. Bollerslev, F. X. Diebold, P. Labys, 2003. Modeling and ForecastingRealized Volatility.. Econometrica 71(2), 579. ? pages 4, 20, 86Andersen, T. G., B. E. S?rensen, 1996. GMM Estimation of a Stochastic Volatility Model: AMonte Carlo Study. Journal of Business and Economic Statistics 14(3), pp. 328?352. ? pages4, 19, 27, 86Anderson, B., J. B. Moore, 1979. Optimal Filtering vol. 1, . ? pages 8Ang, A., G. Bekaert, 2007. Stock Return Predictability: Is it There?. The Review of FinancialStudies 20(3), pp. 651?707. ? pages 77Ang, A., J. Liu, 2007. Risk, return, and dividends. Journal of Financial Economics 85(1), 1 ? 38.? pages 85Ang, A., M. Piazzesi, 2003. A no-arbitrage vector autoregression of term structure dynamicswith macroeconomic and latent variables. Journal of Monetary Economics 50(4), 745 ? 787.? pages 72Asai, M., M. McAleer, J. Yu, 2006. Multivariate Stochastic Volatility: a Review. EconometricReviews 25(2-3), 145?175. ? pages 41Back, K., 2010. Asset Pricing and Portfolio Choice Theory. Oxford University Press, . ? pages141Backus, D., M. Chernov, S. E. Zin, 2012. Sources of Entropy in Representative Agent Models.Working Paper New York University. ? pages 84, 114Bansal, R., D. Kiku, A. Yaron, 2012a. An Empirical Evaluation of the Long-Run Risks Modelfor Asset Prices. Critical Finance Review 1, 183?221. ? pages 73, 143, 147, 2012b. Risks For the Long Run: Estimation with Time Aggregation. Working PaperDuke University. ? pages 75115Bansal, R., A. Yaron, 2004. Risks for the Long Run: A Potential Resolution of Asset PricingPuzzles. The Journal of Finance 59(4), 1481?1509. ? pages 4, 26, 68, 69, 71, 73, 83, 95, 96,97, 101, 102, 103, 104Barillas, F., L. P. Hansen, T. J. Sargent, 2009. Doubts or variability?. Journal of EconomicTheory 144(6), 2388 ? 2418. ? pages 25Barro, R. J., 2006. Rare Disasters and Asset Markets in the Twentieth Century. The QuarterlyJournal of Economics 121(3), pp. 823?866. ? pages 83, 104, 2009. Rare Disasters, Asset Prices, and Welfare Costs. The American EconomicReview 99(1), pp. 243?264. ? pages 83, 104Basak, S., D. Cuoco, 1998. An Equilibrium Model with Restricted Stock Market Participation.The Review of Financial Studies 11(2), pp. 309?341. ? pages 83Beeler, J., J. Y. Campbell, 2012. The Long-Run Risks Model and Aggregate Asset Prices: AnEmpirical Assessment. Critical Finance Review 1, 141?182. ? pages 76, 147Bekaert, G., C. R. Harvey, 1995. Time-Varying World Market Integration. The Journal ofFinance 50(2), pp. 403?444. ? pages 86Bhamra, H. S., R. Uppal, 2010. Asset Prices with Heterogeneity in Preferences and Beliefs.Working Paper Edhec Business School and University of British Columbia. ? pages 83Bidder, R., M. Smith, 2011. Doubts and Variability. New York University Working Paper. ?pages 4, 25, 26Bloom, N., M. Floetotto, N. Jaimovich, I. Saporta-Eksten, S. Terry, 2012. Really UncertainBusiness Cycles. Working Paper: Stanford University. ? pages 25Boguth, O., M. Carlson, A. Fisher, M. Simutin, 2011. Conditional risk and performanceevaluation: Volatility timing, overconditioning, and new estimates of momentum alphas.Journal of Financial Economics 102(2), 363 ? 389. ? pages 4Bollerslev, T., R. Engle, J. Wooldridge, 1988. A capital asset pricing model with time-varyingcovariances. The Journal of Political Economy pp. 116?131. ? pages 77Bollerslev, T., J. M. Wooldridge, 1992. Quasi-maximum likelihood estimation and inference indynamic models with time-varying covariances. Econometric Reviews 11(2), 143?172. ?pages 4, 8, 18, 19Boudoukh, J., M. Richardson, R. F. Whitelaw, 2008. The Myth of Long-Horizon Predictability.Review of Financial Studies 21(4), 1577?1605. ? pages 114Brandt, M. W., Q. Kang, 2004. On the relationship between the conditional mean and volatilityof stock returns: A latent VAR approach. Journal of Financial Economics 72(2), 217 ? 257.? pages 4, 19, 26, 29, 30, 31, 32, 47, 48, 49, 50, 70, 71, 72, 74, 84, 85, 87, 88, 89, 90, 91, 108,111, 112, 143Breeden, D. T., 1979. An intertemporal asset pricing model with stochastic consumption andinvestment opportunities. Journal of Financial Economics 7(3), 265 ? 296. ? pages 83, 104116Breen, W., L. R. Glosten, R. Jagannathan, 1989. Economic Significance of PredictableVariations in Stock Index Returns. The Journal of Finance 44(5), pp. 1177?1189. ? pages 77Broto, C., E. Ruiz, 2004. Estimation Methods for Stochastic Volatility Models: a Survey..Journal of Economic Surveys 18(5), 613 ? 649. ? pages 4, 20, 22Calvet, L., A. J. Fisher, L. Wu, 2013. Staying on Top of the Curve: A Cascade Model of TermStructure Dynamics. Working Paper: University of British Columbia. ? pages 18, 82Calvet, L. E., A. J. Fisher, 2007. Multifrequency News and Stock Returns. Journal of FinancialEconomics 86(1), 178 ? 212. ? pages 83, 104Campbell, J., R. Shiller, 1988. The dividend-price ratio and expectations of future dividendsand discount factors. Review of Financial Studies 1(3), 195?228. ? pages 73, 74, 114, 143Campbell, J., A. Sunderam, L. Viceira, 2011. Inflation bets or deflation hedges? the changingrisks of nominal bonds. Working Paper: Harvard University. ? pages 18, 2012. Inflation Bets or Deflation Hedges? The Changing Risks of Nominal Bonds.Working Paper Harvard University. ? pages 82Campbell, J., L. Viceira, 2002. Strategic Asset Allocation: Portfolio Choice for Long-TermInvestors. Oxford University Press, USA, . ? pages 93, 114, 143Campbell, J. Y., 1987. Stock returns and the term structure. Journal of Financial Economics18(2), 373 ? 399. ? pages 77, 85, 87, 1991. A Variance Decomposition for Stock Returns. The Economic Journal 101(405),pp. 157?179. ? pages 77Campbell, J. Y., J. H. Cochrane, 1999. By Force of Habit: A Consumption Based Explanationof Aggregate Stock Market Behavior. Journal of Political Economy 107(2), pp. 205?251. ?pages 68, 71, 75, 83, 84, 88, 98, 104, 153, 154Campbell, J. Y., L. Hentschel, 1992a. No news is good news: An asymmetric model of changingvolatility in stock returns. Journal of Financial Economics 31(3), 281 ? 318. ? pages 32, 1992b. No news is good news: An asymmetric model of changing volatility in stockreturns. Journal of Financial Economics 31(3), 281 ? 318. ? pages 74, 87Casella, G., C. P. Robert, 1996. Rao-Blackwellisation of Sampling Schemes. Biometrika 83(1),81?94. ? pages 17, 2004. Monte Carlo Statistical Methods vol. 319. Citeseer, . ? pages 137, 138Chan, Y. L., L. Kogan, 2002. Catching Up with the Joneses: Heterogeneous Preferences and theDynamics of Asset Prices.. Journal of Political Economy 110(6), 1255 ? 1285. ? pages 83, 104Chib, S., F. Nardari, N. Shephard, 2006. Analysis of High Dimensional Multivariate StochasticVolatility Models. Journal of Econometrics 134(2), 341?371. ? pages 39, 42, 55Chib, S., Y. Omori, M. Asai, 2009. Multivariate Stochastic Volatility. pp. 365?400. ? pages 39,41117Chien, Y., H. Cole, H. Lustig, 2012. Is the Volatility of the Market Price of Risk Due toIntermittent Portfolio Rebalancing?. American Economic Review 102(6), 2859?96. ? pages68, 83, 104Christoffersen, P., K. Jacobs, L. Karoui, K. Mimouni, 2012. Nonlinear Kalman Filtering inAffine Term Structure Models. Working Paper: University of Toronto. ? pages 18, 133Clark, T., 2009. Is the Great Moderation over? An Empirical Analysis. Economic Review 4,5?42. ? pages 25Conrad, J., G. Kaul, 1988. Time-Variation in Expected Returns. The Journal of Business 61(4),pp. 409?425. ? pages 72Constantinides, G. M., 1990. Habit Formation: A Resolution of the Equity Premium Puzzle.Journal of Political Economy pp. 519 ? 543. ? pages 71, 83, 88, 104Constantinides, G. M., D. Duffie, 1996. Asset Pricing with Heterogeneous Consumers. Journalof Political Economy 104(2), pp. 219?240. ? pages 83Crisan, D., B. Rozovskii, 2011. The Oxford handbook of Nonlinear Filtering. Oxford UniversityPress, . ? pages 7, 17, 69, 81Croushore, D., T. Stark, 2001. A real-time data set for macroeconomists. Journal ofEconometrics 105(1), 111 ? 130. ? pages 26Dangl, T., M. Halling, 2012. Predictive regressions with time-varying coefficients. Journal ofFinancial Economics 106(1), 157 ? 181. ? pages 72De Miguel, V., L. Garlappi, R. Uppal, 2009. Optimal versus Naive Diversification: HowInefficient Is the 1/N Portfolio Strategy?. The Review of Financial Studies 22(5), pp.1915?1953. ? pages 93DeJong, D. N., C. Dave, 2011. Structural Macroeconometrics. Princeton University Press, . ?pages 34Doucet, A., N. de Freitas, N. Gordon, A. Smith, 2001. Sequential Monte Carlo Methods inPractice. Springer, . ? pages 6, 69, 81Doucet, A., N. De Freitas, K. Murphy, S. Russell, 2000. Rao-Blackwellised particle filtering fordynamic Bayesian networks. Proceedings of the Sixteenth Conference on Uncertainty inArtificial Intelligence pp. 176?183. ? pages 4, 6, 17Duffee, G. R., 1996. Idiosyncratic Variation of Treasury Bill Yields. The Journal of Finance51(2), pp. 527?551. ? pages 87Einicke, G. A., 2012. Smoothing, Filtering and Prediction: Estimating the Past, Present andFuture. Rijeka, Croatia: Intech. ISBN, . ? pagesEpstein, L. G., S. E. Zin, 1989. Substitution, Risk Aversion, and the Temporal Behavior ofConsumption and Asset Returns: A Theoretical Framework. Econometrica 57(4), pp.937?969. ? pages 83Fama, E. F., K. R. French, 1988a. Dividend yields and expected stock returns. Journal ofFinancial Economics 22(1), 3 ? 25. ? pages 77118, 1988b. Permanent and Temporary Components of Stock Prices. Journal of PoliticalEconomy 96(2), pp. 246?273. ? pages 27, 86, 1989. Business conditions and Expected Returns on Stocks and Bonds. Journal ofFinancial Economics 25(1), 23?49. ? pages 114Fama, E. F., G. Schwert, 1977. Asset returns and inflation. Journal of Financial Economics 5(2),115 ? 146. ? pages 77Fern?ndez-Villaverde, J., J. F. Rubio-Ram?rez, 2007. Estimating Macroeconomic Models: ALikelihood Approach.. Review of Economic Studies 74(4), 1059 ? 1087. ? pages 3, 5, 25, 33Fern?ndez-Villaverde, J., J. F. Rubio-Ram?rez, M. S. Santos, 2006. Convergence Properties ofthe Likelihood of Computed Dynamic Models. Econometrica 74(1), 93?119. ? pages 3, 33Ferson, W. E., S. Sarkissian, T. T. Simin, 2003. Spurious Regressions in Financial Economics?.The Journal of Finance 58(4), pp. 1393?1413. ? pages 69Flury, T., N. Shephard, 2011. Bayesian Inference based only on Simulated Likelihood: ParticleFilter Analysis of Dynamic Economic Models. Econometric Theory 27(Special Issue 05),933?956. ? pages 32, 35, 38, 39, 112French, K. R., G. Schwert, R. F. Stambaugh, 1987. Expected stock returns and volatility.Journal of Financial Economics 19(1), 3 ? 29. ? pages 32, 77Gallant, A., H. White, 1988. A Unified Theory of Estimation and Inference for NonlinearDynamic Models. Basil Blackwell New York, . ? pages 18, 19, 139Garlappi, L., G. Skoulakis, 2010. Solving Consumption and Portfolio Choice Problems: TheState Variable Decomposition Method. Review of Financial Studies 23(9), 3346?3400. ?pages 5, 10, 13, 2011. Taylor series approximations to expected utility and optimal portfolio choice.Mathematics and Financial Economics 5, 121?156. ? pages 11, 13G?rleanu, N., S. Panageas, 2011. Young, Old, Conservative and Bold: The Implications ofHeterogeneity and Finite Lives for Asset Pricing. Working Paper University of Chicago. ?pages 83Ghysels, E., A. C. Harvey, E. Renault, 1996. Stochastic volatility vol. 14. ElsevierNorth-Holland, . ? pages 20, 112Ghysels, E., P. Santa-Clara, R. Valkanov, 2005. There is a Risk-Return Trade-Off After All.Journal of Financial Economics 76(3), 509?548. ? pages 85, 113, 2006. Predicting Volatility: Getting the Most Out of Return Data Sampled atDifferent Frequencies. Journal of Econometrics 131, 59 ? 95. ? pages 77, 113Glosten, L. R., R. Jagannathan, D. E. Runkle, 1993. On the Relation between the ExpectedValue and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance48(5), pp. 1779?1801. ? pages 74, 77, 85Gomes, F., A. Michaelides, 2008. Asset Pricing with Limited Risk Sharing and HeterogeneousAgents. Review of Financial Studies 21(1), 415 ? 448. ? pages 83119Gordon, N. J., D. J. Salmond, A. F. Smith, 1993. Novel Approach to Nonlinear/Non-GaussianBayesian State Estimation. IEE Proceedings F (Radar and Signal Processing) 140(2),107?113. ? pages 6, 39, 41, 54Goyal, A., I. Welch, 2008. A Comprehensive Look at the Empirical Performance of EquityPremium Prediction. The Review of Financial Studies 21(4), pp. 1455?1508. ? pages 77, 87,114Guvenen, F., 2009. A Parsimonious Macroeconomic Model for Asset Pricing. Econometrica77(6), pp. 1711?1750. ? pages 83Hamilton, J. D., 1994. Time Series Analysis. Princeton University Press, . ? pages 6, 27, 32,69, 85Hansen, L. P., R. Jagannathan, 1991. Implications of Security Market Data for Models ofDynamic Economies. Journal of Political Economy 99(2), pp. 225?262. ? pages 142Hansen, L. P., T. J. Sargent, 1983. Aggregation Over Time and the Inverse Optimal PredictorProblem for Adaptive Expectations in Continuous Time. International Economic Review24(1), pp. 1?20. ? pages 75Harrison, J., D. Kreps, 1979. Martingales and arbitrage in multiperiod securities markets.Journal of Economic Theory 20(3), 381?408. ? pages 72, 141Harvey, A., E. Ruiz, N. Shephard, 1994. Multivariate Stochastic Variance Models. The Reviewof Economic Studies 61(2), pp. 247?264. ? pages 42Harvey, A. C., N. Shephard, 1996. Estimation of an Asymmetric Stochastic Volatility Model forAsset Returns. Journal of Business & Economic Statistics 14(4), pp. 429?434. ? pages 22Harvey, C. R., 2001. The Specification of Conditional Expectations. Journal of EmpiricalFinance 8(5), 573 ? 637. ? pages 85Heaton, J., 1995. An Empirical Investigation of Asset Pricing with Temporally DependentPreference Specifications. Econometrica 63(3), pp. 681?717. ? pages 75Hodrick, R. J., 1992. Dividend Yields and Expected Stock Returns: Alternative Procedures forInference and Measurement. The Review of Financial Studies 5(3), pp. 357?386. ? pages 77,114Ito, K., K. Xiong, 2000. Gaussian Filters for Nonlinear Filtering Problems. Automatic Control,IEEE Transactions on 45(5), 910 ?927. ? pages 4, 8, 23Jacquier, E., M. Johannes, N. Polson, 2007. MCMC maximum likelihood for latent statemodels. Journal of Econometrics 137(2), 615 ? 640. ? pages 27, 86Jacquier, E., N. G. Polson, P. E. Rossi, 2004. Bayesian analysis of stochastic volatility modelswith fat-tails and correlated errors. Journal of Econometrics 122(1), 185 ? 212. ? pages 27Jazwinski, A. H., 1970. Stochastic Processes and Filtering Theory. Academic Press, . ? pages 5Jermann, U. J., 2010. The equity premium implied by production. Journal of FinancialEconomics 98(2), 279 ? 296. ? pages 83, 104120Jiming, J., 2010. Large Sample Techniques for Statistics. Springer, . ? pages 11Johannes, M., N. Polson, 2009. Particle Filtering. pp. 1015?1029. ? pages 136Johnson, T., J. Lee, 2012. Systematic Volatility of Unpriced Earnings Shocks. University ofIllinois, Urbana-Champaign Working Paper. ? pages 76Judd, K. L., 1998. Numerical Methods in Economics. MIT Press (Cambridge, Mass.), . ?pages 5, 34Julier, S. J., J. K. Uhlmann, 1997. A New Extension of the Kalman Filter to Nonlinear Systems.Proceedings of AeroSense: 11th International Symposium on Aerospace/Defense Sensing,Simulation and Controls, Orlando, FL. ? pages 5, 8, 133, 2004. Unscented Filtering and Nonlinear Estimation. Proceedings of the IEEE 92(3),401 ? 422. ? pages 133Jungbacker, B., S. J. Koopman, 2006. Monte Carlo Likelihood Estimation for ThreeMultivariate Stochastic Volatility Models. Econometric Reviews 25(2-3), 385?408. ? pages 43Justiniano, A., G. E. Primiceri, 2008. The Time-Varying Volatility of MacroeconomicFluctuations. American Economic Review 98(3), 604?41. ? pages 25Kalman, R. E., 1960. A new approach to linear filtering and prediction problems. Transactionsof the ASME?Journal of Basic Engineering 82(Series D), 35?45. ? pages 3, 7, 10, 131Kandel, S., R. F. Stambaugh, 1990. Expectations and Volatility of Consumption and AssetReturns. The Review of Financial Studies 3(2). ? pages 77Keim, D. B., R. F. Stambaugh, 1986. Predicting returns in the stock and bond markets.Journal of Financial Economics 17(2), 357 ? 390. ? pages 114Kim, C.-J., C. R. Nelson, 1999. State-Space Models with Regime Switching: Classical andGibbs-Sampling Approaches with Applications vol. 1. the MIT Press, . ? pages 6, 69Kim, S., N. Shephard, S. Chib, 1998. Stochastic Volatility: Likelihood Inference andComparison with ARCH Models. The Review of Economic Studies 65(3), 361?393. ? pages27, 86Koopman, S. J., G. Sandmann, 1998. Estimation of Stochastic Volatility Models via MonteCarlo Maximum Likelihood. Journal of Econometrics 87(2), 271 ? 301. ? pages 22Lamoureux, C., G. Zhou, 1996. Temporary components of stock returns: What do the data tellus?. Review of Financial Studies 9(4), 1033?1059. ? pages 27, 72, 86, 90Lettau, M., S. Ludvigson, 2001a. Consumption, Aggregate Wealth, and Expected StockReturns. The Journal of Finance 56(3), 815?849. ? pages 77, 2001b. Resurrecting the (C)CAPM: A Cross-Sectional Test When Risk Premia AreTime Varying. Journal of Political Economy 109(6), pp. 1238?1287. ? pages 77, 2010. Measuring and Modeling Variation in the Risk-Return Trade-Off. Handbook ofFinancial Econometrics. ? pages 68, 71, 77, 85, 87, 88, 89, 112121Lettau, M., H. Uhlig, 2002. The Sharpe Ratio and preferences: A Parametric Approach.Macroeconomic Dynamics 6(02), 242?265. ? pages 142Lettau, M., S. Van Nieuwerburgh, 2008. Reconciling the Return Predictability Evidence.Review of Financial Studies 21(4), 1607?1652. ? pages 69Lewellen, J., 1999. The time-series relations among expected return, risk, and book-to-market.Journal of Financial Economics 54(1), 5 ? 43. ? pages 77, 2004. Predicting returns with financial ratios. Journal of Financial Economics 74(2),209 ? 235. ? pages 114Lintner, J., 1965. Security Prices, Risk, and Maximal Gains From Diversification. The Journalof Finance 20(4), pp. 587?615. ? pages 83Lucas, R., 1978. Asset Prices in an Exchange Economy. Econometrica pp. 1429 ? 1445. ?pages 83, 104Ludvigson, S., 2012. Advances in Consumption-Based Asset Pricing. Working Paper: New YorkUniversity. ? pages 4, 26Ludvigson, S. C., S. Ng, 2007. The Empirical Risk-Return Relation: A Factor AnalysisApproach. Journal of Financial Economics 83(1), 171 ? 222. ? pages 71, 77, 85Lundblad, C., 2007. The Risk-return Trade-off in the long run: 1836-2003. Journal of FinancialEconomics 85(1), 123 ? 150. ? pages 32, 85Lustig, H. N., A. Verdelhan, 2012. Business Cycle Variation in the Risk-Return Trade-Off.Journal of Monetary Economics (Forthcoming). ? pages 71, 88Martin, I., 2012. Simple Variance Swaps. Working Paper Stanford University. ? pages 84Maz?ya, V., G. Schmidt, 1996. On Approximate Approximations Using Gaussian Kernels. IMAJournal of Numerical Analysis 16(1), 13?29. ? pages 4McConnell, M. M., G. Perez-Quiros, 2000. Output Fluctuations in the United States: WhatHas Changed Since the Early 1980?s?. The American Economic Review 90(5). ? pages 25Mehra, R., E. C. Prescott, 1985. The equity premium: A puzzle. Journal of MonetaryEconomics 15(2), 145 ? 161. ? pages 83, 104Melsa, J. L., A. P. Sage, 1971. Estimation Theory with Applications to Communications andControl. McGraw-Hill Book Company, . ? pagesMenzly, L., T. Santos, P. Veronesi, 2004. Understanding Predictability. Journal of PoliticalEconomy 112(1), pp. 1?47. ? pages 104Merton, R. C., 1969. Lifetime Portfolio Selection under Uncertainty: The Continuous-TimeCase. The Review of Economics and Statistics 51(3), pp. 247?257. ? pages 71Merton, R. C., 1971. Optimum consumption and portfolio rules in a continuous-time model.Journal of Economic Theory 3(4), 373 ? 413. ? pages 71Merton, R. C., 1973. An Intertemporal Capital Asset Pricing Model. Econometrica 41(5), pp.867?887. ? pages 85122, 1980. On estimating the expected return on the market: An exploratory investigation.Journal of Financial Economics 8(4), 323 ? 361. ? pages 86Newey, W., K. West, 1987. A Simple, Positive Semi-Definite, Heteroskedasticity andAutocorrelation Consistent Covariance Matrix. Econometrica 55(3), 703?708. ? pages 139Newey, W. K., K. D. West, 1994. Automatic Lag Selection in Covariance Matrix Estimation.The Review of Economic Studies 61(4), 631?653. ? pages 139Nielsen, L. T., M. Vassalou, 2004. Sharpe Ratios and Alphas in Continuous Time. The Journalof Financial and Quantitative Analysis 39(1), pp. 103?114. ? pages 143Papanikolaou, D., 2011. Investment Shocks and Asset Prices. Journal of Political Economy119(4), pp. 639?685. ? pages 83, 104Pastor, L., M. Sinha, B. Swaminathan, 2008. Estimating the Intertemporal Risk-ReturnTradeoff Using the Implied Cost of Capital. The Journal of Finance 63(6), pp. 2859?2897. ?pages 85Pitt, M. K., N. Shephard, 1999. Filtering via Simulation: Auxiliary Particle Filters.. Journal ofthe American Statistical Association 94(446), 590 ? 599. ? pages 6P?stor, L., R. F. Stambaugh, 2009. Predictive Systems: Living with Imperfect Predictors..Journal of Finance 64(4), 1583 ? 1628. ? pages 72Rietz, T. A., 1988. The equity risk premium a solution. Journal of Monetary Economics 22(1),117 ? 131. ? pages 83, 104Romero, A., 2012. Filtering via Taylor Series Approximations. Working Paper: University ofBritish Columbia. ? pagesRuiz, E., 1994. Quasi-maximum likelihood estimation of stochastic volatility models.. Journal ofEconometrics 63(1), 289 ? 306. ? pages 22Rytchkov, O., 2012. Filtering out Expected Dividends and Expected Returns. QuarterlyJournal of Finance (Forthcoming). ? pages 39, 69, 72Savits, T. H., 2006. Some statistical applications of Faa di Bruno. Journal of MultivariateAnalysis 97(10), 2131 ? 2140. ? pages 5, 6, 10, 11, 12, 13, 126, 127Schmitt-Grohe, S., M. Uribe, 2004. Solving dynamic general equilibrium models using asecond-order approximation to the policy function. Journal of Economic Dynamics andControl 28(4), 755?775. ? pages 5, 19, 35, 38, 67Schwert, G. W., 1989. Why Does Stock Market Volatility Change Over Time?. The Journal ofFinance 44(5), pp. 1115?1153. ? pages 77Scruggs, J. T., 1998. Resolving the Puzzling Intertemporal Relation between the Market RiskPremium and Conditional Market Variance: A Two-Factor Approach. The Journal of Finance53(2), pp. 575?603. ? pages 85Sharpe, W. F., 1964. Capital Asset Prices: A Theory of Market Equilibrium under Conditionsof Risk. The Journal of Finance 19(3), pp. 425?442. ? pages 83123Shephard, N., 1996. Statistical Aspects of ARCH and Stochastic Volatility. Monographs onStatistics and Applied Probability 65, 1?68. ? pages 42Shephard, N. E., 2005. Stochastic Volatility: Selected Readings. Oxford University Press, USA, .? pages 20Simon, D., 2006. Optimal State Estimation: Kalman, H-Infinity, and Nonlinear Approaches.Wiley-Interscience, . ? pagesStambaugh, R. F., 1999. Predictive regressions. Journal of Financial Economics 54(3), 375?421.? pages 114Stein, C. M., 1981. Estimation of the Mean of a Multivariate Normal Distribution. The Annalsof Statistics 9(6), 1135?1151. ? pagesStock, J., M. Watson, 2002. Has the Business Cycle Changed and Why?. NBER WorkingPapers. ? pages 25Tallarini, T. D. J., 2000. Risk-sensitive real business cycles. Journal of Monetary Economics45(3), 507 ? 532. ? pages 25Tang, Y., R. F. Whitelaw, 2011. Time-Varying Sharpe Ratios and Market Timing. QuarterlyJournal of Finance 1(3), 465 ? 493. ? pages 71, 77, 88Tanizaki, H., R. Mariano, 1996. Nonlinear filters based on taylor series expansions?.Communications in Statistics-Theory and Methods 25(6), 1261?1282. ? pages 6Taylor, S., 2008. Modelling Financial Time Series. World Scientific, . ? pages 20Valkanov, R., 2003. Long-horizon regressions: theoretical results and applications. Journal ofFinancial Economics 68(2), 201 ? 232. ? pages 114van Binsbergen, J., J. Fernandez-Villaverde, R. Koijen, J. F. Rubio-Ramirez, 2012. The TermStructure of Interest Rates in a DSGE Model with Recursive Preferences. Journal ofMonetary Economics: Forthcoming. ? pages 81van Binsbergen, J., R. Koijen, 2011. Likelihood-based Estimation of Exactly-SolvedPresent-Value Models. Working Paper: University of Chicago. ? pages 18, 82, 133van Binsbergen, J. H., R. S. J. Koijen, 2010. Predictive Regressions: A Present-ValueApproach.. Journal of Finance 65(4), 1439 ? 1471. ? pages 72Van Der Merwe, R., A. Doucet, N. De Freitas, E. Wan, 2001. The Unscented Particle Filter.Advances in Neural Information Processing Systems pp. 584?590. ? pages 6Vuolteenaho, T., 2000. Understanding the Aggregate Book-to-Market Ratio. Unpublished paper,Harvard University. ? pages 77Wachter, J. A., 2005. Solving models with external habit. Finance Research Letters 2(4), 210 ?226. ? pages 153, 154, 155, 2012. Can time-varying risk of rare disasters explain aggregate stock market volatility?.Journal of Finance, Forthcoming. ? pages 83, 104124Weil, P., 1989. The equity premium puzzle and the risk-free rate puzzle. Journal of MonetaryEconomics 24(3), 401 ? 421. ? pages 83, 104White, H., 1982. Maximum Likelihood Estimation of Misspecified Models. Econometrica 50(1),pp. 1?25. ? pages 4, 8, 18Whitelaw, R., 2000. Stock market risk and return: An equilibrium approach. Review ofFinancial Studies 13(3), 521 ? 547. ? pages 85Whitelaw, R. F., 1994. Time Variations and Covariations in the Expectation and Volatility ofStock Market Returns. The Journal of Finance 49(2), pp. 515?541. ? pages 77, 88Winschel, V., M. Kr?tzig, 2010. Solving, Estimating, and Selecting Nonlinear Dynamic ModelsWithout the Curse of Dimensionality. Econometrica 78(2), 803?821. ? pages 5, 8125Appendix AAppendix to Chapter 2A.1 Efficient Calculation of Derivatives of CompositeFunctionsThe efficient calculation of partial derivatives described in Chapter 2 relies on the Taylorexpansion of a function of the form f (x) = h (g (x)) , where h : R ?? R, g : RN ?? R,and x = (x1, x2, ..., xN ) denotes an N -dimensional vector.66 The generic M -th order Taylorexpansion of f centered at a constant point 0N is defined asf? (x) =?{q:|q|?M}1q!fq (0N )N?n=1xqnn , (A.1)where q = (q1, ..., qN ) is a vector of nonnegative integers, |q| =?Nn=1 qn,q! =?Nn=1 (qn!) , andfq (0N ) denotes the partial derivative of order q of the function f (x) evaluated at 0N ; i.e.,fq (0N ) =?q1+...+qN f?xq11 ...?xqN1(0N ) . (A.2)To compute such derivatives, Savits (2006) relies on the recursive formula of Fa? di Bruno(1855, 1857). To present the formula, I will introduce some notation. Let N0 denote the set ofnonnegative integers and let q = (q1, ..., qN ) , where qn ? N0, n = 1, . . . , N . We write ` ? q if`n ? qn, for n = 1, . . . , N, and denote(q`)=q!`! (q ? `)!.Let gq (x) denote the partial derivative of order q of the function g (x), and hn (w) denote then-th derivative of the function h (w) with respect to the one-dimensional variable w. Accordingto the multivariate version of Fa? di Bruno?s formula, the partial derivative of order q of the66For simplicity I consider the case in which f (x) is one-dimensional; however, the formulas can be extendeddirectly to a multi-dimensional case by applying the results for the one-dimensional case to each of the components.126composite function f (x) = h (g (x)) ; i.e., fq (x) can be expressed asfq (x) =|q|?n=1hn (g (x))?q,n (x) , (A.3)where ?q,n (x) are homogeneous polynomials of degree n in the partial derivatives of g, g` (x),` ? q. To compute the generic derivative of f, it is sufficient to determine the polynomials?q,n (x) . These can be computed efficiently by relying on the recursive relationship proved inTheorem 3.1 of Savits (2006).Theorem A.1.1 For q ? 0N , 1 ? j ? N, and 1 ? n ? |q|+ 1, we have?q+ej,n (x) =?{`?NN0 : 0N?`?q;|`|?n?1}(q`)gq+ej?` (x)?`,n?1 (x) , (A.4)where ej is the unit vector with j-th component equal to 1 and we set?`,0 (x) ={1 if ` = 0,0 if ` 6= 0,Proof See Savits (2006).127A.2 ProofsLemma A.2.1 Let Z ? N(?, ?2)and let f , any continuously differentiable function such thatf ? exists almost everywhere and E |f ? (Z)| 12 ; then11 + x2=1/?1? +x2?=1/?1 + 1? +x2? ? 1=1???j=0(?? 1? x2?)j(A.7)and this equality is true, as long as??????? 1? x2??????< 1or|x| +Q.The update rule is given byKt+1 = Pt+1|tH>[P yyt+1|t]?1(A.10)P yyt+1|t = HPt+1|tH> +Rx?t+1|t+1 = x?t+1|t +Kt+1(yt+1 ?Hx?t+1|t)Pt+1|t+1 = (I ?Kt+1H)Pt+1|t131A.4 The Extended Kalman FilterA well known approximation to non linear filtering is the extended Kalman filter (EKF), whichrelies on a first-order Taylor expansion of the measurement and transition equations around thepredicted value of the state variable at time xt+1|t . The measurement equation is written asfollowsyt+1 = h(xt+1|t)+Ht+1(xt+1 ? xt+1|t)+ vt+1, (A.11)whereHt+1 =?h?xt+1????xt+1=xt+1|t(A.12)denotes the Jacobian matrix of the nonlinear function g computed at xt+1|t . The transitionequation is linearized as in (A.12) and is written asxt+1 = g(xt|t)+Gt(xt ? xt|t)+ ?t+1, (A.13)whereGt =?g?xt????xt=xt|t.The covariance matrices P xyt+1|t and Pyyt+1|t are then computed asP xyt+1|t = Pxxt+1|tHt+1, (A.14)P yyt+1|t = Ht+1Pxxt+1|tH>t+1 +R (A.15)andP xxt+1|t = GtPxxt|t Gt +Q.The estimate of the state vector is then updated using the standard Kalman filter recursions.132A.5 The Unscented Kalman FilterThe unscented Kalman filter, UKF hereafter, uses the exact nonlinear functions in the observationand transition equations to approximate the moments of the state variables. Unlike the EKF,the UKF does not rely on linearizations. The UKF approximates the conditional distributionof the state variables using the unscented transformation, Julier and Uhlmann (1997), whichis a method for computing statistics of nonlinear transformations of random variables. Julierand Uhlmann (2004) prove that this approximation is accurate to the third order for Gaussianrandom variables and up to a second order for non-Gaussian states. Moreover, the UKF doesnot rely on the calculation of Jacobians or Hessian matrices and its efficiency is comparable tothe EKF as noted by van Binsbergen and Koijen (2011) and Christoffersen, Jacobs, Karoui, andMimouni (2012).Let x denote a random vector with mean ?x and covariance matrix P xx. Consider a nonlineartransformation y = h (x). The basic idea behind the scaled transformation is to generate a setof points, denoted as sigma points, with first and second moments denoted by ?x and P xx,respectively, and apply the nonlinear transformation to each sigma point. More precisely, theN -dimensional random vector is approximated by a set of 2N + 1 weighted points given byX0 = ?x, (A.16)Xi = ?x +(?(N + ?)P xx)i, for i = 1, . . . , N (A.17)Xi = ?x ?(?(N + ?)P xx)i, for i = N + 1, . . . , 2N (A.18)with weightsWm0 =?(N + ?),W c0 =?(N + ?)+(1? ?2 + ?)Wmi = Wci =12 (N + ?), for i = 1, ..., N,where ? = ?2 (N + ?)?N, and where(?(N + ?)P xx)iis the i-th column of the matrix squareroot of (N + ?)P xx, ? is a positive scaling parameter that minimizes higher order effects and canbe chosen to be arbitrarily small, ? is a positive parameter that guarantees positive-definitenessof the covariance matrix, ? is a nonnegative parameter that can be used to capture higherorder moments of the distribution of the state variable. Julier and Uhlmann (1997) proposeto use ? = 2 for Gaussian distributions. Once the sigma points are computed, the nonlineartransformation is applied to each of the sigma points defined in (A.16)? (A.18) :Yi = h (Xi) , for i = 0, ..., n.133The UKF relies on the unscented transformation to approximate the covariance matrices Pt+1|t ,P xyt+1|t , Pyyt+1|t . An augmented state vector is defined by including the state and measurementnoises. This yields to a Na = 2p+N -dimensional vectorX at =????xt?tvt???? ,and the unscented transformation is applied to X at . The process for computing the UKF issummarized as follows:1. Compute 2Na + 1 sigma points of the augmented state-space:X at|t= xt|t , (A.19)X at|t= xt|t +(?(Na + ?)P at|t)i, for i = 1, . . . , NaX at|t= xt|t ?(?(Na + ?)P at|t)i, for i = Na + 1, . . . , 2Na2. Prediction step:X xt+1|t= g(X xt|t)+ X ?t+1|txt+1|t =2Na+1?i=1Wmi Xxi,t+1|tPt+1|t =2Na+1?i=1W ci[X xi,t+1|t? xt+1|t] [X xi,t+1|t? xt+1|t]>Yi,t+1|t = h(X xi,t+1|t)+ X ?i,t+1|tyt+1|t =2Na+1?i=1Wmi Yi,t+1|t3. Measurement update:P xyt+1|t =2Na+1?i=1W ci[X xi,t+1|t? xt+1|t] [Yxi,t+1|t? yt+1|t]>,P xyt+1|t =2Na+1?i=1W ci[Yxi,t+1|t? yt+1|t] [Yxi,t+1|t? yt+1|t]>.The estimate of the state vector is updated through the standard Kalman filter recursions. Thealgorithm is initialized by setting the initial value to the unconditional mean and variance of134the state vector.x0|0 = E [xt]P0|0 = var [xt]xa0|0 =[x0|0 0 0]>andP a0|0 =????P0|0 0 00 Q 00 0 R???? .135A.6 The Particle FilterThis appendix provides a general introduction to particle filters following Johannes and Polson(2009). As mentioned in Chapter 2 the solution to the filtering problem is the conditionaldistribution of the state variable, xt, given the observed data, y1, ..., yt. To estimate the density,it is necessary to follow a two-step procedure of prediction and updating. The prediction stepcombines the current filtering distribution with the state transition,p (xt+1 |y1, ..., yt ) =?p (xt+1 |xt ) p (xt |y1, ..., yt ) dxt, (A.20)providing a forecast of next period?s state. Now, given a new observation, yt+1, the predictiveviews are updated by Bayes? rulep (xt+1 |y1, ..., yt+1 ) =p (yt+1 |xt+1 ) p (xt+1 |y1, ..., yt )p (y1, ..., yt+1)(A.21)? p (yt+1 |xt+1 ) p (xt+1 |y1, ..., yt ) ,where ? represents that the denominator of Eq.(A.21) is a normalizing constant that does notdepend on xt+1. The main complication is that the density p (xt |y1, ..., yt ) is known analyticallyonly in a few cases, such as the linear model with normally distributed errors. In this case, thestandard Kalman filter can be applied to generate the first and second moments of the posteriordistribution. In nonlinear and non-normal models, the density p (xt |y1, ..., yt ) is a function ofthe observables, y1, ..., yt that has no analytical tractability. Therefore, Monte Carlo methodsare a feasible way to estimate p (xt |y1, ..., yt ) .A.6.1 ImplementationA particle filter is a discrete approximation, denoted by pN (xt |y1, ..., yt ) , to the conditionaldensity p (xt |y1, ..., yt ) . The density is generally written as{pi(i)t , x(i)t}Ni=1, where the weightsadd up to one,?Ni=1 pi(i)t = 1, and the support of this density is denoted by x(i)t . The genericparticle approximation is given bypN (xt |y1, ..., yt ) =N?i=1pi(i)t ?(xt ? x(i)t?1), (A.22)where ? (?) denotes the Dirac delta function.Intuitively, a particle filter is a discrete approximation to p (xt |y1, ..., yt ) that consists ofstates or particles, denoted by{x(i)t}Ni=1, and weights associated to those particles{pi(i)t}Ni=1.This approximation can be thought as a random histogram, where the states define the supportand the weights are the ?probabilities?associated with each state.136Let pN (xt+1 |y1, ..., yt ) and pN (xt+1 |y1, ..., yt+1 ) denote the discrete approximation to theconditional densities, p (xt+1 |y1, ..., yt ) and p (xt+1 |y1, ..., yt+1 ) , respectively. Suppose that theconditional densities, p (xt+1 |xt ) and p (yt+1 |xt+1 ) are known, then by substituting (A.22) in(A.20), and due to the properties of the Dirac delta function, we have that the integral becomesa sum, that ispN (xt+1 |y1, ..., yt ) =?p (xt+1 |xt ) pN (xt |y1, ..., yt ) dxt (A.23)=N?i=1p(xt+1???x(i)t)pi(i)t .Finally, substituting (A.23) in Eq.(A.21) yields topN (xt+1 |y1, ..., yt+1 ) ? p (yt+1 |xt+1 )N?i=1p(xt+1???x(i)t)pi(i)t . (A.24)Given the discrete approximation to the conditional density function, pN (xt |y1, ..., yt ) , themain challenge of particle filtering is to generate a sample from pN (xt+1 |y1, ..., yt+1 ) recursively,after receiving a new observation, yt+1.The particle approximation can be transformed into an equally weighted random samplefrom the density, pN (xt |y1, ..., yt ) , by sampling with replacement from the discrete distribution,{pi(i)t , x(i)t}Ni=1. This procedure is known as resampling and produces a new sample with uniformlydistributed weights, i.e. pi(i)t = 1/N. Resampling can be done in different ways, but the simplestis the multinomial sampling (Casella and Robert, 2004).A.6.2 Sampling Importance ResamplingOne of the most popular and most general particle filtering algorithm is known as the samplingimportance resampling (SIR) algorithm. The algorithm relies on two steps:Algorithm A.6.1 Given samples from p (xt |y1, ..., yt ) ,1. Draw x(i)t+1 ? p(xt+1???x(i)t)for i = 1, ..., N,2. Draw z(i) ?MultN({w(i)t+1}Ni=1)for i = 1, ..., N and set x(i)t+1 = xz(i)t+1where MultN denotes an N?component multinomial distribution and the importanceweights are given byw(i)t+1 =p(yt+1???x(i)t+1)N?i=1p(yt+1???x(i)t+1) .137Prior to resampling, each particle had weight w(i)t+1. After resampling, the weights are equal,by the definition of resampling. The intuition of the algorithm is as follows: in the first step, thealgorithm simulates new particles from the distribution, p(xt+1???x(i)t). Upon observing yt+1, theresampling step selects the particles that were most likely, in terms of the conditional likelihood,p(yt+1???x(i)t+1), to have generated yt+1.From Chapter 2, we learn that a number of applications characterize the state transitionand measurement densities through a state-space model representation of the form:yt = h (xt) + vt, vt ? N (0, R) ,xt+1 = g (xt) + ?t+1, ?t+1 ? N (0, Q) .In this case, SIR algorithm becomes:Algorithm A.6.2 Given samples from p (xt |y1, ..., yt ) , denoted by x(i)t for i = 1, ..., N,1. Draw x(i)t+1 ?? N(g(x(i)t), Q)for i = 1, ..., N,2. Draw z(i) ?MultN({w(i)t+1}Ni=1)for i = 1, ..., N and set x(i)t+1 = xz(i)t+1where MultN denotes an N?component multinomial distribution and the importanceweights are given byw(i)t+1 =exp(?12(yt+1 ? h(x(i)t+1))>R?1(yt+1 ? h(x(i)t+1)))N?i=1exp(?12(yt+1 ? h(x(i)t+1))>R?1(yt+1 ? h(x(i)t+1))) .The theoretical justification for these algorithms is the weighted bootstrap algorithm or SIRalgorithm, which was designed to simulate posterior distributions, of the form L (x) p (x) , whereL (?) denotes the likelihood function and p (?) the prior. The algorithm first draws an independentsample x(i) ? p (x) for i = 1, ..., N, and then computes the normalized importance weightsw(i) =L(x(i))?Ni=1L(x(i)). The sample drawn from the discrete distribution{x(i), w(i)}Ni=1tends indistribution to a sample from the product density L (x) p (x) as N increases. 6767See Casella and Robert (2004) for a detailed explanation.138A.7 Quasi-Maximum Likelihood Standard ErrorsGallant and White (1988) show that under certain regularity conditions, a heteroskedasticity andautocorrelation consistent covariance matrix of the quasi-maximum likelihood (QML) estimator?QML can be obtained using the formulaCov(?QML)= A?1T(?QML)BTA?1T(?QML),where AT(?QML)is the Hessian of the log-likelihood function,AT(?QML)=?2????>L (?) ,and BT is a consistent estimator of the covariance matrix of the first derivative of the QMLfunction (2.26). Newey and West (1987) proposed an estimator for BT given byBT =T?t=1stst> +L?t=1T?r=t+1(1?tL+ 1) [sts>t?r + st?rs>t],wherest =???lt (?) ,and L represents the number of sample autocovariances to include in the estimation of thevariance?covariance matrix. Newey and West (1994) provide a non-parametric method forautomatically selecting L as a function of the number of observations.139A.8 Calculation of MomentsDefinition Let X = (x1, x2, x3, ..., xN )> be a normally distributed vector with mean vector? and variance covariance matrix ?; then the moment-generating function of X, denoted byMX(t), is given byMX(t) = E[exp(X>t)] = exp(?>t+t>?t2),where t is an N -dimensional real vector.Lemma A.8.1 Let X = (x1, x2, x3, ..., xN )> be normally distributed with moment-generatingfunction MX(t); then?q1+...+qNMX(t)?tq11 ...?tqNN= E[xq11 ...xqNN exp(X>t)].Proposition A.8.2 Let X = (x1, x2, x3)> be a normally distributed random vector with meanvector ? and variance covariance matrix, ?; thenE(exp(x1)x2) = exp(?1 +?212)(?1,2 + ?2)cov(exp(x1), x2) = exp(?1 +?212)?1,2cov(exp(x1), exp(x2)) = exp(?1 + ?2 +?21 + ?222)(exp(?1,2)? 1)E(exp(x1) ? x2 ? x3) = exp(?1 +?212)[(?2 + ?1,2)(?3 + ?1,3) + ?2,3]cov(exp(x1) ? x2, x3) = exp(?1 +?212)[?1,3(?2 + ?1,2) + ?2,3].Proof The proof follows directly from applying Lemma A.8.1.140Appendix BAppendix to Chapter 3B.1 Sharpe Ratios in Asset PricingHarrison and Kreps (1979) show that the absence of arbitrage implies the existence of a stochasticdiscount factor (SDF) or pricing kernel, denoted by Mt, that prices all assets in the economy.68More specifically, the conditional expectation of the product of the stochastic discount factorand the gross asset return (Rt) must be equal to one; that is,Et [Mt+1Rt+1] = 1, (B.1)where the conditional expectation is based on the information available at time t. Since Eq.(B.1) holds for any asset in the economy, it must hold for the one-period risk-free interestrate (Rft+1) ; consequently, the risk-free rate can be written as the inverse of the conditionalexpectation of the stochastic discount factor,Rf,t+1 =1Et [Mt+1]. (B.2)Another implication of Eq. (B.1) is that the expected risk premium on any asset is given by thenegative of the product of the risk-free rate and the conditional covariance of the stochasticdiscount factor with the gross return; that is,Et [Rt+1 ?Rft+1] = ?Rft+1Covt (Rt+1,Mt+1) . (B.3)The conditional Sharpe ratio of an asset at time t, denoted by SRt, is defined as the ratio of theconditional mean excess return to the conditional standard deviation of its return; that is,SRt =Et [Rt+1 ?Rft+1]?t [Rt+1 ?Rft+1]. (B.4)68See Back (2010) for a detailed and concise explanation.141Then, the conditional Sharpe ratio is proportional to the risk-free rate, the volatility of thepricing kernel and the correlation between the pricing kernel and the return; that is,Et [Rt+1 ?Rft+1]?t [Rt+1 ?Rft+1]= ?Rft+1?t [Mt+1]Corrt [Rt+1,Mt+1] , (B.5)where ?t and Corrt are the standard deviation and correlation; respectively, both conditional oninformation at time t. The conditional Sharpe ratio of any asset in the economy is time varyingas long as the risk-free rate varies or the pricing kernel is conditionally heteroskedastic; thatis, if ?t [Mt+1] changes over time or if the correlation between the stock market return and thestochastic discount factor is time varying.Now, the maximum of the right-hand side of Eq. (B.5) over all returns defines a lower boundfor the standard deviation of any stochastic discount factor depending on the risk-free rate.Since the correlation coefficient is between -1 and 1, we haveEt [Rt+1]?Rft+1?t [Rt+1 ?Rft+1]? Rft+1?t [Mt+1] ? SRmaxt , for all assets. (B.6)Eq. (B.6) implies the Hansen and Jagannathan (1991) bound, which is an upper bound to theabsolute value of the conditional Sharpe ratios of any asset in the economy, given a specificdiscount factor. The maximum Sharpe ratio, SRmaxt , is achieved if there exists an asset in theeconomy which is perfectly negatively correlated with Mt+1. In general, the Sharpe ratios of allthe assets in the economy are bounded by the right-hand side of Eq. (B.6) but when marketsare complete there exists an asset that achieves the upper bound, and the inequality becomesan equality.69 Moreover, a very volatile SDF is necessary to understand high Sharpe ratios. Theconditional variance of the SDF can be thought of as the variance of the investor?s marginalutility of consumption in the next period.70 Therefore, from Eq. (B.5) we learn that eachmodel has an implication for the dynamic behavior of the market Sharpe ratio, since each modelimplies a functional form for the SDF.The use of log-returns is a common practice in the empirical literature. A standard approxi-mation of the Sharpe ratio based on continuously compounded returns is given bySRt =Et[rt+1]? rf,t+1 +?2t [rt+1]2?t [rt+1], (B.7)where rt+1 denotes the continuously compounded return of an asset, rf,t+1 denotes the continu-ously compounded risk-free rate and ?t [rt+1] denotes the standard deviation of the return of anasset. The numerator in Eq. (B.7) includes the Jensen adjustment for log-returns.7169A detailed discussion of this result is shown in Lettau and Uhlig (2002).70Hansen and Jagannathan (1991) provide a comprehensive analysis of this bound, allowing for many riskyassets and no risk-free asset, and derive implications of the positivity of the stochastic discount factor.71The difference between Eqs. (B.6) and (B.7) is almost negligible for short return horizons, as reported by142B.2 The Solution to the Long-Run Risks ModelThis section provides solutions for the consumption and dividend claim for the Bansal, Kiku,and Yaron (2012a) endowment process,?ct+1 = ?c + xt + ?t?t+1xt+1 = ?xt + ?e?tet+1?2t+1 = ?2 + v(?2t ? ?2)+ ?wwt+1 (B.8)?dt+1 = ?d + ?xt + ??tut+1 + pi?t?t+1wt+1, et+1, ut+1, ?t+1 ? i.i.d. N (0, 1) .The Euler equation for this economy isEt[exp(? ln ? ????ct+1 + (? ? 1) ra,t+1 + ri,t+1)]= 1, (B.9)where ra,t+1 is the log-return on the consumption claim and ri,t+1 is the log-return on anyasset. All returns are given by the approximation from Campbell and Shiller (1988), ri,t+1 =?0,i + ?1,izi,t+1 ? zi,t + ?di,t+1.Let Y >t =[1, xt, ?2t]denote a vector of state variables and the log price-consumption ratiobe given by zt = A>Yt, where A denotes a vector of coefficients A> = [A0, A1, A2] . In general,for any other asset i, define the coefficients in the same manner: A>i = [A0,i, A1,i, A2,i] . Thissection calculates the price of the consumption claim as well as the dividend claim zt,m = A>mYt.The coefficients that characterize zt and zt,m are obtained by the method of undeterminedcoefficients and by the fact that the Euler equation must hold for all values of Y >t .The risk premium on any asset isEt [ri,t+1 ? rf,t] +12V art [ri,t+1] = ?Covt (mt+1, ri,t+1) (B.10)=?j=n,e,w?j?i,j?2j,t,where ?i,j is the beta and ?2j,t the volatility of the jth risk source, and the ?j represents theprice of each risk source.Brandt and Kang (2004). Nielsen and Vassalou (2004) analyze the difference between discrete and continuouslycompounded versions of Sharpe ratios and propose this adjustment for performance evaluation. Campbell andViceira (2002) discuss in detail this approximation in a portfolio optimization framework.143B.2.1 Consumption ClaimThe risk premium for the consumption claim isEt [ra,t+1 ? rf,t] +12V art [ra,t+1] = ?n?a,n?2t + ?e?a,e?2t + ?w?a,w?2w, (B.11)where ?a,n = 1, ?a,e = ?1A1?e and ?a,w = ?1A2. The conditional variance of the consumptionclaim is equal toV art [ra,t+1] =(?2a,n + ?2a,e)?2t + ?2a,w?2w. (B.12)The coefficients A? for the log price-consumption ratio zt areA0 =ln ? + ?c(1? 1?)+ ?0 + ?a,w?2 (1? v) + 12??2a,w?2w(1? ?1),A1 =1? 1?1? ?1?, (B.13)A2 =?2[(1? 1?)2+ ?2a,e](1? ?1v1).B.2.2 Dividend ClaimThe innovation to the market return, denoted by rm,t+1 ? Et (rm,t+1) , isrm,t+1 ? Et (rm,t+1) = ??tut+1 + ?m,??t?t+1 + ?m,e?tet+1 + ?m,w?wwt+1, (B.14)where ?m,? = pi, ?m,e = ?1,mA1,m?e and ?m,w = ?1,mA2,m, which implies that the risk premiumon the dividend claim isEt [rm,t+1 ? rf,t] +12V art [rm,t+1] = ???m,??2t + ?e?m,e?2t + ?w?m,w?2w. (B.15)Finally, the coefficients A?m for the log price-dividend ratio are as followsA0,m =??? ln ? + ?c(? ? ?? ? 1)? ?w?2 (1? v) + (? ? 1) [?0 +A0 (?1 ? 1)]?0,m + ?m,w?2 (1? v) + ?d + 12 [?m,w ? ?w]2 ?2w??(1? ?1,m),A1,m =?? 1?1? ?1,m?, (B.16)A2,m =(1? ?)A2 (1? ?1v1) + 12[(pi ? ?n)2 + [?m,e ? ?e]2 + ?2](1? ?1,mv).144B.2.3 Risk-Free Interest RateThe risk-free rate is derived from the Euler equation applied to the risk-less asset:rf,t+1 = ? logEt [exp (mt+1)]= ?? ln ? +??Et [?ct+1] + (1? ?)Et [ra,t+1] (B.17)?12V art[???ct+1 + (1? ?) ra,t+1].By subtracting (1? ?) rf,t+1 from both sides of Eq. (B.17) and if ? 6= 0, then we can divide by?, yielding to an expression for the risk-free raterf,t+1 = ? ln ? +1?Et [?ct+1] +(1? ?)?Et [ra,t+1 ? rf,t+1]?12?V art (mt+1) , (B.18)whereEt [?ct+1] = ?c + xtEt [ra,t+1 ? rf,t+1] =???n?a,n + ?e?a,e ?(?2a,n + ?2a,e)2???2t +(?w?a,w ??2a,w2)?2w.V art (mt+1) =(?2? + ?2e)?2t + ?2w?2w,as a resultrf,t+1 = A0,f +A1,fxt +A2,f?2t , (B.19)whereA0,f = ? ln ? +?c?+(1? ?)?(?w?a,w ??2a,w2)?2w ??2w?2w2?,A1,f =1?,A2,f =(1? ?)????n?a,n + ?e?a,e ?(?2a,n + ?2a,e)2???(?2? + ?2e)2?.B.2.4 Return on the Market PortfolioRecall that the rate of return on the market portfolio isrm,t+1 = ?0,m + ?1,mzm,t+1 ? zm,t + ?dt+1, (B.20)145where the dynamics are characterized by the following equationszm,t = A0,m +A1,mxt +A2,m?2t (B.21)xt+1 = ?xt + ?e?tet+1?2t+1 = ?2 + v(?2t ? ?2)+ ?wwt+1,?dt+1 = ?d + ?xt + ??tut+1 + pi?t?t+1.Now, since each of the components of the market return follows a normal distribution, then themarket return has a normal distribution with conditional meanEt [rm,t+1] = ?0,m + ?1,mEt [zm,t+1]? zm,t + Et [?dt+1]= ?0,m + ?1,m(A0,m +A1,m?xt +A2,m(?2 + v(?2t ? ?2)))?A0,m ?A1,mxt ?A2,m?2t + ?d + ?xt= ?0,m + (?1,m ? 1)A0,m + ?1,mA2,m (1? v)?2 + ?d+ [A1,m (?1,m?? 1) + ?]xt +A2,m (?1,mv ? 1)?2t= B0 +B1xt +B2?2t , (B.22)whereB0 = ?0,m + (?1,m ? 1)A0,m + ?1,mA2,m (1? v)?2 + ?dB1 = ??A1,m (1? ?1,m?) =1?,B2 = A2,m (?1,mv ? 1) .Now, the variance of the market portfolio is given byV art [rm,t+1] = ?21,mV art [zm,t+1] + V art [?dt+1]= ?21,m(A21,m?2e?2t +A22,m?2w)+(?2 + pi2)?2t= D0 +D1?2t ,where D0 = (?1,mA2,m?w)2 and D1 = ?21,mA21,m?2e + ?2 + pi2.146B.2.5 Linearization ParametersFor any asset, the linearization parameters are determined endogenously by the following systemof equations as discussed in Bansal, Kiku, and Yaron (2012a) and Beeler and Campbell (2012):zi = A0,i (zi) +A2,i (zi)?2,?1,i =exp (zi)1 + exp (zi), (B.23)?0,i = ln (1 + exp (zi))? ?1,izi.The solution is determined numerically by iteration until reaching a fixed point of zi. Thedependence of A0,i and A2,i on the linearization parameters has been discussed in previoussections.147B.3 Excess Returns Conditional Moments Implied by theLong-Run Risks ModelB.3.1 Expected ReturnsThe expected excess returns for period k are defined asrm,t+k+1 ? rf,t+k+1, k = 0, 1, 2, ...Now, the conditional excess risk premium for any period has a closed-form expression given byEt [rm,t+k+1 ? rf,t+k+1] = E0,k+1 + E1,k+1?2t , (B.24)whereE0,k+1 = E0 + E1(1? vk)?2,E1,k+1 = E1vk, k = 0, 1, 2, ...,E0 = B0 ?A0,f ,E1 = B2 ?A2,f .B.3.2 Variance of Excess ReturnsNow, for any time period k, the conditional variance of the future excess returns is given byV art [rm,t+k+1 ? rf,t+k+1] , for k = 0, 1, 2, ...Its closed-form expression is given byV art [rm,t+k+1 ? rf,t+k+1] = D0,k+1 +D1,k+1?2t , (B.25)whereD0,k+1 = D0 +D1(1? vk)?2 + E21?2w1? v2k1? v2D1,k+1 = vkD1,D0 = (?1,mA2,m?m)2 ,D1 = (?1,mA1,m?e)2 + ?2 + pi2.148Autocovariance of Excess ReturnsNow, let 0 ? k < p. Then the autocovariance of excess returns iscovt (rm,t+k+1 ? rf,t+k+1,rm,t+p+1 ? rf,t+p+1) = E21?2wvp?k(1? v2k1? v2)+E1?1,mA2,m?2wvp?k?1.B.3.3 Aggregate Excess ReturnsNow, the expected excess returns during K periods are given by the sum of the one-periodexcess returns,K?k=1(rm,t+k ? rf,t+k) .Its conditional mean isEt[K?k=1rm,t+k ? rf,t+k]= E0,K + E1,K?2t ,whereE0,K = KE0 + E1?2??K ?(1? vK)(1? v)?? ,E1,K = E1(1? vK)(1? v).B.3.4 Variance of Aggregate Excess ReturnsThe conditional variance isV art[K?k=1rm,t+k ? rf,t+k]= D0,K + D1,K?2t .149whereD0,K = KD0 +KD1?2 ?D1?2(1? vK)(1? v)+E21?2w1? v2??K ?(1? v2K)(1? v2)??+2[K ?1? vK1? v]??E21?2w(1?v2)[v1?v]+E1?1,mA2,m?2w[11?v]???2E21?2w(1? v2)v3(1? vK?1) (1? vK)(1? v)2 (1 + v),D1,K =(1? vK)(1? v)D1.150B.4 Quasi-Maximum Likelihood EstimationSince the measurement equation considered in each of the models is nonlinear, one possibilityis to rely on Taylor series approximations to obtain extended forms of the Kalman filter. Thetransition and measurement equations analyzed in the previous section are expressed as follows:yt = ? (xt?1) + ? (xt?1) ?t, (B.26)xt = Axt?1 + ?t, (B.27)where ?t follows a standard normal distribution and ?t is a d-dimensional noise vector withvariance?covariance matrix ?. The deterministic functions ? (xt) and ? (xt) define the conditionalmean and volatility of excess returns and are characterized by each of the models.I use Gaussian approximations to filter the mean and covariance of the states and measurementseries. More specifically, the linearity of the state vector implies that the first and secondconditional moments of the state vectors arext+1|t = Axt|t , (B.28)Pt+1|t = APt|tA> + ?, (B.29)where xt+1|t and Pt+1|t are the time t predicted values of the conditional mean and covariancematrix of the state vector, respectively. These moments allow us to generate a predicted meanyt+1|t and covariance matrix Pyyt+1|t of the measurement series, given byyt+1|t = E [? (xt) + ? (xt) ?t+1 |yt, yt?1..., y0 ] , (B.30)P yyt+1|t = V ar [? (xt) + ? (xt) ?t+1 |yt, yt?1, ..., y0 ] .Finally, the covariance between the observed and unobserved variables, P xyt+1|t , isP xyt+1|t = Cov [xt+1, ? (xt) + ? (xt) ?t+1 |yt, yt?1, ..., y0 ] . (B.31)Using these conditional moments, we apply the Kalman update, represented by the followingset of recursive equations to obtained values for the conditional mean St+1|t+1 and covariancePt+1|t+1 :Kt+1 = Pxyt+1|t(P yyt+1|t)?1, (B.32)xt+1|t+1 = xt+1|t +Kt+1(yt+1 ? yt+1|t),Pt+1|t+1 = Pt+1|t ?Kt+1Pyyt+1|tK>t+1.151The first attempt to estimate the moments in Eqs. (B.30) through (B.32) uses closed-formexpression, if available. An alternative way to solve the problem is to use Taylor series expansionsof ? (xt) and ? (xt) around xt+1|t , for an arbitrary number of terms. Properties of this methodas well as a detailed explanation can be found in Chapter 2 of this work.152B.5 External Habit Formation ModelThis section presents the model by Campbell and Cochrane (1999) in discrete time and itsextension in Wachter (2005). A representative investor is assumed to have state-dependentpreferences. More specifically, an investor has utility over consumption relative to a referencepoint Xt and maximizesE[??t=0?t(Ct ?Xt)1?? ? 11? ?], (B.33)where ? > 0 is the time preference parameter and ? > 0 is the curvature parameter.Each investor is concerned with her consumption relative to that of others. Habit Xt isdefined through surplus consumption St, whereSt ?Ct ?XtCt. (B.34)One can interpret St as a business cycle indicator. In economic booms, consumption substantiallyexceeds the external habit and the surplus, St, is large; and in recessions consumption barelyexceeds the external habit, and the external habit is relatively small.It is assumed that st = logSt follows the processst+1 = (1? ?) s+ ?st + ? (st) (?ct+1 ? Et [?ct+1]) , (B.35)where s is the unconditional mean of st, ? is the persistence and ? (st) is the sensitivity of thechanges in consumption. The unconditional mean and the sensitivity function are defined interms of primitive parameters. It is assumed that aggregate consumption growth is log-normalwith independent and identically distributed innovations; that is,?ct+1 = g + vt+1, (B.36)where ct = logCt and vt+1 ? N(0, ?2v)is an i.i.d. sequence. The process for st is heteroscedasticand perfectly conditionally correlated with innovations in consumption growth. The sensitivityfunction ? (st) is specified so that the real risk-free rate is linear, and for st ? s, xt is adeterministic function of past consumption. Consequently, we have? (st) ={1/S?1? 2 (st ? s)? 1, if st ? smax0 otherwise,(B.37)S = ?v??1? ?? b/?, (B.38)153where b is a preference parameter that determines the behavior of the risk-free rate andsmax = s+ 12(1? S2). In Campbell and Cochrane (1999), b is chosen to be zero and produce aconstant real risk-free rate, while Wachter (2005) shows that values of b > 0, imply a risk-freerate that is linear in st.B.5.1 Stochastic Discount FactorSince the habit is external, the investor?s inter-temporal marginal rate of substitution is given byMt+1 = ?(St+1St)?? (Ct+1Ct)??. (B.39)Moreover, any asset return Rt+1 must satisfyEt [Mt+1Rt+1] = 1. (B.40)B.5.2 Risk-Free Rate and Maximum Sharpe RatioLet Rf,t+1 denote the one-period risk-free return between t and t+ 1, and rf,t+1 = ln (Rf,t+1) ;as a result, from Eqs. (B.39) and (B.40) imply thatrf,t+1 = ? ln (Et [Mt+1]) (B.41)= ? ln (?) + ?g + ? (1? ?) (s? st)??2?2v2(1 + ? (st))2= ? ln (?) + ?g ?? (1? ?)? b2+ b (s? st) ,where the last equality comes from substituting the definition of ? (st) . This definition impliesa risk-free rate linear in st.Conditional on the information at time t, the one-period stochastic discount factor, definedin Eq. (B.39) is the exponential of a normally distributed random variable that has variance?2 [1 + ? (St)]2 ?2. As a result, the Hansen-Jagannathan bound implies that?exp(?2 [1 + ? (St)]2 ?2)? 1is an upper bound on the Sharpe ratio of any portfolio. If ? is a decreasing function of St, thenthe upper bound on Sharpe ratios will be counter-cyclical: higher in recessions than in booms.B.5.3 Price-Dividend RatioThe aggregate market is represented as the claim to the future consumption stream. If Ptdenotes the ex-dividend price of this claim, then Eq. (B.40) implies that in equilibrium Pt154satisfiesEt[Mt+1(Pt+1 + Ct+1Pt)]= 1, (B.42)which can be rewritten asEt[Mt+1(1 +Pt+1Ct+1)Ct+1Ct]=PtCt.Since Ct is the dividend paid by the aggregate market, Pt/Ct is the price-dividend ratio. Theprice-dividend ratio can be computed numerically using numerical methods; Wachter (2005)provides an efficient method for its computation.Returns on the aggregate market are defined asRmt+1 =(Pt+1/Ct+1 + 1Pt/Ct)Ct+1Ct.The main difficulty lies in solving the model (B.42) for the price-dividend ratio as a functionof st. Once the price-dividend ratio is calculated numerically, Monte Carlo simulations can beperformed to obtain accurate estimates of expected returns, volatilities and Sharpe ratios fordifferent holding periods. Details about the simulations are explained in Wachter (2005).155