Essays on Forecast Evaluation and Model Estimation inFinancial MarketsbyGuoshi TongB.Sc., Fudan University, 2006M.Sc., The University of British Columbia, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Economics)The University of British Columbia(Vancouver)January 2015c© Guoshi Tong, 2015AbstractThis thesis is comprised of three essays. In the first and second essays, I examinethe welfare value of return predictors in financial markets when investors possessonly limited historical data. The first essay focuses on the US Treasury bond mar-ket where time series variation in the expected return is forecastable by yield curveand macroeconomic variables. The second essay shifts attention to the US stockmarket where cross-sectional variation in the expected return is predictable by theunderlying firms’ characteristics. Using monthly US data, I estimate the utilitybenefit of various return predictors in either the bond or stock market through astructural approach of forecast evaluation. I consider both parametric and non-parametric portfolio policies and conduct both unconditional and conditional eval-uations. I find that return predictors are generally hard to exploit with limited data.Incorporating return predictors renders the portfolio strategy more sensitive to es-timation errors and instability in forecast relations. The resultant negative effecton portfolio returns and welfare is not dominated by the information value of pre-dictors. The third essay discusses the estimation of the Cox-Ingersoll-Ross inter-est rate model. I propose a new likelihood-based methodology that uses marginalMetropolis Hasting algorithm with particle-filter based simulated-likelihood placedin each of the iterations. The benefit of this Bayesian approach is that it bypassesthe need to compute exact likelihood functions, and its validity rests upon a recentdevelopment in Bayesian statistical theory. To mitigate the inefficiency in standardbootstrap filters due to peaky measurement density of the CIR model, I designan approximated conditional optimal filter to account for the informativeness ofcurrent yields and reduce the variance of particle weights. For typical parametervalues, performance is shown to be satisfactory.iiPrefaceThis thesis is single authored, independent, unpublished and original work by Gu-oshi Tong.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Return Predictors and Asset Allocation: Should Treasury Bond In-vestors Time the Market ? . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Investment Framework . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Bond allocation rule . . . . . . . . . . . . . . . . . . . . 61.2.2 Measurement of performance . . . . . . . . . . . . . . . 71.2.3 Estimation of welfare metric . . . . . . . . . . . . . . . . 81.2.4 Inference on utility benefit . . . . . . . . . . . . . . . . . 91.3 Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.1 Data and utility assumption . . . . . . . . . . . . . . . . 121.3.2 Implementing allocation rules . . . . . . . . . . . . . . . 131.3.3 Empirical findings . . . . . . . . . . . . . . . . . . . . . 181.3.4 Other strategies . . . . . . . . . . . . . . . . . . . . . . . 37iv1.4 Robustness Check . . . . . . . . . . . . . . . . . . . . . . . . . . 471.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 512 Equity Asset Allocation: Can Investors Exploit Cross-Sectional Re-turn Predictability by a Parametric Strategy ? . . . . . . . . . . . . 522.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.2 Investment Framework . . . . . . . . . . . . . . . . . . . . . . . 572.2.1 Equity allocation rule . . . . . . . . . . . . . . . . . . . . 582.2.2 Measurement of performance . . . . . . . . . . . . . . . 592.2.3 Estimation of welfare metric . . . . . . . . . . . . . . . . 602.2.4 Inference on utility benefit . . . . . . . . . . . . . . . . . 612.3 Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 642.3.1 Data and utility assumption . . . . . . . . . . . . . . . . 642.3.2 Implementing a parametric policy . . . . . . . . . . . . . 652.3.3 Empirical findings . . . . . . . . . . . . . . . . . . . . . 672.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 973 Bayesian Estimation of Cox-Ingersoll-Ross Interest Rate Model withParticle-Filter based Simulated-Likelihood . . . . . . . . . . . . . . 983.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983.2 The CIR Model in a State Space Form . . . . . . . . . . . . . . . 1013.2.1 The CIR model . . . . . . . . . . . . . . . . . . . . . . . 1013.2.2 State space representation and identification . . . . . . . . 1023.2.3 The econometric challenge . . . . . . . . . . . . . . . . . 1043.3 Bayesian Estimation with Simulated Likelihood . . . . . . . . . . 1043.3.1 The marginal Metropolis-Hasting algorithm . . . . . . . . 1053.3.2 Metropolis-Hasting with simulated likelihood . . . . . . . 1083.3.3 Adaptive MCMC . . . . . . . . . . . . . . . . . . . . . . 1103.4 Particle-Filter based Likelihood Estimation . . . . . . . . . . . . 1113.4.1 Particle filter for CIR . . . . . . . . . . . . . . . . . . . . 1123.4.2 Approximated conditional optimal importance distribution 1153.5 Performance with Simulated Data . . . . . . . . . . . . . . . . . 1173.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 126vBibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128A Systematic Resampling . . . . . . . . . . . . . . . . . . . . . . . . . 135B Approximated Conditional Optimal Filter . . . . . . . . . . . . . . . 137viList of TablesTable 1.1 Statistical Accuracy of Bond Return Forecast in Rolling Win-dow Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Table 1.2 Unconditional Evaluation of Bond Allocation Rules . . . . . . 24Table 1.3 Unconditional Evaluation of Bond Allocation Rules (continued) 26Table 1.4 Conditional Evaluation of Bond Allocation Rules: Unemploy-ment rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Table 1.5 Conditional Evaluation of Bond Allocation Rules: Realized Volatil-ity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Table 1.6 Conditional Evaluation of Bond Allocation Rules: Lagged Util-ity Difference . . . . . . . . . . . . . . . . . . . . . . . . . . 30Table 1.7 Frequency of Forecast Breaks by Different Predictors . . . . . 32Table 1.8 Predictability of Forecast Breaks . . . . . . . . . . . . . . . . 33Table 1.9 Unconditional Evaluation of Shrinkage Strategies . . . . . . . 36Table 1.10 Unconditional Evaluation of Shrinkage Strategies (Pre-crisis data) 37Table 1.11 Volatility Timing . . . . . . . . . . . . . . . . . . . . . . . . . 39Table 1.12 Multiple Factors Prediction . . . . . . . . . . . . . . . . . . . 41Table 1.13 Multiple Factors Prediction (continued) . . . . . . . . . . . . . 42Table 1.14 Fixed Weight Strategies . . . . . . . . . . . . . . . . . . . . . 43Table 1.15 Conditional Fixed Weight Strategies . . . . . . . . . . . . . . 45Table 1.16 Estimation Window Averaging . . . . . . . . . . . . . . . . . 46Table 1.17 Parametric and Non-parametric Strategies at Different Risk Aver-sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Table 1.18 Shrinkage Strategies at Different Risk Aversion . . . . . . . . 49viiTable 1.19 A Longer Length of Information Set / Limited Data . . . . . . 50Table 2.1 Unconditional Evaluation of Parametric Policy with UnivariateCharacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Table 2.2 Unconditional Evaluation of Parametric Policy with Multivari-ate Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 74Table 2.3 Conditional Evaluation of Parametric Policy with UnivariateCharacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Table 2.4 Conditional Evaluation of Parametric Policy with Univariate In-dustry Standardized Character . . . . . . . . . . . . . . . . . . 79Table 2.5 Conditional Evaluation of Parametric Policy with MultivariateCharacters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Table 2.6 Conditional Evaluation of Parametric Policy with MultivariateIndustry Standardized Characters . . . . . . . . . . . . . . . . 82Table 2.7 Short Sale Constraint: Unconditional Evaluation of UnivariateCharacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Table 2.8 Short Sale Constraint: Unconditional Evaluation of Multivari-ate Characters . . . . . . . . . . . . . . . . . . . . . . . . . . 85Table 2.9 Conditional Evaluation of Parametric Policy with UnivariateCharacter and Short Sale Constraint . . . . . . . . . . . . . . . 86Table 2.10 Conditional Evaluation of Parametric Policy with MultivariateCharacters and Short Sale Constraint . . . . . . . . . . . . . . 87Table 2.11 Conditional Evaluation of Parametric Policy with Univariate In-dustry Standardized Character and Short Sale Constraint . . . . 88Table 2.12 Conditional Evaluation of Parametric Policy with MultivariateIndustry Standardized Characters and Short Sale Constraint . . 89Table 2.13 Alternative Investable Universe: Unconditional Evaluation ofUnivariate Character . . . . . . . . . . . . . . . . . . . . . . . 90Table 2.14 Alternative Investable Universe: Unconditional Evaluation ofMultivariate Characters . . . . . . . . . . . . . . . . . . . . . 92Table 2.15 Different Size of Information Set: Unconditional Evaluation ofUnivariate Character . . . . . . . . . . . . . . . . . . . . . . . 93viiiTable 2.16 Different Size of Information Set: Unconditional Evaluation ofMultivariate Characters . . . . . . . . . . . . . . . . . . . . . 94Table 2.17 Different Risk Aversion: Unconditional Evaluation of Univari-ate Character . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Table 2.18 Different Risk Aversion: Unconditional Evaluation of Multi-variate Characters . . . . . . . . . . . . . . . . . . . . . . . . 96Table 3.1 Likelihood Estimation with Different Particle Size . . . . . . . 121Table 3.2 Comparing Bootstrap Filter with Approximated Conditional Op-timal Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122Table 3.3 Statistics on the MCMC draws of Parameters and State . . . . 123ixList of FiguresFigure 1.1 Out-of-Sample Bond Return Forecasts with Different Predictors 19Figure 1.2 Out-of-Sample Bond Portfolio Weights under Different Rules 22Figure 1.3 Out-of-Sample Bond Portfolio Returns under Different Rules . 31Figure 2.1 Out-of-Sample Estimates of Parametric Policy with UnivariateCharacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 2.2 Out-of-Sample Estimates of Parametric Policy with UnivariateIndustry Standardized Character . . . . . . . . . . . . . . . . 69Figure 2.3 Out-of-Sample Returns by Parametric Policy with UnivariateCharacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 2.4 Out-of-Sample Returns by Parametric Policy with Industry stan-dardized Univariate Character . . . . . . . . . . . . . . . . . 72Figure 2.5 Out-of-Sample Returns by Parametric Policy with MultivariateCharacters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 2.6 Out-of-Sample Returns by Parametric Policy with MultivariateIndustry Standardized Characters . . . . . . . . . . . . . . . 77Figure 3.1 Simulated Yields based on Cox Ingersoll Ross Model . . . . . 118Figure 3.2 Particle Filtering on CIR Yields . . . . . . . . . . . . . . . . 119Figure 3.3 Empirical Filtering Distribution with Different Particle Size . 120Figure 3.4 The Markov chain Monte Carlo Samples . . . . . . . . . . . 124Figure 3.5 Empirical Posterior Distribution of Parameters and State . . . 125xAcknowledgmentsForemost, I am deeply in debt to my supervisor Prof. Vadim Marmer, for his pa-tience, enthusiasm, encouragement and guidance. His continuous support helpedme at every stage of the PhD program. Without him, this thesis would not havebeen finished. Besides my supervisor, I would also like to thank my committeemembers, Prof. Kevin Song and Prof. Adlai Fisher, for enlightening me, scrutiniz-ing my work and providing very helpful comments. My sincere thanks also goesto Prof. Paul Beaudry and Prof. Jason Chen for hiring me as an RA and support-ing me financially. I am also very grateful to our school director Prof. ThomasLemieux and graduate director Prof. Amartya Lahiri for helping me in dealingwith unusual issues in my job hunting process. I feel very warm hearted with theirhelp. Last but not the least, I owe my parents Dajian Tong and Yongfang Zhang fortheir unconditional support throughout my life.xiChapter 1Return Predictors and AssetAllocation: Should TreasuryBond Investors Time the Market?1.1 IntroductionIt has been widely documented in the literature that expected returns in theUS Treasury bond market vary over time and are predictable by the shape ofthe yield curve and macroeconomic fundamentals (e.g. Fama and Bliss (1987),Cochrane and Piazzesi (2005), Ludvigson and Ng (2009)). The variation in the ex-pected return is also economically large: taking the Cochrane Piazzesi factor, themeasured conditional expected annual-excess-return of a 5-year bond varies overtime with a standard deviation of around 2.5%, while its unconditional mean is lessthan 1%.This chapter examines whether a bond investor could exploit such return pre-dictability to make a welfare gain. In particular, I ask whether bond return forecastscan be used to improve trading decisions. From a theoretical perspective, the highmagnitude of forecastability calls for aggressive market timing, that is, the optimal1portfolio weight should vary according to the level of return predictor.1 However,in practice, additional obstacles exist. First, investors only have limited informa-tion on the relevant predictive relation. For example, although they understand thatthe return predictor at hand is correlated with future return, the true conditional dis-tribution of future return given current predictor value is unknown. Therefore, thereturn forecasting process typically relies on either a parametric or non-parametricreturn model, which is inevitably mis-specified.2 On top of that, for a parametricreturn model, the value of parameters involved are also unknown and have to beestimated. As investors always face a limited data history, estimation uncertaintybecomes a concern. Secondly, given certain estimated and mis-specified returnforecasting model, there is an extra portfolio decision based on it. So, the errors inestimation or model specification would be further transformed during some port-folio optimization processes and their ultimate impact to investor welfare would beunknown.Given the above-mentioned concerns, the objective of this chapter is to quan-tify the welfare value of bond return forecasts in the presence of those estimationand model mis-specification risks. To this end, I consider a limited data bond trad-ing scenario. I assume that, under this scenario, a CRRA bond investor has accessto a finite history of data on bond return predictor values and subsequent realizedbond excess returns. These historical data are used by investors to infer the relevantpredictive relation. I then adopt a decision theory approach to view each allocationstrategy that exploits bond return predictability as a function of historical data, orin other words, an estimator. That portfolio estimator hence maps any observeddata in investor’s information set towards a bond portfolio weight scheme. Finally,I measure the welfare value of any bond return predictor through the level of ex-pected utility achieved by the associated portfolio estimator. It is worth notingthat utility expectation is taken over the joint distribution of historical data, currentvalue of predictor and next period return.Methodologically, the contribution of my approach is to propose a conceptu-1More precisely, if those variations are driven by time varying aggregate risk aversions, than aconstant risk aversion investor should be able to collect market risk premium by timing. In contrast,average or representative investors always hold the market as they determine the equilibrium price.2This can be due to inadequately modeled dynamics, incorrect functional form or any combina-tion of these.2ally new means of assessing return predictors. Traditional works focus on raw pre-dictability, i.e., whether the return predictor helps to reduce mean squared forecasterror (MSFE). In contrast, I shift attention to the welfare value of each predictor,an object of ultimate interest to investors. I argue that although the traditionalapproach helps to understand the underlying data generating process, it does notprovide sufficient information to guide trading decisions. For instance, one predic-tor could forecast the correct sign of excess return in each period while completelymissing the magnitude. In that case, the traditional criteria of MSFE would not fa-vor that signal despite the fact that it is obviously useful for trading. Unlike MSFE,the utility metric accounts for the consequences any forecast error would have onthe portfolio. Under this metric, a small forecast error is more valuable when thesign forecast is correct and the magnitude of forecast is large. Secondly, traditionalworks test on the population level of mean squared forecast error assuming the truevalue of slope coefficients in predictive regression are known. Yet, my work recog-nizes the fact that return forecasts and the subsequent portfolio decisions are bothconducted with limited data. Hence, the focus here is on the forecast errors whenrelevant parameters are estimated and how they translate into portfolio risks. Notethat the predictive relation, even if it does exist in the population level, needs to bespecified and estimated precisely enough within a finite sample in order to achievewelfare gain.Using monthly US data, I estimate and test the significance to welfare gainof a list of bond return predictors from either yield curve, macro fundamental ortechnical analysis in the aforementioned limited data framework. First, I find thatlinear parametric strategies, driven by any of the predictors considered above, cre-ate large losses occasionally, despite positive gains in most states. As a result, thecorresponding welfare estimates are inferior to a simple benchmark rule which ig-nores predictability. Second, non-parametric policies, advocated by Brandt (1999)to account for potential nonlinearity in the predictive relation, exhibit similar per-formance instability and fail to beat the benchmark. Third, such performance in-stability is not specific to any business cycle or market volatility regime and hence,it is hard to forecast in advance. Fourth, shrinkage strategies, suggested in Connor(1997) and Brandt (2009) and implemented through Bayesian predictive regres-sions with no-predictability prior, lead to significant welfare improvement when3the degree of prior confidence is high.The above findings suggest that errors in forecasting model estimation andspecification can indeed create large welfare loss. Specifically, when predictiverelations are subject to un-modeled and hence un-expected instability, forecast er-rors would be magnified under a market timing decision that is made based on thehistorical data observed. In addition, the market timing policies, either paramet-ric or non-parametric, are more complex than the benchmark rule, which ignorespredictors. Their estimations are hence more sensitive to the realization of histori-cal data observed. This additional estimation uncertainty also translates into extravolatility in realized returns and hence lowers welfare. The shrinkage policy repre-sents our attempt to reduce the portfolio risks due to forecasting model estimationand mis-specification. By taming down estimated return forecast, bond predictabil-ity is only partially exploited under this policy. Our results indicate that at a highshrinkage, the information value gained from incorporating predictors outweighsthe welfare loss due to mis-specification and estimation.In assessing strategy performance, my estimation of utility expectation, givena single path of realized US data, rests upon a series of “pseudo” repeated experi-ments generated by an out-of-sample portfolio construction exercise.3 In particular,at the end of each month, our investor is asked to make an allocation decision basedonly on the most recent data. Meanwhile, a rolling window scheme is imposed, sothat the same size of historical data or information set is available at each port-folio decision experiment. Thus, the resulting average realized utility serves as aconsistent estimator of the (unconditional) welfare measure averaged over time.4In testing the significance of expected-utility difference, I build the infer-ence procedures upon those established in the forecast evaluation literature (e.g.,Diebold and Mariano (1995); West (1996); Giacomini and White (2006)). Thisstream of works has traditionally evaluated point forecasts by statistical measuresof accuracy, such as the mean squared forecast error or predictive likelihood. How-ever, in our context, the evaluation object is a portfolio estimator. Thus, forecast3While investors only have access to limited data, econometricians could use the whole timeseries (future data) in performance evaluation.4The conditional performance is estimated in a similar way, except that averaging is now doneover utility realizations generated under the same economic regime.4evaluation is addressed in a structural way with a portfolio optimization processembedded and the evaluation metric modified to be the expected utility.My work complements the analysis of out-of-sample bond return predictabilityby Thornton and Valente (2012) in two dimensions. First, in examining the eco-nomic significance of predictability, Thornton and Valente (2012) fix a mean vari-ance rule and then evaluate the information value of yield curve in shaping bondportfolios. In contrast, I argue that the portfolio value of a return predictor alsodepends on the way it is exploited. Hence, my work focuses on a joint evaluation,where for each return predictor, a bunch of policy functions such as parametric ornon-parametric or Bayes rules are included in measuring welfare value. The returnpredictors examined also go beyond those based on the yield curve and containmacro and technical analysis driven factors. Secondly, while out-of-sample analy-sis is conducted in Thornton and Valente (2012), inference is still on a populationlevel statement of the forecast errors, assuming the values of parameters are known.Nonetheless, my work relies on a formal estimation and inference procedure thatvalidates the finite sample properties of return predictors against estimation andmis-specification risks. Since investors only face limited data in reality, we believethat this finite sample approach is more relevant for portfolio management.My analysis also adds to the broad literature on asset allocation under returnpredictability, i.e., Kandel and Stambaugh (1996); Aı¨t-Sahalia and Brandt (2001);Pa´stor (2000), Barberis (2000); Brennan and Xia (2001); Campbell and Viceira(2002); Avramov (2004), etc. The majority of this literature examined optimalallocation in the stock market, while less attention has been placed on the portfolioimpact of bond return predictability.5 My results provide some new evidence onthe benefit of bond market timing with an emphasis on the constraint of limiteddata. In addition, existing studies focused almost exclusively on the unconditionalor average performance of return predictors. Yet, I recognize the potential het-erogeneity in strategy performance and use conditional evaluations to judge theeconomic value of predictors over different economic episodes.The rest of the chapter is structured as follows. Section 2 lays out the limiteddata bond investment framework. There, I also describe the utility metric along5Indeed, bond markets are known to possess higher levels of predictability than stock marketsand hence, the role of bond predictors could be more critical.5with the associated estimation and inference procedure. Section 3 conducts empiri-cal analyses in the US Treasury bond market and discusses the results. Robustnesschecks are illustrated in Section 4, and the last section concludes.1.2 Investment FrameworkThis section lays out the investment decision framework. I consider a single periodbond allocation problem in which the excess return of long term bond is predictable.However, the true state/forecasting variable as well as its joint distribution with re-turn are unknown and have to be estimated with a finite history of data. I describerespectively the allocation rule, its performance measure, the estimation of perfor-mance as well as the relevant inference procedure.1.2.1 Bond allocation ruleConsider a Treasury bond investor who allocates his current wealth Wt betweena short term 1-year discount bond and a longer term n-year one. The investmenthorizon is τ so the position is held until t + τ and then liquidated. With τ equalto a year, the 1-year bond matures at face value and is risk-less in nominal terms.While, the n-year bond will be sold as an (n-1)-year bond whose price is unknownbeforehand. The long-term bond’s log return r(n)t+τ is therefore random. Its expectedvalue in excess of the log risk-less rate r ft = r(1)t+τ is refered to as bond risk premiumEt [r(n)t+τ − r ft ].The investor’s preference admits an expected utility representation with aCRRA function defined over the terminal wealth Wt+τ . At time t, the investor puts afraction αt of wealth into the n-year bond based on the conditional density of returnand his risk tolerance 1/γ . The allocation decision is thus a directional / market-timing bet that collects risk premium and does not involve any cross-sectional arbi-trage.0Estimation windowInformation set: φt tPortfolio decision α(φt)Forecasting windowt+τIn making allocation decision, the true conditional density is unknown, but a6finite history (sample realization) of return~rt = {r(n)τ , . . . ,r(n)t } and state variables~zt = {z0, . . . ,zt} is available. Following bond literature, zt include both yield curveand macroeconomic fundamental data which form the basis of various return fore-casting factors. This historical data, of length t and denoted as φt = {~rt ,~zt}, may beused to estimate the desirable forecasting model and the corresponding portfoliopolicy. Thus, the allocation choice αt in this finite history setting is data dependent,and the rule α(.) can be viewed as a generic estimator formally defined as follows:Definition 1. An allocation rule, or portfolio strategy, α(.) is a mapping fromrealizations of historical data in the estimation window to the set of allocationpositions.α(φt) : Φt → A ,where Φt is the range of historical (sample) data φt and A = (−∞,+∞) is theadmissible set of portfolio weight on long term bond.1.2.2 Measurement of performanceThe performance of each allocation rule α(.) is assessed based on the expectedutility it generates. Given a history of φt , the realized utility is derived asU(α(φt),r(n)t+τ ) =(α(φt)er(n)t+τ +(1−α(φt))er ft )1−γ1− γ ,where dependence on r ft is suppressed since it is observed at t and thus is an ele-ment of φt . This realized utility is a random variable as both φt and r(n)t+τ are random.Accordingly, it should be averaged across realizations of both historical data andfuture return, as suggested in the following (unconditional) notion of performancemeasure:Definition 2. An unconditional welfare measure of the allocation rule α(.) is theunconditional expectation of realized utility:EU [α(.)] = E fφt ,r(n)t+τ[U(α(φt),r(n)t+τ )],where fφt ,r(n)t+τ is the joint density of historical data and forecasting period return.7By integrating over historical data, the above metric explicitly accounts for theeffect of estimation uncertainty on portfolio performance. Meanwhile, it reflectsmis-specification risk as modeled forecasting relations do not necessarily coincidewith the true one. These two ingredients help us to focus on the practical, or lim-ited data usefulness of any portfolio strategy. Besides, this unconditional welfaremeasure can be further modified to examine potential heterogeneity in allocationrule performance. In particular, we will consider utility expectations that are condi-tional on certain regime of the business cycle or market volatility measured by aneconomic state variable st , i.e. ,EU [α(.)|st = s] = E fφt ,r(n)t+τ |st=s[U(α(φt),r(n)t+τ )],where fφt ,r(n)t+τ |st=s is the joint density of historical data and future return conditionalon current economic state being s. The regime s, for example, can be a boom / bustepisode, sboom / sbust , if the contemporary unemployment rate is below / above itsaverage level, or a high / low turbulence episode, shvol / slvol , if past year’s realizedbond market volatility is greater / less than its mean.1.2.3 Estimation of welfare metricOur welfare measure, either unconditional or conditional, is a frequentist notionof average realized utility achieved over repeated samples drawn from the true dis-tribution. However, in estimating this quantity, only one single path of data isavailable. To overcome this issue, I rely on a sequence of “pseudo” repeated exper-iments generated by an out-of-sample portfolio construction exercise. Specifically,let T denote the total number of observations available to the econometrician and tbe the number of observations accessible to the investor (in his portfolio estimationwindow). Thus, m = T − t − τ + 1 would represent the number of out-of-sampleperiods. At each time j, t ≤ j < t +m, our investor is asked to make allocationdecision based only on the historical data within [ j− t +1, j], i.e., rolling windowscheme. The rationale of using rolling window rather than the expanding one isthat: in quantifying limited data value, our allocation rule α(φt) and performancemeasure are set to be history size specific. Accordingly, the length of data availableto investor at each portfolio decision experiment needs to stay the same.8Based on the above argument, I propose an estimator of the (unconditional)welfare to be the out-of-sample average of realized utility, expressed as:ÊU [α(.)] = 1mT−τ∑j=tU(α(φ j),r(n)j+τ ),where φ j stands for the sample data between [ j− t +1, j]. Assume that the wholetime series follows certain mixing property and denote that the population levelof utility expectation at period j to be EU j[α(.)], ÊU[α(.)]− 1m ∑T−τj=t EU j[α(.)]will converge almost surely to zero as m goes to infinity. (See the strong law oflarge numbers for mixing process in White (1984) Corollary 3.48 p.49)6 Here, I donot require stationarity but instead, I allow the data to possess considerable hetero-geneity. In particular, I allow the whole time series to be characterized by structuralshifts at unknown date. This assumption of data heterogeneity is more realistic thanthe assumption of stationarity. It further justifies the use of rolling window schemesince local approximation may be less biased in cases of instability.The conditional notion of welfare can be estimated in a similar way, exceptthat averaging is now only over portfolio exercises with the same economic regime.Denote s j to be the level / regime of certain economic state at allocation time j, theperformance of α(.) conditional on regime s is then estimated as:ÊU [α(.)|s] = 1msT−τ∑j=tU(α(φ j),r(n)j+τ)I(s j = s),with I(.) being the indicator function and ms being the number of observationswith regime s. Finally, all welfare estimates, either unconditional or conditional,are translated into certainty equivalent returns, ĈE[(α(.))] = U−1(ÊU [α(.)]), forease of exposition.1.2.4 Inference on utility benefitOur performance estimate of any allocation rule α(.) is benchmarked against thatof a simple strategy which ignores predictability. In particular, the benchmark6When data is stationary, unconditional expected utility EU j[α(.)] = EU [α(.)] is the same acrosstime, so ÊU [α(.)] will converge to the constant EU [α(.)].9strategy, denoted by α0(.), does not have access to past values of return predictors~zt and can only map past realization of returns into portfolio position: α0(φt\~zt) =α0(~rt). The difference in estimated welfare between an allocation rule that usesthe predictors and a benchmark that discards them, ÊU[α(.)]− ÊU[α0(.)], thusreflects the portfolio value of relevant return predictors. However, ÊU is only apoint estimate of the true expected utility. Hence, to account for the samplingvariability in ÊU , a formal inference procedure is needed.Unconditional inferenceFor unconditional inference, the null hypothesis we are interested in is that, onaverage, market timing does not generate any expected utility difference relative tothe benchmark:H0 : E[U(α(φ j),r(n)j+τ)−U(α0(~r j),r(n)j+τ )] = 0, ∀ j = t, ...,T − τ .Note that, expectation here is taken with respect to all possible sample paths of theentire stochastic process {rt+τ ,zt}T−τt=0 .The alternative to H0 is specified in a global way, as distribution of sampledata and return are non-identical over time. Denote ∆U j, j+τ = U(α(φ j),r(n)j+τ)−U(α0(~r j),r(n)j+τ ) and let ∆U t,m = 1m ∑T−τj=t ∆U j, j+τ ,HA : E[|∆U t,m|]≥ δ > 0, for small δ and all m sufficiently large.The testing procedure borrows from those developed in the forecast evaluationliterature (e.g. Diebold and Mariano (1995), West (1996), Clark and McCracken(2001), Giacomini and White (2006)). This stream of research has traditionallyfocused on equal forecast accuracy between two competing forecasts, in which theobjects of interest are typically quadratic loss (squared error), directional accuracy,or predictive log-likelihood. However, in our framework, the primary purpose ofreturn forecast is to make allocation decision. Accordingly, forecast evaluation isaddressed in a structural way that integrates the portfolio optimization process. Therelevant loss is the negative of realized utility and can no longer be expressed asfunction of forecast errors.While conceptually distinct, the asymptotic results established in existing liter-10ature can still be applied. Particularly, I employ the testing framework (for predic-tive superiority) in Giacomini and White (2006), which allows for a mixing dataenvironment. The test is based on the following Wald-type statistic:Tt,m = m(∆U t,m) ˆΩ−1m (∆U t,m),where ˆΩm is a suitable HAC estimator of the asymptotic variance Ωm =var[√m∆U t,m].7A level α test rejects the null of equal performance whenever Tt,m > χ21,1−α ,where χ21,1−α is the 1 − α quantile of χ21 distribution. The underlying justifi-cation of such test follows central limit theorem for mixing process stated inWooldridge and White (1988) and other standard asymptotic arguments in White(1984) and Giacomini and White (2006).Conditional inferenceWhereas the above analysis focused on the unconditional value of market-timing, conditional inference tests for expected utility difference conditional ona particular economic regime. The null hypothesis considered now is:Hc0 : E[∆U j, j+τ |s j = s] = 0, ∀ j = t, ...,T − τ ,with s being certain business cycle or market volatility regime. As mentionedabove, I consider current economy being at a boom / bust state, sboom / sbust , ifthe contemporary unemployment rate is below / above its average level, and ata high / low turbulence regime, shvol / slvol , if past year’s realized bond marketvolatility is greater / less than its mean. In addition, I also conditioning on thelagged relative performance, ∆U j−τ , j, to let the economy be in a positive / nega-tive past performance state, splag / snlag, if ∆U j−τ , j > / < 0. Those conditioninginstruments will help us to examine whether relative performance of market timingis uniform across the business cycle, turbulence dependent or characterized by per-sistence. As a result, it could potentially guide us to fine tune the portfolio advicebased on current state if performance heterogeneity is detected.87We truncate covariance estimator at the lag of 18 since returns has a 12 months overlapping. Theconstruction of HAC estimator is illustrated in Andrews (1991).8For instance, if relative performance is significantly positive in boom and negative in recession,11Fixing a regime s, the testing procedure relies on the same Wald-type statisticas in the unconditional test except that it uses only samples with s j = s. As before,under certain regularity conditions on the mixing coefficients (c.f., White (1984);Giacomini and White (2006)), such test has correct size and is consistent againstthe alternative of HcA : E[ |∆U t,mc | | s]≥ δ > 0, where U t,mc is now the average ofutility realizations conditional on the state s.1.3 Empirical AnalysisUsing framework developed in the previous section, I now look into the perfor-mance of bond market timing empirically. I first describe the data used, the returnpredictors considered and the types of portfolio policies entertained. I then presentthe empirical findings and discuss their implications.1.3.1 Data and utility assumptionI use monthly data on US Treasury bond and macroeconomic fundamentals. Bondprices are obtained from Fama-Bliss data set in Center for Research in SecurityPrices (CRSP) and contain observations of zero coupon (discount) bonds with ma-turity one to five years. The spanning period considered starts from Jan 1964 andends at Dec 2011. Macro fundamental data consists of a balanced panel of 131economic series. Such data set is originally collected in Stock and Watson (2002)and Stock and Watson (2005), later expanded by Ludvigson and Ng (2009) andLudvigson and Ng (2011), and available on Sydney C. Ludvigson’s website. Thispanel starts at Jan 1964, but lasts only until Dec 2007.Regarding primitive on the CRRA preference, it is common practice in theportfolio allocation literature to consider relative risk aversion γ ranging from 5to 10, but a higher value of γ = 20 are also entertained when gauging the effectof varying γ (See for instance, Barberis (2000)).9 Following this tradition, wethis would imply that we shall avoid market timing during economic recession.9Decision theory literature and experimental economists have shown some evidence that individ-ual’s risk aversion level when making lottery choices should not exceed a number of 5. We point outhere that a portfolio manager operating in the financial market may have a different risk appetite. Infact, according to Figure 1 in van Binsbergen et al. (2012) which estimates the cross sectional distri-bution of US mutual fund managers’ risk appetite, the density of risk aversion peaks at 10 to 25 andis skewed to the right.12pick γ = 10 for most of our portfolio allocation exercises and then change this riskaversion level at 5, 15 and 20 as robustness checks. While the main conclusionsare robust to each γ , my analysis shows that a low value of risk aversion at 5 wouldinduces highly levered positions for certain predictors considered and lead to ex-post bankruptcy at some states.1.3.2 Implementing allocation rulesRecall that, since an allocation rule is defined as a function of sample data φt , thesize of investor’s information set need to be pre-specified. I assume that investoralways face a historical data of 15 years length, i.e. φt include 180 monthly ob-servations. Then based on this 15 year rolling window scheme, I describe on howto construct various return forecasting factors and how to estimate the associatedpolicies using available sample on bond prices and macroeconomic series.Return predictors constructionI construct major return predictors identified in the bond forecasting literature.Those factors are either directly observed or themselves estimated through histori-cal data. I classify them into three categories: those based on yield/forward curve;on macro fundamental; or on bond market technical analysis.Yield/forward curve driven factors:The first factor I consider is the Fama-Bliss (FB) forward spread. I calculatelog forward rate at time t for loans between t +n−1 and t +n as: f (n)t = p(n−1)t −p(n)t ,n = 1, ...,5, where p(n)t is the log price at time t of the n-year discount bond. Ithen record the n-year forward spread to be FB(n)t = f (n)t − f (1)t . As documented inFama and Bliss (1987), this factor forecast annual excess return of the n-year bond,which we label as rx(n)t+1 = r(n)t+1 − rft = p(n−1)t+1 − p(n)t + p(1)t .The second predictor I study is the Cochrane-Piazzesi (CP) factor. While theFB(n)t predictor is maturity dependent, Cochrane and Piazzesi (2005) suggests thata single factor summarizes bond premium across maturity. This single return-forecasting factor is estimated through a (first stage) predictive regression of aver-age excess return on the whole forward curve. Specifically, let rxt+1 = 14 ∑5n=2 rx(n)t+1be the average (across maturity) annual excess return, CP factor is formed as the13fitted value from:rxt+1 = γ0 + γ1 f (1)t + γ2 f (2)t + . . .+ γ5 f (5)t + εt+1.The regression uses only data on bond prices within the information set φt ,10 andthe (estimated) CP factor is denoted by: ĈPt = γˆ0 + γˆ1 f (1)t + γˆ2 f (2)t + . . .+ γˆ5 f (5)t .The third predictor I account for is the cycle factor (cf) proposed inCieslak and Povala (2012). The construction of this factor rests on a decompo-sition of log yields, y(n)t = − 1n p(n)t , into persistent component ηt and shorter-livedfluctuations c(n)t (cycles). ηt relates to the long run inflation expectation and isproxied by discounted moving average of realized core CPI, while c(n)t , the tran-sitory part, is counted by residual. Following the authors’ suggestion, I regresslog yields of different maturity on the contemporary level of long-run inflation ex-pectation proxy: y(n)t = b(n)0 + b(n)η ηt + εη , and obtain cycle as the fitted residualc(n)t = y(n)t − ˆb(n)0 − ˆb(n)η ηt . I then project the average excess return onto the cross-sectional composition of these cycles to form a (single) return-forecasting factor.In particular, I estimaterxt+1 = θ0 +θ1c(1)t +θ2ct + εt+1, where ct =145∑n=2c(n)t ,and record the fitted linear combination as cycle factor ĉ f t = ˆθ0 + ˆθ1c(1)t + ˆθ2ct .Macro and technical analysis driven factor:Macroeconomic fundamentals also predict bond excess return. I followLudvigson and Ng (2009) and Ludvigson and Ng (2011) to estimate one such fac-tor. I first extract J principal components, ˆft = ( ˆft,1, ..., ˆft,J) from the set of131 macroeconomic series, where J << 131. Extraction relies on asymptoticPCA, and the number J is determined by the information criteria developed inBai and Ng (2002). I then perform best subset selection among different subsets of{ ˆf 3t,1,{ ˆft, j, ˆf 2t, j; j = 1, ...,J}} using the BIC criteria.11 Given a preferred subset F̂t ,10It is worth noticing that, in implementing the regression with most recent 180 observations onprices, actually only 180−12 = 168 number of excess returns have been realized and can thus serveas regressands.11The motivation is that pervasive components in ˆf (those with large eigenvalues) is not necessarilythe ones most relevant for prediction.14I estimate the Ludvigson and Ng factor by runningrxt+1 = δ0 +δ ′1F̂t + εt+1,and it follows that L̂Nt = ˆδ0 + ˆδ ′1F̂t .The technical analysis driven factor I consider is implemented in a similar wayexcept that, principal components are extracted from a set of technical indicators.Following Goh et al. (2012), I build those indicators by comparing two (short andlong) moving averages of forward spread. Let MAn, jt = 1j ∑ j−1k=0 f (n)t− j/12, j ∈ {s, l}be the s (short) or l (long) months moving average of n-year forward spread,I(MAn,st > MAn,lt ) would then define one such signal. Combining n = 2,3,4,5,s =3,6,9, and l = 18,24,30,36 gives us a total of 48 signals. The return-forecastingfactor T̂A is the fitted value of predictive regression on the selected subset of theextracted principal components from the 48 signals.As a finally remark, all the estimations use only past 15 years of data. Hence,both J and the composition of subset F̂t in L̂N and T̂A may vary over time as φtgets updated monthly.Policy function estimationGiven return predictors z ∈ {FB, ĈP, ĉ f , L̂N, T̂A}, I now turn to the questionof how to transform the information contained into portfolio decisions. I lay outtwo empirical procedures to estimate the portfolio weights. The first one assumesand estimates a conditional log-normal distribution for the return generating pro-cess, and then solve for an optimal policy function under the estimated distribution.While, the second one, based on Brandt (1999), bypasses the need to specify astatistical model on return and directly estimates the portfolio weight in a non-parametric GMM framework.Linear parametric rules:This is the standard plug-in strategy with portfolio weight determined by theestimated mean excess return divided by its estimated variance, scaled down bythe risk aversion level. I assume that bond excess returns rx(n)t+1 is log-normallydistributed conditional on z and model the return generating process through apredictive regression expressed as rx(n)t+1 = β (n)zt +ut+1, with homoskedastic error15term ut+1 ∼ N(0,σ 2). Using conditional distribution estimated from data φt , Ianalytically solve for the approximate (up to a log linearization) optimal portfolioweight under this estimated distribution:12α(φt) =ˆβ (n)zt + σˆ 2/2γσˆ 2 .Note that this allocation rule is linear in terms of the current value of return predic-tor, a consequence of the log-normal assumption. The benchmark strategy, whichignores predictors, will be a special case of the parametric strategy with zt ≡ 1.Non-linear non-parametric rules:This strategy allows for a non-linear response to the value of the predictor.Following Brandt (1999), the optimal portfolio weights given a return predictor arenow estimated directly through investor’s conditional Euler equations. In particular,denote αt to be the choice variable on portfolio weight at time t, the first orderconditions that characterize the portfolio optimization problem can be expressedas:Et[(αterx(n)t+1+r f ,t +(1−αt)er f ,t )−γ(erx(n)t+1+r f ,t − er f ,t ) | zt = z]= 0.These FOCs serve as a set of moment conditions (for each z), and then the methodof moments estimator is applied separately in each value of the predictor. Collec-tively, this yields a point-wise, or non-parametric estimate of the allocation ruleα(z).Operationally, to replace the conditional expectation (point-wise on z) with aproper empirical counterpart, I use sample analog with each observation weightedaccording to the similarity of its predictor level with the current value z. I adopt anormal kernel density g( z j−zht ) on each observation {rx(n)j+1,z j}, where ht is a datadependent bandwidth. By standard practice, I set ht = 1.06σˆzt−0.2, with σˆz beingthe standard deviation estimate of {z j}tj=1 within the estimation window and t thewindow length. The empirical moment condition at time t, denoted by Qt(αt), with12The derivation follows Campbell and Viceira (2002) and uses Taylor expansion on the log func-tion to connect the log portfolio return with log individual asset returns. Based on a log linear ap-proximation, the log portfolio return is still normal and the corresponding moments can be computedanalytically.16zt = z and given choice variable αt , is expressed asQt(αt) =∑t−τj=1((αterx(n)j+1+r f ,t +(1−αt)er f ,t )−γ(erx(n)j+1+r f ,t − er f ,t )exp(− (z j−z)22h2t))∑t−τj=1 exp(− (z j−z)22h2t),with the numerator normalizing the weights to sum up to one. The optimal portfolioweight at time t, conditioning on zt = z, is estimated through,α(φt) = argminαt(Qt(αt))2.Note that, the above procedure has not relied on any statistical model of the re-turn process or any (parametric) functional form of the portfolio policy. Thus, theresulting estimator is less biased and robust to policy function mis-specification.However, non-parametric estimation comes at the cost of loosing observations. Asthe effective sample size is decreased due to kernel weighting, variance of esti-mated portfolio weight would be increased (relative to a correctly specified para-metric estimator).13 Such efficiency loss is particularly severe when the currentvalue of predictor, zt , falls into a sparse region of {z j}t−τj=1, and thus lacks similarobservations. Technically, the numerator in moment estimate ∑t−τj=1 exp(− (z j−zt)22h2t)at sparse sate would be too close to zero, and the portfolio weight estimate atthis point/state would be poor. To partially address this concern, I consider trim-ming zt when ∑t−τj=1 exp(− (z j−zt)22h2t) is below certain threshold. As an example,we use the 10% quantile of density estimates computed at all other observations{∑t−τj=1 exp(− (z j−zi)22h2t)}t−τi=1 . When triggered, trimming will switch the portfolio ad-vice to the benchmark one. In this way, estimation error at sparse states are con-trolled. Yet, the remaining additional risk at other states would still be disliked bya risk averse investor. Hence, in a limited data environment, it is not clear a` priorwhether non-parametric strategies will dominate the parametric ones.13see Brandt (1999) for the expression of standard error on the estimated portfolio weights and therelevant discussion on the estimator’s asymptotic properties.171.3.3 Empirical findingsStatistical accuracyI start by looking at the statistical accuracy of the above mentioned predictors in theout-of-sample environment. As a benchmark, I first use the historical mean, whichcomplies with a no-predictability belief. I then switch to the constructed factorsz ∈ {FB, ĈP, ĉ f , L̂N, T̂A}. I plot in Figure 1.1 the 15 years rolling window returnforecasts (green), averaged across maturities: ∑5n=2 r̂x(n)t+12, in comparison to theactual values (red). Each panel corresponds to a particular predictor. As shownin the graph, the benchmark forecast based on the historical mean (top left) turnsout too flat relative to the realized ones which fluctuates heavily. In contrast, theforward spread FB (top right) picks up some of the variations in realized return.The forward curve factor ĈP (middle left) further improves the forecast precisionin many episodes. Yet, it breaks down in certain periods such as early 80s and the07-10 crisis. Similarly, the cycle factor ĉ f (middle right) captures lots of the returnspikes, especially from mid 80s to early 00s. But, as ĈP, it fails severely duringthe early 80s and the crisis period.14 Macro based predictor, L̂N (bottom left), isrelatively more persistent, but it catches a lower frequency return trend. Finally,technical factor, T̂A (bottom right), forecasts the correct sign in most instances,while as FB, it does not perform too badly during the crisis.To provide a more quantitative assessment, I resort to the metric of out-of-sample R2OS, which measures the percentage reduction in mean squared forecasterror (MSFE) by each return predictor z relative to the random walk benchmark:R2OS = 1−∑T−12t (r(n)j+12 − rˆ(n),zj+12)2∑T−12t (r(n)j+12 − r¯(n)j+12)2.Within this expression, rˆ(n),zj+12 and r¯(n)j+12 are respectively the predictor driven and his-torical mean based annual excess return forecast. Hence a positive R2OS would indi-cate favor to that predictor under mean squared loss. I employ the Clark and West(2007) MSFE-adjusted test to gauge the significance of R2OS. The null hypothesis is14Note that both FB and ĈP comes back after the crisis.18Figure 1.1: Out-of-Sample Bond Return Forecasts with Different PredictorsNotes: This figure illustrates the 15 years rolling-window out-of-sample forecasts of annual bond excess returnbased on alternative predictors. Red lines represents the realized annual bond excess return (averaged over 2 to5-year maturity bonds) and green lines denote, respectively, the predicted value based on historical mean (Hist,top left); Fama-Bliss (1987) forward spread (FB, top right); Cochrane and Piazessi (2005) forward factor (CP,middle left); Cieslak and Povala (2012) cycle factor (Cycle, middle right); Ludvigison and Ng (2009) macrofactor (LN, bottom left) and Goh et al (2011) technical analysis factor (TA, bottom right). Data sample rangesfrom Jan 1964 till Dec. 2011 with bond prices observed at monthly frequency. Most constructed predictors spanthe same period, while the macro factor LN stands as an exception since the panel of macro series is availableonly till Dec 2007.1980 1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Realized Excess ReturnHistorical Mean Values1980 1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Realized Excess ReturnFB Predicted Values1980 1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Realized Excess ReturnCP Predicted Values1980 1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Realized Excess ReturnCycle Predicted Values1980 1985 1990 1995 2000 2005−10−8−6−4−20246810 Realized Excess ReturnMacro (LN) Predicted Values1980 1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Realized Excess ReturnTechinal Indicator Predicted Values19that, predictor does not reduce expected squared forecast error, i.e., H0 : E[R2OS]≤ 0,and is against a one-sided alternative that it does, i.e., HA : E[R2OS]> 0. I examineboth the whole sample period (ended at Dec 2011) and the pre-crisis one (endedDec 2007). According to the p-values reported in Table 1.1, we find that, exceptfor FB at 5 year maturity and L̂N for 2 year bond, all nulls are rejected at 10%confidence level. We therefore conclude: (1) bond excess returns are not character-ized by random walk and (2) predictors considered, z ∈ {FB, ĈP, ĉ f , L̂N, T̂A},are generally still valid in terms of statistical accuracy under this out-of-sampleenvironment.15Portfolio (welfare) value – unconditional evaluationI now turn to the welfare value of above predictors in making bond allocation /market timing decisions. I depict in Figure 1.2 the rolling window portfolio choicesof a CRRA investor using either parametric or non-parametric rules with 15 yearsof data. Within each panel, a particular predictor is considered and the resultingportfolio weights on the long term bond are plotted against that of the benchmarkstrategy. For illustration, I visualize only the case of risk aversion γ = 10 andmaturity of long term bond n = 5. We see that parametric rule weights basedon alternative predictors are generally quite similar to the non-parametric ones,suggesting that linear policy is a reasonable approximation. However, when valueof predictor falls into the sparse region, non-parametric estimates will get too noisyand are thus trimmed. Besides, the magnitude of parametric and non-parametric(trimmed) weights range from about -300% to 400% (on the 5-year risky bond) forFB; -500% to 500% for CP; -800% to 800% for c f ; -600% to 400% for LN; and-800% to 1000% for TA, which looks quite extreme. But this is not too surprising.Given that bond risk premiums are small and predictive regression R2s are high (atleast relative to the case of equity return prediction), it is intriguing to take a bit15We are aware of the multiple and simultaneous hypothesis testing issue that with a total numberof 5 × 4 = 20 hypothesis tested in the same time, the likelihood of witnessing a rare event andtherefore the family-wise type I error rate increase ( See Dunn (1961) and Holm (1979) ). We conductBonferroni correction and Holm-Bonferroni method as conservative ways to control for family wiseerror rate. With whole sample data, we can no longer reject the null hypothesis of no-predictability.But with pre-crisis data, the cycle factor survives these corrections since the associated p-values arelower than the significance level α = 0.05 divided by 20.20Table 1.1: Statistical Accuracy of Bond Return Forecast in Rolling WindowSchemeNotes: This table documents the out-of-sample R2OSs in predicting annual excess return of n-year governmentbond computed from a 15 years rolling window scheme. Entries in the parenthesis are the p-value of Clark andWest test which assess the significance of R2OS, i.e. H0 : E[R2OS] ≤ 0 against HA : E[R2OS] > 0. Each columncorresponds to a particular factor: Fama-Bliss (1987) (FB); Cochrane-Piazzesi (2005) (CP); Cieslak and Povala(2012) (c f ); Ludvigson and Ng (2009) (LN) and Goh et al (2012)’s technical analysis (TA) factor, in that order;while each adjacent two rows document the the evaluation results for a particular maturity of bond, from 2 yearsto 5 years. The whole sample results are reported in Panel A and the pre-crisis (till Dec 2007) ones are illustratedin Panel B.A. Whole sample: 1964/01 : 2011/12Maturity FB CP cf LN TA2-years 0.0686 -0.0710 0.1654 -0.0580 0.0219(0.0244) (0.0240) (0.0052) (0.1309) (0.0072)3-years 0.0832 -0.0438 0.1405 -0.0265 0.0853(0.0134) (0.0244) (0.0050) (0.0887) (0.0071)4-years 0.0910 -0.0222 0.1583 -0.0124 0.1233(0.0155) (0.0207) (0.0039) (0.0741) (0.0059)5-years -0.0001 -0.0142 0.1530 -0.0091 0.1556(0.1687) (0.0257) (0.0041) (0.0734) (0.0046)B. Pre-crisis period: 1964/01 : 2007/12Maturity FB CP cf LN TA2-years 0.0830 0.0394 0.2987 -0.0580 0.0403(0.0157) (0.0061) (0.0019) (0.1309) (0.0057)3-years 0.0984 0.0752 0.2894 -0.0265 0.1061(0.0087) (0.0057) (0.0015) (0.0887) (0.0060)4-years 0.0992 0.1002 0.3084 -0.0124 0.1455(0.0136) (0.0041) (0.0011) (0.0741) (0.0051)5-years -0.0009 0.1028 0.3077 -0.0091 0.1760(0.1819) (0.0053) (0.0011) (0.0734) (0.0044)21Figure 1.2: Out-of-Sample Bond Portfolio Weights under Different RulesNotes: This figure depicts 15 years rolling-window estimates of the portfolio weights based on alternative predic-tors interacted with different portfolio policy functions. For illustration purpose, plotted are only the weights ofallocation between one year risk free bond and 5-year maturity one. Different panels correspond to different bondpredictors: Fama-Bliss (1987) forward spread (FB, top left); Cochrane and Piazessi (2005) forward factor (CP,top right); Cieslak and Povala (2012) cycle factor (Cycle, middle left); Ludvigison and Ng (2009) macro factor(LN, middle right) and Goh et al (2011) technical analysis factor (TA, bottom). Within each panel, the estimatedportfolio weight based on predominant mean is plotted through a red line and serves as the benchmark. Blue andgreen curves denote the portfolio weights generated from, respectively, the parametric and nonparametric rulesin conjunction with the corresponding predictor. Risk aversion is set at 10.1980 1985 1990 1995 2000 2005 2010−3−2−10123 Benchmark weight (5−yr bond)FB parametricFB non−parametric trim1980 1985 1990 1995 2000 2005 2010−5−4−3−2−1012345 Benchmark weights (5−yr bond)CP parametricCP non−parametric trim1980 1985 1990 1995 2000 2005 2010−8−6−4−202468 Benchmark weights (5−yr bond)Cycle parametricCycle non−parametric trim1980 1985 1990 1995 2000 2005−6−5−4−3−2−101234 Benchmark weights (5−yr bond)LN parametricLN non−parametric trim1980 1985 1990 1995 2000 2005 2010−8−6−4−20246810 Benchmark weights (5−yr bond)TA parametricTA non−parametric trim22leverage in collecting those premiums.I then measure the performance of these allocation rules through the uncondi-tional expected utilities they achieve. I report in Table 1.2, for each predictor inturn, the point estimates of certainty equivalent gross returns (CERs) of parametricand non-parametric strategy, as well as the unconditional inference results on theirwelfare benefits relative to the benchmark.16 From the rows para CER and non-para CER, we observe that estimated CERs of predictors based timing strategiesare frequently lower than that of the no-predictor benchmark (row Bench CER).Taking CP factor as an example, the resulting CER estimates for either policies un-der different bond maturities n range from about 0.996 to 1.039, while those of thebenchmark strategy are no lower than 1.066. Likewise, c f , LN, TA based markettiming all deliver below benchmark welfare estimates. In fact, the cycle factor, c f ,which has the highest out-of-sample R2, is generating the lowest estimated CERsfor each maturity (column) n. The forward spread FB, interacted by linear para-metric policy, stands out as an exception for which the CER estimates reach 1.0667and 1.0675 when n = 2,3. However, according to the p-values of the unconditionaltests (row p-value (P) in panel FB), those welfare improvement are not statisticallysignificant.17 These findings tells us that, despite the fact of non-random walk /return predictability, it is indeed difficult to transform the information contained inidentified predictors into expected utility gains at least by the above policies.As an additional check, I contrast the performance of each parametric timingstrategy against the corresponding non-parametric one with same underlying pre-dictor. In terms of CERs estimates, we find evidence are mixed, favoring non-parametric (and trimmed) policies when equipped with CP, LN, or TA, but leaningto parametric ones when using FB or c f . In terms of equal performance tests, wegauge the significance on estimated welfare differences and find that none of thep-values (rows p-value (P v.s NP)) are below the conventional threshold.18 This16We keep γ = 10 but include the whole range of long term bond’s maturities n = 2,3,4,5. Anal-ysis of other risk aversion levels are postponed to robustness check section.17On the other hand, the under-performance of other strategies are not statistically significant eitheras p-values are seldom smaller than 10%.18In our structural test, the utility difference now is between parametric and non-parametric policyrather than against the no-predictor benchmark. The rest of the inference procedure remains thesame.23Table 1.2: Unconditional Evaluation of Bond Allocation RulesNotes: This table reports, for each predictor in turn, the unconditional evaluation results of alternative bond markettiming strategies. The rows labeled ”Para CER” and ”Non-para CER” denote, respectively, the point estimates ofthe certainty equivalent gross return of the parametric and non-parametric allocation rules based on a particularpredictor. And the rows labeled ”p-value (P)”, ”p-value (NP)” report the p-values of an unconditional test onthe null that no-predictability benchmark strategy achieves the same expected utility as the corresponding timingrule. Rows labeled ”p-value (P vs NP)” record testing results that compare parametric against non-parametricrules. Different column corresponds to a different maturity of the long term bond considered. All portfolios areconstructed by 15 years rolling window estimate and investor’s risk aversion is set at 10.Whole sample: 1964/01 : 2011/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0661 1.0669 1.0678 1.0666FBPara CER 1.0675 1.0697 1.0683 1.0502p-value (P) (0.7204) (0.5423) (0.9462) (0.3758)Non-para CER 1.0660 1.0604 1.0503 1.0552p-value (NP) (0.9682) (0.4627) (0.1847) (0.0759)p-value (P vs NP) (0.6957) (0.3660) (0.1335) (0.7475)CPPara CER 0.9680 0.9645 0.9470 0.9116p-value (P) (0.1697) (0.1825) (0.1928) (0.2375)Non-para CER 1.0032 0.9982 1.0260 1.0358p-value (NP) (0.2419) (0.2243) (0.1471) (0.1183)p-value (P vs NP) (0.5257) (0.5831) (0.3043) (0.2807)cfPara CER 0.7223 0.4232 0.3181 0.2537p-value (P) (0.2951) (0.3040) (0.3009) (0.3030)Non-para CER 0.2935 0.0179 0.0194 0.0390p-value (NP) (0.3054) (0.2984) (0.3055) (0.3055)p-value (P vs NP) (0.3054) (0.2984) (0.3055) (0.3055)LNParametric CER 0.7765 0.8203 0.8525 0.8264p-value (P) (0.3011) (0.2960) (0.2894) (0.2970)Non-para CER 1.0519 1.0393 1.0255 1.0118p-value (NP) (0.2628) (0.1476) (0.1991) (0.2478)p-value (P vs NP) (0.3045) (0.3079) (0.3217) (0.3319)TAPara CER 0.8808 0.7611 0.6951 0.6057p-value (P) (0.2916) (0.3017) (0.3024) (0.3012)Non-para CER 1.0360 0.9425 0.9303 0.9542p-value (NP) (0.2133) (0.2313) (0.1916) (0.1374)p-value (P vs NP) (0.2988) (0.3099) (0.3089) (0.3036)24indicates that correcting for potential non-linearity or using less mis-specified pol-icy does not significantly change strategy performance in this limited data bondallocation experiment.Finally, I repeat all the unconditional performance evaluation analysis usingonly pre-crisis data (ended Dec 2007). I document the results in Table 1.3. Wenotice that all the conclusions are qualitatively the same, and hence, our findingsare not purely driven by the financial crisis happened in 2008.Portfolio (welfare) value – conditional evaluationThus far, the focus has been on the unconditional notion of welfare measure. Butas mentioned above, relative performance of market timing can be heterogeneousacross different economic regimes. Accordingly, I conduct welfare estimates andutility benefit inferences conditional on each particular regime. In particular, I leteconomy be at boom / bust state if the contemporary unemployment rate is below /above its average, (which amounts to 6.2% for our whole sample), and a high / lowturbulence regime when past year’s realized bond market volatility is greater / lessthan its mean.19I present in Table 1.4 the estimated CERs of predictors- based timing strate-gies, along with the associated inference results, conditioning on boom regime(left panel) and bust regime (right panel). We notice that, while forward spreadFB and CP factors appear to generate higher CERs in recession than boom, therest of the factors create higher expected utility estimates in boom than recession.However, conditioning on either regime, the relative performance of timing againstthe benchmark is generally not significant. Exceptions are the technical analysisfactor TA driven parametric policies when conditioning on the boom regime. Theassociated p-values of the structural tests on welfare difference (row p-value (P)in left panel TA) are below 5% when n = 2,3, and the rejections are favoring thebenchmark. This suggests that one should avoid TA based parametric timing using2-year or 3-year bond when the current economic state is boom.Table 1.5 examines whether relative performance of bond market timing are19In principal, we could have a more refined definition of economic regime such as three or fourstage regimes, but this would reduce the number of data for conditional tests and hence decreasepower.25Table 1.3: Unconditional Evaluation of Bond Allocation Rules (continued)Notes: This table repeats the analysis in Table 2 using pre-crisis data. The rows labeled ”Para CER” and ”Non-para CER” denote, respectively, the point estimates of the certainty equivalent gross return of the parametric andnon-parametric allocation rules based on a particular predictor. And the rows labeled ”p-value (P)”, ”p-value(NP)” report the p-values of an unconditional test on the null that no-predictability benchmark strategy achievesthe same expected utility as the corresponding timing rule. Rows labeled ”p-value (P vs NP)” record testing resultsthat compare parametric against non-parametric rules. Different column corresponds to a different maturity ofthe long term bond considered. All portfolios are constructed by 15 years rolling window estimate and investor’srisk aversion is set at 10.Pre-crisis sample: 1964/01 : 2007/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0665 1.0663 1.0667 1.0654FBPara CER 1.0694 1.0701 1.0667 1.0461p-value (P) (0.5035) (0.4232) (0.9918) (0.3458)Non-para CER 1.0666 1.0581 1.0463 1.0521p-value (NP) (0.9893) (0.4008) (0.1662) (0.0583)p-value (P vs NP) (0.5157) (0.2850) (0.1211) (0.7253)CPPara CER 1.0320 1.0344 1.0297 1.0289p-value (P) (0.2794) (0.2040) (0.1921) (0.2428)Non-para CER 1.0023 0.9961 1.0273 1.0394p-value (NP) (0.2929) (0.2691) (0.2261) (0.2305)p-value (P vs NP) (0.3479) (0.3646) (0.9210) (0.6842)cfPara CER 0.7131 0.4170 0.3134 0.2500p-value (P) (0.3004) (0.3024) (0.2991) (0.3012)Non-para CER 0.2892 0.0177 0.0191 0.0385p-value (NP) (0.3036) (0.2966) (0.3038) (0.3038)p-value (P vs NP) (0.3036) (0.2966) (0.3038) (0.3038)LNPara CER 0.7765 0.8203 0.8525 0.8264p-value (P) (0.3011) (0.2960) (0.2894) (0.2970)Non-para CER 1.0519 1.0393 1.0255 1.0118p-value (NP) (0.2628) (0.1476) (0.1991) (0.2478)p-value (P vs NP) (0.3045) (0.3079) (0.3217) (0.3319)TAPara CER 0.8701 0.7503 0.6851 0.5968p-value (P) (0.2896) (0.2998) (0.3006) (0.2993)Non-para CER 1.0297 0.9315 0.9194 0.9439p-value (NP) (0.1591) (0.2197) (0.1841) (0.1292)p-value (P vs NP) (0.3006) (0.3091) (0.3075) (0.3019)26Table 1.4: Conditional Evaluation of Bond Allocation Rules: UnemploymentrateNotes: This table reports, for each predictor in turn, the evaluation results of alternative bond market timingstrategies condition on either a boom or bust regime. A boom/bust regime is defined as the current level of unem-ployment rate below/above its whole sample mean. The rows labeled ”Para CER” and ”Non-para CER” denote,respectively, the point estimates of the certainty equivalent gross return of the parametric and non-parametricallocation rules based on a particular predictor. And the rows labeled ”p-value (P)”, ”p-value (NP)” report thep-values of a test on the null that no-predictability benchmark strategy achieves the same expected utility as thecorresponding timing rule conditional on each regime. Different column corresponds to a different maturity ofthe long term bond considered. All portfolios are constructed by 15 years rolling window estimate and investor’srisk aversion is set at 10.Condition on Boom BustMaturity Maturity2 3 4 5 2 3 4 5Bench CER 1.0605 1.0623 1.0632 1.0614 1.0718 1.0714 1.0720 1.0714FB FBPar CER 1.0557 1.0585 1.0608 1.0616 1.0801 1.0810 1.0751 1.0390p-val(P) (0.3840) (0.3873) (0.6155) (0.9463) (0.1173) (0.1964) (0.8134) (0.3546)Non-par CER 1.0553 1.0442 1.0442 1.0486 1.0768 1.0770 1.0545 1.0607p-val(NP) (0.1065) (0.2490) (0.2302) (0.1411) (0.3177) (0.1930) (0.4320) (0.2818)CP CPPar CER 0.9170 0.9158 0.8990 0.8581 1.0608 1.0482 1.0288 1.0199p-val (P) (0.1765) (0.2088) (0.2409) (0.2769) (0.4681) (0.3802) (0.3195) (0.3062)Non-par CER 0.9609 0.9560 0.9978 1.0126 1.0670 1.0613 1.0601 1.0625p-val (NP) (0.2401) (0.2350) (0.1821) (0.1502) (0.5758) (0.3772) (0.3752) (0.4228)cf cfPar CER 0.9876 0.9744 0.9288 0.8181 0.6708 0.3917 0.2944 0.2348p-val (P) (0.1376) (0.1818) (0.2296) (0.2817) (0.2901) (0.2921) (0.2884) (0.2909)Non-par CER 1.0397 1.0378 1.0311 1.0239 0.2717 0.0166 0.0179 0.0361p-val (NP) (0.1966) (0.2272) (0.2874) (0.3189) (0.2935) (0.2859) (0.2932) (0.2937)LN LNPar CER 1.0388 1.0275 1.0212 1.0307 0.7193 0.7625 0.7955 0.7684p-val (P) (0.3222) (0.2909) (0.2915) (0.3095) (0.2889) (0.2896) (0.2904) (0.2901)Non-par CER 1.0549 1.0494 1.0511 1.0522 1.0477 1.0281 1.0027 0.9804p-val (NP) (0.6707) (0.3141) (0.2768) (0.2421) (0.2623) (0.1945) (0.2221) (0.2518)TA TAPar CER 1.0313 1.0406 1.0473 1.0548 0.8276 0.7070 0.6443 0.5608p-val (P) (0.0192) (0.0232) (0.0698) (0.4108) (0.3000) (0.2935) (0.2916) (0.2892)Non-par CER 1.0409 1.0429 1.0492 1.0563 1.0292 0.8939 0.8786 0.9049p-val (NP) (0.0794) (0.1003) (0.1415) (0.5329) (0.3543) (0.2363) (0.1849) (0.1199)27Table 1.5: Conditional Evaluation of Bond Allocation Rules: RealizedVolatilityNotes: This table reports, for each predictor in turn, the evaluation results of alternative bond market timingstrategies condition on either a high volatility or low volatility regime. A high vol / low vol regime is defined asthe current level of past annual realized volatility above / below its whole sample mean. The rows labeled ”ParaCER” and ”Non-para CER” denote, respectively, the point estimates of the certainty equivalent gross return ofthe parametric and non-parametric allocation rules based on a particular predictor. And the rows labeled ”p-value(P)”, ”p-value (NP)” report the p-values of a test on the null that no-predictability benchmark strategy achieves thesame expected utility as the corresponding timing rule condition on each regime. Different column corresponds toa different maturity of the long term bond considered. All portfolios are constructed by 15 years rolling windowestimate and investor’s risk aversion is set at 10.Condition on High volatility Low volatilityMaturity Maturity2 3 4 5 2 3 4 5Bench CER 1.0631 1.0659 1.0679 1.0680 1.0691 1.0679 1.0676 1.0652FB FBPar CER 1.0650 1.0696 1.0665 1.0358 1.0701 1.0697 1.0701 1.0668p-val (P) (0.5530) (0.3901) (0.8837) (0.3398) (0.8850) (0.7961) (0.7563) (0.7060)Non-par CER 1.0626 1.0673 1.0463 1.0533 1.0694 1.0539 1.0540 1.0570p-val (NP) (0.8922) (0.7599) (0.2853) (0.1093) (0.9470) (0.3301) (0.3576) (0.2657)CP CPPar CER 0.9462 0.9357 0.9103 0.8653 0.9959 1.0052 1.0068 1.0076p-val (P) (0.2423) (0.2193) (0.2017) (0.2304) (0.1065) (0.1014) (0.1338) (0.2120)Non-par CER 1.0630 1.0605 1.0598 1.0613 0.9658 0.9599 1.0004 1.0152p-val (NP) (0.9873) (0.5718) (0.4883) (0.4749) (0.2216) (0.2175) (0.1508) (0.1162)cf cfPar CER 0.6699 0.3917 0.2944 0.2348 1.0422 1.0365 1.0103 0.9590p-val (P) (0.2906) (0.2980) (0.2937) (0.2968) (0.4749) (0.4884) (0.3964) (0.3347)Non-par CER 0.2717 0.0166 0.0179 0.0361 1.0708 1.0653 1.0547 1.0448p-val (NP) (0.2994) (0.2916) (0.2982) (0.2996) (0.9305) (0.9110) (0.7045) (0.6225)LN LNPar CER 0.7212 0.7636 0.7955 0.7696 1.0603 1.0601 1.0594 1.0588p-val (P) (0.2950) (0.2902) (0.2838) (0.2907) (0.5336) (0.5834) (0.6006) (0.6991)Non-par CER 1.0626 1.0576 1.0612 1.0652 1.0418 1.0234 0.9988 0.9766p-val (NP) (0.7651) (0.4407) (0.5036) (0.7456) (0.2619) (0.1777) (0.2075) (0.2375)TA TAPar CER 0.8239 0.7063 0.6441 0.5608 1.0652 1.0707 1.0734 1.0759p-val (P) (0.2871) (0.2954) (0.2961) (0.2948) (0.7953) (0.8300) (0.6527) (0.3382)Non-par CER 1.0088 0.8892 0.8757 0.9026 1.0728 1.0710 1.0712 1.0706p-val (NP) (0.1745) (0.2192) (0.1774) (0.1158) (0.6949) (0.7557) (0.7255) (0.5929)28specific to market turbulence regime. Following Viceira (2012), I measure such tur-bulence at annual frequency through the realized or integrated daily return volatilitybetween time t−252 and t, i.e., ∑ti=t−252(rni,d)2, where rni,d is the daily return of a n-yr bond. Using this volatility measure, I report separately the portfolio evaluationresults conditioning on the state of higher (left panel) and lower (right panel) thanmean volatility. Based on the relevant CER estimates, we find that most timingstrategies more profitable in low volatility than high state. One exception is thenon-parametric policy coupled with CP factor, which has higher CER estimates inturbulent state. Conditioning on either volatility regime, none of the relative perfor-mance against the benchmark is significant according to the p-values of conditionaltests (rows p-value (P) and p-value (NP)).Beyond inspecting heterogeneity over business cycle and volatility regimes, Ifurther check whether relative performance of market timing is persistent. For ex-ample, if market timing outperform benchmark last time, are they more or lesslikely to beat the benchmark this period ? To this end, I condition on the laggedutility benefit ∆Ut−τ ,t = U(α(φt−τ))−U(α0(φt−τ/zt−τ)) and define a positive /negative past performance state if ∆Ut−τ ,t > / < 0. I evaluate strategies condition-ing on these two states and document the results in Table 1.6. Interestingly, we findthat, except for CP factor, market timing generates higher CER estimates when itslagged relative performance is negative. But as before, conditioning on either state,the welfare difference against benchmark is generally not significant.Understanding the failure of timingTo better understand the performance of timing strategies, I plot in Figure 1.3,for each predictor in turn, the rolling window realized returns of parametric (bluecurve) and non-parametric (green curve) policies in comparison to the benchmarkone (red curve). I observe that, although for many periods market timing gener-ates extra profits (blue/green curve above the red one), there are certain episodesin which they lead to huge losses. I attribute these large losses to the presence ofsevere mis-specification, especially un-modeled instability in forecasting relation.For parametric strategies, such instability may be reflected when realized returnsfall into the tail of forecasted distribution at abnormally high frequency. To explore29Table 1.6: Conditional Evaluation of Bond Allocation Rules: Lagged UtilityDifferenceNotes: This table reports, for each predictor in turn, the evaluation results of alternative bond market timingstrategies condition on either a positive or negative lagged relative performance regime. A positive / negativelagged relative performance regime is defined as the current level of lagged realized utility difference above /below its 0. The rows labeled ”Para CER” and ”Non-para CER” denote, respectively, the point estimates ofthe certainty equivalent gross return of the parametric and non-parametric allocation rules based on a particularpredictor. And the rows labeled ”p-value (P)”, ”p-value (NP)” report the p-values of a test on the null that no-predictability benchmark strategy achieves the same expected utility as the corresponding timing rule conditionon each regime . Different column corresponds to a different maturity of the long term bond considered. Allportfolios are constructed by 15 years rolling window estimate and investor’s risk aversion is set at 10.Condition on Positive lagged ∆U Negative lagged ∆UMaturity Maturity2 3 4 5 2 3 4 5FB FBPar CER 1.0600 1.0620 1.0614 1.0345 1.0715 1.0736 1.0709 1.0683p-val (P) (0.5685) (0.8977) (0.9961) (0.3245) (0.1821) (0.2389) (0.7643) (0.7636)Non-par CER 1.0540 1.0419 1.0454 1.0445 1.0762 1.0791 1.0431 1.0643p-val (NP) (0.9309) (0.4116) (0.3700) (0.1130) (0.9434) (0.3194) (0.3222) (0.3174)CP CPPar CER 1.0267 1.0471 1.0597 1.0579 0.9373 0.9269 0.9034 0.8638p-val (P) (0.2506) (0.3149) (0.5566) (0.5253) (0.1858) (0.1830) (0.1829) (0.2302)Non-par CER 0.9587 1.0037 1.0674 1.0648 1.0406 0.9807 0.9973 1.0126p-val (NP) (0.2502) (0.2462) (0.5913) (0.4703) (0.0343) (0.1837) (0.1164) (0.0792)cf cfPar CER 0.6844 0.4003 0.3015 0.2404 0.9710 0.9619 0.9061 0.7936p-val (P) (0.2999) (0.3026) (0.2988) (0.3017) (0.1307) (0.2150) (0.2199) (0.2585)Non-par CER 0.2714 0.0166 0.0180 0.0363 1.0359 1.0464 1.0245 1.0174p-val (NP) (0.3024) (0.2933) (0.3004) (0.3017) (0.3318) (0.4554) (0.3317) (0.3566)LN LNPar CER 0.7228 0.7683 0.8016 0.7731 1.0480 1.0332 1.0270 1.0409p-val (P) (0.2998) (0.2985) (0.2987) (0.2984) (0.3609) (0.3034) (0.3009) (0.3711)Non-par CER 1.0335 1.0217 0.9964 0.9698 1.0756 1.0559 1.0535 1.0574p-val (NP) (0.2896) (0.2147) (0.2407) (0.2489) (0.4798) (0.1202) (0.1115) (0.1982)TA TAPar CER 0.8321 0.7160 0.6559 0.5735 1.0380 1.0415 1.0476 1.0650p-val (P) (0.2998) (0.3010) (0.3010) (0.2997) (0.2075) (0.2551) (0.3551) (0.9757)Non-par CER 1.0257 0.8981 0.8783 0.9064 1.0398 1.0181 1.0425 1.0546p-val (NP) (0.2918) (0.2580) (0.1921) (0.1215) (0.1437) (0.1951) (0.1596) (0.5978)30Figure 1.3: Out-of-Sample Bond Portfolio Returns under Different RulesNotes: This figure plots the out-of-sample realizations of annual gross return of the portfolio rules depicted inprevious figure. As before, illustrated portfolios are constructed by allocating between one year risk free bondand 5-year maturity one. Different panels correspond to different bond predictors: Fama-Bliss (1987) forwardspread (FB, top left); Cochrane and Piazessi (2005) forward factor (CP, top right); Cieslak and Povala (2012)cycle factor (Cycle, middle left); Ludvigison and Ng (2009) macro factor (LN, middle right) and Goh et al(2011) technical analysis factor (TA, bottom). Within each panel, the gross return of the portfolio based onpredominant mean is plotted through a red line and serves as the benchmark. Blue and green curves denote thegross returns generated from, respectively, the parametric and nonparametric portfolio rules in conjunction withthe corresponding predictor. Risk aversion is set at 10.1980 1985 1990 1995 2000 2005 20100.70.80.911.11.21.3 Benchmark return (5−yr bond)FB parametricFB non−parametric trim1980 1985 1990 1995 2000 2005 20100.60.811.21.41.61.8 Benchmark return (5−yr bond)CP parametricCP non−parametric trim1980 1985 1990 1995 2000 2005 20100.511.522.5 Benchmark return (5−yr bond)Cycle parametricCycle non−parametric trim1980 1985 1990 1995 2000 20050.511.52 Benchmark return (5−yr bond)LN parametricLN non−parametric trim1980 1985 1990 1995 2000 2005 20100.40.60.811.21.41.61.822.22.4 Benchmark return (5−yr bond)TA parametricTA non−parametric trim31Table 1.7: Frequency of Forecast Breaks by Different PredictorsNotes: This table reports, for each maturity of risky bond in turn, the frequency of forecast break by differentreturn predictors over the out of sample periods. All forecasts are constructed with 15 years rolling window.When forecasted mean of excess return is positive, a break is defined by a period where the realized excess returnfalls below the lower 2.5 percentile of the distributional forecast. In the other case when forecasted mean excessreturn is negative, a break is defined by a period where realized excess return falls above the upper 2.5 percentileof the distributional forecast.Maturity FB CP cf LN TAn = 2-yr 0.044 0.146 0.120 0.095 0.117n = 3-yr 0.036 0.143 0.140 0.089 0.104n = 4-yr 0.036 0.135 0.140 0.089 0.101n = 5-yr 0.044 0.117 0.133 0.086 0.086this possibility, I identify failures or breaks in return forecasts by each predictorover the out-of-sample periods. Specifically, when forecasted mean excess returnis positive/negative, we define forecast break by a period when realized return dropbelow / hit above 2.5 percentile20 of the forecasted distribution.21 According toTable 1.7, we find that frequencies of such breaks exceed the theoretical expecta-tion of 2.5% for all of the predictors. For example, the failure rates by FB factorreaches 4% while that of the other predictors hit as high as 10% to 15%. Beyondthis frequency analysis, I further examine on whether these breaks are themselvespredictable by the conditioning variables considered in previous sections. In par-ticularly, I label the occurrence of breaks through a 0-1 sequence and run linearProbit regressions of this series on either the level of unemployment rate, laggedannualized volatility or lagged relative realized returns. The p-values on the sig-nificance of slope coefficients are then documented in Table 1.8. As illustrated,forecast breaks are generally hard to predict by the above conditioning variables at20We are attempting to identify ”large” losses, hence the threshold of tail distribution is chosen asa small number.21Overshooting / undershooting when forecasted mean excess return is positive / negative is notrecognized as break since portfolio benefits ex-post and is not penalized by the loss function.32Table 1.8: Predictability of Forecast BreaksNotes: This table examines the predictability of identified forecast breaks by respectively the level of unemploy-ment rate (top panel); the lagged annualized realized volatility (middle panel) and the lagged relative realizedreturn (bottom panel). Entries in brackets are the p-values of tests on slope coefficients in linear Probit regressionsof break series on the corresponding instruments with Newey-West corrections at 12 month lag. The characterson the upper right corner, ∗, † and ‡, denote respectively 1%, 5% or 10% significance level. Different rows docu-ment results for long term bonds of different maturity and each column analyzes a particular return predictor. Allforecasts are constructed by 15 years of rolling data and the breaks are defined as in the notes of previous table.FB CP cf LN TAMaturity By unemployment raten=2 (0.428) (0.366) (0.188) (0.406) (0.159)n=3 (0.183) (0.343) (0.217) (0.142) (0.450)n=4 (0.223) (0.413) (0.394) (0.073)‡ (0.532)m=5 (0.106) (0.504) (0.365) (0.040)† (0.745)Maturity By lagged volatilityn=2 (0.015)† (0.405) (0.283) (0.082)‡ (0.354)n=3 (0.006)∗ (0.527) (0.202) (0.076)‡ (0.136)n=4 (0.039)† (0.470) (0.140) (0.073)‡ (0.088)‡n=5 (0.005)∗ (0.227) (0.123) (0.058)‡ (0.069)‡Maturity By lagged relative returnn=2 (0.242) (0.087)‡ (0.431) (0.006)∗ (0.557)n=3 (0.245) (0.109) (0.382) (0.657) (0.920)n=4 (0.141) (0.092)‡ (0.557) (0.842) (0.721)n=5 (0.032)† (0.075)‡ (0.560) (0.643) (0.560)5% significance level. Exceptions are the breaks in FB driven forecasts by laggedvolatility. Yet, all these breaks are concentrated only in the beginning of the out ofsample periods.22 At 10% significance, there is weak evidence on the predictabilityof breaks in LN and CP driven forecasts by respectively the lagged volatility andrelative return. But unreported analysis suggest that switching between timing andno-timing strategies according to those predictions on breaks do not help to beatthe benchmark.2322Also, rejections in support of the break forecastability do not survive the Bonferroni and Holm-Bonferroni correction for multiple hypothesis issues.23This is consistent with the results in conditional evaluations where failures of timing are foundto be in-specific to any business cycle, lagged volatility or relative performance based regimes.33In view of the above findings, we recognize that errors in forecasting modelestimation and specification can indeed create a large welfare loss. A typical ex-ample is the cycle factor, c f , driven strategy. As the in-sample (within estimationwindow) fit by this predictor is very good in many episodes,24 leverage ratio inportfolio decision goes extreme. While such high leverage effectively captures lotsof the risk premiums when forecasted correctly, it also enormously magnify theportfolio loss in presence of forecast instability or breaks. On the other hand, theno-predictability benchmark, although mis-specified, does not create that amountof downside risk and hence is not dominated by the timing strategy. In addition,market timing in a limited data environment also renders allocation decision moresensitive to data realizations. While benchmark strategy only needs the uncondi-tional mean and volatility estimates, timing strategy requires the estimation of bothreturn predictor and a parametric or non-parametric policy. Such complexity, ofadditional parameters (for parametric policy) and different estimation procedure(for non-parametric policy), leads to higher estimation uncertainty and eventuallytranslates into extra volatility in realized returns. The same argument also appliesto explain why the less mis-specified non-parametric policy does not outperformthe parametric one in a limited data environment. Finally, the negative effects ofboth model complexity and forecast instability will intertwine, and our findings ac-tually suggest that, the resulting portfolio losses are not dominated by the benefitsof incorporating predictors, at least for the parametric and non-parametric policies.Shrinkage policyGiven above findings, it is natural to ask whether we can reduce the loss due tomodel mis-specification and estimation in exploiting predictability. Here, I makeone such attempt by compromising between our benchmark strategy and the dog-matic market timing. In particular, I adopt a shrinkage strategy, suggested inConnor (1997) and Brandt (2009) and implemented through a Bayesian predic-tive regression with informative prior on the slope coefficients. By setting the priorto be no-predictability (slope equal to zero), I effectively tame down the estimatedreturn forecast towards the unconditional mean and only partially exploit the pre-24For the c f based predictive regression, in-sample R2 is high or the variance of the unexplained /residual part is small among all predictors.34dictability. With confidence on prior expressed in terms of expected R2 in thepredictive regression, the shrinked return forecast rˆst+1 can then be derived asrˆst+1 =[1− tt + 1ρ]ˆ¯r+[tt + 1ρ]ˆβolszt , for ρ = E prior[R21−R2],where ˆ¯r is the estimated unconditional mean return, ˆβolszt is the original forecast,t is the sample size and tt+ 1ρis the shrinkage factor.25 Such representation canbe thought of as an intermediate view between the benchmark and predictive re-gression, with shrinkage factor as the relative weight. In specifying this weight, Iexamine the whole range of prior confidence, or equivalently the shrinkage factor,to see whether this strategy has potential to out-perform the benchmark.I report in Table 1.9, for each bond maturity in turn, the estimated certaintyequivalent returns of various shrinkage strategies. I consider shrinkage factorsranging from 0.1 to 0.9 with an increment of 0.1.26 As the degree of shrinkage getslarger (shrinkage factor smaller), the estimated CERs increase first and then drop.Such pattern holds for all predictors. This suggests that, when return forecasts aretamed down, the reduction in estimation and specification risks will initially bene-fit the welfare despite of a distorted return forecast. But gradually, risk reductionswould be limited and forecast distortion is too big. When compared against thebenchmark, we find that at certain (high) level of shrinkage (low value of factor)and especially for those based on TA predictor, the welfare benefit is statisticallysignificant at conventional levels.27 For example, with a 5-year maturity bond, theTA driven strategy with 50% of shrinkage generates almost 100 basis point gain incertainty equivalent which is significant at 5% confidence level.28 This indicates25For multivariate predictive regression, each of the slope coefficient will be shrinked accordingto its marginal degree of predictability, i.e[tt+ 1ρ j], where ρ j = E[R2j1−R2]and R2j the marginal coef-ficient of determination by variable j. See Connor (1997) and Brandt (2009) for more detail.26A value of 0 or 1 corresponds to the two extreme cases of benchmark and dogmatic markettiming.27However, with a total number of 45 hypothesis tested simultaneously, the rejection of null doesnot survive the Bonferroni correction to control for familywise type I error.28Showing that certain choices of shrinkage factor lead to portfolio out-performance using onlyone realized data is in fact data mining. A reality check in the spirit of White (2000) may be done, butinference needs to comply with the framework of our limited data utility criteria forecast evaluation.35Table 1.9: Unconditional Evaluation of Shrinkage StrategiesNotes: This table reports, for each maturity of risky bond in turn, the evaluation results of alternative portfolioshrinkage strategies implemented through bayesian predictive regressions with random walk prior. Entries arethe estimated certainty equivalent returns (CER) of different strategies varying from both the predictor used(each column) and the shrinkage factor (each row). The characters on the upper right corner, ∗, † and ‡, denoterespectively the null hypothesis that no-predictability benchmark strategy achieves the same expected utility asthe corresponding shrinkage rule is rejected at 1%, 5% or 10% significance level. All portfolios are constructedby 15 years rolling window estimate and investor’s risk aversion is set at 10.FB CP cf LN TA FB CP cf LN TAMaturity n=2 Maturity n=3shrink Bench CER: 1.0661 Bench CER: 1.06690.1 1.0666 1.0669 1.0684‡ 1.0677 1.0673 1.0677‡ 1.0680 1.0697‡ 1.0671 1.0688†0.2 1.0670 1.0672 1.0700 1.0686 1.0682 1.0684 1.0685 1.0714 1.0677 1.0703‡0.3 1.0673 1.0670 1.0705 1.0691 1.0685 1.0689 1.0685 1.0718 1.0679 1.07120.4 1.0676 1.0661 1.0693 1.0690 1.0679 1.0693 1.0679 1.0697 1.0677 1.07110.5 1.0678 1.0645 1.0651 1.0679 1.0655 1.0696 1.0665 1.0622 1.0669 1.06930.6 1.0679 1.0618 1.0538 1.0650 1.0590 1.0697 1.0642 1.0388 1.0653 1.06370.7 1.0679 1.0580 1.0244 1.0590 1.0414 1.0697 1.0610 0.9683 1.0626 1.04810.8 1.0679 1.0531 0.9548 1.0469 0.9932 1.0696 1.0567 0.8175 1.0582 1.00600.9 1.0678 1.0471 0.8350 1.0246 0.8918 1.0694 1.0514 0.6235 1.0515 0.9166FB CP cf LN TA FB CP cf LN TAMaturity n=4 Maturity n=5shrink Bench CER: 1.0678 Bench CER: 1.06660.1 1.0688‡ 1.0691 1.0713† 1.0674 1.0703† 1.0668 1.0676 1.0703† 1.0661 1.0695∗0.2 1.0695‡ 1.0696 1.0736 1.0679 1.0724† 1.0669 1.0679 1.0727† 1.0665 1.0720∗0.3 1.0701 1.0696 1.0744 1.0681 1.0739† 1.0668 1.0675 1.0735 1.0667 1.0740∗0.4 1.0705 1.0688 1.0725 1.0679 1.0746‡ 1.0665 1.0664 1.0715 1.0665 1.0753†0.5 1.0707 1.0672 1.0647 1.0673 1.0741 1.0661 1.0642 1.0637 1.0658 1.0755†0.6 1.0707 1.0644 1.0397 1.0660 1.0713 1.0655 1.0608 1.0409 1.0645 1.07350.7 1.0705 1.0604 0.9633 1.0639 1.0637 1.0646 1.0559 0.9761 1.0624 1.06650.8 1.0700 1.0550 0.8020 1.0607 1.0447 1.0634 1.0492 0.8323 1.0594 1.04700.9 1.0693 1.0483 0.5991 1.0561 1.0030 1.0619 1.0405 0.6335 1.0549 0.9989that, eventually the investor can partially exploit the return predictability withoutbeing completely offset by the associated estimation and mis-specification risks.The results become stronger when we conduct the same analysis using pre-crisisdata only. As shown in Table 1.10, now cycle factor c f with a shrinkage of 0.1to 0.4 also leads to significant CER benefit at the magnitude of 50 to 130 basispoints.29 In addition, and as will be shown in the robustness section below, the29However, as before, rejections of equal performance under multiple hypothesis do not surviveBonferoni correction.36Table 1.10: Unconditional Evaluation of Shrinkage Strategies (Pre-crisisdata)Notes: This table repeats the same analysis in Table 7 using pre-crisis data. Entries are the estimated certaintyequivalent returns (CER) of different strategies varying from both the predictor used (each column) and theshrinkage factor (each row). The characters on the upper right corner, ∗, † and ‡, denote respectively the nullhypothesis that no-predictability benchmark strategy achieves the same expected utility as the correspondingshrinkage rule is rejected at 1%, 5% or 10% significance level. All portfolios are constructed by 15 years rollingwindow estimate and investor’s risk aversion is set at 10.FB CP cf LN TA FB CP cf LN TAMaturity n=2 Maturity n=3shrink Bench CER: 1.0665 Bench CER: 1.06630.1 1.0672 1.0682 1.0703∗ 1.0677 1.0677 1.0673‡ 1.0682‡ 1.0706∗ 1.0671 1.0681‡0.2 1.0677 1.0694 1.0734∗ 1.0686 1.0685 1.0681‡ 1.0696 1.0741∗ 1.0677 1.06940.3 1.0683 1.0702 1.0757† 1.0691 1.0686 1.0688‡ 1.0706 1.0764† 1.0679 1.07020.4 1.0687 1.0705 1.0764‡ 1.0690 1.0677 1.0693 1.0711 1.0764 1.0677 1.06990.5 1.0690 1.0700 1.0739 1.0679 1.0649 1.0697 1.0710 1.0704 1.0669 1.06770.6 1.0693 1.0686 1.0631 1.0650 1.0575 1.0700 1.0701 1.0454 1.0653 1.06130.7 1.0695 1.0662 1.0304 1.0590 1.0378 1.0701 1.0684 0.9663 1.0626 1.04390.8 1.0695 1.0626 0.9524 1.0469 0.9857 1.0701 1.0657 0.8081 1.0582 0.99860.9 1.0695 1.0580 0.8262 1.0246 0.8810 1.0699 1.0622 0.6145 1.0515 0.9060FB CP cf LN TA FB CP cf LN TAMaturity n=4 Maturity n=5shrink Bench CER: 1.0667 Bench CER: 1.06540.1 1.0677 1.0686‡ 1.0716∗ 1.0674 1.0690† 1.0655 1.0673‡ 1.0706∗ 1.0661 1.0681∗0.2 1.0686 1.0701‡ 1.0757∗ 1.0679 1.0710† 1.0655 1.0686 1.0751∗ 1.0665 1.0705∗0.3 1.0692 1.0711 1.0786∗ 1.0681 1.0724‡ 1.0653 1.0695 1.0785∗ 1.0667 1.0724†0.4 1.0697 1.0715 1.0792‡ 1.0679 1.0730 1.0650 1.0698 1.0799† 1.0665 1.0736†0.5 1.0699 1.0712 1.0736 1.0673 1.0723 1.0644 1.0693 1.0761 1.0658 1.0737‡0.6 1.0699 1.0701 1.0478 1.0660 1.0691 1.0637 1.0678 1.0559 1.0645 1.07140.7 1.0696 1.0680 0.9624 1.0639 1.0607 1.0626 1.0650 0.9846 1.0624 1.06370.8 1.0691 1.0649 0.7929 1.0607 1.0399 1.0613 1.0606 0.8274 1.0594 1.04240.9 1.0684 1.0606 0.5905 1.0561 0.9953 1.0596 1.0544 0.6251 1.0549 0.9911range of desirable shrinkage that results in utility improvement is not risk aversionspecific. One explanation is that shrinkage here only measures the conservative-ness on forecasted return distribution. It is not directly affected by the risk averselevel, which will be accounted for in the portfolio decision stage.1.3.4 Other strategiesIn this subsection, I check several other portfolio strategies that have been studiedin the asset allocation literature to see whether they can effectively extract informa-37tion in the return predictors. I focus only on the unconditional evaluation as it turnsout that relative performance is not specific to any economic regime.Volatility timingI first examine the role of volatility timing. So far, our parametric policy has reliedon a distributional forecast with only time varying conditional mean.30 But condi-tional volatility is also fluctuating in a predictable way. Motivated by the stylizedfacts of volatility clustering, I follow Campbell and Thompson (2008) to estimateits dynamics through a rolling window sample variance estimator applied to recentsubset of available data.31 Specifically, I estimate conditional volatility using either3 or 5-year window of past realized returns. A shorter window length for volatilityestimation may reflect the dynamics more timely. Yet, the associated estimationuncertainty and its negative effect would be increased. I compute the rolling win-dow allocation decisions of modified parametric strategies with both (3 and 5 year)volatility timing schemes. The estimated strategies performance and inference re-sults are documented in Table 1.11, for each predictor and risky bond’s maturity.Comparing with entries in Table 1.2, rows “Para CER”, we find that adding volatil-ity timing reduce the welfare estimates when the strategies are based on FB, CP,and LN predictors. However, under cycle factor and a 5 year window for volatilityestimation, estimated CERs are increased relative to the corresponding parametricstrategy without volatility timing. Under technical analysis factor TA, evidenceare mixed in that volatility timing generate marginal benefit only when the riskybond’s maturity is long (n = 4, or n = 5). Yet, it hurts expected utility in othertwo cases (n = 2, or n = 3). As for comparison between rows “CER V3” and rows“CER V5”, we see that the shorter (3 year) window length for volatility estimation(i.e., CER V3) lowers welfare estimates by a significant amount. Exceptions areagain the TA based strategies with n = 4 or n = 5, where estimated CER using3-year volatility window are slightly higher. Finally, when comparing against thebenchmark, non of the volatility timing strategy beat it, suggesting that portfolio30Our predictive regression is assumed to be homoscedastic.31As additional checks, I also examine some more heavily parameterized models of volatility dy-namics. I use exponentially decaying weighted sample variance or a Garch type volatility modeling.The estimated welfare are even worse than the simpler timing strategy considered here and are thusomitted.38Table 1.11: Volatility TimingNotes: This table reports, for each predictor in turn, the evaluation results of alternative parametric strategies withvolatility timing. The volatility forecasts are based on either a 3-year or 5-year moving window of the past samplevariance. The rows labeled ”CER V3” and ”CER V5” denote, respectively, the point estimates of the certaintyequivalent gross return of the parametric rules based on a 3-year and 5-year window volatility estimate. And therows labeled ”p-value” report the p-values of an unconditional test on the null that no-predictability benchmarkstrategy achieves the same expected utility as the corresponding mean and volatility timing rule. Different columncorresponds to a different maturity of the long term bond considered. All portfolios are constructed by 15 yearsrolling window estimate and investor’s risk aversion is set at 10.Whole sample: 1964/01 : 2011/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0661 1.0669 1.0678 1.0666FBCER V3 0.6781 0.8545 0.5643 0.0194p-val (0.2777) (0.2683) (0.3029) (0.3055)CER V5 1.0584 1.0420 0.9709 1.0019p-val (0.5772) (0.4139) (0.3115) (0.2715)CPCER V3 0.0171 0.0171 0.0171 0.0162p-val (0.2927) (0.2927) (0.2339) (0.2332)CER V5 0.8157 0.6487 0.4999 0.0194p-val (0.1680) (0.2973) (0.3038) (0.3055)cfCER V3 0.3541 0.5403 0.0293 0.0194p-val (0.3023) (0.2718) (0.3055) (0.3055)CER V5 0.9377 0.8244 0.6003 0.5128p-val (0.0914) (0.2552) (0.2944) (0.1817)LNCER V3 0.0396 0.0191 0.0191 0.0191p-val (0.3038) (0.3038) (0.3038) (0.3038)CER V5 0.6544 0.3949 0.2528 0.3025p-val (0.2794) (0.3036) (0.3038) (0.3037)TACER V3 0.5983 0.7495 0.8918 0.8799p-val (0.1661) (0.2479) (0.0946) (0.2311)CER V5 0.5092 0.6823 0.8878 0.8794p-val (0.3040) (0.3027) (0.2944) (0.2991)39losses due to potential mis-specification in mean and volatility dynamics as well asestimation uncertainty is not dominated by the benefit of market timing.Multiple factors predictionIn the second exercise, I investigate whether combining predictors based on differ-ent sources of information, such as yield curve, macro fundamental and technicalanalysis, can lead to superior allocation decision. In particular, I generate returnforecasts by running multivariate predictive regressions which include identifiedpredictors of different categories simultaneously. I document in Table 1.12 andTable 1.13 the evaluation results of parametric strategies based on a list of possiblecombinations of identified factors.32 According to the rows “CER”, we find thatmultiple factors prediction does not improve asset allocation. In fact, when com-pared against rows “Para CER” in Table 1.2, we see that it even impairs welfarerelative to those using individual factor. For example, CP factor alone generateestimated CER above 0.91 and those of TA based strategy fall between 0.88 and0.60, but combining TA with CP creates CER estimates (panel CP+ TA) lowerthan either of the univariate strategy. Also, a tri-variate predictive regression (suchas CP+LN +TA) further deteriorates the estimated strategy performance relativeto any of the bivariate ones (such as LN + TA). To understand these results, Iattribute the unfavorable performance as following. While adding predictor in dis-tinct category may provide complementary information, it intensifies forecastingmodel complexity and turns out worsen the effect of instability and estimation un-certainty. In a limited data environment, such extra risk again is not outweighed bythe gain of adding information.Fixed weight strategiesDeMiguel et al. (2009) reports that, in equity asset allocation, an 1/N naive diver-sification among all individual assets outperform a variety of sample based mean-variance rules in an out-of-sample environment. Motivated by their findings, Iinvestigate a similar strategy in bond market that diversifies between the 1-yr short32I do not combine factors within the same category of yield curve, as these factors are nested toeach other and based on the same raw information.40Table 1.12: Multiple Factors PredictionNotes: This table reports the evaluation results of alternative parametric strategies based on return forecastsgenerated by multiple factors predictive regressions. Here, a variety of combinations between technical analysisfactor TA and identified predictors in yield curve categories are considered. The rows labeled ”CER” denotethe point estimates of the certainty equivalent gross return and the rows labeled ”p-value” report the p-values ofan unconditional test on the null that no-predictability benchmark strategy achieves the same expected utility asthe corresponding timing rule. The top panel reports the whole sample results and the pre-crisis (till Dec 2007)ones are illustrated in the bottom. Different column corresponds to a different maturity of the long term bondconsidered. All portfolios are constructed by 15 years rolling window estimate and investor’s risk aversion is setat 10.Whole sample: 1964/01 : 2011/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0661 1.0669 1.0678 1.0666FB + TACER 0.8822 0.6826 0.6894 0.7434p-val (0.2935) (0.3036) (0.3013) (0.2926)CP + TACER 0.8278 0.5989 0.4689 0.3786p-val (0.2817) (0.3026) (0.3026) (0.3024)cf + TACER 0.1883 0.0194 0.0179 0.0171p-val (0.3055) (0.3055) (0.2984) (0.2927)Pre-crisis sample: 1964/01 : 2007/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0665 1.0663 1.0667 1.0654FB + TACER 0.8296 0.8041 0.6866 0.5612p-val (0.2925) (0.2967) (0.3005) (0.3006)CP + TACER 0.8186 0.5902 0.4621 0.3730p-val (0.2904) (0.3016) (0.3010) (0.3007)cf + TACER 0.1856 0.0191 0.0177 0.0169p-val (0.3038) (0.3038) (0.2966) (0.2908)41Table 1.13: Multiple Factors Prediction (continued)Notes: This table reports the evaluation results of alternative parametric strategies based on return forecastsgenerated by multiple factors predictive regressions. Here, a variety of combinations between macro factor LN ormacro plus technical indicator LN +TA and identified predictors in yield curve categories are considered. Noticethat macro factor LN are available only until Dec.2007. The rows labeled ”CER” denote the point estimates ofthe certainty equivalent gross return and the rows labeled ”p-value” report the p-values of an unconditional test onthe null that no-predictability benchmark strategy achieves the same expected utility as the corresponding timingrule. Different column corresponds to a different maturity of the long term bond considered. All portfolios areconstructed by 15 years rolling window estimate and investor’s risk aversion is set at 10.Pre-crisis sample: 1964/01 : 2007/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0665 1.0663 1.0667 1.0654FB + LNCER 0.8909 0.8582 0.8520 0.8270p-val (0.2827) (0.2467) (0.2158) (0.2397)CP + LNCER 0.7484 0.5346 0.4528 0.4307p-val (0.3003) (0.3035) (0.3037) (0.3034)cf + LNCER 0.0217 0.0191 0.0191 0.0191p-val (0.3038) (0.3038) (0.3038) (0.3038)TA + LNCER 0.7101 0.5741 0.5118 0.4136p-val (0.3024) (0.3035) (0.3035) (0.3036)FB + TA +LNCER 0.4963 0.5882 0.5704 0.4437p-val (0.3036) (0.3030) (0.3029) (0.3034)CP + TA +LNCER 0.6627 0.4015 0.2722 0.1731p-val (0.3019) (0.3036) (0.3037) (0.3038)cf + TA +LNCER 0.1209 0.0191 0.0177 0.0169p-val (0.3038) (0.3038) (0.2966) (0.2908)42Table 1.14: Fixed Weight StrategiesNotes: This table reports the evaluation results of alternative fixed weight strategies. The list of fixed weightconsidered (on the long term bond) range from 0.25 to 1.25 with an increment of 0.25. The rows labeled ”CER”denote the point estimates of the certainty equivalent gross return and the rows labeled ”p-value” report the p-values of an unconditional test on the null that no-predictability benchmark strategy achieves the same expectedutility as the corresponding fixed weight rule. Different column corresponds to a different maturity of the longterm bond considered. All portfolios are constructed by 15 years rolling window estimate and investor’s riskaversion is set at 10.Whole sample: 1964/01 : 2011/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0661 1.0669 1.0678 1.0666αt=0.25CER 1.0576 1.0592 1.0607 1.0613p-val (0.2814) (0.2784) (0.2943) (0.4160)αt=0.5CER 1.0595 1.0623 1.0644 1.0650p-val (0.3824) (0.4850) (0.6029) (0.8068)αt=0.75CER 1.0612 1.0647 1.0668 1.0665p-val (0.5017) (0.7244) (0.8783) (0.9847)αt=1CER 1.0628 1.0663 1.0676 1.0655p-val (0.6341) (0.9230) (0.9776) (0.9068)αt=1.25CER 1.0641 1.0671 1.0668 1.0618p-val (0.7710) (0.9758) (0.9128) (0.7065)term bond and the n-yr risky one using fixed portfolio weight.33 But rather thanputting a 50%-50% equal weight on each asset, we consider a list of weights (onthe risky bond) ranging from 0.25 to 1.25 with an increment of 0.25.34 Accordingto Table 1.14, we find that these fixed weight diversification and leverage strategies33I thank Prof. Garlappi for suggesting this additional check.34Strictly speaking, I have expanded pure diversification (wt < 1) to include asset selection (wt ≡ 1)and fixed leveraging (wt ≡ 1.25). But the common feature is that portfolio weights are constant overtime.43do not significantly outperform the benchmark. This suggest that such attempts toeliminate estimation error by completely ignoring the return forecast do not gener-ate enough welfare gain. One may argue that naive diversification works for equitymarket since equity risk premium are large and always positive, while bond riskpremium can switch sign at different state. As a response, I check a conditionalfixed weight strategy that depends on the sign of directional forecast. For example,I put 0.75 weight on the long term bond when forecasted risk premium is positiveand short -0.25 otherwise.35 We summarize some of the results in Table 1.15. Asbefore, the benchmark is still not beat by any conditional fixed weight rule basedon any predictor.Estimation window averagingIn previous sections, our portfolio strategies have relied on the use of all availabledata in the information set φt . But in presence of potential break in forecast relationwithin this estimation window, it may or may not be optimal to use the whole 15years length.36 The reason is that, when the size of break is small, adding pre-break data may reduce forecast error variance. However, when the size is big, itis the effect of bias that dominates. In addition, the estimation of time and sizeof a break with limited data is usually subject to considerable uncertainty. Toalleviate these concerns, I borrow the idea in Pesaran and Timmermann (2007),which combines return distribution forecasts based on predictive regressions withestimation window of different length. In particular, rather than selecting a singleestimation window, I poor three return forecasts based respectively on 5 years; 10years; and 15 years data with equal weight. More specifically, I generate one set ofsimulated returns from all three estimated distributional forecasts (by a particularpredictor) and then solve for the portfolio decision numerically. I also poor overdifferent predictors as an additional model averaging check. Table 1.16 documentsthe evaluation results of such window averaging strategies. Comparing with Table1.2, pooling over windows improves the estimated CERs for most of the predictors35I check a bunch of other values for conditional fixed weights. The results are qualitatively thesame and therefore omitted.36Previously, we have focused only on the negative effect of instability during the forecastinghorizon. In contrast, attention here is on instability within estimation window.44Table 1.15: Conditional Fixed Weight StrategiesNotes: This table reports, for each predictor in turn, the evaluation results of alternative conditional fixed weightstrategies. The fixed weight considered (on the long term bond) equal to 0.75 when the forecasted future expectedreturn is positive and equal to -0.25 otherwise. The rows labeled ”CER” denote the point estimates of the certaintyequivalent gross return and the rows labeled ”p-value” report the p-values of an unconditional test on the null thatno-predictability benchmark strategy achieves the same expected utility as the corresponding conditional fixedweight rule. Different column corresponds to a different maturity of the long term bond considered. All portfoliosare constructed by 15 years rolling window estimate and investor’s risk aversion is set at 10.Whole sample: 1964/01 : 2011/12w = 0.75 | −0.25 if Et(rt+1)> | < 0Maturity2-year 3-year 4-year 5-yearBench CER 1.0661 1.0669 1.0678 1.0666FBCER 1.0612 1.0651 1.0680 1.0667p-val (0.4934) (0.7578) (0.9811) (0.9914)CPCER 1.0579 1.0596 1.0610 1.0614p-val (0.3254) (0.3611) (0.4043) (0.5223)cfCER 1.0601 1.0633 1.0661 1.0672p-val (0.4563) (0.6420) (0.8339) (0.9474)LNCER 1.0658 1.0673 1.0678 1.0665p-val (0.9225) (0.8784) (0.8473) (0.8649)TACER 1.0604 1.0640 1.0668 1.0680p-val (0.4560) (0.6705) (0.8759) (0.8391)45Table 1.16: Estimation Window AveragingNotes: This table reports, for each predictor in turn, the evaluation results of alternative parametric strategiespooling different lengths of estimation window. An equal pooling weights on 5 year, 10 year and 15 year sub-window is entertained here. The sub-panel ”Model average” pools over different predictors as well. The rowslabeled ”CER” denote the point estimates of the certainty equivalent gross return and the rows labeled ”p-value”report the p-values of an unconditional test on the null that no-predictability benchmark strategy achieves thesame expected utility as the corresponding averaging rule. Different column corresponds to a different maturityof the long term bond considered and investor’s risk aversion is set at 10.Whole sample: 1964/01 : 2011/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0661 1.0669 1.0678 1.0666FBCER 1.0631 1.0637 1.0619 1.0485p-val (0.5378) (0.5460) (0.4518) (0.2812)CPCER 1.0258 1.0280 1.0246 1.0023p-val (0.0582) (0.0542) (0.0543) (0.0863)cfCER 1.0004 0.9205 0.8382 0.8567p-val (0.2138) (0.2653) (0.2775) (0.2294)LNCER 1.0559 1.0522 1.0509 1.0491p-val (0.3939) (0.2511) (0.2115) (0.2191)TACER 0.7378 0.8572 0.9265 0.8967p-val (0.3034) (0.3023) (0.3038) (0.3019)Model averageCER 1.0615 1.0630 1.0670 1.0629p-val (0.6207) (0.6862) (0.9324) (0.7383)46with the exception of FB and TA at short term maturities (n = 2,3). But relativeto the benchmark, it still fails to outperform. Finally, pooling over predictors doesnot help either (sub-panel “Model average”).1.4 Robustness CheckThis part of the paper checks the robustness of our baseline evaluation results insection 3.3 with respect to: (1) different level of relative risk aversion γ ; and (2)different size of estimation window.37 The first exercise involves the considerationof parametric and non-parametric strategies at two alternative risk aversion coef-ficients γ = 5 and γ = 15. As shown in Table 1.17, in both cases, the results arequalitatively similar to the ones in Table 1.2, so that none of the timing strategiesdiffers from the benchmark significantly. And noteworthy, when risk aversion islow, i.e., γ = 5, some of the entries on CER estimates, especially for c f , LN andTA, are close to zero. This is due to the fact that leverage ratio of less risk averseinvestor would go up sharply. Then under forecasting model instability, such highleverage leads to ex-post bankruptcy at some states and kills the correspondingstrategy.38The second test focuses on the shrinkage strategies with risk aversion rangingfrom γ = 5,10,15 to 20. I report only the estimated CERs and their significanceagainst the benchmark when the maturity of long term bond equals 5 years. Asillustrated in Table 1.18, entries in different panels, which correspond to differentγ s, exhibit similar pattern. Just as our baseline results, when shrinkage factor getssmaller gradually, all estimated CERs initially go up and then slightly drop. Inaddition, the range of shrinkage that leads to a significant utility benefit remainsthe same across γ , especially for TA based strategies.39 This indicates that the roleof shrinkage is not specific to any particular choice of risk aversion.The third robustness test considers changing the size of information set. Specif-37To save space, we report only the unconditional evaluation results, while those of the conditionalones are available upon request.38Since CRRA utility is not defined on negative payoff, we truncate the loss at a gross return of0.01 (close to bankruptcy). This explains why some of the entries on CER estimates, especially inthe left panel when γ = 5, are close to zero.39The range of shrinkage in which cycle factor c f based strategies outperform the benchmark arealso approximately the same at least up to γ = 15.47Table 1.17: Parametric and Non-parametric Strategies at Different Risk Aver-sionNotes: This table reports, for each predictor in turn, the evaluation results of alternative bond market timingstrategies when investor’s risk aversion level γ equals 5 or 15. The rows labeled ”Para CER” and ”Non-paraCER” denote, respectively, the point estimates of the certainty equivalent gross return of the parametric and non-parametric allocation rules based on a particular predictor. And the rows labeled ”p-value (P)”, ”p-value (NP)”report the p-values of a test on the null that no-predictability benchmark strategy achieves the same expectedutility as the corresponding timing rule conditional on each regime. Rows labeled ”p-value (P vs NP)” recordtesting results that compare parametric against non-parametric rules. Different column corresponds to a differentmaturity of the long term bond considered. All portfolios are constructed by 15 years rolling window estimates.Risk aversion γ = 5 γ = 15Maturity Maturity2 3 4 5 2 3 4 5Bench CER 1.0772 1.0775 1.0784 1.0755 1.0605 1.0615 1.0624 1.0617FB FBPar CER 1.0811 1.0834 1.0670 0.7907 1.0611 1.0631 1.0638 1.0578p-val(P) (0.6424) (0.5429) (0.6171) (0.3075) (0.7947) (0.5668) (0.7123) (0.5262)Non-par CER 1.0774 1.0782 1.0072 1.0433 1.0598 1.0547 1.0538 1.0555p-val(NP) (0.9819) (0.9206) (0.2824) (0.0829) (0.7562) (0.3928) (0.1998) (0.0914)p-value (P vs NP) (0.5902) (0.6281) (0.2608) (0.3250) (0.6610) (0.3389) (0.1459) (0.7060)CP CPPar CER 0.4801 0.3892 0.2344 0.0443 1.0024 1.0020 0.9935 0.9786p-val (P) (0.2956) (0.3016) (0.3047) (0.3055) (0.1621) (0.1725) (0.1845) (0.2247)Non-par CER 0.8368 0.7813 0.9518 0.9980 1.0262 1.0241 1.0379 1.0430p-val (NP) (0.2783) (0.2777) (0.1918) (0.1379) (0.2284) (0.2083) (0.1371) (0.1128)p-value (P vs NP) (0.3333) (0.3252) (0.3054) (0.3055) (0.4638) (0.5324) (0.3060) (0.2833)cf cfPar CER 0.0372 0.0337 0.0331 0.0296 0.9152 0.7700 0.7115 0.6723p-val (P) (0.2984) (0.2927) (0.2907) (0.1922) (0.2647) (0.2976) (0.2962) (0.2930)Non-par CER 0.0372 0.0336 0.0337 0.0337 0.6304 0.4043 0.3579 0.4423p-val (NP) (0.2984) (0.2926) (0.2927) (0.2927) (0.3025) (0.3050) (0.3040) (0.2999)p-value (P vs NP) (0.3928) (0.3059) (0.3055) (0.1371) (0.3027) (0.3050) (0.3040) (0.2999)LN LNPar CER 0.0428 0.0428 0.0428 0.0428 0.9480 0.9654 0.9776 0.9677p-val (P) (0.3038) (0.3038) (0.3038) (0.3038) (0.2919) (0.2758) (0.2606) (0.2796)Non-par CER 1.0176 0.9351 0.7614 0.5044 1.0545 1.0494 1.0439 1.0401p-val (NP) (0.2758) (0.2497) (0.2894) (0.3025) (0.2300) (0.1274) (0.1554) (0.2013)p-value (P vs NP) (0.3038) (0.3038) (0.3038) (0.3038) (0.3048) (0.3072) (0.3284) (0.3495)TA TAPar CER 0.0443 0.0443 0.0371 0.0313 0.9870 0.9383 0.9063 0.8599p-val (P) (0.3055) (0.3055) (0.2979) (0.2873) (0.2701) (0.2931) (0.2974) (0.2972)Non-par CER 0.2048 0.0443 0.0154 0.0844 1.0465 1.0057 0.9989 0.9898p-val (NP) (0.3054) (0.3055) (0.3055) (0.3054) (0.2606) (0.2054) (0.1716) (0.1636)p-value (P vs NP) (0.3055) (0.2948) (0.3060) (0.2971) (0.2789) (0.3218) (0.3249) (0.3310)48Table 1.18: Shrinkage Strategies at Different Risk AversionNotes: This table repeats the analysis in Table 1.9 using different risk aversion levels of γ = 5;10;15 and 20.Reported are the evaluation results of alternative portfolio shrinkage strategies, for maturity of risky bond equalto 5 year, implemented through bayesian predictive regressions with random walk prior. Entries are the estimatedcertainty equivalent returns (CER) of different strategies varying from both the predictor used (each column) andthe shrinkage factor (each row). The characters on the upper right corner, ∗, † and ‡, denote respectively thenull hypothesis that no-predictability benchmark strategy achieves the same expected utility as the correspondingshrinkage rule is rejected at 1%, 5% or 10% significance level. All portfolios are constructed by 15 years rollingwindow estimate.FB CP cf LN TA FB CP cf LN TAγ = 5 γ = 10shrink Bench CER: 1.0755 Bench CER: 1.06660.1 1.0755 1.0782 1.0853† 1.0699 1.0819∗ 1.0668 1.0676 1.0703† 1.0661 1.0695∗0.2 1.0751 1.0795 1.0920‡ 1.0711 1.0876∗ 1.0669 1.0679 1.0727‡ 1.0665 1.0720∗0.3 1.0742 1.0793 1.0951 1.0717 1.0922∗ 1.0668 1.0675 1.0735 1.0667 1.0740∗0.4 1.0727 1.0773 1.0914 1.0715 1.0952† 1.0665 1.0664 1.0715 1.0665 1.0753†0.5 1.0703 1.0731 1.0695 1.0701 1.0952‡ 1.0661 1.0642 1.0637 1.0658 1.0755†0.6 1.0668 1.0661 0.9558 1.0670 1.0888 1.0655 1.0608 1.0409 1.0645 1.07350.7 1.0615 1.0551 0.3025 1.0614 1.0609 1.0646 1.0559 0.9761 1.0624 1.06650.8 1.0536 1.0385 0.0364 1.0512 0.9260 1.0634 1.0492 0.8323 1.0594 1.04700.9 1.0414 1.0137 0.0337 1.0319 0.3880 1.0619 1.0405 0.6335 1.0549 0.9989FB CP cf LN TA FB CP cf LN TAγ = 15 γ = 20shrink Bench CER: 1.0617 Bench CER: 1.05800.1 1.0619 1.0621 1.0635‡ 1.0631 1.0635∗ 1.0582 1.0582 1.0590 1.0604 1.0592∗0.2 1.0621 1.0622 1.0646 1.0633 1.0650∗ 1.0583 1.0581 1.0595 1.0605 1.0604∗0.3 1.0621 1.0618 1.0648 1.0633 1.0663∗ 1.0585 1.0578 1.0594 1.0605 1.0613∗0.4 1.0621 1.0610 1.0634 1.0631 1.0671∗ 1.0585 1.0571 1.0582 1.0603 1.0619∗0.5 1.0620 1.0595 1.0587 1.0627 1.0673† 1.0585 1.0559 1.0550 1.0600 1.0622†0.6 1.0618 1.0572 1.0470 1.0619 1.0664 1.0585 1.0543 1.0475 1.0594 1.06170.7 1.0616 1.0541 1.0185 1.0607 1.0632 1.0584 1.0520 1.0308 1.0586 1.06000.8 1.0612 1.0499 0.9586 1.0589 1.0549 1.0583 1.0490 0.9968 1.0575 1.05560.9 1.0607 1.0447 0.8683 1.0566 1.0359 1.0581 1.0453 0.9435 1.0560 1.0457ically, I set the length of limited data available to investor at 20 years. Accordingto Table 1.19, evaluation results are still qualitatively similar to Table 1.2, so thatalmost all of competing forecast fail to beat the benchmark. The only exception isthe technical indicator TA based parametric timing strategy operated with a 5-yearbond. But the significance of welfare improvement merely cross the 5% threshold.To summarize, our conclusion that it is generally hard to exploit bond return pre-dictability in a limited data environment is not sensitive to the level of risk aversion49Table 1.19: A Longer Length of Information Set / Limited DataNotes: This table reports, for each predictor in turn, the unconditional evaluation results of alternative bondmarket timing strategies when the information set / available data is of length 20 years. The rows labeled ”ParaCER” and ”Non-para CER” denote, respectively, the point estimates of the certainty equivalent gross return ofthe parametric and non-parametric allocation rules based on a particular predictor. And the rows labeled ”p-value(P)”, ”p-value (NP)” report the p-values of an unconditional test on the null that no-predictability benchmarkstrategy achieves the same expected utility as the corresponding timing rule. Rows labeled ”p-value (P vs NP)”record testing results that compare parametric against non-parametric rules. Different column corresponds to adifferent maturity of the long term bond considered. Investor’s risk aversion is set at 10.Whole sample: 1964/01 : 2011/12Maturity2-year 3-year 4-year 5-yearBench CER 1.0601 1.0614 1.0619 1.0607FBPara CER 1.0635 1.0662 1.0688 1.0649p-value (P) (0.4002) (0.3221) (0.2636) (0.1792)Non-para CER 1.0633 1.0631 1.0621 1.0608p-value (NP) (0.3826) (0.4904) (0.9500) (0.9730)p-value (P vs NP) (0.9508) (0.4148) (0.1062) (0.0779)CPPara CER 1.0267 1.0284 1.0266 1.0253p-value (P) (0.1362) (0.1517) (0.1867) (0.2283)Non-para CER 1.0549 1.0560 1.0578 1.0576p-value (NP) (0.4579) (0.4265) (0.5388) (0.6313)p-value (P vs NP) (0.1049) (0.1428) (0.1775) (0.2215)cfPara CER 1.0413 1.0325 1.0142 0.9514p-value (P) (0.4856) (0.4164) (0.3695) (0.3192)Non-para CER 1.0654 1.0639 1.0589 1.0500p-value (NP) (0.5654) (0.8365) (0.8641) (0.6640)p-value (P vs NP) (0.2557) (0.2623) (0.2889) (0.3042)LNParametric CER 1.0557 1.0576 1.0586 1.0586p-value (P) (0.7792) (0.8847) (0.9858) (0.8836)Non-para CER 1.0308 1.0015 0.8434 0.8211p-value (NP) (0.3262) (0.2951) (0.3039) (0.3045)p-value (P vs NP) (0.2291) (0.2635) (0.3013) (0.3022)TAPara CER 1.0634 1.0692 1.0729 1.0762p-value (P) (0.6799) (0.3416) (0.1820) (0.0475)Non-para CER 1.0598 1.0604 1.0610 1.0609p-value (NP) (0.9668) (0.8989) (0.9010) (0.9801)p-value (P vs NP) (0.3234) (0.0478) (0.0315) (0.0267)50and size of information set.1.5 Concluding RemarksIn this chapter, I adopt a hypothesis testing approach to assess the portfolio value ofa variety of identified bond return forecasts in timing the bond market. I emphasizeon the practical usefulness of return predictors in a limited data environment whereforecast relations can merely be estimated. I consider allocation rules that varyfrom not only the return predictors but also the policy functions. I evaluate theirperformances relative to that of a simple no-predictability benchmark strategy onboth unconditional and conditional bases. While the unconditional assessmentsask whether return predictor is valuable on average, the conditional ones allow forperformance heterogeneity and gauged their relative performance conditional ondifferent economic regimes. The estimation of performance measure relied on anout-of-sample portfolio construction exercise and the inference procedure built onthe forecast evaluation literature in a structural way.Empirically, using monthly US data, I find that major return predictors identi-fied based on either yield curve, macro-fundamental or technical analysis indica-tors, coupled with parametric or non-parametric strategies, fail to outperform thebenchmark rule. This suggest that welfare loss due to estimation uncertainty andforecasting model instability outweigh the benefit of incorporating return predic-tors. Conditional tests indicate that the failure of market timing is not specific toany economic regimes of boom or recession, market turbulence and lagged rela-tive performance. On the other hand, a shrinkage strategy implemented throughBayesian predictive regression, combined with random walk prior, manage to beatthe benchmark at certain range of prior confidence. The above baseline results areshown to be robust to modifications of portfolio strategies that allow for volatilitytiming, multiple factors prediction, fixed weight allocation, and estimation windowaveraging. Conclusions are not sensitive to investor’s risk aversion, not to size ofinformation set, and are not completely driven by the outlier of the 2008 financialcrisis.51Chapter 2Equity Asset Allocation: CanInvestors Exploit Cross-SectionalReturn Predictability by aParametric Strategy ?2.1 IntroductionIn the endeavor to uncover how does expected return vary across individual stocks,researchers have linked the magnitude of expected return to each stock’s character-istics, such as the underlying firm’s market capitalization and the book to marketratio (Fama and French (1992) and Fama and French (1996)), momentum (Carhart(1997)), the annual asset growth rate (Cooper et al. (2008)), the gross profit ofthe firm (Novy-Marx (2013)), and other accounting or financial variables (Goyal(2012)). Another key finding is the co-movement among stocks with similar char-acteristics.1 That feature of the data motivates the researchers to explain the doc-umented cross-sectional dispersion in expected return through the dispersion ofcovariance between the individual stock’s return and certain common factors. The1Thus, the characteristics that are identified to forecast mean return in the cross section wouldalso forecast variance, a fact confirmed by (Chan et al. (1998)).52common factors are typically constructed by the return of a spread portfolio, whichtakes long or short position on stocks according to their ranks in the characteristicssorts. Interestingly, not every return-predicting characteristic requires a new fac-tor. For instance, Fama and French (1993) found that although sales growth pro-duces spread in expected return, it can be accounted for by the factor loadings onsize (SMB) and book-to-market (HML) driven factors. Likewise, such attemptsat data reduction has been conducted repeatedly as new return predicting char-acteristics emerged (e.g., Fama and French (1996) and Fama and French (2008)).In the history of organizing those predicting signals, the CAPM with market re-turn as single factor, the Fama French three factor model, the Carhart four factormodel (FF3+momentum), and more recently, the Fama French five factor model(FF3+Profit+Investment) have shaped our progressive understanding of the under-lying stock return generating process.2In this chapter, we examine whether the documented cross-sectional return pre-dicting characteristics can be exploited by an equity investor to make welfare ben-efit. More specifically, we ask whether tilting portfolio of stocks based on firmscharacteristics such as size, book-to-market ratio, momentum, etc., would bringsignificant utility gain. We argue that, although return predictability is an estab-lished fact, its implication on portfolio management is nontrivial for at least tworeasons. First of all, while facing a zoo of return predictors, there is still only lim-ited information on the true stock return generating process. Harvey et al. (2013)count up to 300 variables identified in the literature to predict cross-sectional ex-pected returns. They are all candidate right hand variables for the panel stock returnforecasting regressions but we do not know which of them provide independent in-formation on average return and which of them are subsumed by others. Secondly,even if our investor pins down the forecasting characteristics to consider, the exactconditional distribution of returns over large number of stocks, given those firmcharacteristics, remain unknown. Hence, either a parametric or a non parametricreturn model needs to be imposed and then estimated with the limited amount ofhistorical data observed by the investor. Meanwhile, since any model-based port-2Deeper theories would further connect the factor risk premium to either the consumption side(e.g., Bansal et al. (2005); Hansen et al. (2008)) or the production side (e.g., Zhang (2005); Liu et al.(2009)) of economic fundamentals.53folio decision would take the estimated and potentially mis-specified return modelas an input, the associated error in estimation and model specification would befurther translated during the portfolio decision process, and the ultimate impact onportfolio return and welfare is unknown to the investor.Given the above mentioned concerns, the objective of this work is to quan-tify the utility value of a list of predicting characteristics, or their combination, inportfolio allocation net of the estimation and specification risks. To this end, I con-sider a limited data trading environment. I assume that under this environment, aCRRA equity investor is equipped with an information set that consists of a finitehistory of firm characteristics on individual stocks and subsequent realized returns.I then adopt a decision theory approach to view any portfolio strategy that allocatesweight on each individual stock as a function of historical data, or estimator. Thatportfolio estimator then maps the observed data in the information set to a vectorof weights depending on the current level of predicting characteristics. Finally, thewelfare value of each characteristic, or their combination, is measured by the levelof expected utility achieved by the associated portfolio strategy. Note that expec-tation should be taken over the joint distribution of historical data, current level ofcharacteristics, and next period returns.The contribution of my approach, methodologically, is to propose a conceptu-ally new means of assessing cross-sectional return predictability. Traditional worksfocus on the cross sectional R2 of average return over factor loading at the secondstage of Fama-Macbeth regression (Fama and MacBeth (1973)) or the joint signif-icance of pricing errors through GRS test (Gibbons et al. (1989)). In contrast, myanalysis looks directly into the welfare consequence of return predictability, an ob-ject of ultimate interest to investors. I argue that, although the traditional focus isvalid for asset pricing purposes, it does not provide sufficient information for thesake of portfolio management. First, a characteristic may explain a lot of returnvariation over time, but if that variation does not generate a big risk premium, thecontribution of that characteristic in separating average returns cross-sectionally isonly marginal. Statistical tests may well suggest to drop that variable if it doesn’tsignificantly affect the magnitude of pricing errors. However, for portfolio alloca-tion purposes, such variance related factor is clearly valuable, as it can be used tohedge risk. In addition, our utility metric, unlike the statistical ones, would natu-54rally take into account its contribution to both portfolio mean and higher moments.Secondly, traditional works make inference on the return generating process. Forexample, they ask whether incorporating extra characteristics reduces pricing errorif the true values of model parameters are known. Yet, my approach emphasizesthe welfare value of these predicting variables when the relevant forecast relation isestimated with limited data. The attention then shifts to the finite sample propertyof a portfolio estimator. Under this new criteria, a less mis-specified return modelsupported by asset pricing tests could be inferior to a simpler model with largerpricing errors. After all, a more complex portfolio strategy (one that involves morecharacteristics) carries more estimation burden, while a simpler one could be morerobust to sampling uncertainty in the estimation window.Using CRSP-Compustat data on US individual stocks, I estimate and test thesignificance to welfare gain of a list of cross-sectional expected return predictorsand their combinations in the aforementioned limited data framework. I includemarket capitalization, book-to-market ratio, lagged one-year return, gross profit,and investment, as these variables are believed to reliably forecast return and toorganize a lot of expected return anomalies(See Fama and French (2014)). Forportfolio optimization based on these characteristics, I adopt the parametric ap-proach in Brandt et al. (2009). These authors propose to directly parameterize theoptimal portfolio weight on an individual stock as a function of firm characteristicsand then solve for the the parameters that maximize average realized utility. Thisapproach has the benefit of dimension reduction and, to the best of my knowledge,it is by far the only effective approach that can be implemented on a large numberof securities.3Empirically, I find that, with this parametric approach, book-to-market ratioand asset growth rate can each create a significant utility benefit relative to a mar-ket capitalization weighted benchmark. The good performance of these strategies3More specifically, the traditional Markowitz approach on N securities requires modeling N firstmoments and N2 +N second moments, which is a formidable econometric problem when N is big.Imposing K factor structure on the covariance matrix reduces the number of parameters to K +N ∗(K + 2), but is still large. As a result, the literature sorts individual stocks into deciles based on acharacter first, and then make allocation decision over these decile portfolios. With two characters,the investable universe typically becomes 25 pre-sorted portfolios (quintile sorts cross quintile sorts),but it is not clear how this process would continue as the number of sorting characteristics goes up.55is concentrated in high unemployment (economic bust), high aggregate dividend-to-price ratio (low valuation), and low volatility regimes of the market. When com-bining multiple characters to form the portfolio decision, I find that multivariatestrategies tend to have infrequent but strong negative portfolio returns. Such insta-bility loss drags the associated expected utility estimates below that of the bench-mark. However, conditional on high dividend-to-price or low volatility regime, thesize, book-to-market, plus momentum based strategy significantly out-perform thecap weighted benchmark, as well as the univariate book-to-market ratio driven one.Operationally, in estimating utility expectation with a single path of realizedUS data, I relied on an out-of-sample equity allocation exercise, which creates a se-ries of “pseudo” repeated portfolio experiments. I impose a rolling window scheme,so that the length of historical data available remains the same at each portfolio de-cision. The resulting average realized utility thus serves as a consistent estimatorof the (unconditional) welfare measure. In testing the significance to utility benefit,I build the inference procedures upon those established in the forecast evaluationliterature (e.g., Diebold and Mariano (1995); West (1996); Giacomini and White(2006)). This stream of works has traditionally focused on evaluating point fore-casts in time series regressions by statistical measures of accuracy, such as meansquared forecast error, sign correctness or predictive likelihood. However, in ourcontext, the evaluation object is a panel data forecast and the evaluation metric ismodified to an economically meaningful criteria of expected utility.My work complements the analysis in DeMiguel et al. (2009) on the out-of-sample performance of mean variance equity allocation rules in two dimensions.First, in DeMiguel et al. (2009), the investable universe is either a 10 industriesportfolio or a 25 size and book-to-market sorted portfolios. In other words, the pre-dicting character is fixed first and then, a variety of policy functions (14 extensionsof the classical mean variance rule) is compared. However, my work fixes the port-folio policy and selects over a list of cross-sectional return predictors. The focusis now on the economic value of return forecasts as opposed to the effectiveness ofany policy function alone. Secondly, while an out-of-sample exercise is conductedin DeMiguel et al. (2009), the inference is on population level statements aboutSharpe ratio or certainty equivalent, i.e., whether Sharpe ratio gain can be achievedif true parameter values are known. In contrast, my analysis relies on a formal56estimation and inference procedure that validates the finite sample properties of aportfolio estimator against estimation and mis-specification risks. Since investorsonly face limited data in reality, we believe that the finite sample approach is morerelevant for portfolio management.My analysis also supplements the extensive literature on asset al-location under return predictability, i.e., Kandel and Stambaugh (1996),Aı¨t-Sahalia and Brandt (2001), Pa´stor (2000), Barberis (2000), Brennan and Xia(2001), Campbell and Viceira (2002), Avramov (2004), etc. The majority ofthis literature examined optimal allocation under time series predictability of theaggregate market. However, less attention has been paid on the portfolio impactof cross-sectional predictability in a disaggregate market. My results providesome new evidence in this aspect, with an emphasis on the limited data constraint.Meanwhile, existing studies focused almost exclusively on the unconditionalor average performance of return predictors. Yet, I recognize the potentialheterogeneity in strategy performance and use conditional evaluations to judge theeconomic value of predictors distinctively over different economic episodes.The rest of this chapter is organized as follows. Section 2 illustrates the lim-ited data investment framework. There, I also discuss how to estimate and makeinference on the welfare measure of any portfolio strategy. In Section 3, I presentthe empirical results on the US stock market when the investor adopts a parametricpolicy to exploit cross-sectional predictability. Both unconditional and conditionalevaluation are conducted. The last section concludes.2.2 Investment FrameworkThis section lays out the investment decision framework. I consider a single periodequity allocation problem in which the expected return on each individual stock ispredictable cross sectionally. However, the true conditional joint distribution of in-dividual stock returns given the predicting characteristics is unknown and has to beestimated with a finite history of data. I describe respectively the portfolio strategy,its performance measure in terms of welfare, the estimation of this performance aswell as the relevant inference procedure.572.2.1 Equity allocation ruleConsider an equity investor who allocates his current wealth Wt among a largenumber Nt of stocks. The investment horizon is τ so the position is held untilt + τ and then liquidated. The return of each individual stock j between time tand t + τ is denoted by r j,t+τ and is associated or correlated with a vector of firmcharacteristics z j,t observed at time t. The characteristics could be the firms marketcapitalization, book to market ratio, lagged annual stock return, asset growth orgross profit.The investor’s preference admits an expected utility representation with aCRRA function defined over the terminal wealth Wt+τ . At time t, the investorputs a fraction α j,t of wealth into each individual stock j based on the conditionaljoint density of returnsf (r1,t+τ , . . . ,rNt ,t+τ | {z j,t}Ntj=1),given observed firm attributes {z j,t}Ntj=1 for all stocks and his risk tolerance 1/γ .0Estimation windowHistorical data: φt tEquity allocation decision α(φt)Forecasting windowt+τThis conditional joint density is in practice unknown, but a finite history (sam-ple realization) of returns on individual stocks~rt = {{r j,τ}N0j=1, . . . ,{r j,t}Nt−τj=1 } andtheir attributes~zt = {{z j,0}N0j=1, . . . ,{z j,t−τ}Nt−τj=1 } is available. This historical data, oflength t and denoted as φt = {~rt ,~zt}, may be used to help investor to understand therelationship between the expected return of an individual stock E(r j,t+τ |z j,t) and itsattributes z j,t , i.e., E(r j,t+τ |z j,t) = g(z j,t). The function g can be estimated througheither panel regressions or attributes based portfolio sorts.4 Hence, the subsequentallocation choice α j,t in this finite history setting is data dependent, and any equityallocation strategy α(.) can be viewed as a generic estimator formally defined asfollows:4Portfolio sorts are really the same thing as nonparametric cross-sectional regressions, using non-overlapping histogram weights. See Cochrane (2011).58Definition 3. An allocation rule, or portfolio strategy, α(.) is a mapping fromany realization of historical data in the estimation window to a vector of portfolioweights on each individual stock:α(φt) : Φt → A ,where Φt is the range of historical data φt , and A is the admissible set of an Ntdimensional portfolio positions. Admissibility here requires the portfolio weightson different stocks to sum up to 1, so A = {{α j,t}Ntj=1 ⊂ RNt : ∑ j α j,t = 1}.2.2.2 Measurement of performanceI assess the performance of each allocation strategy α(.) based on the expectedutility it generates. Given a history of φt , the realized utility is derived asU(α(φt),{r j,t+τ}Ntj=1) =(∑Ntj=1 α(φt)[ j] r j,t+τ)1−γ1− γ ,where α(φt)[ j] is the jth element of portfolio position vector α(φt). This real-ized utility is a random variable as both φt and r j,t+τ are random. Accordingly, itshould be averaged across realizations of both historical data and future returns, assuggested in the following (unconditional) notion of performance measure:Definition 4. An unconditional welfare measure of the allocation rule α(.) is theunconditional expectation of realized utility:EU [α(.)] = E fφt ,{r j,t+τ }Ntj=1[U(α(φt),{r j,t+τ}Ntj=1)],where fφt ,{r j,t+τ}Ntj=1 is the true joint density of historical data and forecasting periodreturns.Under the above decision theory framework, the performance metric explic-itly accounts for the effect of estimation uncertainty on portfolio performance.Meanwhile, it also reflects mis-specification risk as modeled forecasting relation,E(r j,t+τ |z j,t) = g(z j,t), does not necessarily coincide with the one implied by thetrue distribution. Thus, this metric helps us to quantify the practical usefulness of59any portfolio strategy when there is only limited information on the true predictiverelation.Beyond such unconditional evaluation, the (unconditional) welfare measurecan further be modified to examine potential heterogeneity in strategy performance.In particular, I will consider utility expectations that are conditional on certainregime of the business cycle, aggregate market valuation, or market volatility mea-sured by an economic state variable st , i.e.,EU [α(.)|st = s] = E fφt ,{r j,t+τ }Ntj=1 |st=s[U(α(φt),{r j,t+τ}Ntj=1)],where fφt ,{r j,t+τ}Ntj=1|st=s is the joint density of historical data and future returns con-ditional on current economic state being at s. The regime s, for example, can be aboom / bust episode, sboom / sbust , if the contemporary unemployment rate is below/ above its average level.2.2.3 Estimation of welfare metricThe welfare measure, either unconditional or conditional, is a frequentist notionof average realized utility achieved over repeated samples drawn from the truedistribution f . However, since the true joint distribution is unknown to econome-trician, this quantity of welfare measure itself has to be estimated. Note that, inestimating this quantity, only one single path of data is available. So, to overcomethis issue, we rely on a sequence of “pseudo” repeated experiments generated byan out-of-sample equity allocation exercise. Specifically, let T denotes the totalnumber of observations available to the econometrician and t be the number ofobservations accessible to the investor (in his portfolio estimation window). Thus,m = T − t − τ + 1 would represent the number of out-of-sample periods. At eachtime k, t ≤ k < t +m, our investor is asked to make the above equity allocationdecision based only on the historical data within [k− t +1,k], i.e., rolling windowscheme. The rationale of using rolling window rather than the expanding one isthat, in quantifying limited data value, our allocation strategy α(φt) and its per-formance measure was set to be history size specific. Accordingly, the length ofdata available to investor at each portfolio decision experiment needs to remain thesame.60Based on the above argument, I propose an estimator of the (unconditional)welfare to be the out-of-sample average of realized utility, expressed as:ÊU[α(.)] = 1mT−τ∑k=tU(α(φk),{r j,k+τ}Nkj=1),where φk stands for the sample data between [k− t + 1,k]. If the cross sectionalstock returns as well as their associated firm characteristics (after some normaliza-tion) are stationary and ergotic, ÊU [α(.)] will converge almost surely to the uncon-ditional welfare measure as m goes to infinity.(See strong law of large numbers forstationary and ergotic process in White (1984).)The conditional notion of welfare will be estimated in a similar way, exceptthat averaging is now only over the allocation exercises with the same economicregime. Denote sk to be the level / regime of certain economic state at allocationtime k, the performance of α(.) conditional on regime s is then estimated as:ÊU [α(.)|s] = 1msT−τ∑k=tU(α(φk),{r j,k+τ}Nkj=1)I(sk = s),where I(.) is the indicator function and ms the number of observations with regimes. Finally, all welfare estimates, either unconditional or conditional, are translatedinto certainty equivalent returns estimates, ĈE[(α(.))] =U−1(ÊU [α(.)]), for easeof exposition.2.2.4 Inference on utility benefitThe performance estimate of any equity allocation rule α(φt) is benchmarkedagainst that of a simple market cap weighted strategy which ignores any returnpredicting characteristics. The benchmark rule mimics the return of passively hold-ing the aggregate market and is denoted by α0(φt) = α0(−−−−→mktcapt). The differencein estimated welfare ÊU [α(.)]− ÊU [α0(.)] reflects the incremental portfolio valueof the considered cross-sectional return predicting attributes under limited data.However, ÊU itself is only a point estimate of the true expected utility. Hence, toaccount for the sampling variability in ÊU , a formal inference procedure is needed.Unconditional inference61For unconditional inference, the null hypothesis we are interested in is that,on average, return predicting firm attributes do not generate any expected utilitydifference relative to the benchmark:H0 : E[U(α(φk),{r j,k+τ}Nkj=1)−U(α0(φk),{r j,k+τ}Nkj=1)] = 0, ∀k = t, ...,T − τ .Note that, expectation is taken with respect to all possible sample paths of theentire panel of stock returns and firms attributes, and that, under stationarity, theunconditional mean stays the same for ∀k.The alternative to H0 is specified as follows. Denote ∆Uk,k+τ =U(α(φk),{r j,k+τ}Nkj=1)−U(α0(φk),{r j,k+τ}Nkj=1),HA : E[|∆Uk,k+τ |]> 0, ∀k.The testing procedure borrows from those developed in the forecast evaluationliterature (e.g., Diebold and Mariano (1995), West (1996), Clark and McCracken(2001), Giacomini and White (2006)). This stream of research has traditionallyfocused on equal forecast accuracy between two competing point forecasts in timeseries regressions based on quadratic loss (squared error), directional accuracy, orpredictive log-likelihood. However, in our framework, the object of interest is apanel forecast and the purpose of return forecast is to make allocation decision.Accordingly, forecast evaluation is addressed in a structural way that integrates theportfolio optimization process. The relevant loss is the negative of realized utilityand can no longer be expressed as a function of forecast errors.While conceptually distinct, the asymptotic results established in existing liter-ature can still be applied. Particularly, we will employ the testing framework (forpredictive superiority) in Giacomini and White (2006). The test is based on thefollowing Wald-type statistic:Tt,m = m(∆U t,m) ˆΩ−1m (∆U t,m),where ∆U t,m = 1m ∑T−τj=t ∆U j, j+τ is the out-of-sample average of realized utilitydifference and ˆΩm is a suitable HAC estimator of its asymptotic variance Ωm =62var[√m∆U t,m].5A level α test rejects the null of equal strategy performance whenever Tt,m >χ21,1−α , where χ21,1−α is the 1 − α quantile of χ21 distribution. The underlyingjustification of such test follows central limit theorem for stationary and ergoticprocess and other standard asymptotic arguments on the positive definiteness ofcovariance estimates (c.f. White (1984); Giacomini and White (2006)).Conditional inferenceWhereas the above analysis focused on the unconditional portfolio value of re-turn predicting firm attributes, conditional inference tests for expected utility differ-ence conditional on a particular economic regime. The null hypothesis consideredis:Hc0 : E[∆Uk,k+τ |sk = s] = 0, ∀k = t, ...,T − τ ,with s being certain business cycle, market valuation or volatility regime. As men-tioned above, I will consider current economy to be at a boom / bust state, sboom /sbust , if the contemporary unemployment rate is below / above its average level. Ialso conditioning on a high / low market valuation regime, sldp / shdp, defined asstates when the dividend to price ratio at the aggregate level is less / greater thanits mean,6 or a high / low volatility regime, shvol / slvol , if the VIX index exceeds/fall beneath its average. Those conditioning instruments will help us to examinewhether relative performance of characteristics based portfolio tilting is uniformacross the business cycle, turbulence dependent or sensitive to the aggregate marketvaluation level. As a result, it could potentially guides us to fine tune the portfolioadvice based on current state if performance heterogeneity is detected.7Fixing a regime s, the testing procedure relies on the same Wald-type statisticas in the unconditional test except that, it uses only samples with sk = s. As be-fore, under certain regularity conditions (c.f., White (1984); Giacomini and White(2006)), such test has correct size and is consistent against the alternative of5We truncate covariance estimator at the lag of 12 since most of our accounting based firm charac-teristics stays the same for 12 months. The construction of HAC estimator is illustrated in Andrews(1991).6Low dividend to price ratio corresponds to high valuation and on average signals a low futurereturn.7For instance, if relative performance is significantly positive in boom and negative in recession,this would imply that we shall avoid characteristics based portfolio tilts during economic recession.63HcA : E[ |∆Uk,k+τ | | s]> 0, for ∀k with sk = s.2.3 Empirical AnalysisUsing framework developed in the previous section, I now look into the perfor-mance of characteristics based portfolio strategies empirically. I first describe thedata used; the return predicting characters considered and the portfolio policy en-tertained. I then present the empirical findings and discuss their implications.2.3.1 Data and utility assumptionI use monthly firm-level return data obtained from CRSP and annually reportedfirm-level accounting attributes from Compustat Fundamentals Annual. Unlikequarterly report, the balance sheet and income statement items documented in Fun-damentals Annual is audited and not revised in a future date (real time). For eachfirm and month within these data sets, I link the stock return to its recent account-ing attributes at the firm’s fiscal year end. To allow for publication lag, I requirethat fiscal year end characteristics be released for at least 6 months at the time ofportfolio formation. I include all CRSP firms that are incorporated in US and listedon NYSE, AMEX and NASDAQ with share code of 10 or 11 (common stock). Inthe months of delisting, missing delisting return (dlret) is set to be -30% when thedelisting code is between 400 and 600, and 0% otherwise. Return is set to the delist-ing return (dlret) directly when there is no market return (ret), but is compoundedwith return (ret) when there is value in that variable at the delisting month.At each of the firm fiscal year end, I construct the following variable: the mar-ket capitalization, defined as log of price per share times number of shares out-standing (abs(prc)*shrout); the book-to-market ratio, defined as log of one plusbook equity divided by market capitalization; the asset growth rate, defined as per-centage change in book value of total asset (at) relative to lag year; the growthprofitability, defined as log of one plus gross profit (gp) divided by total asset (at).In computing book-to- market ratio, the book value of equity is computed as stockholder’s equity, plus balance sheet deferred taxes (txdb) and investment tax credit(itcb) (if available) minus the book value of preferred stock. Depending on avail-ability, the book value of preferred stock is measure by redemption (pstkrv), liq-64uidation (pstkl), or par value (pstk), in that order. The stock holder’s equity isreported in Compustat (ceq), but if that number is missing, I use the book value ofcommon equity(ceq) plus book value of preferred stock computed above. And ifbook value of common equity is unavailable, it is counted for as book value of totalasset (at) minus total liability (lt). Beyond those accounting based firm attributes,I also construct a price based variable, momentum, which can be updated monthly.Following standard literature such as Jegadeesh (1990), momentum at month t isdefined by the compounded return between month t −13 to t −1.8Regarding primitive on the CRRA preference, it is common practice in theportfolio allocation literature to consider relative risk aversion γ ranging from 5 to10, but a higher value of γ = 15 are sometimes also entertained when gauging theeffect of varying γ (See for instance, Barberis (2000)).9 Following this tradition, Ipick γ = 5 for most of the empirical exercises and change this risk aversion levelat 10 and 15 as robustness checks.2.3.2 Implementing a parametric policyRecall that any portfolio strategy is viewed as a generic estimator, that maps afinite history of panel data on stock returns and firm attributes towards a vectorof portfolio weights. In this chapter, I focus on a parametric approach proposedby Brandt et al. (2009), which to the best of my knowledge, is the only effectivestrategy to exploit predictability over large scale of securities. In particular, I pa-rameterize the optimal portfolio weight α j,t on each individual stock j as a linearfunction of its characteristics:α j,t(θ , z˜ j,t) = α0(φt)[ j]+ 1Nt θ′ z˜ j,t ,8The motivation for skipping the last month is due to the presence of the short-term reversal effect,as documented by Jegadeesh (1990).9Decision theory literature and experimental economists have shown some evidence that individ-ual’s risk aversion level when making lottery choices should not exceed a number of 5. We pointout here that a portfolio manager operating in the financial market may have a different risk appetite.In fact, according to Figure 1 in van Binsbergen et al. (2012), which estimates the cross sectionaldistribution of US mutual fund managers’ risk appetite, the density of risk aversion peaks at 10 to 25and is skewed to the right.65where α0(φt)[ j] is the portfolio weight on stock j in our benchmark rule (valueweighted), and θ captures the deviation or active portion relative to the benchmark.The linear relationship with firm characteristics is an innocuous assumption, sincewe can always make non-linear transformations on the basis attributes as I alreadydid through the log function. The term z˜ j,t is a cross-sectionally standardized ver-sion of firm characteristics. Standardization is conducted through either subtractingthe mean characteristic of the whole stock universe or that of the same industry.10In both cases, it guarantees the cross-sectional distribution of firm attributes to bestable over time and the deviations of optimal portfolio weight from benchmarksum to zero. Finally, the term 1Nt normalize the portfolio policy to be applicableto arbitrary number of stocks, so that changing Nt under the same cross sectionaldistribution wouldn’t affect the riskiness of the portfolio.Given historical data of length t, the coefficients in the portfolio function arethen estimated by maximizing the sample analogue of investor’s unexpected util-ity:11ˆθ = argmaxθt−τ∑k=1u(Nk∑j=1(α j,t(θ , z˜ j,t)r j,k+τ )= argmaxθt−τ∑k=1u(Nk∑j=1(α0(φk)[ j]+ 1Nk θ′z˜ j,k)r j,k+τ ).Note that, the burden of computation only grows with the number of characteristicsconsidered and is independent of the number of assets. Finally, the portfolio weighton individual stock j given historical data and current realized predictor values z˜ j,tcan be expressed as:α(φt)[ j] = α0(φt)[ j]+ 1Ntˆθ ′z˜ j,t .10Asness et al. (2000) stress the importance of industry normalization to clean out the systematicoperational or financial differences across industries. For instance, if we sort individual stocks bybook to market ratio, Bio-tech firms always stay at the bottom. Dogmatically shorting these stocksignores the systematic feature of Bio-Tech industry.11Since the same coefficients maximize expected utility given any realized value of cross-sectionalpredictors, it also maximize unconditional expected utility.662.3.3 Empirical findingsI now turn to the empirical assessment on the portfolio value of cross-sectionalreturn predictors under the above parametric policy. Unless otherwise stated, Iassume the history starts at Jan 1974 and the size of the information set to be 10years. So, the out-of-sample period spans a 30 year from Jan 1984 to Dec 2013. Ipresent both the unconditional and conditional evaluation results.Unconditional performanceUnivariate strategies: I start from the univariate case where each of the individ-ual firm character (predictor) is incorporated. Figure 2.1 plots the estimated slopecoefficient θ on each firm character in the 10 years out-of-sample rolling windowenvironment. As firm characters are all standardized, these coefficients measure thedegree of portfolio weight tilting on any individual stock for one standard deviationdifference in the corresponding firm character cross-sectionally. The signs of theseestimated coefficients suggest that, consistent with the literature, investors wouldlike to tilt their portfolio towards stocks with lower than average market capital-ization, asset growth rate and higher than average book-to-market ratio, past year’sreturn and gross profit to asset.12 In terms of magnitude, however, the estimated co-efficients are not stable over time. For instance, the past return (momentum) effecton portfolio choice almost disappears after 2000 and the asset growth (investment)effect shrinks gradually over time. Among the five predictors considered, book-to-market ratio, gross profit to asset along with the asset growth rate seem to havestronger impact than size and momentum character on estimated portfolio choice.However, as argued above, their welfare consequences are not trivial given estima-tion error and forecasting power instability.I henceforth turn to the ex-post performance of these character based strategies.Table 2.1 presents the expected utility estimates of the parametric policy for eachindividual predicting character. Numbers are translated into certainty equivalentreturns (CER). We observe that, compared to the value weighted benchmark, bookto market ratio and gross profit to asset produce a monthly CER gain of 1.18% and0.12% when firm characters are normalized over the whole investable universe.12Industry wide standardization exhibit similar pattern as painted in Figure 2.267Figure 2.1: Out-of-Sample Estimates of Parametric Policy with UnivariateCharacterNotes: This figure depicts the 10-years rolling-window estimates of the θ coefficient in the parametric policy withalternative univariate character. The coefficient measures the degree of portfolio weight tilting on each individualstock for one standard deviation difference in the firm characteristic cross-sectionally. Plotted in each panel is theslope coefficient θ on standardized market capitalization (Size, top left); book to market ratio (Book to Market,top right); lagged return momentum (Momentum, middle left); gross profitability over asset (Gross Profit, middleright) and asset growth rate (Asset Growth, bottom). Data is observed at monthly frequency and ranges from Jan1984 till Dec. 2013. Risk aversion level is assumed to be 5.1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Size1985 1990 1995 2000 2005 2010−10−5051015 Book to Market1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Momentum1985 1990 1995 2000 2005 2010−10−5051015 Gross Profit1985 1990 1995 2000 2005 2010−25−20−15−10−505 Asset Growth68Figure 2.2: Out-of-Sample Estimates of Parametric Policy with UnivariateIndustry Standardized CharacterNotes: This figure depicts the 10-years rolling-window estimates of the θ coefficient in the parametric policy withalternative univariate industry standardized character. The coefficient measures the degree of portfolio weighttilting on each individual stock for one standard deviation difference in the firm characteristic compared to itsindustry peers cross-sectionally. Plotted in each panel is the slope coefficient θ on industry standardized mar-ket capitalization (Size, top left); book to market ratio (Book to Market, top right); lagged return momentum(Momentum, middle left); gross profitability over asset (Gross Profit, middle right) and asset growth rate (AssetGrowth, bottom). Data is observed at monthly frequency and ranges from Jan 1984 till Dec. 2013. Risk aversionlevel is assumed to be 5.1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Size (Industry)1985 1990 1995 2000 2005 2010−10−5051015 Book to Market (Industry)1985 1990 1995 2000 2005 2010−10−8−6−4−20246810 Momentum (Industry)1985 1990 1995 2000 2005 2010−10−505101520 Gross Profit (Industry)1985 1990 1995 2000 2005 2010−25−20−15−10−505 Asset Growth (Industry)69Table 2.1: Unconditional Evaluation of Parametric Policy with UnivariateCharacterNotes: This table reports the unconditional evaluation results of parametric policy with alternative univariate stan-dardized predicting characteristic. The rows labeled ”CER” denote the point estimates of the certainty equivalentgross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-values of an uncon-ditional test on the null that value weighted benchmark strategy achieves the same expected utility as the relevantparametric rule under 10 years of limited data. Each column corresponds, respectively, to the firm characteristicof (a) Size; (b) Book to Market ratio; (c) Momentum; (d) Gross Profit over Total Asset; (e) Asset Growth rate.Panel (1) is on the case where character is standardized over the whole investable universe while panel (2) is onthe case where standardization is over the peer firms within the same industry. Data ranges from Jan 1984 to Dec2013 and risk version is set at 5.CER estimate on value weighted benchmark: 1.0115Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0107 1.0233 1.0105 1.0127 1.0091p-val (0.6907) (0.0229) (0.8217) (0.6669) (0.8892)(2) Industry Standardized CharacterCER 1.0119 1.0265 1.0104 1.0102 1.0269p-val (0.8206) (0.0119) (0.8090) (0.9036) (0.0113)In the case where these characters are normalized within the same industry (sin-gle digit SIC code), book to market and asset growth rate create a monthly CERbenefit of 1.50% and 1.54% respectively. All other strategies produce lower thanbenchmark CER estimates.While above analysis focus on point estimates of welfare values, inferenceexercises help us to gauge the significance of utility difference. According to the p-values reported in Table 2.1, utility benefit from incorporating either whole marketor industry standardized book to market ratio is significant at 5% level. The samefinding holds for industry normalized asset growth rate character. In contrast, theutility gains from gross-profit-to-asset ratio is not statistically significant.1313We should be aware of the multiple and simultaneous hypothesis testing issue that with a totalnumber of 5 hypothesis tested in the same time, the likelihood of witnessing a rare event and thereforethe family-wise type I error rate increase ( See Dunn (1961) and Holm (1979)). I henceforth conduct70Figure 2.3: Out-of-Sample Returns by Parametric Policy with UnivariateCharacterNotes: This figure plots the out-of-sample realizations of monthly gross returns by the parametric policy withalternative univariate character. Green line is the returns of a market capitalization weighted passive portfoliowhich serves as a benchmark. Blue lines , in contrast, is the returns generated by strategy on standardizedmarket capitalization (Size, top left); book to market ratio (Book to Market, top right); lagged return momentum(Momentum, middle left); gross profitability over asset (Gross Profit, middle right) and asset growth rate (AssetGrowth, bottom). Data is observed at monthly frequency and ranges from Jan 1984 till Dec. 2013. Risk aversionlevel is assumed to be 5.1985 1990 1995 2000 2005 20100.80.850.90.9511.051.11.151.21.25 vwSize1985 1990 1995 2000 2005 20100.80.911.11.21.31.4 vwBook to Market1985 1990 1995 2000 2005 20100.70.80.911.11.21.31.4 vwMomentum1985 1990 1995 2000 2005 20100.750.80.850.90.9511.051.11.151.2 vwGrossProfit1985 1990 1995 2000 2005 20100.511.52 vwAsset Growth71Figure 2.4: Out-of-Sample Returns by Parametric Policy with Industry stan-dardized Univariate CharacterNotes: This figure plots the out-of-sample realizations of monthly gross returns by the parametric policy withalternative industry standardized univariate character. Green lines denotes the returns of a market capitalizationweighted passive portfolio which serves as a benchmark. Blue lines, in contrast, is the returns generated bystrategy on industry standardized market capitalization (Size, top left); book to market ratio (Book to Market, topright); lagged return momentum (Momentum, middle left); gross profitability over asset (Gross Profit, middleright) and asset growth rate (Asset Growth, bottom). Data is observed at monthly frequency and ranges from Jan1984 till Dec. 2013. Risk aversion level is assumed to be 5.1985 1990 1995 2000 2005 20100.80.850.90.9511.051.11.151.21.251.3 vwSize (Industry)1985 1990 1995 2000 2005 20100.80.911.11.21.31.4 vwBook to Market (Industry)1985 1990 1995 2000 2005 20100.70.80.911.11.21.3 vwMomentum (Industry)1985 1990 1995 2000 2005 20100.60.70.80.911.11.21.3 vwGrossProfit (Industry)1985 1990 1995 2000 2005 20100.811.21.41.61.82 vwAsset Growth (Industry)72To better understand these welfare consequences, I plot in Figure 2.3 the re-alized monthly portfolio returns of the character based strategy (blue) comparedwith the value weighted benchmark (green) in the 10 years rolling window envi-ronment.14 The general observation is that, although predicting character basedstrategies dominate in many episodes, they exhibit significant tail risks. For exam-ple, momentum based strategies create a monthly loss as large as 25% and 40% in1987 and early 2000. The presence of such infrequent but strong negative returns isconsistent with the findings in the work of Daniel and Moskowitz (2013) on ”mo-mentum crashes”. Similarly, gross profit based strategy crashes in the eras of 1987;early 1990 and 2009 at the magnitude of 25% monthly loss. And asset growth rate,when standardized across the whole market, generates a monthly loss of 60% in2001 shortly after an astonishing 200% gross return in early 2000.The above figures indeed help us to explain our utility value findings in Table2.1. In particular, from a data generating perspective, as firm characters are corre-lated with future return cross-sectionally, incorporating these factors should carryinformation benefit. But on the other hand, the implementation of relevant portfo-lio policy requires model specification and estimation. With limited data, the resul-tant portfolio decision is sensitive to the realization of historical stock returns andvulnerable to the instability in forecast relation. The noises incurred would thentranslate into extra randomness and tail events in the portfolio returns, of whichthe downside is viewed as risk. The documented infrequent but strong negativereturns are evidence of such extra risk, and the failures of many character basedstrategies suggest that the losses due to mis-specification and estimation are notdominated by the information gain from incorporating predicting character. Sim-ilarly, the successes of book- to-market and (industry standardized) asset growthrate based strategies can be attributed to their limited downside risks, which do notoffset the information gain of incorporating predictors.Multivariate strategies: The above strategies utilize only single return predict-ing firm character. Yet, multiple characters may provide complementary informa-tion on the distribution of future return. Thus, adding additional characters to theBonferroni correction and Holm-Bonferroni method as conservative ways to control for family wiseerror rate. Unfortunately, we can no longer reject the null hypothesis of equal expected utility then.14Again, the case with industry wide character normalization is painted in Figure 2.4.73Table 2.2: Unconditional Evaluation of Parametric Policy with MultivariateCharactersNotes: This table reports the unconditional evaluation results of parametric policy with alternative multivariatestandardized predicting characteristics. The rows labeled ”CER” denote the point estimates of the certaintyequivalent gross return of the relevant multivariate parametric strategy. And the rows labeled ”p-value” reports thep-values of an unconditional test on the null that value weighted benchmark strategy achieves the same expectedutility as the relevant parametric rule under 10 years of limited data. Column (a) to (e) corresponds, respectively,to the firm characteristics combination of (a) Book to Market + Gross Profit; (b) Book to Market + Asset Growth;(c) Size + Book to Market + Momentum; (d) Size + Gross Profit + Asset Growth; and (e) all five characteristics.Panel (1) is on the case where characters are standardized over the whole investable universe while panel (2) ison the case where standardizations are over the peer firms within the same industry. Data ranges from Jan 1984to Dec 2013 and risk version is set at 5.CER estimate on value weighted benchmark: 1.0115B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0275 1.0123 0.4322 0.9341 0.4322p-val (0.0157) (0.9592) (0.3059) (0.3475) (0.3060)(2) Industry Standardized CharacterCER 1.0338 1.0288 0.4322 0.9359 0.4323p-val (0.0044) (0.0092) (0.3061) (0.4087) (0.3063)portfolio strategy should create welfare benefit. But the cost here is more estima-tion uncertainty due to increased model complexity. The net tradeoff hence needsto be examined empirically.Table 2.2 exhibits the unconditional performance of several multiple charactersbased strategies. The first one, Book-to-Market plus Gross- Profit to Asset ratio,represents our attempt to tilt the portfolio towards cheap (value) and profitablefirms. The associated CER estimates ( Column (a) ) exceed that of the market capweighted benchmark at either 5% or 1% significance level for whole market andindustry standardized characters.15 More importantly, these CER estimates riseabove that of the univariate Book to Market ratio strategy by 0.42% and 0.73%respectively. However, unreported analysis shows that such out-performance (over15The industry standardized case survives the multiple hypothesis testing correction mentionedbefore.74the univariate one) is only significant for industry standardized case at 10% level.The second strategy I entertain combines Book-to-Market with Asset Growth rate.The CER estimates there ( Column (b) ) also exceed the cap weighted benchmarkbut is significant only for industry standardized characters. And now, relative to theunivariate Book-to-Market strategy, combining Asset Growth rate does not gener-ate significant advantage.The next three strategies considered involve more (three and five) charac-ters. These strategies comply with either the standard value, size, and momentummodel, or the investment based size, profitability, and investment model (Hou et al.(2012));16 or the “kitchen sink” five factors model (Fama and French (2014)). Theassociated unconditional evaluation results are reported in the last three columnsof Table 2.2. Indeed, none of these strategies could even create a higher point es-timate of CER relative to the cap weighted benchmark. A glance of Figure 2.5, orFigure 2.6 for industry standardized case, reveals to us the presence of a numberof large and negative realized portfolio returns, which kill these strategies. Theresults indicate that it is difficult to exploit more than three return predicting firmcharacters jointly at least in an unconditional sense.Conditional evaluationThus far, my analysis has only focused on the average, or unconditional, perfor-mance of characters based portfolio strategies. However, as argued above, theirrelative performance may be heterogenous across different economic regimes. Ac-cordingly, I repeat the expected utility estimation and inference exercises condi-tional on alternative economic or market states. In particular, I let economy be atboom / bust state if the contemporary unemployment rate is below / above its aver-age (which amounts to 6.18% for our sample). For market based states, I define alow / high market valuation state as periods when the aggregate dividend-to-priceratio rises above / drops beyond its mean; and a turbulent / quite state when thevolatility measured by VIX index exceeds / under- reach its historical average.Table 2.3 presents the evaluation results for univariate strategies (with whole16Hou et al. (2012) uses asset growth rate to measure investment and return on equity to measureprofitability.75Figure 2.5: Out-of-Sample Returns by Parametric Policy with MultivariateCharactersNotes: This figure plots the out-of-sample realizations of monthly gross returns by the parametric policy withalternative multivariate characters. Green line is the returns of a market capitalization weighted passive portfoliowhich serves as a benchmark. Blue lines , in contrast, is the returns generated by strategy on standardized bookto market ratio and gross profit over asset (Book to Market + Gross Profit, top left); book to market ratio andasset growth rate (Book to Market + Asset Growth, top right); size, value and momentum (Size + Book to Market+ Momentum, middle left); Size, gross profitability and asset growth rate (Size + Gross Profit + Asset Growth,middle right) and all five characteristics (5 factors, bottom). Data is observed at monthly frequency and rangesfrom Jan 1984 till Dec. 2013. Risk aversion level is assumed to be 5.1985 1990 1995 2000 2005 20100.80.911.11.21.3 vwBook to Market + Gross Profit1985 1990 1995 2000 2005 20100.60.811.21.41.61.822.2 vwBook to Market + Asset Growth1985 1990 1995 2000 2005 2010−0.200.20.40.60.811.21.41.61.8 vwSize + BM + Momentum1985 1990 1995 2000 2005 20100.40.60.811.21.41.61.8 vwSize + Growth Profit + Asset Growth1985 1990 1995 2000 2005 2010−1−0.500.511.522.5 vw5 factors76Figure 2.6: Out-of-Sample Returns by Parametric Policy with MultivariateIndustry Standardized CharactersNotes: This figure plots the out-of-sample realizations of monthly gross returns by the parametric policy withalternative multivariate industry standardized characters. Green line is the returns of a market capitalizationweighted passive portfolio which serves as a benchmark. Blue lines , in contrast, is the returns generated bystrategy on industry standardized book to market ratio and gross profit over asset (Book to Market + Gross Profit,top left); book to market ratio and asset growth rate (Book to Market + Asset Growth, top right); size, valueand momentum (Size + Book to Market + Momentum, middle left); Size, gross profitability and asset growthrate (Size + Gross Profit + Asset Growth, middle right) and all five characteristics (5 factors, bottom). Data isobserved at monthly frequency and ranges from Jan 1984 till Dec. 2013. Risk aversion level is assumed to be 5.1985 1990 1995 2000 2005 20100.80.911.11.21.3 vwBook to Market + Gross Profit (Ind)1985 1990 1995 2000 2005 20100.811.21.41.61.82 vwBook to Market + Asset Growth (Ind)1985 1990 1995 2000 2005 201000.20.40.60.811.21.41.61.8 vwSize + BM + Momentum (Ind)1985 1990 1995 2000 2005 20100.40.60.811.21.41.61.8 vwSize + Growth Profit + Asset Growth (Ind)1985 1990 1995 2000 2005 2010−1.5−1−0.500.511.522.5 vw5 factors (Ind)77Table 2.3: Conditional Evaluation of Parametric Policy with Univariate Char-acterNotes: This table reports the conditional evaluation results of parametric policy with alternative univariate stan-dardized predicting characteristic. The rows labeled ”CER” denote the point estimates of the certainty equivalentgross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-values of an uncon-ditional test on the null that value weighted benchmark strategy achieves the same expected utility as the relevantparametric rule under 10 years of limited data. Each column corresponds, respectively, to the firm characteristicof (a) Size; (b) Book to Market ratio; (c) Momentum; (d) Gross Profit over Total Asset; (e) Asset Growth rate.Panel (A) conditions the strategy performance on low (boom) and high (bust) unemployment regimes; panel (B)conditions on high or low market expected return proxied by aggregate dividend to price ratio and panel (C) ontwo volatility states measured by the level of VIX index. Data ranges from Jan 1984 to Dec 2013 and risk versionis set at 5.Parametric policy with univariate standardized characterSize Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0130 1.0173 1.0112 1.0179 0.9908p-val (0.7122) (0.5875) (0.7651) (0.2710) (0.4750)(A) High unemployment: Benchmark CER 1.0110CER 1.0091 1.0295 1.0106 1.0093 1.0315p-val (0.5391) (0.0045) (0.9005) (0.7249) (0.0011)(B) High dividend to price: Benchmark CER 1.0145CER 1.0127 1.0344 1.0173 1.0122 1.0328p-val (0.4948) (0.0060) (0.3831) (0.6179) (0.0055)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0086 1.0127 1.0039 1.0131 0.9878p-val (0.9529) (0.5532) (0.5620) (0.0693) (0.5024)(C) High VIX index: Benchmark CER 1.0022CER 1.0034 1.0079 0.9926 1.0000 0.9884p-val (0.6760) (0.4490) (0.2345) (0.5454) (0.6788)(C) Low VIX index: Benchmark CER 1.0206CER 1.0208 1.0355 1.0267 1.0249 1.0312p-val (0.9268) (0.0137) (0.0151) (0.3331) (0.1091)78Table 2.4: Conditional Evaluation of Parametric Policy with Univariate In-dustry Standardized CharacterNotes: This table reports the conditional evaluation results of parametric policy with alternative univariate indus-try standardized predicting characteristic. The rows labeled ”CER” denote the point estimates of the certaintyequivalent gross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-valuesof an unconditional test on the null that value weighted benchmark strategy achieves the same expected utilityas the relevant parametric rule under 10 years of limited data. Each column corresponds, respectively, to thefirm characteristic of (a) Size; (b) Book to Market ratio; (c) Momentum; (d) Gross Profit over Total Asset; (e)Asset Growth rate. Panel (A) conditions the strategy performance on low (boom) and high (bust) unemploymentregimes; panel (B) conditions on high or low market expected return proxied by aggregate dividend to price ratioand panel (C) on two volatility states measured by the level of VIX index. Data ranges from Jan 1984 to Dec2013 and risk version is set at 5.Parametric policy with univariate industry standardized characterSize Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0125 1.0215 1.0109 1.0235 1.0209p-val (0.5431) (0.2661) (0.7421) (0.0237) (0.5072)(A) High unemployment: Benchmark CER 1.0110CER 1.0121 1.0313 1.0108 0.9974 1.0349p-val (0.6662) (0.0071) (0.9205) (0.5104) (0.0000)(B) High dividend to price: Benchmark CER 1.0145CER 1.0152 1.0384 1.0167 1.0009 1.0372p-val (0.7880) (0.0060) (0.4799) (0.5026) (0.0002)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0086 1.0153 1.0044 1.0199 1.0172p-val (0.9507) (0.4130) (0.6149) (0.0085) (0.3785)(C) High VIX index: Benchmark CER 1.0022CER 1.0043 1.0116 0.9931 1.0086 1.0216p-val (0.4496) (0.2819) (0.2648) (0.1909) (0.0377)(C) Low VIX index: Benchmark CER 1.0206CER 1.0213 1.0389 1.0268 1.0084 1.0319p-val (0.7928) (0.0088) (0.0088) (0.5827) (0.1208)79market standardized characters) conditional on each of the regimes.17 Panel (A)suggests that, based on the CER estimates, size, momentum, and gross profitdriven strategies seem to do better in economic boom while book-to-market andasset growth driven strategies appear to work better in economic bust. Condition-ing on economy being at bust, book-to-market and asset growth rate driven strate-gies significantly outperform the cap weighted benchmark. While conditioning onboom, nothing beats the benchmark. Compared with the unconditional evaluationreported in panel (1) of Table 2.1, results indicate that the average success of book-to-market based strategy is attributed to its good performance in economic bust,and that the failure of asset growth rate driven one is mainly due to its unsatis-factory performance during economic boom. Panel (B) and (C) conduct strategyevaluation over different aggregate market valuation and volatility regimes. Wefind that, as expected, performance in terms of estimated CER are generally betterin low valuation (high dividend-to-price) and low volatility states. The exception isthe gross profit driven strategy, which has a larger CER estimate in high valuationstate. Conditional on low valuation states, book-to-market and asset growth ratedriven strategies outperform benchmark at 1% significance level. Yet, the CERimprovement of gross profit based one at high valuation state is only significant at10% level. During low volatility states, Book-to-Market strategy keeps doing wellsignificantly, but the outperform of asset growth rate driven rule is not significantanymore. Interesting, momentum strategy now create a 0.6% monthly CER ben-efit that is significant at 5% level. This suggests that momentum crashes can beavoided at less volatile market and hence momentum character can be exploitedconditionally.Table 2.5 documents the conditional evaluation results for multiple charactersbased strategies.18 Recall that from unconditional analysis (Table 2.2), multivariatepolicy rules generally under-perform the cap weighted benchmark. Conditionally,however, this finding does not hold. In fact, almost all multiple characters basedstrategies significantly outperform the cap weighted benchmark when the economyis in a bust, when the market is in a high dividend-to-price ratio (low valuation)17The case with industry standardized characters generates qualitatively similar results and arereported in Table 2.4.18Again, the industry standardized case is documented in Table 2.680Table 2.5: Conditional Evaluation of Parametric Policy with MultivariateCharactersNotes: This table reports the conditional evaluation results of parametric policy with alternative multivariatestandardized predicting characteristics. The rows labeled ”CER” denote the point estimates of the certaintyequivalent gross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-valuesof an unconditional test on the null that value weighted benchmark strategy achieves the same expected utilityas the relevant parametric rule under 10 years of limited data. Each column corresponds, respectively, to thecombination of firm characteristics as (a) Book to Market + Gross Profit; (b) Book to Market + Asset Growth;(c) Size + Book to Market + Momentum; (d) Size + Gross Profit + Asset Growth; (e) All five factors. Panel (A)conditions the strategy performance on low (boom) and high (bust) unemployment regimes; panel (B) conditionson high or low market expected return proxied by aggregate dividend to price ratio and panel (C) on two volatilitystates measured by the level of VIX index. Data ranges from Jan 1984 to Dec 2013 and risk version is set at 5.Parametric policy with multivariate standardized characterB/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0215 0.9948 0.3638 0.8698 0.3639p-val (0.4353) (0.5119) (0.2911) (0.2812) (0.2913)(A) High unemployment: Benchmark CER 1.0110CER 1.0337 1.0349 1.0307 1.0335 1.0372p-val (0.0017) (0.0000) (0.0113) (0.0896) (0.0339)(B) High dividend to price: Benchmark CER 1.0145CER 1.0382 1.0396 1.0434 1.0328 1.0482p-val (0.0150) (0.0002) (0.0003) (0.2014) (0.0185)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0174 0.9882 0.3648 0.8697 0.3648p-val (0.3478) (0.4601) (0.2913) (0.2878) (0.2912)(C) High VIX index: Benchmark CER 1.0022CER 1.0087 0.9883 0.3581 0.8593 0.3581p-val (0.5294) (0.6427) (0.2889) (0.2948) (0.2889)(C) Low VIX index: Benchmark CER 1.0206CER 1.0428 1.0360 1.0482 1.0362 1.0527p-val (0.0008) (0.0392) (0.0000) (0.0652) (0.0014)81Table 2.6: Conditional Evaluation of Parametric Policy with Multivariate In-dustry Standardized CharactersNotes: This table reports the conditional evaluation results of parametric policy with alternative multivariate in-dustry standardized predicting characteristics. The rows labeled ”CER” denote the point estimates of the certaintyequivalent gross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-valuesof an unconditional test on the null that value weighted benchmark strategy achieves the same expected utilityas the relevant parametric rule under 10 years of limited data. Each column corresponds, respectively, to thecombination of firm characteristics as (a) Book to Market + Gross Profit; (b) Book to Market + Asset Growth;(c) Size + Book to Market + Momentum; (d) Size + Gross Profit + Asset Growth; (e) All five factors. Panel (A)conditions the strategy performance on low (boom) and high (bust) unemployment regimes; panel (B) conditionson high or low market expected return proxied by aggregate dividend to price ratio and panel (C) on two volatilitystates measured by the level of VIX index. Data ranges from Jan 1984 to Dec 2013 and risk version is set at 5.Parametric policy with multivariate standardized characterB/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0268 1.0232 0.3638 0.8638 0.3638p-val (0.2142) (0.3486) (0.2912) (0.2988) (0.2909)(A) High unemployment: Benchmark CER 1.0110CER 1.0400 1.0366 1.0351 1.0497 1.0515p-val (0.0011) (0.0001) (0.0045) (0.0363) (0.0281)(B) High dividend to price: Benchmark CER 1.0145CER 1.0472 1.0437 1.0494 1.0394 1.0433p-val (0.0075) (0.0002) (0.0002) (0.3123) (0.3120)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0211 1.0149 0.3648 0.8695 0.3649p-val (0.1958) (0.5111) (0.2913) (0.3358) (0.2917)(C) High VIX index: Benchmark CER 1.0022CER 1.0162 1.0190 0.3581 0.8646 0.3582p-val (0.2099) (0.1010) (0.2890) (0.3696) (0.2898)(C) Low VIX index: Benchmark CER 1.0206CER 1.0451 1.0365 1.0534 1.0202 1.0234p-val (0.0042) (0.0537) (0.0000) (0.9836) (0.9150)82state, or during low volatility regime. These results suggest that the instabilitylosses we find before (by multivariate strategies) are concentrated in certain eco-nomic regimes. By avoiding these regimes, the welfare gain from incorporatingmultiple characters can outweigh the associated estimation and mis-specificationrisks. However, when we set the benchmark to be the univariate book to marketdriven strategy, the expected utility benefits are not significant anymore for moststrategies.19 The exception is the three factors, value, size, and momentum drivenone, which has a 5% confidence of out-performance at high dividend-to-price ratiostates and 1% confidence at low volatility regime. This finding indicates that thesize and momentum characters are practically valuable, as the incremental infor-mation on returns they provide (over the book-to-market character) can indeed beexploited conditionally.Robustness checksThis part of the paper examines whether the baseline results are robust to (1) thepresence of short sale constraint, (2) a smaller investable universe, (3) a shorteror longer length of historical information, and (4) a higher risk aversion level forthe investor. The first exercise restricts the investor from taking short positions.Here, I consider a simple behavioral rule, which eliminates all the short positionsin the non-restricted portfolio allocation and then re-normalize the long positionweights. This constrained policy rule would preclude a lot of trading opportuni-ties, but meanwhile, it improves the robustness of portfolio as the maximum losson any individual security is now bounded by -100%. The tradeoff on this shortsale constraint is estimated and documented in Table 2.7 for the case of univariatestrategies. In that case, the sacrifice in the ability to short seems large. For in-stance, the book-to-market or industry standardized asset growth rate driven strat-egy, of which their non-restricted version is successful, does not significantly out-perform the cap weighted benchmark anymore. Other univariate strategies, whenshort sale constrained, even significantly under-perform. Regarding multiple char-acters based strategies, evidence is mixed. Table 2.8 shows that, when short posi-tions are abandoned, the average performance of two characters driven strategies19This is not shown in Table 2.5.83Table 2.7: Short Sale Constraint: Unconditional Evaluation of UnivariateCharacterNotes: This table reports the unconditional evaluation results of parametric policy with alternative univariatestandardized predicting characteristic under short sale constraint. The rows labeled ”CER” denote the pointestimates of the certainty equivalent gross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmark strategy achievesthe same expected utility as the relevant parametric rule under 10 years of limited data. Each column corresponds,respectively, to the firm characteristic of (a) Size; (b) Book to Market ratio; (c) Momentum; (d) Gross Profit overTotal Asset; (e) Asset Growth rate. Panel (1) is on the case where character is standardized over the wholeinvestable universe while panel (2) is on the case where standardization is over the peer firms within the sameindustry. Data ranges from Jan 1984 to Dec 2013 and risk version is set at 5.CER estimate on value weighted benchmark: 1.0115Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0079 1.0086 1.0084 1.0077 1.0052p-val (0.0320) (0.1428) (0.0190) (0.0048) (0.0107)(2) Industry Standardized CharacterCER 1.0079 1.0080 1.0085 1.0074 1.0053p-val (0.0356) (0.1174) (0.0170) (0.0050) (0.0112)drop significantly and below benchmark. In contrast, for three and more charactersbased strategies, the point estimates of unconditional CER have increased com-pared to the no-restriction ones. Yet, their performance still fail to exceed thatof the benchmark. Conditional evaluation results for these short sale constrainedstrategies are documented in Table 2.9 and Table 2.10 for univariate and multi-variate strategies respectively.20 Even on those favorable regimes of high unem-ployment, high dividend-to-price ratio, or low volatility, we find none of short-saleconstrained single-character based strategies, including the book-to-market drivenone, significantly outperform the benchmark in terms of CER. For multiple charac-ters based policies, constrained size, book-to-market, plus momentum driven oneis the only no-short-sale strategy that generates a higher estimated CER than the20The case with industry standardized characters are presented in Table 2.11 and Table 2.1284Table 2.8: Short Sale Constraint: Unconditional Evaluation of MultivariateCharactersNotes: This table reports the unconditional evaluation results of parametric policy with alternative multivariatestandardized predicting characteristics under short sale constraint. The rows labeled ”CER” denote the pointestimates of the certainty equivalent gross return of the relevant multivariate parametric strategy. And the rowslabeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmark strategyachieves the same expected utility as the relevant parametric rule under 10 years of limited data. Column (a) to (e)corresponds, respectively, to the firm characteristics combination of (a) Book to Market + Gross Profit; (b) Bookto Market + Asset Growth; (c) Size + Book to Market + Momentum; (d) Size + Gross Profit + Asset Growth; and(e) all five characteristics. Panel (1) is on the case where characters are standardized over the whole investableuniverse while panel (2) is on the case where standardizations are over the peer firms within the same industry.Data ranges from Jan 1984 to Dec 2013 and risk version is set at 5.CER estimate on value weighted benchmark: 1.0115B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0088 1.0062 1.0090 1.0062 1.0068p-val (0.1851) (0.0202) (0.2511) (0.0201) (0.0394)(2) Industry Standardized CharacterCER 1.0087 1.0065 1.0087 1.0066 1.0067p-val (0.1718) (0.0315) (0.2139) (0.0275) (0.0257)benchmark conditional on low volatility regime. However, the utility improvementis not statistically significant.The second robustness test restricts the investor to trade only larger stocks.In particular, I consider alternative investable universe which excludes either thesmallest (in terms of market cap) 20% or 40% stocks. Note that all the firm char-acters need to be re-normalized over the new investable universe. Now facing lesstradeable assets would certainly precludes investors from a lot of trading opportu-nities. But, if the returns on small stocks are more volatile to model and estimate,excluding them may help improve the robustness of parametric strategy. Accordingto Table 2.13, we find that empirically, the net effect is generally negative for uni-variate strategies. Specifically, excluding either 20% or 40% of the smallest stockrenders the utility benefit from book-to-market or asset growth rate driven strategy85Table 2.9: Conditional Evaluation of Parametric Policy with Univariate Char-acter and Short Sale ConstraintNotes: This table reports the conditional evaluation results of parametric policy with alternative univariate stan-dardized predicting characteristic under short sale constraint. The rows labeled ”CER” denote the point estimatesof the certainty equivalent gross return of the relevant parametric strategy. And the rows labeled ”p-value” re-ports the p-values of an unconditional test on the null that value weighted benchmark strategy achieves the sameexpected utility as the relevant parametric rule under 10 years of limited data. Each column corresponds, respec-tively, to the firm characteristic of (a) Size; (b) Book to Market ratio; (c) Momentum; (d) Gross Profit over TotalAsset; (e) Asset Growth rate. Panel (A) conditions the strategy performance on low (boom) and high (bust) un-employment regimes; panel (B) conditions on high or low market expected return proxied by aggregate dividendto price ratio and panel (C) on two volatility states measured by the level of VIX index. Data ranges from Jan1984 to Dec 2013 and risk version is set at 5.Parametric policy with univariate standardized characterSize Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0093 1.0084 1.0098 1.0091 1.0038p-val (0.0093) (0.0103) (0.0416) (0.0008) (0.0001)(A) High unemployment: Benchmark CER 1.0110CER 1.0077 1.0102 1.0084 1.0080 1.0088p-val (0.1550) (0.7660) (0.1765) (0.1891) (0.5474)(B) High dividend to price: Benchmark CER 1.0145CER 1.0121 1.0140 1.0120 1.0113 1.0115p-val (0.2481) (0.8049) (0.1273) (0.1072) (0.4142)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0038 1.0034 1.0048 1.0042 0.9991p-val (0.1031) (0.1401) (0.0651) (0.0416) (0.0074)(C) High VIX index: Benchmark CER 1.0022CER 0.9980 0.9961 0.9958 0.9967 0.9939p-val (0.0598) (0.0197) (0.0032) (0.0040) (0.0060)(C) Low VIX index: Benchmark CER 1.0206CER 1.0197 1.0217 1.0213 1.0194 1.0182p-val (0.4361) (0.5544) (0.5179) (0.1553) (0.3234)86Table 2.10: Conditional Evaluation of Parametric Policy with MultivariateCharacters and Short Sale ConstraintNotes: This table reports the conditional evaluation results of parametric policy with alternative multivariatestandardized predicting characteristics under short sale constraint. The rows labeled ”CER” denote the pointestimates of the certainty equivalent gross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmark strategy achievesthe same expected utility as the relevant parametric rule under 10 years of limited data. Each column corresponds,respectively, to the combination of firm characteristics as (a) Book to Market + Gross Profit; (b) Book to Market+ Asset Growth; (c) Size + Book to Market + Momentum; (d) Size + Gross Profit + Asset Growth; (e) All fivefactors. Panel (A) conditions the strategy performance on low (boom) and high (bust) unemployment regimes;panel (B) conditions on high or low market expected return proxied by aggregate dividend to price ratio and panel(C) on two volatility states measured by the level of VIX index. Data ranges from Jan 1984 to Dec 2013 and riskversion is set at 5.Parametric policy with multivariate standardized characterB/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0085 1.0048 1.0089 1.0053 1.0056p-val (0.0175) (0.0003) (0.0327) (0.0005) (0.0011)(A) High unemployment: Benchmark CER 1.0110CER 1.0106 1.0099 1.0102 1.0091 1.0101p-val (0.8795) (0.7266) (0.7814) (0.5464) (0.7654)(B) High dividend to price: Benchmark CER 1.0145CER 1.0144 1.0134 1.0143 1.0119 1.0135p-val (0.9537) (0.7160) (0.9060) (0.4123) (0.7507)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0034 0.9992 1.0038 1.0006 1.0002p-val (0.1556) (0.0093) (0.2375) (0.0308) (0.0244)(C) High VIX index: Benchmark CER 1.0022CER 0.9962 0.9942 0.9959 0.9947 0.9939p-val (0.0304) (0.0058) (0.0345) (0.0138) (0.0068)(C) Low VIX index: Benchmark CER 1.0206CER 1.0220 1.0197 1.0226 1.0190 1.0207p-val (0.3840) (0.6666) (0.2786) (0.3559) (0.9935)87Table 2.11: Conditional Evaluation of Parametric Policy with Univariate In-dustry Standardized Character and Short Sale ConstraintNotes: This table reports the conditional evaluation results of parametric policy with alternative univariate indus-try standardized predicting characteristic under short sale constraint. The rows labeled ”CER” denote the pointestimates of the certainty equivalent gross return of the relevant parametric strategy. And the rows labeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmark strategy achievesthe same expected utility as the relevant parametric rule under 10 years of limited data. Each column corresponds,respectively, to the firm characteristic of (a) Size; (b) Book to Market ratio; (c) Momentum; (d) Gross Profit overTotal Asset; (e) Asset Growth rate. Panel (A) conditions the strategy performance on low (boom) and high (bust)unemployment regimes; panel (B) conditions on high or low market expected return proxied by aggregate divi-dend to price ratio and panel (C) on two volatility states measured by the level of VIX index. Data ranges fromJan 1984 to Dec 2013 and risk version is set at 5.Parametric policy with univariate industry standardized characterSize Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0084 1.0080 1.0097 1.0090 1.0040p-val (0.0018) (0.0105) (0.0235) (0.0029) (0.0001)(A) High unemployment: Benchmark CER 1.0110CER 1.0086 1.0093 1.0089 1.0073 1.0087p-val (0.3302) (0.5794) (0.2044) (0.1394) (0.4924)(B) High dividend to price: Benchmark CER 1.0145CER 1.0124 1.0138 1.0123 1.0109 1.0119p-val (0.3493) (0.7666) (0.1412) (0.0952) (0.4420)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0034 1.0024 1.0048 1.0039 0.9990p-val (0.0733) (0.1270) (0.0523) (0.0420) (0.0108)(C) High VIX index: Benchmark CER 1.0022CER 0.9975 0.9953 0.9961 0.9961 0.9938p-val (0.0371) (0.0252) (0.0021) (0.0051) (0.0042)(C) Low VIX index: Benchmark CER 1.0206CER 1.0197 1.0215 1.0213 1.0196 1.0183p-val (0.5818) (0.6442) (0.5423) (0.2450) (0.3278)88Table 2.12: Conditional Evaluation of Parametric Policy with Multivariate In-dustry Standardized Characters and Short Sale ConstraintNotes: This table reports the conditional evaluation results of parametric policy with alternative multivariateindustry standardized predicting characteristics under short sale constraint. The rows labeled ”CER” denote thepoint estimates of the certainty equivalent gross return of the relevant parametric strategy. And the rows labeled”p-value” reports the p-values of an unconditional test on the null that value weighted benchmark strategy achievesthe same expected utility as the relevant parametric rule under 10 years of limited data. Each column corresponds,respectively, to the combination of firm characteristics as (a) Book to Market + Gross Profit; (b) Book to Market+ Asset Growth; (c) Size + Book to Market + Momentum; (d) Size + Gross Profit + Asset Growth; (e) All fivefactors. Panel (A) conditions the strategy performance on low (boom) and high (bust) unemployment regimes;panel (B) conditions on high or low market expected return proxied by aggregate dividend to price ratio and panel(C) on two volatility states measured by the level of VIX index. Data ranges from Jan 1984 to Dec 2013 and riskversion is set at 5.Parametric policy with multivariate standardized characterB/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(A) Low unemployment: Benchmark CER 1.0137CER 1.0088 1.0052 1.0088 1.0064 1.0060p-val (0.0276) (0.0004) (0.0307) (0.0021) (0.0011)(A) High unemployment: Benchmark CER 1.0110CER 1.0098 1.0099 1.0098 1.0086 1.0093p-val (0.6910) (0.7077) (0.6960) (0.4305) (0.5713)(B) High dividend to price: Benchmark CER 1.0145CER 1.0138 1.0142 1.0144 1.0120 1.0132p-val (0.7628) (0.8989) (0.9409) (0.3918) (0.6358)(B) Low dividend to price: Benchmark CER 1.0084CER 1.0037 0.9991 1.0032 1.0014 1.0004p-val (0.1769) (0.0153) (0.1902) (0.0510) (0.0179)(C) High VIX index: Benchmark CER 1.0022CER 0.9963 0.9943 0.9956 0.9949 0.9943p-val (0.0446) (0.0081) (0.0357) (0.0150) (0.0067)(C) Low VIX index: Benchmark CER 1.0206CER 1.0219 1.0197 1.0226 1.0196 1.0200p-val (0.4551) (0.6687) (0.2921) (0.5091) (0.7234)89Table 2.13: Alternative Investable Universe: Unconditional Evaluation ofUnivariate CharacterNotes: This table reports the unconditional evaluation results of parametric policy with alternative investableuniverse given univariate standardized predicting characteristic. Part (A) and (B) document, respectively, on thecase where the investable universe excludes the bottom 20% or 40% of the smallest stocks in terms of marketcapitalization. The rows labeled ”CER” denote the point estimates of the certainty equivalent gross return of therelevant parametric strategy. And the rows labeled ”p-value” reports the p-values of an unconditional test on thenull that value weighted benchmark strategy achieves the same expected utility as the relevant parametric ruleunder 10 years of limited data. Each column corresponds, respectively, to the firm characteristic of (a) Size; (b)Book to Market ratio; (c) Momentum; (d) Gross Profit over Total Asset; (e) Asset Growth rate. Panel (1) is onthe case where character is standardized over the whole investable universe while panel (2) is on the case wherestandardization is over the peer firms within the same industry. Data ranges from Jan 1984 to Dec 2013 and riskversion is set at 5.(A) Excluding 20% smallest stock: benchmark CER 1.0115Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0176 1.0146 1.0088 1.0100 0.9983p-val (0.0579) (0.1852) (0.3869) (0.3141) (0.6068)(2) Industry Standardized CharacterCER 1.0081 1.0154 1.0086 1.0084 1.0174p-val (0.0238) (0.1109) (0.3377) (0.5830) (0.4948)(B) Excluding 40% smallest stock: benchmark CER 1.0115Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0098 1.0106 1.0071 1.0102 1.0129p-val (0.3011) (0.3631) (0.3390) (0.1110) (0.2751)(2) Industry Standardized CharacterCER 1.0098 1.0102 1.0050 1.0106 1.0128p-val (0.2595) (0.1190) (0.2431) (0.4857) (0.2976)90insignificant. The only exception is the size character under the 20% exclusioncase. When standardized under this new investable universe, this character can beexploited with limited data, although the utility benefit is only weakly significant.When standardized across the same industry, however, the associated strategy sig-nificantly under-perform the benchmark. The results for multiple characters basedstrategies are similar. Table 2.14 indicates that, when excluding 20% of the small-est stock, the two characters driven policies fail to significantly outperform thebenchmark. When 40% of the smallest stocks are excluded, the point estimatesof CER for the two characters-based strategies further decrease. However, the size,gross- profitability, plus asset-growth rate driven strategy, restricted on large stocks,now significantly outperform the benchmark when all characters are industry stan-dardized.In the third robustness test, I explore the performance of parametric policywhen the investor has alternative size of information set. A longer length of histor-ical information would reduce estimation error and hence improves the efficiency.However, in the presence of model instability, a shorter length of historical datacould more timely capture the dynamics of return generating process. Table 2.15and Table 2.16 report the unconditional evaluation results of the parametric policyunder 5 years and 15 years of historical data. With shorter length of historical in-formation, the book-to-market driven strategy as well as the two characters-basedones cease to outperform the benchmark. Yet, the asset growth rate, when standard-ized within industry, still creates significant utility benefit. When given 15 years ofhistorical data, the performance of industry-standardized book-to-market ratio aswell as book-to-market plus asset-growth based strategies come back. Yet, book-to-market combined with gross- profitability does not create significant utility gainwhen given longer information set.The last test checks the impact of different risk aversion on our baseline un-conditional evaluation results. Two alternative relative risk aversion levels γ = 10and γ = 15 are entertained. The main finding shown in Table 2.17 and Table 2.18is that as risk aversion level increases, the CER estimates decrease. Yet, the rela-tive performance against the benchmark and its significance level are qualitativelysimilar across different γ .91Table 2.14: Alternative Investable Universe: Unconditional Evaluation ofMultivariate CharactersNotes: This table reports the unconditional evaluation results of parametric policy with alternative investableuniverse given multivariate standardized predicting characteristic. Part (A) and (B) document, respectively, onthe case where the investable universe excludes the bottom 20% or 40% of the smallest stocks in terms of marketcapitalization. The rows labeled ”CER” denote the point estimates of the certainty equivalent gross return of therelevant parametric strategy. And the rows labeled ”p-value” reports the p-values of an unconditional test on thenull that value weighted benchmark strategy achieves the same expected utility as the relevant parametric ruleunder 10 years of limited data. Each column corresponds, respectively, to the combination of firm characteristicsas (a) Book to Market + Gross Profit; (b) Book to Market + Asset Growth; (c) Size + Book to Market + Mo-mentum; (d) Size + Gross Profit + Asset Growth; (e) All five factors. Panel (1) is on the case where character isstandardized over the whole investable universe while panel (2) is on the case where standardization is over thepeer firms within the same industry. Data ranges from Jan 1984 to Dec 2013 and risk version is set at 5.(A) Excluding 20% smallest stock: benchmark CER 1.0115B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0125 0.9616 0.4321 0.9887 0.4320p-val (0.7934) (0.4073) (0.3053) (0.4807) (0.3050)(2) Industry Standardized CharacterCER 1.0114 1.0172 0.2812 1.0063 0.4320p-val (0.9956) (0.5425) (0.3048) (0.7681) (0.3049)(B) Excluding 40% smallest stock: benchmark CER 1.0115B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0072 1.0126 0.4318 0.9929 0.3863p-val (0.1080) (0.7607) (0.3041) (0.1349) (0.1507)(2) Industry Standardized CharacterCER 1.0115 1.0140 0.4320 1.0229 0.0724p-val (0.9988) (0.5975) (0.3049) (0.0082) (0.3055)92Table 2.15: Different Size of Information Set: Unconditional Evaluation ofUnivariate CharacterNotes: This table reports the unconditional evaluation results of parametric policy with alternative size of infor-mation set given univariate standardized predicting characteristic. Part (A) and (B) document, respectively, onthe case where investor’s information set contains 5 years or 15 years of historical data. The rows labeled ”CER”denote the point estimates of the certainty equivalent gross return of the relevant parametric strategy. And therows labeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmarkstrategy achieves the same expected utility as the relevant parametric rule under the 5 years or 15 years of limiteddata. Each column corresponds, respectively, to the firm characteristic of (a) Size; (b) Book to Market ratio; (c)Momentum; (d) Gross Profit over Total Asset; (e) Asset Growth rate. Panel (1) is on the case where character isstandardized over the whole investable universe while panel (2) is on the case where standardization is over thepeer firms within the same industry. Data ranges from Jan 1984 to Dec 2013 and risk version is set at 5.(A) 5 years of historical data: benchmark CER 1.0115Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0092 1.0156 1.0098 1.0108 1.0163p-val (0.3936) (0.5752) (0.6601) (0.8808) (0.5772)(2) Industry Standardized CharacterCER 1.0096 1.0155 1.0089 1.0028 1.0264p-val (0.4892) (0.6849) (0.5521) (0.4799) (0.0040)(B) 15 years of historical data: benchmark CER 1.0115Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0129 1.0204 1.0072 1.0129 1.0217p-val (0.4045) (0.1306) (0.5528) (0.5786) (0.1548)(2) Industry Standardized CharacterCER 1.0133 1.0236 1.0055 1.0089 1.0282p-val (0.3183) (0.0637) (0.4995) (0.8018) (0.0054)93Table 2.16: Different Size of Information Set: Unconditional Evaluation ofMultivariate CharactersNotes: This table reports the unconditional evaluation results of parametric policy with alternative size of infor-mation set given multivariate standardized predicting characteristic. Part (A) and (B) document, respectively, onthe case where investor’s information set contains 5 years or 15 years of historical data. The rows labeled ”CER”denote the point estimates of the certainty equivalent gross return of the relevant parametric strategy. And therows labeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmarkstrategy achieves the same expected utility as the relevant parametric rule under the 5 years or 15 years of limiteddata. Each column corresponds, respectively, to the combination of firm characteristics as (a) Book to Market+ Gross Profit; (b) Book to Market + Asset Growth; (c) Size + Book to Market + Momentum; (d) Size + GrossProfit + Asset Growth; (e) All five factors. Panel (1) is on the case where character is standardized over the wholeinvestable universe while panel (2) is on the case where standardization is over the peer firms within the sameindustry. Data ranges from Jan 1984 to Dec 2013 and risk version is set at 5.(A) 5 years of historical data: benchmark CER 1.0115B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0121 1.0034 0.4291 0.9340 0.2593p-val (0.9602) (0.5214) (0.2909) (0.3100) (0.2360)(2) Industry Standardized CharacterCER 1.0110 1.0152 0.4254 0.8748 0.4028p-val (0.9752) (0.7365) (0.2736) (0.1810) (0.1765)(B) 15 years of historical data: benchmark CER 1.0115B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0215 1.0183 0.4322 0.9417 0.4323p-val (0.2917) (0.4373) (0.3059) (0.3593) (0.3063)(2) Industry Standardized CharacterCER 1.0274 1.0268 0.4322 0.9671 0.4323p-val (0.1187) (0.0113) (0.3060) (0.4919) (0.3063)94Table 2.17: Different Risk Aversion: Unconditional Evaluation of UnivariateCharacterNotes: This table reports, for investors with risk aversion levels 10 and 15, the unconditional evaluation resultsof parametric policy with alternative univariate standardized predicting characteristic. The rows labeled ”CER”denote the point estimates of the certainty equivalent gross return of the relevant parametric strategy. And therows labeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmarkstrategy achieves the same expected utility as the relevant parametric rule under 10 years of limited data. Eachcolumn corresponds, respectively, to the firm characteristic of (a) Size; (b) Book to Market ratio; (c) Momentum;(d) Gross Profit over Total Asset; (e) Asset Growth rate. Panel (1) is on the case where character is standardizedover the whole investable universe while panel (2) is on the case where standardization is over the peer firmswithin the same industry. Data ranges from Jan 1984 to Dec 2013 and risk version is set at 10 for Part (A) and at15 for Part (B).(A) Risk Aversion = 10: benchmark CER 1.0054Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0055 1.0154 1.0055 1.0055 1.0081p-val (0.9103) (0.0099) (0.9254) (0.9016) (0.7044)(2) Industry Standardized CharacterCER 1.0068 1.0169 1.0058 1.0073 1.0163p-val (0.2524) (0.0119) (0.7201) (0.7087) (0.0051)(B) Risk Aversion = 15: benchmark CER 0.9973Size Book to market Momentum Gross Profit Asset Growth(a) (b) (c) (d) (e)(1) Standardized CharacterCER 0.9969 1.0086 0.9973 0.9964 1.0016p-val (0.6557) (0.0292) (0.9639) (0.4708) (0.4955)(2) Industry Standardized CharacterCER 0.9990 1.0092 0.9979 0.9986 1.0084p-val (0.1271) (0.0447) (0.6053) (0.8075) (0.0294)95Table 2.18: Different Risk Aversion: Unconditional Evaluation of Multivari-ate CharactersNotes: This table reports, for investors with risk aversion levels 10 and 15, the unconditional evaluation resultsof parametric policy with alternative multivariate standardized predicting characteristic. The rows labeled ”CER”denote the point estimates of the certainty equivalent gross return of the relevant parametric strategy. And therows labeled ”p-value” reports the p-values of an unconditional test on the null that value weighted benchmarkstrategy achieves the same expected utility as the relevant parametric rule under 10 years of limited data. Eachcolumn corresponds, respectively, to the combination of firm characteristics as (a) Book to Market + Gross Profit;(b) Book to Market + Asset Growth; (c) Size + Book to Market + Momentum; (d) Size + Gross Profit + AssetGrowth; (e) All five factors. Panel (1) is on the case where character is standardized over the whole investableuniverse while panel (2) is on the case where standardization is over the peer firms within the same industry. Dataranges from Jan 1984 to Dec 2013 and risk version is set at 10 for Part (A) and at 15 for Part (B).(A) Risk Aversion = 10: benchmark CER 1.0054B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0174 1.0102 0.5466 0.9945 0.1923p-val (0.0067) (0.5451) (0.3049) (0.5901) (0.3047)(2) Industry Standardized CharacterCER 1.0224 1.0182 0.8483 1.0093 0.1923p-val (0.0010) (0.0057) (0.3224) (0.8166) (0.3047)(B) Risk Aversion = 15: benchmark CER 0.9973B/M+GP B/M+AG Size+B/M+Mom Size+GP+AG 5-factors(a) (b) (c) (d) (e)(1) Standardized CharacterCER 1.0093 1.0041 0.7525 0.9942 0.3410p-val (0.0243) (0.3820) (0.3064) (0.8021) (0.3047)(2) Industry Standardized CharacterCER 1.0146 1.0103 0.9267 1.0108 0.2472p-val (0.0049) (0.0306) (0.3548) (0.1935) (0.3047)962.4 Concluding RemarksIn this chapter, I adopt a hypothesis testing approach to assess the portfolio valueof cross-sectional stock return predicting firm characters. I emphasize the practicalusefulness of these characters under limited historical data. I fix a parametric port-folio policy that exploits cross-sectional predictability, and I consider a variety ofpredicting characters ranging from size, book-to-market, momentum, gross prof-itability to asset growth rate. I evaluate the performance of these characters-basedstrategies and contrast it with that of the market capitalization weighted benchmark.I conduct both unconditional and conditional analysis. The estimation of investorwelfare relies on an out-of-sample portfolio construction exercise and the inferenceprocedure builds upon the forecast evaluation literature in a structural way.Empirically, with monthly US stock return data, I find that book-to-marketratio or asset growth rate characters can each create a significant utility benefit rel-ative to the benchmark. The good performance of these strategies are concentratedin economic bust, high aggregate dividend-to-price ratio (low valuation), and lowvolatility regimes. For other single character-based strategies, I document infre-quent but strong negative portfolio returns and no significant utility improvement.When combining different characters to form a portfolio, I find that multiple char-acters tend to carry more instability loss. Yet, conditional on economic bust, lowmarket valuation, and low volatility regimes, they still outperform the cap weightedbenchmark. The baseline results are robust to varying risk aversion levels. Yet,when investors face short sale constraints, or have to trade large stocks only, thestrategy performance would drop below that of the benchmark.97Chapter 3Bayesian Estimation ofCox-Ingersoll-Ross Interest RateModel with Particle-Filter basedSimulated-Likelihood3.1 IntroductionThis paper proposes a new methodology in the likelihood-based estimation of theCox- Ingersoll-Ross interest rate model. Cox-Ingersoll-Ross(hereafter CIR), isa Markovian term-structure model on the dynamics of yield curve. It capturesmain empirical features of the short rate, such as mean reversion, conditional het-eroscedasticity and non-negativity, and it offers closed form solutions to the bondprices of other maturities that exclude arbitrage opportunities. With these merits,the CIR model has served as a building block in the development of various subse-quent continuous time affine interest rate models.1Parameter estimation of the CIR model could utilize either cross-sectional prop-1The original CIR has been extended or modified to include a joint system of yield curve andmacroeconomic factors as state variables, along with a more involved statistical model, such as jumpdiffusion, on the variables’ joint dynamics. See Duffie and Kan (1996) and the survey article ofPiazzesi (2010).98erty of the data, the time series property, or both. This paper takes into consider-ation the entire panel data in order to exploit information from the whole yieldcurves.2 As the CIR model is non-linear, maximum likelihood seems to be an idealapproach for parameter estimation. However, additional obstacles exists. In partic-ular, the associated likelihood function for the CIR model is hard to compute. Themain challenge is that, to get the likelihood for the observed yields, one needs tointegrate out the unobserved state variable. With a long history of data, this typi-cally turns out to be a very high dimensional integral. For a linear and Gaussianmodel, the standard Kalman filter provides a way to recursively compute the exactlikelihood, while for a non -linear and non-Gaussian model such as the CIR, thereis no closed form expression for that likelihood function.To overcome this econometric challenge, I carry out the likelihood based es-timation in a Bayesian setting. More specifically, I adopt a marginal MetropolisHasting algorithm with particle-filter based simulated-likelihood placed in each ofthe iterations. The benefit of this Bayesian approach is that it bypasses the needto compute the exact likelihood function, and its validity rests upon a recent devel-opment in Bayesian statistical theory (i.e., Andrieu et al. (2010), hereafter ADH).ADH proves that if one puts an unbiased simulation-based estimated likelihoodinside a Metropolis Hasting step, the likelihood estimation error will not be accu-mulated and the equilibrium posterior distribution of parameters will remain unaf-fected. In other words, one is still allowing exact parameters estimation althoughthe likelihood is only estimated.For the CIR model, I generate a simulated likelihood by particle-filter, a tech-nique to be reviewed briefly in Section 4. The choice of a particle-filter to generatea likelihood estimation for a non-linear and non-Gaussian model was initially pro-posed in Gordon et al. (1993). Andrieu et al. (2010) later suggested to combineit with standard Bayesian estimation and name their method Particle-MCMC. Thecontribution of this paper, besides illustrating how to make likelihood estimationsfeasible for the CIR model, is to further push the efficiency of both the particle-2Cross-sectional approach is usually performed on no-arbitrage models for the purpose ofpricing derivative assets. However, risk premium parameter is subsumed in the drift term andtypically unidentified. The univariate time series approach often fits the short-term rate; seeGourieroux and Monfort (2007) for a nice review on existing estimation methods.99filter and the marginal Metropolis- Hasting step. In particular, I document thatwith a small measurement error on bond yields,3 the CIR model shows a peakylikelihood and the standard particle filter does not perform well. I therefore usean approximated conditional optimal importance distribution, rather than the tran-sition density as in a standard particle filter, to reduce the variance of importanceweights on different particles. Particle-filter based simulated likelihood is thenembedded in the marginal Metropolis-Hasting algorithm, which provides Markovchain Monte Carlo draws from the posterior distribution of parameters and hiddenstates given yields data. In sampling the parameters, I improve the mixing prop-erty of the Markov chains by using an adaptive MCMC method that allows theproposal distribution to learn the covariance structure of the true posterior or targetdistribution adaptively.The Bayesian estimation with the simulated likelihood approach consideredhere complements existing solutions to term structure model estimation. Amongthem, a common approach is to use Quasi-Maximum Likelihood Estimation (e.g.,De Jong and Santa-Clara (1999), hereafter QMLE), which approximate the likeli-hood function through local linearization. The resulting Quasi-Maximum Like-lihood Estimator is, however, inconsistent (unless observation frequency goes tozero) and inefficient (since the wrong density is used), and the asymptotic distri-bution is unknown.4 A second popular approach is to pick several moments tomatch and to adopt a Method of Moments estimator. The relevant moment condi-tions can be derived, for example, from the infinitesimal generator of the diffusion(Hansen and Scheinkman (1995)). Unlike QMLE, the Method of Moments estima-tor is consistent, but usually sensitive to the chosen moments and therefore lessefficient than Maximum Likelihood Estimation. A more compelling solution isto follow Gallant and Tauchen (1996) with the Efficient Method of Moments (here-after EMM). EMM has the same asymptotic efficiency as Maximum Likelihood Es-timation if the score generator (and therefore the moments to match) is chosen op-timally. However, Duffee and Stanton (2008) report Monte Carlo evidence that the3With panel data, as we generally have more instruments (bonds/yields) than state variables, aperfect fit is not achievable. Therefore, it is typically assumed that bond prices or yields are observedwith some measurement error, caused by non-synchronous trading, etc.4See Lund (1995) for a detailed discussion on Quasi-Maximum Likelihood Estimation of interestrate term structure models.100EMM may have poor finite sample properties with persistent data. Besides, EMMdoes not estimate the latent factor, which impedes the estimation of the time vary-ing risk premium, the prediction of future spot rates or derivative prices. A thirdexisting method is to simply maximize the simulated likelihood (e.g., De Rossi(2010)). However, Maximum Simulated Likelihood (hereafter MSL) is in generalquite fragile. Denote M as the number of i.i.d draws/simulations in computing theestimated likelihood, Flury and Shephard (2011) show that “based on a sample ofsize T, (even) for i.i.d. data we need that T/M → 0 for this MSL estimator to beconsistent and T/√M → 0 to have the same distribution as the maximum (true)likelihood estimator.” In a data rich environment, it is hard to tell whether M isalready large enough.The rest of the chapter is structured as follows. Section 2 illustrates the CIRmodel in a state space form. Section 3 sets up the framework for the Bayesianestimation on CIR using simulated likelihood. An adaptive marginal Metropolis-Hasting algorithm is adopted there. Section 4 briefly reviews particle-filter as atool to obtain an unbiased simulation-based likelihood estimator. Then, an approx-imated conditional optimal importance density in the context of CIR is derived toimprove efficiency. Section 5 tests the estimation strategy on a simulated data set.Performance is shown to be satisfactory. Section 6 concludes and points out severalpotential extensions.3.2 The CIR Model in a State Space Form3.2.1 The CIR modelCox et al. (1985) provide a production based equilibrium asset pricing model forthe term structure of interest rate. The model links bond yields to one or morelatent factors. In the single factor case, the latent variable has an interpretation ofthe instantaneous spot rate. The time series evolution of this factor is derived as asquare root process5:drt = k(m− rt)dt +σ√rtdBt , (3.1)5Most of the notations in this section are taken directly from CIR’s original article.101where Bt is a Brownian motion, m is the long run mean of the short rate process,k is the speed of reversion to the long run mean and σ is the volatility. When theFeller condition, i.e., (2km > σ 2), holds, the square root process is stationary andguaranteed to be strictly positive.6Bond prices depend on the current value of the latent state as well as the riskpremium parameter λ 7. In particular, at time t, a bond with time to maturity τ hasa yield of:yt(τ) =−A(τ)+B(τ)rt, (3.2)whereA(τ) = 2kmτσ 2log{ 2γexp(τ(k+λ + γ)/2)(k+λ + γ)(eγτ −1)+2γ }, (3.3)B(τ) = 1τ2(eγτ −1)(k+λ + γ)(eγτ −1)+2γ , (3.4)andγ =√(k+λ )2 +2σ 2. (3.5)3.2.2 State space representation and identificationSince yields data are only seen at equally spaced discrete time, it would be moreconvenient to express CIR in a state space model. A transition equation describesthe probability density of future value of the latent state given its current value. Ameasurement equation connects observed yields with current period value of thestate variable.To be concrete, denote d as the time interval between two observations of yields(i.e., for weekly data, d = 1/52). {rt}Tt=1 would then correspond to state variablesat the sequence of time t0, t0 + d, t0 + 2d, . . . , t0 +T d. Similarly {yt(τ)}Tt=1 wouldbe the yields of maturity τ observed at that same equally spaced discrete time.Given that rt in continuous time follows a square root process, it can be shownthat (e.g., by Fourier transform analysis as in Gourieroux and Monfort (2007)),6Actually, even if that condition is not satisfied, r = 0 is still not an absorbing state.7λ measures market price of the Brownian risk. It depends on both consumers’ aggregate riskaversion and state of the production opportunity set.102state variable {rt}Tt=1 has an initial distribution:µ(r1) = Gamma(r1;2kmσ 2, σ22k ), (3.6)and a transition dynamicf (rt |rt−1) = 2c∗ noncentralχ2(2crt ; 4kmσ 2 ,2ce−kdrt−1), (3.7)wherec = 2k(1− e−kd)σ 2 . (3.8)Notice that, the transition density here is itself a function of the previous step statevariable.Yields in CIR model are linear functions of the state variable, as described inequation (3.2). However, as mentioned in the introduction part, there is an issue ofstochastic singularity. When the number of bonds with different maturities exceedsthe number of state variables, CIR model is over identified. For any parametersvalues, the model would inevitably be rejected with observed data. The commonsolution is to introduce measurement error between the observed yields and thetheoretical one derived in the model. Compressed in a vector-matrix form, andassume that we observe L number of bonds, the measurement equation is expressedas: yt(τ1)...yt(τL)=B(τ1)...B(τL)rt −A(τ1)...A(τL)+ε1,t...εL,t (3.9),where εt is the error term on the observed rates.The idea of attributing the unexplained part of the yields into measurement er-ror is consistent with the fact that, due to on the run and off the run spread, or nonsynchronous trading, bond yields are rarely measured precisely. Gu¨rkaynak et al.(2007), among others, provide some empirical evidence in US data which supportsthis argument. In this paper, more restrictions are imposed on the structure of mea-surement error vector. I adopt a typical assumption in the literature, which statesthat the measurement error vector follows a mutually independent i.i.d Gaussian103distribution. However, the method to be described in latter sections is valid formore general assumption such as correlated error terms.Let yt = {yt(τ1), · · · ,yt(τL)} and assume covariance matrix of ε1,t , . . . ,εL,t tobe h∗ IL, where IL is an L dimensional identity matrix. Measurement density cantherefore be derived as:g(yt |rt) =L∏j=1N(yt(τ j);−A(τ j)+B(τ j)rt ,h).=L∏j=11√2pihexp[− [yt(τ j)+A(τ j)−B(τ j)rt ]22h ] (3.10)Equations (3.6), (3.7) and (3.10) constitute our state space model.3.2.3 The econometric challengeIt is now easier to see why maximum likelihood estimation is econometrically chal-lenging. For fixed parameters values, let p(y1:T ) be the likelihood of the observedyields and p(y1:T ,r1:T ) be the joint density of state variable and yields, thenp(y1:T ) =∫p(y1:T ,r1:T )dr1:T (3.11)with the following high dimensional joint density:p(y1:T ,r1:T ) = p(r1:T )∗ p(y1:T |r1:T )= µ(r1)T∏2f (rt |rt−1)T∏1g(yt |rt), (3.12)where µ(r1) is Gamma distributed, f (rt |rt−1) follows non central chi squared dis-tribution, and g(yt |rt) itself a product of normal densities. For this system, a Tdimensional integral is very hard to compute.3.3 Bayesian Estimation with Simulated LikelihoodTo overcome the difficulty in computing exact likelihood, I perform Bayesian esti-mation on the CIR model. Yields data are treated as fixed after observed and param-eters are viewed as random which can be formed belief on. With the above para-104metric state space model and a prior on parameters, Bayesian approach then uses ageneric algorithm to produce Markov chain Monte Carlo (MCMC) samples fromthe posterior distribution of the parameters given the yields data. Moments of thesamples are then used to approximate moments of the posterior distribution. Ran-dom draws automatically concentrate on the area of high probability mass, whilethe burden of multi dimensional optimization in the parameters space is avoided.The method is conceptually simple and highly adaptive to a wide range ofcomplicated systems. Besides, as we will see, the fact that, we can only come upwith some estimator of the likelihood rather then the true value, does not underminethe power of Bayesian method. The Bayesian approach still efficiently combinesthe information contained in the estimated likelihood with the Monte Carlo draws.To get immediately into the heart of the estimation procedure, this section fo-cuses on the Bayesian estimation part. The discussion on how to generate estimatedlikelihood with particle-filter is postponed to the next section.3.3.1 The marginal Metropolis-Hasting algorithmBayesian estimation using MCMC algorithms is now well established in eco-nomics. I refer to Chib (2001), a Handbook of Econometrics chapter, for a de-tailed treatment on MCMC method and references therein. This paper adopts theMetropolis-Hasting algorithm, a particular class of MCMC, to generate randomsamples from the target distribution.Denote θ = {k,m,σ ,λ ,h} a collection of all the parameters. For CIR model,the target we are interested in is the posterior of both parameters and latent statevariables given the yields data,8p(θ ,r1:T |y1:T ) = p(θ |y1:T )p(r1:T |y1:T ,θ). (3.13)To simplify notation, let z= {θ ,r1:T }, and the target can therefore be expressedas p(z|y1:T ). The Metropolis-Hasting sampler generates a Markov chain {z(i)}Ni=1according to the following mechanism:8Posterior on parameters alone given yields would simply be the marginal distribution of thistarget, and marginalization with Monte Carlo samples is straightforward by just keeping the relevantelements. While, this setup is more general which allows estimation on latent state variables as well.105Given z(i−1), propose a candidate z∗ ∼ q(z∗|z(i−1)) where q is some arbitraryproposal distribution. With probabilityα(z(i−1),z∗) = min(1, p(z∗|y1:T )q(z(i−1)|z∗)p(z(i−1)|y1:T )q(z∗|z(i−1))), (3.14)set z(i) = z∗. Otherwise, set z(i) = z(i−1).It can be easily shown that,p(z′|y1:T ) =∫p(z|y1:T )K(z′|z)dz,for any z′ where K(z′|z) is the transition kernel of this particular Metropolis-Hastingalgorithm. In other words, the target density coincide with the invariant/equlibriumdistribution with respect to the constructed transition kernel.Under the assumptions of irreducibility and aperiodicity, the generated Markovchain asymptotically approximates the equilibrium distribution, and therefore thetrue target, i.e. {z(i)}Ni=1 ∼ p(z|y1:T ), as N → ∞.In choosing the proposal distribution that generates new draws, it is practicallyrecommended to pick a q(z∗|z) that is easy to sample while still reasonably closeto the target. Given equation (3.13), I useq(z∗|z) = q((θ∗,r∗1:T )|(θ ,r1:T ))= q(θ∗|θ)p(r∗1:T |y1:T ,θ∗). (3.15)In words, I propose to sample the parameters given previous draws of parameterθ first, and then condition on the new draw θ∗, I sample from the posterior ofthe latent variables r∗1:T . Therefore, for fixed θ , the second item in the proposaldistribution (3.15), (of dimension T ), mimics the target up to a constant. In theBayesian language, this is an efficient sampling procedure with high dimensionalblock updates, which keeps the dependence structure among the latent states. Thefirst item q(θ∗|θ) (of 5 dimensions) is taken to be a mixture of Gaussian. More de-tails on the choice of variance covariance matrix will be discussed in the subsectionof Adaptive MCMC.106Now, acceptance rate could be simplified toα(z,z∗) = min(1, p(z∗|y1:T )q(z|z∗)p(z|y1:T )q(z∗|z))= min(1, p(θ∗,r∗1:T |y1:T )q((r1:T ,θ )|(r∗1:T ,θ∗))p(θ ,r1:T |y1:T )q((r∗1:T ,θ∗)|(r1:T ,θ )))= min(1, p(y1:T |θ∗)p(θ∗)q(θ |θ∗)p(y1:T |θ)p(θ)q(θ∗ |θ)), (3.16)where p(θ) is the prior on parameters, and p(y1:T |θ) is the likelihood on the ob-served yields for given parameters. The first two equalities follows by definition,and the third one comes from Bayes rule after plugging in equation (3.15).To summarize, the marginal Metropolis Hasting algorithm runs as follows:1. At iteration i, given {θ (i−1),r(i−1)1:T }, propose the new values of parame-ters θ∗ from q(θ |θ (i−1)), then conditional on θ∗, draw r∗1:T according top(r1:T |y1:T ,θ∗).2. Compute acceptance rate.α = min(1, p(y1:T |θ∗)p(θ∗)q(θ (i−1)|θ∗)p(y1:T |θ (i−1))p(θ (i−1))q(θ∗|θ (i−1))).3. Draw a uniformly distributed sample u ∼U [0,1], if u < α , set θ (i) = θ∗ andr(i)1:T = r∗1:T , otherwise set θ (i) = θ (i−1) and r(i)1:T = r(i−1)1:T .4. Go to iteration i+1.The phrase marginal is used since the latent variables r1:T do not appear in theacceptance probability anymore. There are several implications. First, it saves alot of computation in the evaluation of acceptance rate in each iteration of MCMC.Second, if we are only interested in estimating the parameters, the above algorithmcould be further simplified. We do not need to sample from the posterior of latentvariables, given yields data and the parameter values, if we are not worrying aboutfiltering the latent state.In that case, the marginal Metropolis Hasting alogrithm could be reduced to:1071. At iteration i, given θ (i−1), propose the new values of parameters θ∗ fromq(θ |θ (i−1)).2. Compute acceptance rate.α = min(1, p(y1:T |θ∗)p(θ∗)q(θ (i−1)|θ∗)p(y1:T |θ (i−1))p(θ (i−1))q(θ∗|θ (i−1))).3. Draw a uniformly distributed sample u ∼ U [0,1], if u < α , set θ (i) = θ∗,otherwise set θ (i) = θ (i−1).4. Go to iteration i+1.In lots of cases, however, having an estimator of the latent variables is still desirableespecially for the purpose of prediction.3.3.2 Metropolis-Hasting with simulated likelihoodNotice that, the marginal Metropolis Hasting algorithm in the above subsection isnot directly implementable. In evaluating the acceptance probability, exact like-lihood on yields p(y1:T |θ) can not be computed for CIR. Plus, there is no wayto generate samples from p(r1:T |y1:T ,θ), the posterior distribution of latent statesgiven yields and parameters.The best we can do is to have some simulation based likelihood estimationand an approximation of the posterior distribution on latent states, which we cansample from. A natural idea is to just replace p(y1:T |θ) and p(r1:T |y1:T ,θ) withtheir estimators pˆ(y1:T |θ) and pˆ(r1:T |y1:T ,θ). A recent paper in statistical theoryby Andrieu et al. (2010) has shown a very desirable result that validate this idea.They proved that as long as the estimator of the likelihood and the latent statesare unbiased and simulated in each iteration of an MCMC step, the estimationerror would not accumulate. The equilibrium distribution of the MCMC procedureremains unaffected.This method (MCMC with simulated likelihood) is powerful since it impliesthat, we can perform likelihood estimation as long as we can simulate from themodel. It contrasts with the Maximum-Simulated-Likelihood estimator since pre-108cision of likelihood estimation is now sidestepped.9 It has been introduced to eco-nomics by Flury and Shephard (2011), who illustrated this method on several ex-amples in both micro and macro economics. Flury and Shephard (2011) used asimpler framework where no latent states are estimated, and they provided a veryintuitive proof of the validity by imaging the simulation itself to be an auxiliaryvariable and looking at the MCMC procedure in an enlarged space. Other applica-tions in economics are rare so far, with the exception of Andreasen and Meldrum(2011), who used this method to estimate a discrete time quadratic term- structuremodel.To generate simulation-based likelihood estimation and samples of the latentvariables, this paper uses particle-filter, which will be discussed in detail in sec-tion four. This choice is also suggested by Andrieu et al. (2010), who named theirmethod Particle MCMC. Given parameters values, particle-filter provides simula-tion based estimator of the likelihood on yields, pˆ(y1:T |θ), and an approximationof the posterior distribution of state variables pˆ(r1:T |y1:T ,θ), where samples of un-observed states r1:T can then be drawn from.Relying on the above argument, the Metropolis-Hasting algorithm withsimulation-based likelihood estimation is detailed as:1. At iteration i, given {θ (i−1),r(i−1)1:T }, propose the new values of parametersθ∗ from q(θ |θ (i−1)).2. Conditional on θ∗, run a Particle Filter to obtain pˆ(r1:T |y1:T ,θ∗) andpˆ(y1:T |θ∗).3. Sample r∗1:T ∼ pˆ(r1:T |y1:T ,θ∗).4. Compute acceptance rate.α = min(1, pˆ(y1:T |θ∗)p(θ∗)q(θ (i−1)|θ∗)pˆ(y1:T |θ (i−1))p(θ (i−1))q(θ∗|θ (i−1)))9In theory, unbiasedness is enough, however, precision of estimation will affect the acceptancerate of the Metropolis Hasting algorithm. But still, it is much less demanding than MSL, whichrequires the accuracy of estimated likelihood to increase at the order of sample size squared. Asshown in Flury and Shephard (2011), for many economic models, we do not need extremely accuratelikelihood estimation to do inference on parameters.1095. Draw a uniformly distributed sample u ∼U [0,1], if u < α , set θ (i) = θ∗ andr(i)1:T = r∗1:T , otherwise set θ (i) = θ (i−1) and r(i)1:T = r(i−1)1:T .6. Go to iteration i+1.This algorithm admits exactly p(θ ,r1:T |y1:T ) as invariant distribution.3.3.3 Adaptive MCMCIn proposing new values for the parameters (step 1 of the algorithm), it turns outthat choosing a better q(θ |θ (i−1)) will improve the acceptance rate of the proposedcandidates and therefore the mixing property of the Markov chain generated. HereI use an adaptive MCMC method, which allows the proposal distribution to adap-tively learn the covariance structure of the target (posterior distribution on parame-ters).The adaptive algorithm uses a time inhomogeneous proposal qi(θ |θ (i−1)),which depends on the empirical variance covariance matrix of the Markov chainat each iteration. However, given time varying transition kernel, the Markov chainis not necessarily ergodic and the invariant distribution might not even exist. Toaddress this concern, this paper uses a version of adaptive MCMC proposed byRosenthal (2011), which has been proved to satisfy all the theoretical conditionsthat ensure the validity of adaptive algorithm.The proposal to sample the parameters, qi(θ |θ (i−1)), is constructed as follows.During a burn in period, 1 ≤ i ≤ K, qi(θ |θ (i−1)) is set to be N(θ ;θ (i−1),Σ0), whereΣ0 is some initial variance covariance matrix for the Gaussian proposal density tosample from. Elements of Σ0 typically have very small values. For the period i>K,the proposal comprises of a mixture of Gaussian.qi(θ |θ (i−1)) = wN(θ ;θ (i−1),Σi)+ (1−w)N(θ ;θ (i−1),Σ0), (3.17)where Σi is the empirical variance covariance matrix of the Markov chain up toiteration i. w, set at 0.95 here, is the probability to draw sample from the firstGaussian density, which is updated at each iteration. It is less than one because inthe beginning, the empirical covariance is not necessarily a good estimate of thetrue covariance structure yet. The acceptance rate would therefore be low during110that period and we might still want to draw from the second Gaussian, using initialcovariance matrix Σ0. Besides, as stated in Rosenthal (2011), if the dimension ofparameters (or target distribution), denoted by d, is relatively large, it is optimalto use (2.382)d as a multiplier before the empirical covariance matrix. In this paper,however, since d = 5, it does not make too much difference whether we add thismultiplier or not.3.4 Particle-Filter based Likelihood EstimationIn this section, I discuss how to use Particle-filter to generate an estimate of thelikelihood pˆθ (y1:T ), and how to sample from posterior distribution on latent statespθ (r1:T |y1:T ).10 As mentioned above, it is a key element to make our adaptivemarginal Metropolis Hasting algorithm feasible.Particle-filter is a simulation based device that delivers filtered estimates of thelatent states sequentially in general non linear non Gaussian state space models11.It also produces an estimate of the one step ahead predictive density pˆθ (yt |yt−1) andthe estimated likelihood on observed data by predictive decomposition. Particle-filter is developed since the original work of Gordon et al. (1993). Theoreticalproperties are then studied intensely in Del Moral (2004). Doucet et al. (2001)is now the standard reference on particle-filter, where more advanced algorithmbeyond the context of state space models are included.The applications of particle-filter in economics literature starts from Kim et al.(1998), who used this method to extract current volatility from a stochastic volatil-ity model. Later works include Ferna´ndez-Villaverde and Rubio-Ramı´rez (2007),who took particle-filter as the basis in estimating dynamic stochastic general equi-librium models. Gallant et al. (2008) also relied on particle filter to estimate adynamic oligopoly game with serially correlated unobserved production cost.10Within this section, I use pˆθ (y1:T )= pˆ(y1:T |θ ) and pθ (r1:T |y1:T )= p(r1:T |y1:T ,θ ) to emphasizethat parameters are always fixed during the filtering procedure.11Henceforth, it is widely viewed as a modern generalization of Kalman filter.1113.4.1 Particle filter for CIRIn this subsection, I briefly review the main idea and algorithm of standard particle-filter in the context of CIR model. I refer to Creal (2012) as a more detailed surveyarticle and tutorial on particle-filter for economists.Recall that, our purpose here is to sample from pθ (r1:T |y1:T ) and estimatepθ (y1:T ) for given parameter values under the state space model (3.6), (3.7),and (3.10). Particle-filter breaks the problem of sampling from T dimensionalpθ (r1:T |y1:T ) into a collection of simpler problems. It deals with pθ (r1|y1) first,then pθ (r1:2|y1:2), and so on. At each step t, pθ (r1:t |y1:t) is approximated by acloud of random sample paths termed particles, evolving according to the follow-ing Sequential Importance Sampling and Resampling scheme. Sampling frompθ (r1:t |y1:t) is therefore replaced by sampling from its particle approximationpˆθ (r1:t |y1:t), which is an easier task since the latter is a discrete (empirical) dis-tribution. Likelihood estimation turns out to be a simple by-product.Sequential importance samplingAssume that at step t − 1, the target pθ (r1:t−1|y1:t−1) is approximated by a”weighted” empirical distribution of M random sample paths (particles) {r(i)1:t−1}Mi=1,i.epˆθ (r1:t−1|y1:t−1) =M∑i=1W (i)t−1δr(i)1:t−1(r1:t−1), (3.18)where W (i)t−1, summing up to 1 over i, is the normalized importance weight on eachparticle and δr is the delta-Dirac measure located at r.At step t, I aim to approximate the target pθ (r1:t |y1:t). I move each particle onestep forward by sampling the current state r˜(i)t from the non central χ2 distributedtransition density f (rt |r(i)t−1) and then set r˜(i)1:t = (r(i)1:t−1, r˜(i)t ). Trajectories up to t −1are kept unchanged and newly simulated current states are appended to the end.The expanded sequences of paths (new particles) now approximate the one stepahead joint conditional density pθ (r1:t |y1:t−1), i.epˆθ (r1:t |y1:t−1) =M∑i=1W (i)t−1δr˜(i)1:t (r1:t). (3.19)112The target at time t, pθ (r1:t |y1:t), is connected to the one step ahead joint condi-tional density by the following relation:pθ (r1:t |y1:t) =gθ (yt |rt)pθ (r1:t |y1:t−1)∫gθ (yt |rt)pθ (r1:t |y1:t−1)drt. (3.20)By substituting pˆθ (r1:t |y1:t−1) to pθ (r1:t |y1:t−1), I obtain a ”re-weighted” discretedistribution approximating the current target,p˜θ (r1:t |y1:t) =M∑i=1W (i)t δr˜(i)1:t (r1:t), (3.21)W (i)t ∝ W(i)t−1w(i)t , (3.22)where w(i)t = gθ (yt |r˜(i)t ) is the incremental importance weight on particle i, and W (i)tis the normalized version of W (i)t−1w(i)t , i.e., W(i)t =W (i)t−1w(i)t∑Mi=1 W (i)t−1w(i)t.Resampling at random timesSequential importance sampling suffers from the problem that, as the number ofiterations increase, all the importance weight will concentrate on few particles.12Standard particle filter adds a resampling step after sequential importance samplingat each iteration to mitigate the weight degeneracy issue. Here, I use an algorithmthat resamples at random times to reduce the monte carlo errors generated by re-sampling.I use the measure Effective Sample Size (ESS) to detect weight degeneracy ateach iteration. ESS at iteration t is defined to be:ESSt =1∑Mi=1W (i)t2 . (3.23)If weights are equally distributed at each particle, then ESS = M, while at theextreme case when all weights concentrate on a single particle, (i.e., W ( j)t = 1 andW (i)t = 0, i 6= j), ESS=1.12Chopin (2002) showed that normalized sequential importance weight W (i)t on all but one parti-cles will converge to zero eventually.113I compute ESS after the sequential importance sampling procedure at eachiteration and perform an extra resampling step whenever ESS < 12M. If the condi-tion is triggered at iteration t, given the “weighted” approximation p˜θ (r1:t |y1:t) ofpθ (r1:t |y1:t) wherep˜θ (r1:t |y1:t) =M∑i=1W (i)t δr˜(i)1:t (r1:t), (3.24)I resample M timesr(i)1:t ∼ p˜θ (r1:t |y1:t) (3.25)to get new samples r(i)1:t approximately distributed according to pθ (r1:t |y1:t), i.e.,pˆθ (r1:t |y1:t) =1MM∑i=1δr(i)1:t(r1:t). (3.26)The resampling algorithm used here is the systematic resampling scheme intro-duced by Carpenter et al. (1999).13(Algorithm described in Appendix A.) Duringthe resampling step, new particles are created by replicating the original ones inproportion to their importance weights. Particles with high weights are copiedmultiple times, and particles with low weights die. After resampling, the normal-ized importance weight will be reset to 1/M for each particle, so that W (i)t = 1/Mfor all i at the sequential importance sampling part of the next (t +1th) iteration.Sampling latent states and likelihood estimationAt iteration T , to sample from pθ (r1:T |y1:T ), I simply draw from the weightedapproximation p˜θ (r1:T |y1:T ) by picking each particle with a probability equal to itscorresponding normalized importance weight.Likelihood estimation is also based on importance weights. It can be shownthat an estimate of the predictive density could be,pˆθ (yt |y1:t−1) =M∑i=1W (i)t−1w(i)t . (3.27)13Different resampling algorithms, besides systematic resampling, include multinomial resam-pling, stratified resampling and residual resampling. Douc and Cappe´ (2005) compared the efficiencyof these methods in terms of Monte Carlo variance generated.114Estimation of likelihood on all observed yields then follows from the predictivedecomposition:pˆθ (y1:T ) =T∏t=1pˆθ (yt |y1:t−1). (3.28)Based on the general theory of Del Moral (2004), pˆθ (y1:T ) will converge almostsurely (with respect to θ ) to the true likelihood as M → ∞. Unbiasedness is alsosatisfied, which is essentially what we need for the Bayesian approach here.3.4.2 Approximated conditional optimal importance distributionThe filtering algorithm described above has used the transition density f (rt |rt−1)as sampling distribution to draw new state variables rt at each iteration. In particle-filter literature, the sampling distribution is often called (incremental) importancedistribution, and different choices of importance distribution will lead to differentalgorithms. Using transition density (of state variables) as importance distributionwas initially suggested in Gordon et al. (1993), who named their algorithm boot-strap particle filter. It has the merit of being simple and performs well for many fil-tering problems. However, this choice of importance/sampling distribution ignoresthe information contained in the current observation yt and turns out to severelyundermine the estimation of latent states in CIR model.To see this more clearly, notice that, transition density f (rt |rt−1) is a functionof the last step state variables, rt−1, which is distributed according to the marginalposterior density pθ (rt−1|y1:t−1). New samples drawn from f (rt |r(i)t−1) then approx-imate the marginal predictive posterior density pθ (rt |y1:t−1). For CIR model, sincemeasurement errors are typically small (√h less than 10 basis points), measure-ment density, g(yt |rt) (product of normal distributions), is a very peaky function ofrt . Observation yt is really informative on rt , so marginal posterior distribution ofthe target at t, pθ (rt |y1:t), is quite different from pθ (rt |y1:t−1). If new samples arestill blindly drawn from the transition f (rt |rt−1), most draws will fall into the taildistribution of pθ (rt |y1:t). Those samples are then assigned with very small incre-mental importance weights, while weights on the very few samples closer to thehigh probability mass region of pθ (rt |y1:t) are unusually large. The incrementalimportance weights will have a high variance, so the approximation of marginal115posterior density will be poor. Besides, this problem is not resolved by the resam-pling step since resampling only prevents future weight degeneracy and does notcure current degeneracy caused by improper importance distribution.To pick an efficient importance distribution that incorporates the current ob-served yields yt , I rely on the idea of conditional optimal importance density intro-duced in Liu and Chen (1995). Their choice of the importance distribution, basedon both the current observed data and last step’s latent states, is proportional to theproduct of transition and measurement density, i.e.pθ (rt |yt ,rt−1) ∝ fθ (rt |rt−1)gθ (yt |rt). (3.29)This choice is shown to be ”conditional optimal” in the sense that it minimizes thevariance of incremental importance weight conditional on the current yields yt andprevious locations of the particles. However, given state space model (3.6), (3.7)and (3.10), the density fθ (rt |rt−1)gθ (yt |rt) can not be sampled from directly. Iapproximate the conditional optimal importance distribution with a normal densityand use this approximation as a sub optimal choice.Recall that, the transition density follows non central χ2 distribution as de-scribed in (3.7), I approximate the transition density with a normal distribution bymatching the first two moments:f (rt |rt−1)≈ N(rt ; 4km+2ce−kdσ 2rt−12cσ 2, 2km+2ce−kdσ 2rt−1c2σ 2), (3.30)where (4km+2ce−kdσ 2rt−1)/2cσ 2, denoted as et , and(2km+2ce−kdσ 2rt−1)/c2σ 2, denoted as ft , are the mean and variance ofthe non central χ2 distribution respectively.Then, given the measurement equation (3.10), conditional optimal importance116distribution is approximated (up to some constant) by:fθ (rt |rt−1)gθ (yt |rt) ≈ N(rt ;et , ft)∗L∏j=1N(yt(τ j);−A(τ j)+B(τ j)rt ,h)∝ exp(−(rt − et)22 ft )∗L∏j=1exp[− [yt(τ j)+A(τ j)−B(τ j)rt ]22h]∝ exp[−Ptr2t −Qtrt ] ∝ N(rt ;−Qt2Pt, 12Pt), (3.31)wherePt =12hL∑j=1B2(τ j)+12 ft , (3.32)andQt =−1hL∑j=1(yt(τ j)+A(τ j))B(τ j)−etft . (3.33)I choose the sub optimal importance distribution to be qθ (rt |yt ,rt−1) =N(rt ;− Qt2Pt ,12Pt ). The incremental importance weight becomes:wt =fθ (rt |rt−1)gθ (yt |rt)qθ (rt |yt ,rt−1). (3.34)The rest part of the filtering procedure is not affected, and a summarize of the whole(approximated conditional optimal) algorithm is provided in Appendix B.I will show in the next section that, with this choice of importance distribution,incremental importance weights are more balanced and the average ESS increasessignificantly over the one from standard bootstrap particle filter. Latent states sam-pling and likelihood estimation based on those balanced particles will also be morestable.3.5 Performance with Simulated DataThis section uses simulated data to test the performance of particle-filter and theassociated Bayesian estimation on the CIR model.I simulate T=1000 weekly data (therefore d=1/52) from the square root process,using parameters k = 0.1860,m = 0.0654,σ = 0.0481 and λ = −0.0741. Those117Figure 3.1: Simulated Yields based on Cox Ingersoll Ross Model0 100 200 300 400 500 600 700 800 900 10000.030.040.050.060.070.080.090.1Time tYield to Maturity 0.51510Notes: This graph plots T = 1000 weekly data of bond yields simulated from the CIR model. Parameter valuesare set at k = 0.1860,m = 0.0654,σ = 0.0481,λ =−0.0741,√h = 0.0005, and the legend (0.5,1,5,10) gives thematurities of the corresponding bonds.numbers come from the estimation result of De Jong and Santa-Clara (1999),which is viewed as one of the reasonable set of parameters values for single factorCIR model. Next, I pick four bond maturities, 0.5, 1, 5, 10 years respectively,14 andsimulate the measurement errors with standard deviation√h fixed at a small valueof 5 basis point.15 Yields observations of these four maturities are then formedbased on equation (3.9), and the whole panel is plotted in Figure 3.1.Given the simulated data, I start by looking at the latent states approximationand likelihood estimation generated by particle filter. Parameters are assumed to beknown at the true values at this moment, and the approximated conditional optimalfilter described above is implemented.To provide some intuition on how this filtering procedure works, Figure 3.2plots the evolvement of the generated particles in the first 50 periods together with14The choice of these four maturities only reflects a desire to better span the whole yield curve, sothat bond prices are more informative on the state variables.15With one exception when I show the evolvement of particles in Figure 3.2.118Figure 3.2: Particle Filtering on CIR Yields0 5 10 15 20 25 30 35 40 45 500.0540.0560.0580.060.0620.0640.0660.068 True value of states0 5 10 15 20 25 30 35 40 45 500.0540.0560.0580.060.0620.0640.0660.068 True value of states0 5 10 15 20 25 30 35 40 45 500.0540.0560.0580.060.0620.0640.0660.0680.07 True value of states0 5 10 15 20 25 30 35 40 45 500.0540.0560.0580.060.0620.0640.0660.068 True value of statesNotes: Pictured is the the true values of latent states rt over the first 50 periods (solid blue line) together with theevolvement of a particle system with N=20 particles after: (i) 10 time-periods; (ii) 20 time-periods; (iii) 30 timeperiods; and (iv) 50 time periods.the true values of latent states rt over the same range. For illustration purpose,M = 20 number of particles are used, and√h is set to 20 basis points to inflatethe measurement errors.16 Though only distinct sample paths are shown in thegraph, (duplicates overlap each other), it is clear that particles keep track of the la-tent states and progressively approximate the joint posterior density pθ (r1:50|y1:50).Nonetheless, moving from panel(i) to panel (iv), there are less and less distinctpaths over the early time periods. This degeneracy is due to repeated resamplingand is a widely known phenomenon for particle-filter.17 The remaining/duplicated16So that different sample paths are more distinguishable in the graph.17It is actually not that severe when I reset√h to 5 basis points. With conditional optimal filter,as I will show, the number of resampling steps is low and degeneracy is not very obvious with 1000119Figure 3.3: Empirical Filtering Distribution with Different Particle Size0.0436 0.0438 0.044 0.0442 0.0444 0.0446 0.044800.10.20.30.40.50.60.70.80.91F(x)Empirical CDF (iv) M=150.0436 0.0438 0.044 0.0442 0.0444 0.0446 0.0448 0.04500.10.20.30.40.50.60.70.80.91F(x)Empirical CDF (iv) M=300.043 0.0435 0.044 0.0445 0.045 0.045500.10.20.30.40.50.60.70.80.91F(x)Empirical CDF (iv) M=1000.043 0.0435 0.044 0.0445 0.045 0.045500.10.20.30.40.50.60.70.80.91F(x)Empirical CDF (iv) M=200Notes: Pictured is the empirical distribution functions created using the particles to approximate the marginalfiltering distribution, pθ (r1000|y1:1000), with : (i) 15 particles; (ii) 30 particles; (iii) 100 particles; and (iv) 200particles.sample paths over the beginning periods are still close to the true states, howeverthe empirical distribution is now a poor approximation of the joint posterior density.Nonetheless, for the Bayesian approach implemented below, this would not be aconcern since one only needs to draw a single sample from the approximated jointdensity at each Metropolis Hasting step.I then vary the size of the particles to see its effect on latent states filtering.18For graphical diagnose, I only plot the cumulative distribution function (hereafterc.d.f) of the particles at last period, i.e., r(i)t=1000 for i = 1 to M. The correspond-ing probability mass therefore approximates the marginal posterior/filtering den-periods.18√h is reset to 5 basis points and the entire time periods is considered from now on.120Table 3.1: Likelihood Estimation with Different Particle SizeM=15 M=30 M=100 M=200Mean of Estimated Log-Likelihood 22983 22983 22984 22984Std of Estimated Log-Likelihood 1.7585 1.2415 0.6636 0.4788Notes: The table reports the standard deviation of log-likelihood estimator based on 100 repetitions of the approx-imated conditional optimal particle filter as size of particles varies from 15, 30, 100 to 200.sity pθ (r1000|y1:1000). Figure 3.3 demonstrates that, as M increases from 15, 30,100 to 200, the empirical c.d.f (step functions) converges towards the c.d.f of truemarginal posterior density. Hence, a larger number of particles produce a betterapproximation. Given that panel (iii) is already quite indistinguishable from a con-tinuous distribution, M = 100 seems enough and is thus chosen for the rest of theexercise.Likelihood estimation is also more stable with larger particle size as shownin Table 3.1, where sample standard deviations are calculated based on 100 runsof particle filtering. For the Bayesian approach, however, M = 100 turns out ad-equate and a more precise likelihood estimator does not significantly improve theconvergence speed of Monte Carlo draws.Next, I compare the proposed approximated conditional optimal filter with thestandard bootstrap filter. For each of the two algorithms, I run 100 repetitions ofthe filter and compute the mean of average (cross time) effective sample size (ESS),defined as,AvESS = ∑100j=1 ∑Tt=1 ESS jt100∗T . (3.35)Table 3.2 shows that AvESS for bootstrap filter is around 18 percent, indicatingthat for every 100 particles only about 18 are used for approximation. In contrary,AvESS for the proposed sub optimal filter is as high as 72 percent. So, on average,a lot more particles fall into the high probability mass region of pθ (rt |yt). A higherAvESS also implies less number of re-sampling steps and a faster algorithm andless Monte Carlo errors associated with re-sampling. In addition, the last row ofTable 2 reports that, standard deviation of estimated log likelihood is much smallerin the case of sub optimal filter, which confirms the improved efficiency. Notice-ably, even the bootstrap filter equipped with 300 particles is less stable than the121Table 3.2: Comparing Bootstrap Filter with Approximated Conditional Opti-mal FilterStandard Bootstrap Filter Approximated Optimal FilterM=100 M=200 M=300 M=100 M=200 M=300AvESS 17.94 35.74 53.59 71.78 143.30 215.32Percentage AvESS 17.94 17.87 17.86 71.78 71.65 71.77Std of Estimated Log-Likeli 25.66 17.01 10.62 0.6636 0.4788 0.3828Notes: This table compares the bootstrap particle filter with the proposed approximated conditional optimal filterin terms of average effective sample size (AvESS) based on 100 runs of each filter. The standard deviation oflog-likelihood estimation is also reported and the same exercise is repeated as size of particles varies from 100,200 to 300.sub optimal one with only 100 particles. This is not puzzling given the huge dif-ference in AvESS. It also partially justifies why 100 particles (under the proposedsub-optimal filter) turns out adequate for the estimation exercise.After investigating the performance of particle filter, I switch to the test ofBayesian (simulated) likelihood estimation. True parameter values are assumedunknown now,19 and the adaptive marginal Metropolis Hasting algorithm is im-plemented to draw samples from the posterior distribution of both parameters andlatent states, p(θ ,r1:T |y1:T ).To better explore the correlation structure of different parameters, I re-scaleθ = (k,m,σ ,λ ) into Θ = (k/10,k ∗m,σ ,k+λ ).20 The corresponding true valuesthen become (0.0186, 0.0122, 0.0481, 0.1119). Meanwhile, prior belief on Θ ischosen as a flat/non-informative one with the only restriction: k/10 > 0, and theinitial guess of these new parameters is set to be (0.016, 0.014, 0.07, 0.1).21I then run a Markov chain of length N = 20,000 to see whether the empiricalposterior distributions of Θ and states are centered around the underlying true val-ues. For the latent states estimation part, although the entire sequence of states canbe estimated, I only record the random samples (hence the empirical posterior) of19However,√h is still fixed at 5 basis points and not estimated.20My preliminary analysis shows that random samples from the posterior distribution of dif-ferent parameters in θ co-move a lot and the Markov chains are quite persistent. Yet, this re-parameterization takes into account how original parameters enter the state space model and sig-nificantly improves the mixing property of the Monte Carlo draws.21Initial value is set quite arbitrarily since it does not affect the equilibrium distribution of theMetropolis-Hasting samples.122Table 3.3: Statistics on the MCMC draws of Parameters and StateParameters Median 5th percentile 95th percentile Truthk/10 0.0225 0.0157 0.0290 0.0186k ∗m 0.0121 0.0121 0.0122 0.0122σ 0.0477 0.0461 0.0495 0.0481k+λ 0.1115 0.1104 0.1126 0.1119State rT 0.0443 0.0438 0.0448 0.0440Notes: This table reports some statistics on the the second half of the 20,000 Monte carlo draws representing theposterior distribution of parameters k/10,k ∗m,σ ,k + λ and the last period state rT . Underlying true values ofthe each item are shown in the corresponding row of the last column.last period state rT .22Figure 3.4 depicts all the Markov chain Monte Carlo draws for k/10,k ∗m,σ ,k + λ , and rT respectively. Notice that initially, when deviations from thetrue values are jointly large, the Markov chains swing dramatically. After a burn-inperiod of roughly 2,500 iterations, however, all five series start to fluctuate locallyaround the right numbers.23 The convergence of these Markov chains is furtherillustrated in the third column of Figure 3.4 as cumulative average in each seriesgradually stabilize.To form the empirical posterior distributions and the relevant statistics withthese MCMC samples, I discard the first half of the iterations and use only the re-maining 10,000 draws (reduce impact of burn-in). Figure 3.5 shows the histogramof those draws. The medians of those draws, as well as the 5th and 95th percentilevalues, are documented in Table 3.3.Comparing the posterior medians with true values, and given those bell shapedempirical distributions, we can see that most of the parameters, i.e., k∗m, σ , k+λ ,as well as the last period state rT are very precisely estimated. One exception isthat, k/10 seems over-estimated and therefore the long run mean m under-estimatedgiven k ∗m precise. However, this is consistent with the typical finding in theliterature that, when underlying time series is close to a unit root process, as isthe case for this square root process with assumed parameters, level of persistency22So I do not need to store a 1 x T vector of sampled states at each of the 20,000 iterations.23For this exercise, I have tuned the proposal covariance matrix with initial guess directly set attrue value first. In practice, when true value is unknown, the convergence speed will be slower.123Figure 3.4: The Markov chain Monte Carlo Samples0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000000.020.040.060.080.10.12k/1010000 12000 14000 16000 18000 20000 10000 12000 14000 16000 180000.010.0150.020.0250.030.0350.04k/100 5000 10000 15000 20000 0 5000 10000 15000 20000 000.010.020.030.040.050.06k/10 Cumulative Average0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.01050.0110.01150.0120.01250.0130.01350.0140.0145k*m10000 12000 14000 16000 18000 20000 10000 12000 14000 16000 180000.01190.0120.0120.01210.01210.01220.01220.0123k*m0 5000 10000 15000 20000 0 5000 10000 15000 20000 00.01150.0120.01250.0130.01350.0140.0145k*m Cumulative Average0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.040.060.080.10.120.140.160.18σ10000 12000 14000 16000 18000 20000 10000 12000 14000 16000 180000.0430.0440.0450.0460.0470.0480.0490.050.0510.052σ0 5000 10000 15000 20000 0 5000 10000 15000 20000 00.040.060.080.10.120.140.16σ Cumulative Average0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.050.060.070.080.090.10.110.12k+λ10000 12000 14000 16000 18000 20000 10000 12000 14000 16000 180000.1090.110.1110.1120.1130.1140.115k+λ0 5000 10000 15000 20000 0 5000 10000 15000 20000 00.0650.070.0750.080.0850.090.0950.10.1050.11k+λ Cumulative Average0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.0390.040.0410.0420.0430.0440.0450.046r T10000 12000 14000 16000 18000 20000 10000 12000 14000 16000 180000.0430.04350.0440.04450.0450.04550.046r T0 5000 10000 15000 20000 0 5000 10000 15000 20000 00.04050.0410.04150.0420.04250.0430.04350.0440.0445r T Cumulative AverageNotes: Column 1 plots the first half of the MCMC samples from iteration 1 to 10,000. Column 2 presents thesecond half from iteration 10,001 to 20,000. Column 3 depicts the cumulative average of these samples. The fiverows correspond to the four parameters k/10,k ∗m,σ ,k +λ and the last period state rT respectively. Referencelines (red) are the underlying true values.124Figure 3.5: Empirical Posterior Distribution of Parameters and State0.01 0.015 0.02 0.025 0.03 0.035 0.04050100150200250300350k/100.0119 0.012 0.012 0.0121 0.0121 0.0122 0.0122 0.0123050100150200250300350400k*m0.043 0.044 0.045 0.046 0.047 0.048 0.049 0.05 0.051 0.052050100150200250300350400σ0.109 0.11 0.111 0.112 0.113 0.114 0.115050100150200250300350400k+λ0.043 0.0435 0.044 0.0445 0.045 0.0455 0.046050100150200250300350r TNotes: Pictured is the empirical posterior distribution of parameters k/10,k∗m,σ ,k+λ and the last period staterT based on the second half of the 20,000 Monte carlo draws.125tends to be under-measured.3.6 Concluding RemarksTo summarize, this chapter shows how to combine the MCMC method with a parti-cle filter to perform a Bayesian estimation on the single factor Cox-Ingersoll-Rossmodel. In this research, a marginal Metropolis-Hasting algorithm is adopted withsimulation-based estimated likelihood placed in each of the iterations. The bene-fit of this approach is that it avoids the burden of computing analytical likelihoodfunctions, while the validity (of replacing it with the estimated value) relies onthe fact that, as long as the likelihood estimator is unbiased and re-simulated eachtime, Monte Carlo errors do not accumulate and the equilibrium distribution ofrandom samples remains unaffected. Likelihood and latent-states estimation arecarried out by particle-filtering here. To further push the efficiency of a standardbootstrap particle filter, I design an approximated conditional optimal filter thataccounts for the informativeness of current yields and hence reduces the varianceof particle weights. Besides, in sampling the parameters, an adaptively learnedproposal distribution is incorporated to speed up the convergence of the generatedMarkov chain. This method is then implemented and tested on a simulated dataset. For typical parameter values, estimates are shown to be accurate with just 100particles.Although this chapter has focused on a particular term-structure model, thesuggested estimation procedure potentially applies to general state space models aswell. For example, a natural extension could be estimating the multi-factor versionof CIR model. However, one thing impeding those applications is that repeatedruns of a particle-filter take a lot of time.24 A possible solution may be to resort tothe work of Johansen (2009), who built a C++ template for general particle filteringalgorithm. He also showed that the template was much faster than a similar pro-gram written in Matlab. Therefore, it could be fruitful to tailor the C++ templateand explore some other state space models in economics that are traditionally hardto estimate. Besides, more and more theoretical models nowadays have incorpo-24Given the 20,000 iterations programmed in Matlab, the exercise is anticipated to take at least 8hours on a powerful PC.126rated jump diffusion processes to capture some stylized facts of different financialand economic variables. Unlike the CIR model, those processes do not typicallyattain a closed-form transition density and therefore, an Euler discretization of theprocess is usually performed before estimation. Whether these error terms due todiscretization would accumulate in the Bayesian approach is not known yet. Hence,in the future, it would be important to investigate the performance of estimation onthose general partially-observed jump-diffusion models and to understand the as-sociated theoretical properties.127BibliographyAı¨t-Sahalia, Yacine, and Brandt, Michael W. 2001. Variable selection forportfolio choice. The Journal of Finance, 56(4), 1297–1351. → pages 5, 57Andreasen, Martin M, and Meldrum, Andrew. 2011. Likelihood Inference inNon-Linear Term Structure Models: The Importance of the Zero Lower Bound.Available at SSRN 1738206. → pages 109Andrews, Donald WK. 1991. Heteroskedasticity and autocorrelation consistentcovariance matrix estimation. Econometrica: Journal of the EconometricSociety, 817–858. → pages 11, 63Andrieu, Christophe, Doucet, Arnaud, and Holenstein, Roman. 2010. Particlemarkov chain monte carlo methods. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 72(3), 269–342. → pages 99, 108, 109Asness, Clifford S, Porter, R Burt, and Stevens, Ross L. 2000. Predicting stockreturns using industry-relative firm characteristics. Available at SSRN 213872.→ pages 66Avramov, Doron. 2004. Stock return predictability and asset pricing models.Review of Financial Studies, 17(3), 699–738. → pages 5, 57Bai, Jushan, and Ng, Serena. 2002. Determining the number of factors inapproximate factor models. Econometrica, 70(1), 191–221. → pages 14Bansal, Ravi, Dittmar, Robert F, and Lundblad, Christian T. 2005. Consumption,dividends, and the cross section of equity returns. The Journal of Finance,60(4), 1639–1672. → pages 53Barberis, Nicholas. 2000. Investing for the long run when returns are predictable.The Journal of Finance, 55(1), 225–264. → pages 5, 12, 57, 65128Brandt, Michael W. 1999. Estimating portfolio and consumption choice: Aconditional Euler equations approach. The Journal of Finance, 54(5),1609–1645. → pages 3, 15, 16, 17Brandt, Michael W, Santa-Clara, Pedro, and Valkanov, Rossen. 2009. Parametricportfolio policies: Exploiting characteristics in the cross-section of equityreturns. Review of Financial Studies, hhp003. → pages 55, 65Brandt, MichaelW. 2009. Portfolio choice problems. Handbook of financialeconometrics, 1, 269–336. → pages 3, 34, 35Brennan, Michael J, and Xia, Yihong. 2001. Assessing asset pricing anomalies.Review of Financial Studies, 14(4), 905–942. → pages 5, 57Campbell, John Y, and Thompson, Samuel B. 2008. Predicting excess stockreturns out of sample: Can anything beat the historical average? Review ofFinancial Studies, 21(4), 1509–1531. → pages 38Campbell, John Y, and Viceira, Luis M. 2002. Strategic asset allocation: portfoliochoice for long-term investors. Oxford University Press. → pages 5, 16, 57Carhart, Mark M. 1997. On persistence in mutual fund performance. The Journalof finance, 52(1), 57–82. → pages 52Carpenter, James, Clifford, Peter, and Fearnhead, Paul. 1999. Building robustsimulation-based filters for evolving data sets. Tech-nical report, University ofOxford, Dept. of Statistics. → pages 114Chan, Louis KC, Karceski, Jason, and Lakonishok, Josef. 1998. The risk andreturn from factors. Journal of Financial and Quantitative Analysis, 33(02),159–188. → pages 52Chib, Siddhartha. 2001. Markov chain Monte Carlo methods: computation andinference. Handbook of econometrics, 5, 3569–3649. → pages 105Chopin, Nicolas. 2002. A sequential particle filter method for static models.Biometrika, 89(3), 539–552. → pages 113Cieslak, Anna, and Povala, Pavol. 2012. Understanding bond risk premia.Unpublished working paper. Kellogg School of Management, Evanston, IL. →pages 14Clark, Todd E, and McCracken, Michael W. 2001. Tests of equal forecastaccuracy and encompassing for nested models. Journal of econometrics,105(1), 85–110. → pages 10, 62129Clark, Todd E, and West, Kenneth D. 2007. Approximately normal tests for equalpredictive accuracy in nested models. Journal of econometrics, 138(1),291–311. → pages 18Cochrane, John H. 2011. Presidential address: Discount rates. The Journal ofFinance, 66(4), 1047–1108. → pages 58Cochrane, John H, and Piazzesi, Monika. 2005. Bond Risk Premia. The AmericanEconomic Review, 95(1), 138–160. → pages 1, 13Connor, Gregory. 1997. Sensible return forecasting for portfolio management.Financial Analysts Journal, 44–51. → pages 3, 34, 35Cooper, Michael J, Gulen, Huseyin, and Schill, Michael J. 2008. Asset growthand the cross-section of stock returns. The Journal of Finance, 63(4),1609–1651. → pages 52Cox, John C, Ingersoll Jr, Jonathan E, and Ross, Stephen A. 1985. A theory of theterm structure of interest rates. Econometrica: Journal of the EconometricSociety, 385–407. → pages 101Creal, Drew. 2012. A survey of sequential Monte Carlo methods for economicsand finance. Econometric Reviews, 31(3), 245–296. → pages 112Daniel, Kent, and Moskowitz, Tobias J. 2013. Momentum crashes. University ofGeneva. → pages 73De Jong, Frank, and Santa-Clara, Pedro. 1999. The dynamics of the forwardinterest rate curve: A formulation with state variables. Journal of Financial andQuantitative Analysis, 34(01), 131–157. → pages 100, 118De Rossi, Giuliano. 2010. Maximum Likelihood Estimation of theCox–Ingersoll–Ross Model Using Particle Filters. Computational Economics,36(1), 1–16. → pages 101Del Moral, Pierre. 2004. Feynman-Kac Formulae. → pages 111DeMiguel, Victor, Garlappi, Lorenzo, and Uppal, Raman. 2009. Optimal versusnaive diversification: How inefficient is the 1/N portfolio strategy? Review ofFinancial Studies, 22(5), 1915–1953. → pages 40, 56Diebold, Francis X, and Mariano, Roberto S. 1995. Comparing PredictiveAccuracy. Journal of Business and Economic Statistics, 13(3). → pages 4, 10,56, 62130Douc, Randal, and Cappe´, Olivier. 2005. Comparison of resampling schemes forparticle filtering. Pages 64–69 of: Image and Signal Processing and Analysis,2005. ISPA 2005. Proceedings of the 4th International Symposium on. IEEE.→ pages 114Doucet, Arnaud, De Freitas, Nando, and Gordon, Neil. 2001. Sequential MonteCarlo methods in practice. Springer. → pages 111Duffee, Gregory R, and Stanton, Richard H. 2008. Evidence on simulationinference for near unit-root processes with implications for term structureestimation. Journal of Financial Econometrics, 6(1), 108–142. → pages 100Duffie, Darrell, and Kan, Rui. 1996. A yield-factor model of interest rates.Mathematical finance, 6(4), 379–406. → pages 98Dunn, Olive Jean. 1961. Multiple comparisons among means. Journal of theAmerican Statistical Association, 56(293), 52–64. → pages 20, 70Fama, Eugene F, and Bliss, Robert R. 1987. The information in long-maturityforward rates. The American Economic Review, 680–692. → pages 1, 13Fama, Eugene F, and French, Kenneth R. 1992. The cross-section of expectedstock returns. the Journal of Finance, 47(2), 427–465. → pages 52Fama, Eugene F, and French, Kenneth R. 1993. Common risk factors in thereturns on stocks and bonds. Journal of financial economics, 33(1), 3–56. →pages 53Fama, Eugene F, and French, Kenneth R. 1996. Multifactor explanations of assetpricing anomalies. The Journal of Finance, 51(1), 55–84. → pages 52, 53Fama, Eugene F, and French, Kenneth R. 2008. Dissecting anomalies. TheJournal of Finance, 63(4), 1653–1678. → pages 53Fama, Eugene F, and French, Kenneth R. 2014. A five-factor asset pricing model.Fama-Miller working paper, University of Chicago, Dartmouth College andNBER. → pages 55, 75Fama, Eugene F, and MacBeth, James D. 1973. Risk, return, and equilibrium:Empirical tests. The Journal of Political Economy, 607–636. → pages 54Ferna´ndez-Villaverde, Jesu´s, and Rubio-Ramı´rez, Juan F. 2007. Estimatingmacroeconomic models: A likelihood approach. The Review of EconomicStudies, 74(4), 1059–1087. → pages 111131Flury, Thomas, and Shephard, Neil. 2011. Bayesian inference based only onsimulated likelihood: particle filter analysis of dynamic economic models.Econometric Theory, 27(05), 933–956. → pages 101, 109Gallant, A Ronald, and Tauchen, George. 1996. Which moments to match?Econometric Theory, 12(04), 657–681. → pages 100Gallant, A Ronald, Hong, Han, and Khwaja, Ahmed. 2008. Estimating DynamicGames of Complete Information with an Application to the GenericPharmaceutical Industry. Tech. rept. working paper. → pages 111Giacomini, Raffaella, and White, Halbert. 2006. Tests of conditional predictiveability. Econometrica, 74(6), 1545–1578. → pages 4, 10, 11, 12, 56, 62, 63Gibbons, Michael R, Ross, Stephen A, and Shanken, Jay. 1989. A test of theefficiency of a given portfolio. Econometrica: Journal of the EconometricSociety, 1121–1152. → pages 54Goh, Jeremy, Jiang, Fuwei, Tu, Jun, and Zhou, Guofu. 2012. Forecasting bondrisk premia using technical indicators. Tech. rept. Working paper, WashingtonUniversity in St. Louis. → pages 15Gordon, Neil J, Salmond, David J, and Smith, Adrian FM. 1993. Novel approachto nonlinear/non-Gaussian Bayesian state estimation. Pages 107–113 of: IEEProceedings F (Radar and Signal Processing), vol. 140. IET. → pages 99, 111,115Gourieroux, C, and Monfort, A. 2007. Estimating the historical mean revertingparameter in the CIR model. Tech. rept. CREST Working Paper. → pages 99,102Goyal, Amit. 2012. Empirical cross-sectional asset pricing: a survey. FinancialMarkets and Portfolio Management, 26(1), 3–38. → pages 52Gu¨rkaynak, Refet S, Sack, Brian, and Wright, Jonathan H. 2007. The USTreasury yield curve: 1961 to the present. Journal of Monetary Economics,54(8), 2291–2304. → pages 103Hansen, Lars Peter, Heaton, John C, and Li, Nan. 2008. Consumption strikesback? Measuring long-run risk. Journal of Political Economy, 116(2), 260–302.→ pages 53Hansen, LP, and Scheinkman, JA. 1995. Back to the future: generating momentimplications for continuous-time Markov processes. Econometrica, 63(4),767–804. → pages 100132Harvey, Campbell, Liu, Yan, and Zhu, Heqing. 2013. and the cross-section ofexpected returns. Available at SSRN 2249314. → pages 53Holm, Sture. 1979. A simple sequentially rejective multiple test procedure.Scandinavian journal of statistics, 65–70. → pages 20, 70Hou, Kewei, Xue, Chen, and Zhang, Lu. 2012. Digesting anomalies: Aninvestment approach. Tech. rept. National Bureau of Economic Research. →pages 75Jegadeesh, Narasimhan. 1990. Evidence of predictable behavior of securityreturns. The Journal of Finance, 45(3), 881–898. → pages 65Johansen, Adam M. 2009. SMCTC: sequential Monte Carlo in C++. Journal ofStatistical Software, 30(6), 1–41. → pages 126Kandel, Shmuel, and Stambaugh, Robert F. 1996. On the Predictability of StockReturns: An Asset-Allocation Perspective. The Journal of Finance, 51(2),385–424. → pages 5, 57Kim, Sangjoon, Shephard, Neil, and Chib, Siddhartha. 1998. Stochastic volatility:likelihood inference and comparison with ARCH models. The Review ofEconomic Studies, 65(3), 361–393. → pages 111Liu, Jun S, and Chen, Rong. 1995. Blind deconvolution via sequentialimputations. Journal of the American Statistical Association, 90(430), 567–576.→ pages 116Liu, Laura Xiaolei, Whited, Toni M, and Zhang, Lu. 2009. Investment-BasedExpected Stock Returns. Journal of Political Economy, 117(6), 1105–1139. →pages 53Ludvigson, Sydney C, and Ng, Serena. 2009. Macro factors in bond risk premia.Review of Financial Studies, hhp081. → pages 1, 12, 14Ludvigson, Sydney C, and Ng, Serena. 2011. A Factor Analysis of Bond RiskPremia. Handbook of Empirical Economics and Finance, 313. → pages 12, 14Lund, Jesper. 1995. Econometric analysis of continuous-time: arbitrage-freemodels of the term structure of interest rates. Handelshøjskolen i A˚rhus, Institutfor informationsbehandling, Anvendt statistik og databehandling. → pages 100Novy-Marx, Robert. 2013. The other side of value: The gross profitabilitypremium. Journal of Financial Economics, 108(1), 1–28. → pages 52133Pa´stor, L’ubosˇ. 2000. Portfolio selection and asset pricing models. The Journal ofFinance, 55(1), 179–223. → pages 5, 57Pesaran, M Hashem, and Timmermann, Allan. 2007. Selection of estimationwindow in the presence of breaks. Journal of Econometrics, 137(1), 134–161.→ pages 44Piazzesi, Monika. 2010. Affine term structure models. Handbook of financialeconometrics, 1, 691–766. → pages 98Rosenthal, Jeffrey S. 2011. Optimal proposal distributions and adaptive MCMC.Handbook of Markov Chain Monte Carlo, 93–112. → pages 110, 111Stock, James H, and Watson, Mark W. 2002. Macroeconomic forecasting usingdiffusion indexes. Journal of Business and Economic Statistics, 20(2), 147–162.→ pages 12Stock, James H, and Watson, Mark W. 2005. Implications of dynamic factormodels for VAR analysis. Tech. rept. National Bureau of Economic Research.→ pages 12Thornton, Daniel L, and Valente, Giorgio. 2012. Out-of-sample predictions ofbond excess returns and forward rates: An asset allocation perspective. Reviewof Financial Studies, 25(10), 3141–3168. → pages 5van Binsbergen, Jules H, Brandt, Michael W, and Koijen, Ralph SJ. 2012.Decentralized decision making in investment management. → pages 12, 65Viceira, Luis M. 2012. Bond risk, bond return volatility, and the term structure ofinterest rates. International Journal of Forecasting, 28(1), 97–117. → pages 29West, Kenneth D. 1996. Asymptotic inference about predictive ability.Econometrica: Journal of the Econometric Society, 1067–1084. → pages 10,56, 62White, Halbert. 1984. Asymptotic theory for econometricians. Academic pressNew York. → pages 9, 11, 12, 61, 63Wooldridge, Jeffrey M, and White, Halbert. 1988. Some invariance principles andcentral limit theorems for dependent heterogeneous processes. Econometrictheory, 4(02), 210–230. → pages 11Zhang, Lu. 2005. The value premium. The Journal of Finance, 60(1), 67–103. →pages 53134Appendix ASystematic ResamplingSuppose that at time t, importance sampling has provided the following ”weighted”approximation of pθ (r1:t |y1:t):p˜θ (r1:t |y1:t) =M∑i=1W (i)t δr˜(i)1:t (r1:t), (A.1)where r˜(i)1:t for i= 1, ...,M are M different particles/sample paths and W(i)t are the cor-responding importance weights. Resampling scheme then consists of sampling Mtimes r(i)1:t ∼ p˜θ (r1:t |y1:t) and use these new samples to build the (balanced/equallyweighted) approximation:pˆθ (r1:t |y1:t) =1MM∑i=1δr(i)1:t(r1:t). (A.2)Systematic resampling is one of the popular algorithms that achieve this task.It first selects a sequence of numbers U j, j = 1, ...,M, with U1 randomly drawnfrom a uniform distribution:U1 ∼U [0,1M] (A.3)and the rest of U j (for j = 2, . . . ,M) set atU j =U1 +j−1M. (A.4)135Next, let m(i)t be the number of U j s that fall into the following region created withthe importance weights:m(i)t = #{U j :(i−1)∑s=1W (s)t ≤U j <(i)∑s=1W (s)t }, (A.5)where in particular, ∑0s=1W (s)t = 0. Then, each original sample path r˜(i)1:t is du-plicated m(i)t times to form the set of new particles/sample paths r(i)1:t which re-approximate the target at time t. Notice that ∑Mi=1 m(i)t = M by construction, theresampling step can therefore be expressed aspˆθ (r1:t |y1:t) =M∑i=1m(i)tMδr˜(i)1:t(r1:t). (A.6)It can also be proved that for each i, E[m(i)t ] =W (i)t ∗M. So intuitively, originalparticles are repeatedly replicated in proportion to their importance weights. Thosewith high weights are copied multiple times, and the ones with low weights die.After resampling, the normalized importance weight will be reset to 1/M for eachparticle, so W (i)t = 1/M for all i at the sequential importance sampling part of thenext (t +1 th) iteration.136Appendix BApproximated ConditionalOptimal FilterAlgorithm for the approximated conditional optimal filter proposed in section 4.2is summarized as follows:At time t = 1• Sample M times the initial state r˜(i)1 ∼ µ(r1) and then approximate pθ (r1|y1)by the weighted empirical distribution:p˜θ (r1|y1) =M∑i=1W (i)1 δr˜(i)1 (r1),where W (i)1 is the normalized importance weight at time 1, i.e W(i)1 ∝g(y1|r˜(i)1 ) and ∑Mi=1W (i)1 = 1.• Resample r(i)1 ∼ p˜θ (r1|y1) for all i using systematic resampling scheme toobtain new particles and build a balanced approximation:pˆθ (r1|y1) =1NM∑i=1δr(i)1(r1).• Reset W (i)1 = 1M for all i and let t = t +1.At time t ≥ 2137• Given M particles r(i)1:t−1, for each i, sampler˜(i)t ∼ qθ (rt |yt ,r(i)t−1) = N(rt ;−Qt2Pt, 12Pt),and append the new sample to the end of the corresponding path: r˜(i)1:t =(r(i)1:t−1, r˜(i)t ).• The ”weighted approximation” at time t is therefore:p˜θ (r1:t |y1:t) =M∑i=1W (i)t δr˜(i)1:t (r1:t),where W (i)t is the normalized importance weight at time t constructed byW (i)t ∝ w(i)t W(i)t−1,and w(i)t is the incremental importance weight:w(i)t =fθ (r˜(i)t |r(i)t−1)gθ (yt |r˜(i)t )qθ (r˜(i)t |yt ,r(i)t−1).• ComputeESSt =1∑Mi=1W (i)t2 .• If ESSt > M2 , perform the next step, otherwise set r(i)1:t = r˜(i)1:t and let t = t +1.• Resample r(i)1:t ∼ p˜θ (r1:t |y1:t) using systematic resampling scheme to obtainnew particles and re-approximate:pˆθ (r1:t |y1:t) =1MM∑i=1δr(i)1:t(r1:t).• Reset W (i)t = 1M for all i and let t = t +1.138
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Essays on forecast evaluation and model estimation...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Essays on forecast evaluation and model estimation in financial markets Tong, Guoshi 2015
pdf
Page Metadata
Item Metadata
Title | Essays on forecast evaluation and model estimation in financial markets |
Creator |
Tong, Guoshi |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | This thesis is comprised of three essays. In the first and second essays, I examine the welfare value of return predictors in financial markets when investors possess only limited historical data. The first essay focuses on the US Treasury bond market where time series variation in the expected return is forecastable by yield curve and macroeconomic variables. The second essay shifts attention to the US stock market where cross-sectional variation in the expected return is predictable by the underlying firms' characteristics. Using monthly US data, I estimate the utility benefit of various return predictors in either the bond or stock market through a structural approach of forecast evaluation. I consider both parametric and non-parametric portfolio policies and conduct both unconditional and conditional evaluations. I find that return predictors are generally hard to exploit with limited data. Incorporating return predictors renders the portfolio strategy more sensitive to estimation errors and instability in forecast relations. The resultant negative effect on portfolio returns and welfare is not dominated by the information value of predictors. The third essay discusses the estimation of the Cox-Ingersoll-Ross interest rate model. I propose a new likelihood-based methodology that uses marginal Metropolis Hasting algorithm with particle-filter based simulated-likelihood placed in each of the iterations. The benefit of this Bayesian approach is that it bypasses the need to compute exact likelihood functions, and its validity rests upon a recent development in Bayesian statistical theory. To mitigate the inefficiency in standard bootstrap filters due to peaky measurement density of the CIR model, I design an approximated conditional optimal filter to account for the informativeness of current yields and reduce the variance of particle weights. For typical parameter values, performance is shown to be satisfactory. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-01-12 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0135660 |
URI | http://hdl.handle.net/2429/51883 |
Degree |
Doctor of Philosophy - PhD |
Program |
Economics |
Affiliation |
Arts, Faculty of Vancouver School of Economics |
Degree Grantor | University of British Columbia |
Graduation Date | 2015-02 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2015_february_tong_guoshi.pdf [ 1.35MB ]
- Metadata
- JSON: 24-1.0135660.json
- JSON-LD: 24-1.0135660-ld.json
- RDF/XML (Pretty): 24-1.0135660-rdf.xml
- RDF/JSON: 24-1.0135660-rdf.json
- Turtle: 24-1.0135660-turtle.txt
- N-Triples: 24-1.0135660-rdf-ntriples.txt
- Original Record: 24-1.0135660-source.json
- Full Text
- 24-1.0135660-fulltext.txt
- Citation
- 24-1.0135660.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0135660/manifest