Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

On the choice of scoring functions for forecast comparisons Agyeman, Jonathan 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_may_agyeman_jonathan.pdf [ 783.41kB ]
Metadata
JSON: 24-1.0344014.json
JSON-LD: 24-1.0344014-ld.json
RDF/XML (Pretty): 24-1.0344014-rdf.xml
RDF/JSON: 24-1.0344014-rdf.json
Turtle: 24-1.0344014-turtle.txt
N-Triples: 24-1.0344014-rdf-ntriples.txt
Original Record: 24-1.0344014-source.json
Full Text
24-1.0344014-fulltext.txt
Citation
24-1.0344014.ris

Full Text

On the choice of scoring functions for forecastcomparisonsbyJonathan AgyemanB.A. Economics and Statistics, University of Ghana, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)April 2017c© Jonathan Agyeman, 2017AbstractForecasting of risk measures is an important part of risk management for financialinstitutions. Value-at-Risk and Expected Shortfall are two commonly used riskmeasures and accurately predicting these risk measures enables financial institu-tions to plan adequately for possible losses. Point forecasts from different methodscan be compared using consistent scoring functions, provided the underlying func-tional to be forecasted is elicitable. It has been shown that the choice of a scoringfunction from the family of consistent scoring functions does not influence theranking of forecasting methods as long as the underlying model is correctly speci-fied and nested information sets are used. However, in practice, these conditions donot hold, which may lead to discrepancies in the ranking of methods under differentscoring functions.We investigate the choice of scoring functions in the face of model misspecifi-cation, parameter estimation error and nonnested information sets. We concentrateon the family of homogeneous consistent scoring functions for Value-at-Risk andthe pair of Value-at-Risk and Expected Shortfall and identify conditions requiredfor existence of the expectation of these scoring functions. We also assess thefinite-sample properties of the Diebold-Mario Test, as well as examine how thesescoring functions penalize for over-prediction and under-prediction with the aid ofsimulation studies.iiPrefaceThis dissertation is solely authored, unpublished work by the author, JonathanAgyeman. The research topic was suggested by the supervisor Prof. Natalia Nolde,and the supporting simulation studies were designed and carried out by the author.The issue of the choice of scoring functions in the face of model misspecification,parameter estimation error or nonnested information sets of forecasting procedureswas raised by Patton.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Consistency and elicitability . . . . . . . . . . . . . . . . . . . . 52.2 Value-at-Risk and Expected Shortfall . . . . . . . . . . . . . . . . 72.3 Scoring functions for VAR and (VAR, ES) . . . . . . . . . . . . . 83 Choice of scoring functions . . . . . . . . . . . . . . . . . . . . . . . 154 Scoring functions and tail behaviour . . . . . . . . . . . . . . . . . . 194.1 Expectation of homogeneous scoring function . . . . . . . . . . . 194.1.1 GARCH Processes in QRM . . . . . . . . . . . . . . . . 20iv5 Finite-Sample Properties of the Diebold-Mariano test . . . . . . . . 235.1 Diebold-Mariano test . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Simulation-based result for size properties . . . . . . . . . . . . . 265.3 Simulation-based result for power properties . . . . . . . . . . . . 316 Over-prediction and Under-prediction in finance . . . . . . . . . . . 446.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52A Forecasting Risk Measures . . . . . . . . . . . . . . . . . . . . . . . 55A.1 Fully parametric estimation . . . . . . . . . . . . . . . . . . . . . 56A.2 Filtered historic simulation . . . . . . . . . . . . . . . . . . . . . 57vList of TablesTable 5.1 Models for parameter estimation error, model misspecificationand nonnested information sets in the simulation study for thefinite-sample size property of the D-M test. Models are used inmaking quantile and expected shortfall forecasts and evaluatedusing selected consistent scoring functions. . . . . . . . . . . . 24Table 5.2 Models for the three scenarios (parameter estimation error, modelmisspecification and nonnested information sets) in the simula-tion study for the finite-sample power property of the D-M test.Models are used in making quantile and expected shortfall fore-casts and evaluated using selected consistent scoring functions. 24Table 5.3 Parameter estimation error: (Case 1) Size values of the D-Mtest for the one-step ahead forecast of the 0.90, 0.95 and 0.99quantiles at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters inthis case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Table 5.4 Parameter estimation error: (Case 1) Size values of the D-Mtest for the one-step ahead forecast of the 0.90, 0.95 and 0.99quantiles at various out-of-sample sizes. The filtered historicsimulation is used in the estimation of the model parameters inthis case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28viTable 5.5 Parameter estimation error: (Case 2) Size values of the D-Mtest for the one-step ahead forecast of the 0.90, 0.95 and 0.99quantiles at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters inthis case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Table 5.6 Model misspecification: (Case 1) Size values of the D-M testfor the one-step ahead forecast of the 0.90, 0.95 and 0.99 quan-tiles at various out-of-sample sizes. The fully parametric ap-proach is used in the estimation of the model parameters in thiscase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Table 5.7 Model misspecification: (Case 2) Size values of the D-M testfor the one-step ahead forecast of the 0.90, 0.95 and 0.99 quan-tiles at various out-of-sample sizes. The fully parametric ap-proach is used in the estimation of the model parameters in thiscase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Table 5.8 Nonnested information sets: Size values of the D-M test for theone-step ahead forecast of the 0.90, 0.95 and 0.99 quantiles atvarious out-of-sample sizes. The fully parametric approach isused in the estimation of the model parameters in this case. . . 32Table 5.9 Parameter estimation error: (Case 1) Size values of the D-Mtest for the one-step ahead forecast of (VAR, ES) for ν values0.754, 0.875 and 0.975 with various out-of-sample sizes. Thefully parametric approach is used in the estimation of the modelparameters in this case. . . . . . . . . . . . . . . . . . . . . . 32Table 5.10 Parameter estimation error: (Case 2) Size values of the D-Mtest for the one-step ahead forecast of (VAR, ES) for ν values0.754, 0.875 and 0.975 at various out-of-sample sizes. The fullyparametric approach is used in the estimation of the model pa-rameters in this case. . . . . . . . . . . . . . . . . . . . . . . . 33viiTable 5.11 Model misspecification: (Case 1) Size values of the D-M testfor the one-step ahead forecast of (VAR, ES) for ν values 0.754,0.875 and 0.975 with various out-of-sample sizes. The fullyparametric approach is used in the estimation of the model pa-rameters in this case. . . . . . . . . . . . . . . . . . . . . . . 33Table 5.12 Model misspecification: (Case 2) Size values of the D-M testfor the one-step ahead forecast of (VAR, ES) for ν values 0.754,0.875 and 0.975 with various out-of-sample sizes. The fullyparametric approach is used in the estimation of the model pa-rameters in this case. . . . . . . . . . . . . . . . . . . . . . . 34Table 5.13 Nonnested information sets: Size values of the D-M test for theone-step ahead forecast of (VAR, ES) for ν values 0.754, 0.875and 0.975 at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters inthis case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Table 5.14 Parameter estimation error: Power values of the D-M test forthe one-step ahead forecast of the 0.90, 0.95 and 0.99 quantilesat various out-of-sample sizes. The fully parametric approachis used in the estimation of the model parameters in this case. . 35Table 5.15 Model misspecification: Power values of the D-M test for theone-step ahead forecast of the 0.90, 0.95 and 0.99 quantiles atvarious out-of-sample sizes. The fully parametric approach isused in the estimation of the model parameters in this case. . . 37Table 5.16 Nonnested information sets: Power values of the D-M test forthe one-step ahead forecast of the 0.90, 0.95 and 0.99 quantilesat various out-of-sample sizes. The fully parametric approachis used in the estimation of the model parameters in this case. . 39Table 5.17 Parameter estimation error: Power values of the D-M test for theone-step ahead forecast of (VAR, ES) for ν values 0.754, 0.875and 0.975 at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters inthis case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39viiiTable 5.18 Model misspecification: Power values of the D-M test for theone-step ahead forecast of (VAR, ES) for ν values 0.754, 0.875and 0.975 at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters inthis case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Table 5.19 Nonnested information sets: Power values of the D-M test forthe one-step ahead forecast of (VAR, ES) for ν values 0.754,0.875 and 0.975 at various out-of-sample sizes. The fully para-metric approach is used in the estimation of the model parame-ters in this case. . . . . . . . . . . . . . . . . . . . . . . . . . 40Table 6.1 Expected scores and corresponding forecasts for selected val-ues of the homogeneity order b used in the scoring function forVAR0.95 forecasts. Sop, Sov and Sud indicate scoring functionswhen the optimal forecast, over-predicted forecast and under-predicted forecasts, respectively, are used. . . . . . . . . . . . 46ixList of FiguresFigure 2.1 An example of a loss distribution with the 95% VAR markedas a vertical line and 95% ES shown with a dotted line. . . . . 8Figure 2.2 Various homogeneous GPL scoring function S(yˆ,y) in (2.8),with α = 0.90 (a) and α = 0.99 (b). The Lin-Lin scoring func-tion is obtained when b = 1. The value of y is set at 1. . . . . . 10Figure 2.3 Homogeneous scoring functions S(yˆ1, yˆ2,y) in (2.10) and (2.11)for the pair (VARν , ESν ), with ν = 0.90 (a) and ν = 0.99 (b).The value of y is set at 1. . . . . . . . . . . . . . . . . . . . . 12Figure 2.4 Murphy diagrams for simulation study comparing forecasts for0.99-quantile. Data generating process is a GARCH (1, 1)model with skew-t innovations. The first forecasting proce-dure uses a GARCH (1, 1) model and the second forecastingprocedure uses an AR (1)-GARCH (1, 1) model. (a) the plot ofthe scores from the elementary scoring function against the θ .(b) the plot of the scores differences against θ . No differencein the performance of forecasting procedures. . . . . . . . . . 13xFigure 2.5 Murphy diagrams for simulation study comparing forecasts for0.99-quantile. Data generating process is a GARCH (1, 1)model with skew-t innovations. The first forecasting proce-dure uses a GARCH (1, 1) model with information on the con-ditional variance and the second forecasting procedure uses anARCH (1) model with no information on the conditional vari-ance. (a) the plot of the scores from the elementary scoringfunction against the θ . (b) the plot of the scores differencesagainst θ . Forecasting procedure with GARCH model per-forms better than forecasting procedure with ARCH model. . 14Figure 4.1 Upper bounds of the homogeneity order b for the expectationof the b-homogeneous scoring functions for VARα to exist. . . 22Figure 5.1 Plot of power of the D-M test against homogeneity order b ofthe homogeneous GPL for the different out-of-sample sizes. . 36Figure 5.2 Plot of power of the D-M test against homogeneity order b ofthe homogeneous GPL for the different out-of-sample sizes. . 38Figure 5.3 Plot of power of the D-M test against homogeneity order b ofthe homogeneous GPL for the different out-of-sample sizes. . 40Figure 5.4 Parameter estimation Error: Plot of power of the D-M testagainst out-of-sample size (T) for the homogeneous scoringfunctions for the pair (VARν , ESν ). . . . . . . . . . . . . . . 41Figure 5.5 Model misspecification: Plot of power of the D-M test againstout-of-sample size (T) for the homogeneous scoring functionsfor the pair (VARν , ESν ). . . . . . . . . . . . . . . . . . . . . 42Figure 5.6 Nonnested information sets: Plot of power of the D-M testagainst out-of-sample size (T) for the homogeneous scoringfunctions for the pair (VARν , ESν ). . . . . . . . . . . . . . . 43Figure 6.1 Plot depicting the optimal forecast value, over-predicted andunder-predicted VAR forecasts for a case where returns are as-sumed to follow a standard normal distribution . . . . . . . . 45xiFigure 6.2 Plot of expectation of homogeneous scoring function for VAR0.95forecasts against the magnitude of difference for over-predictionand under-prediction. Top panel shows the case for the lowerhomogeneous order b, where under-prediction is penalized morethan over-prediction. The lower panel shows the case for highervalues of b where over-prediction is penalized more than under-prediction. Data is generated from a skewed-normal distribu-tion with mean zero, unit variance and skewness parameter of3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Figure 6.3 Plot of expectation of homogeneous scoring function for VAR0.95forecasts against the magnitude of difference for over-predictionand under-prediction. Top panel shows the case for the lowerhomogeneous order b, where under-prediction is penalized morethan over-prediction. The lower panel shows the case for highervalues of b where over-prediction is penalized more than under-prediction. Data is generated from a skewed-t distribution withmean zero, unit variance, 5 degrees of freedom and skewnessparameter of 3. . . . . . . . . . . . . . . . . . . . . . . . . . 48Figure 6.4 Plot of expectation of homogeneous scoring function for(VAR0.95,ES0.95) forecasts against the magnitude of difference for over-prediction and under-prediction. Top panel shows the casewhere data is generated from a skewed-t distribution with meanzero, unit variance, 5 degrees of freedom and skewness param-eter of 3. The lower panel shows the case where data is gen-erated from a skewed-normal distribution with mean zero, unitvariance and skewness parameter of 3. . . . . . . . . . . . . . 49xiiGlossaryDGP data generating processD-M Diebold-MarianoES Expected ShortfallFHS filtered historic simulationFPE fully parametric estimationGPL generalized piecewise linearVaR Value-at-RiskxiiiAcknowledgmentsFirstly, I want to thank the Almighty God for seeing me through my Masters pro-gram at UBC. Being at UBC has been a great experience for me. I had the op-portunity to learn from the greatest minds in statistics and also interact with veryintelligent students from all over the world.My deepest gratitude to my supervisor, Prof. Natalia Nolde. Natalia has been agreat mentor and working with her has taught me a lot. Her dedication towards herwork is second to none and this is a quality I want to emulate as I begin my careerin this field. Her invaluable advice and ability to present complicated issues in asimple way have been very helpful throughout my research work. I am also grate-ful to Prof. Jiahua Chen for being my second reader and his helpful suggestionsconcerning my thesis.A big thank you to my mother Juliana and my siblings Anne-Mary, Yvonne,David, Emmanuel, and Joshua for their tremendous support and prayers. Thankyou to my wonderful nephews Gidon and Winston, and my lovely nieces Annabeland Valerie. I also wish to thank Rev.Dr. Enerstina Afriyie, Dr. Isaac Baidoo andMr. Albert Owusu for their support throughout my education.I wish to thank my friends Rowena, Kwasi, Yeng, Kojo, Andy, Farouk, Nay-orm, Elsie, Naki and Ann for their support and sacrifices made for me. I alsowant to thank my ever supportive office mates, Creagh, Derek, Sohrab and Yijun.Thanks to my dear friends Qiong, Yiwei, Harry and Julian who have been of greathelp all through my program. Special thanks to Ali, Andrea and Peggy for their pa-tience and willingness to answer all my questions and clarify any misunderstandingI had here. Lastly, I want to say a big thank you to my mean square terror matesDanny, David, Joe and Pablo. God bless you all.xivChapter 1IntroductionPoint forecasting plays an important role in many fields. To make accurate predic-tions about an uncertain future, there is a need to make desicions on the appropri-ate forecasting procedure as well as measures for evaluating forecasts. Competingforecasting procedures can be compared using scoring functions. Scoring func-tions play an important role in the theory and practice of forecasting [Gneiting,2011a]. Gneiting [2011a] argues that in the situation where point forecasts areto be issued and evaluated, it is important to either specify a scoring function forevaluation from which the functional to be forecasted can be deduced or make thestatistical functional (e.g., the mean, quantile or expectile) to be forecasted known.In finance, point forecasting plays an important role in decision making. Financialinstitutions rely on conditional forecasts of risk measures for the purposes of inter-nal risk management as well as regulatory capital calculations [Nolde and Ziegel,2016]. Most modern risk measures for a portfolio are statistical quantities whichdescribe the (conditional) loss distribution of the portfolio [McNeil et al., 2005].Value-at-Risk (VaR) and Expected Shortfall (ES) are examples of such risk mea-sures. The decision on the forecasting procedure to use in estimating a chosen riskmeasure can be made using scoring functions. The choice of scoring function istherefore an important aspect of forecasting in finance.A scoring function is an error measure which assesses how close or far-off pointforecasts are from the verifying observations. When comparing two competingforecasting procedures using a scoring function, the forecast cases obtained from1each of the forecasting procedures and the corresponding verifying observationsare used to obtain score values for each forecast case. These score values areaveraged to obtain a single estimate for each forecasting procedure as shown in(1.1)S¯ j =1nn∑i=1S(yˆ ji ,yi), i = 1, ...,n j = 1,2 (1.1)where yˆ ji is the ith point forecast from forecasting procedure j and yi is the corre-spoding verifying obeservation. Smaller average scores correspond to better fore-casting procedures. Commonly used scoring functions for the mean and medianforecasts include the squared error and the absolute error, respectively. A func-tional is a potentially set-valued mapping T (F) from a class of probability distri-butions,F , to the real line, R. Examples include the mean and quantile. Gneiting[2011a] argues that it is important for a scoring function S(yˆ,y), where yˆ is the pointforecast and y is the observed value, to be consistent for the functional T relativeto the class F . This means that the expectation of the scoring function with re-spect to the distribution F is minimized at a given functional compared to all otherforecasts.Statistical functionals for which a consistent scoring function exists are knownas elicitable functionals. We discuss the concept of consistency and elicitabilityin Section 2.1. Patton argues that evaluating forecasts of a given functional us-ing consistent scoring functions is a minimal requirement for sensible rankings ofcompeting forecasts.It is shown in Thomson [1979] and Saerens [2000] that quantiles are elicitablefunctionals. Hence, VAR is an elicitable functional since it is simply the quantile ofa given distribution in probabilistic terms. On the other hand, ES is not elicitable.However, it is jointly elicitable with VAR; see [Acerbi and Szekely, 2014] and[Fissler and Ziegel, 2016]. This implies that there are scoring functions that areconsistent for VAR and the pair (VAR, ES). Detailed review on these elicitable riskmeasures and their respective consistent scoring functions is presented in Section2.2 and Section 2.3.For a given elicitable functional, there are infinitely many consistent scoring2functions that can be used in the evaluation of forecasts. Patton assesses the per-formance of consistent scoring functions for given functionals in the presence ofmisspecified models, parameter estimation error, or nonnested information sets.He concludes that the choice of the scoring function for ranking forecasts does notmatter provided that models are correctly specified and information sets of com-peting forecasting procedures are nested. However, in practice, the model usedfor forecasting is uncertain, the parameters need to be estimated, and forecastingprocedures do not have much information about competing forecast models. Heargues that scoring functions are sensitive to model misspecification, parameterestimation error, or nonnested information sets and the ranking of forecasts maybe inconsistent when different scoring functions are used in evaluating forecasts.Gneiting [2011a] also presents a simulation study where he shows that the choiceof an arbitrary scoring function in the assessment of forecasts could produce mis-leading results. Due to this, there is the need to establish criteria for desirablescoring functions within the class of consistent scoring functions.This research work seeks to identify the criteria for choosing desirable consis-tent scoring functions for the evaluation of VAR and the pair (VAR, ES) forecasts.We seek to identify the class of consistent scoring functions that satisfy the cho-sen criteria and may lead to accurate ranking of forecasting procedures in practice.The report is organized as follows: Chapter 2 presents key definitions and theoreti-cal discussion of consistent scoring functions, elicitable functionals, risk measuresVAR and ES, and the family of consistent scoring functions for VAR and the pair(VAR, ES). Chapter 3 presents an overview of the criteria to consider in choos-ing a consistent scoring function for evaluating the risk measures and introducesthe class of homogeneous scoring functions for VAR and (VAR, ES). In Chapter4, we look at the tail behaviour of GARCH processes, examining the conditionsunder which the expected value of chosen scoring function with respect to the ran-dom variable Y with distribution F will exist. In Chapter 5, we present simulationstudies where the finite-sample size and power properties of the Diebold-Marianotest [Diebold and Mariano, 1995] are assessed for consistent scoring functions forVAR and the pair (VAR, ES). Chapter 6 assesses how consistent scoring functionspenalize for under-prediction and over-prediction of the same magnitude with theaid of simulation studies. Finally, conclusions for our work and possible further3studies are presented in Chapter 7.4Chapter 2PreliminariesA key component of point forecasting is the means by which competing forecastsare evaluated. Competing point forecasting procedures are typically comparedbased on a scoring function which is averaged over forecast cases. A scoring func-tion, S, is any mapping S : Rk×R→ R, where S(yˆ,y) represents the score whena point forecast yˆ ∈ Rk is issued and the observation y ∈ R is realized. Decisionsconcerning the choice of scoring functions when evaluating competing forecasts isimportant in point forecasting since an arbitrary choice of the scoring function forthe evaluation of forecasts may give misleading results.2.1 Consistency and elicitabilityFollowing Gneiting [2011a], a functional is a potentially set-valued mapping T(F)from a class of probability distributions, F , to the real line R. It is important thatfor the given functional relative to the class F , the scoring function chosen forevaluation of forecasts be consistent. A scoring function is consistent for a givenfunctional if the expected score is minimized under the true distribution comparedto all other forecasts.Definition 2.1.1. (Gneiting [2011a]) Let F a family of probability measures onR, T : F → R a functional, and S : Rk×R→ R a scoring function. The scoringfunction S is consistent for the functional T relative to the class F , if EFS(yˆ,Y )5exists and is finite for all F ∈F and a given point forecast yˆ ∈ R, and ifEFS(t,Y )≤ EFS(yˆ,Y ) (2.1)for all F ∈F , all t ∈ T (F), and all point forecasts yˆ ∈ R. It is strictly consistent ifit is consistent and equality of expectations implies that yˆ ∈ T (F).This means that, for a given scoring function S, and a predictive distribution F , thefunctional T (F) is the optimal point predictor ifT (F) = argminyˆEF [S(yˆ,Y )].An example of a consistent scoring function for the mean functional is thesquared error given by S(yˆ,y) = (yˆ− y)2. Even though it is desirable for a givenfunctional to have a consistent scoring function for forecast evaluation. However,not all functionals have consistent scoring functions. One example is the variance.A functional for which a consistent scoring function exists is called an elicitablefunctional [Lambert et al., 2008]. Examples of elicitable functionals are the mean,quantile, and expectile functionals. A formal definition for an elicitable functionalis given below.Definition 2.1.2. (Lambert et al. [2008]) A functional T is elicitable relative to theclass F if there exists a scoring function S that is strictly consistent for T relativetoF .Misleading results about the quality of forecasts may be obtained when the scoringfunction used in the evaluation of point forecasts is not consistent for the givenfunctional under the true distribution. Nolde and Ziegel [2016] illustrate with asimulation study how the methods used in forecasting VAR and the pair (VAR,ES) have reasonable rankings when consistent scoring functions for these func-tionals are used in evaluating forecasts. In particular, the optimal forecast withknowledge of the data generating process is ranked as “best” among competingforecasting procedures. Ziegel [2016] describes elicitable functionals as function-als for which meaningful point forecasts and forecast performance comparisons arepossible. This report concentrates on forecasting of two risk measures, the VARand the pair (VAR, ES), which are introduced next.62.2 Value-at-Risk and Expected ShortfallThe use of VAR as a risk measure in risk management is very common. For agiven confidence level α ∈ (0,1), VAR is the value x such that, the probability ofobserving a value greater than x is no larger than 1-α . Typical α values in riskmanagement are α = 0.95 (internal risk management) and α = 0.99 (regulatorylevel). A formal definition for VAR is given below.Definition 2.2.1. For a random variable X , VAR at confidence level α ∈ (0,1) isdefined as:VARα(X) = in f{x ∈ R : P(X > x)≤ 1−α}. (2.2)From the statistical perspective, VARα is simply an α-quantile of a given loss (orprofit) distribution, provided the α-quantile is single-valued. As mentioned earlier,the α-quantile is an elicitable functional, and the family of consistent scoring func-tions for the α-quantile consists of generalized piecewise linear functions of orderα ∈ (0,1). VAR as a risk measure has been criticized for structural drawbacks bya number of researchers over the years. ES, a coherent risk measure, was proposedby Artzner et al. [1999] as an alternative to VAR and there has been a growingconsensus on the use of ES as a risk measure over the VAR; [Kusuoka, 2001] and[Acerbi and Tasche, 2002]. The Bank for International Settlements [2013] hasbeen looking into the arguments for and against the change of the regulatory riskmeasure from VAR to ES. There is a close relation between ES and VAR and thetwo have been found to be jointly elicitable. This means that for there is a consis-tent scoring function for the pair (VAR, ES). The formal definition for ES is givenbelow.Definition 2.2.2. For a random variable X , ES at confidence levels ν ∈ (0,1) isdefined as:ESν(X) =11−ν∫ 1νVARα(X)dα. (2.3)7When X is a continuous random variable, ES can be written asESν(X) = E(X |X > VARν(X)). (2.4)As seen from the definition, the ES looks at the entire tail of the loss (or profit)distribution and averages the VAR over all levels of α ≥ ν . Generally, ESα ≥VARα as illustrated in Figure 2.1. For α = 0.95, a VARα value of 3.41 is obtainedwhile the ESα is 4.77.−4 −2 0 2 40.000.050.100.150.200.250.300.35LossProbability densityVARESFigure 2.1: An example of a loss distribution with the 95% VAR marked as a vertical line and 95% ESshown with a dotted line.2.3 Scoring functions for VAR and (VAR, ES)There are many possibilities from which to choose a consistent scoring functionfor an elicitable functional. Gneiting [2011b] showcases a number of consistentscoring functions for VAR that arises in different fields where different weightsare assigned in penalizing over-prediction and under-prediction in the evaluation8of forecasts. Nolde and Ziegel [2016] use the homogeneous consistent scoringfunctions for the pair (VAR, ES) for comparative backtesting of the risk measures.There exists an entire family of consistent scoring functions for VAR and the pair(VAR, ES). We next review the characterization of the family of consistent scoringfunctions for VAR and the pair (VAR, ES).Theorem 2.3.1. (Thomson [1979]: Saerens [2000]) Up to equivalence and mildregularity conditions, a consistent scoring function for VARα (α-quantile), α ∈(0,1) relative to the class of compactly supported probability measures on I ⊆R isgiven byS(yˆ,y) = (1(yˆ≥ y)−α)(G(yˆ)−G(y)), (2.5)where G is a non-decreasing function on I, yˆ is the point forecast and y is theobserved value. Scoring functions in (2.5) form the family of generalized piecewiselinear (GPL) functions.Theorem 2.3.2. (Fissler and Ziegel [2016]) Up to equivalence and mild regularityconditions, all consistent scoring functions for (VARν , ESν ) , ν ∈ (0,1) relative tothe class of probability measures on I ⊆ R are of the formS(yˆ1, yˆ2,y) = (1{yˆ1 ≥ y}−ν)(G1(yˆ1)−G1(y))+ 1νG2(yˆ2)1{yˆ1 ≥ y}(yˆ1− y)+G2(yˆ2)(yˆ2− yˆ1)−G2(yˆ2),(2.6)where yˆ1 is the VAR forecast and yˆ2 is the ES forecast with G1 and G2 beingstrictly increasing continuously differentiable functions such that the expectationE[G1(X)] exists, limx→−∞G2(x) = 0 and G′2 = G2; (see Fissler and Ziegel [2016],corollary 5.5).From Theorem 2.3.1 and Theorem 2.3.2 above, it is seen that the choice of theconsistent scoring functions for the VAR and the pair (VAR, ES) depends on thechoice G in (2.5) and G1 and G2 in (2.6), respectively. A well known example ofa GPL scoring function is the Lin-Lin (or tick) scoring function which is obtainedwhen G in (2.5) is the identity function:S(yˆ,y) = (1(yˆ≥ y)−α)(yˆ− y). (2.7)9Patton defines a special case of GPL scoring functions which is obtained whenG(y) = sgn(y) |y|bb for b > 0. The scoring function is then given byS(yˆ,y) = (1{yˆ≥ y}−α)(sgn(yˆ)|yˆ|b− sgn(y)|y|b)/b , b > 0. (2.8)This family of GPL scoring functions is a homogeneous family of scoring func-tions. The Lin-Lin score function arises when b= 1. Figure 2.2 presents the plot ofS(yˆ,y) for a range of values of yˆ for the homogeneous GPL scoring functions ob-tained for varying values of b and selected values of α . We examine the behaviourof the scoring functions for α = 0.9 and α = 0.99 as we are concerned with thetails of loss distributions. It can be seen that the shape of the different homoge-neous GPL differs even though they all assign the same score at the optimal valuefor a given α level.0.0 0.5 1.0 1.5 2.00.00.51.01.5α = 0.90y^Scoring function b = 0.5b = 1b = 2(a)0.0 0.5 1.0 1.5 2.00.00.51.01.52.0α = 0.99y^Scoring functionb = 0.5b = 1b = 2(b)Figure 2.2: Various homogeneous GPL scoring function S(yˆ,y) in (2.8), with α = 0.90 (a) and α = 0.99(b). The Lin-Lin scoring function is obtained when b = 1. The value of y is set at 1.For the case where b = 0, Nolde and Ziegel [2016] present the 0-homogeneousscore differences for VAR which is given asS(yˆ,y) = (1−α−1{y > yˆ})logyˆ+1{y > yˆ}logy , yˆ > 0. (2.9)10Another example of a GPL scoring function is the exponential GPL scoring func-tion which is obtained when G(y) = exp(y).For consistent scoring functions for the pair (VAR, ES), Nolde and Ziegel[2016] present two homogeneous scoring functions for score differences. The(1/2)-homogeneous scoring function is obtained when G1(y) = 0 and G2(y) = 12√y, y > 0 and is given byS(yˆ1, yˆ2,y) =( 1ν) 12√yˆ21{yˆ1 ≥ y}+ 12√yˆ2 (yˆ2− yˆ1)−√yˆ2 (2.10)where yˆ1 is the VARν forecast, yˆ2 is the ESν forecast and y is the observed value.The 0-homogeneous scoring function is obtained when G1(y) = 0 and G2(y) =1y , y > 0 and is given byS(yˆ1, yˆ2,y) =( 1ν) 1yˆ21{yˆ1 ≥ y}+ 1yˆ2 (yˆ2− yˆ1)− log(yˆ2) (2.11)Figure 2.3 displays the behaviour of the two homogeneous scoring functions fora range of values of yˆ1 and yˆ2 when ν = 0.90 and ν = 0.99. It is seen that bothscoring functions assign the lowest score at the optimal forecast. Other examplesof consistent scoring functions for the pair (VAR, ES) presented by Fissler et al.[2015] include the case where G1(y) = y and G2(y) = exp(y) and the case whereG1(y) = y and G2(y) =exp(y)1+exp(y) . It can be seen from Figure 2.2 and Figure 2.3 thatthe scoring functions for evaluating these risk measures have varying shapes withdifferent scores assigned to under-prediction and over-prediction of the same mag-nitude. We assess the behaviour of the consistent scoring function when penalizingfor under-prediction and over-prediction in chapter 6 of this report.Ehm et al. [2016] present a Choquet type mixture representation of GPL scor-ing functions. This mixture representation is a reduction of the infinite number ofGPL scoring functions to a one-dimensional family of elementary scoring func-tions, in the sense that every GPL scoring function admits a representation as amixture of elementary elements.Theorem 2.3.3. (Ehm et al. [2016]) Any member of the class of GPL scoring110.0 0.5 1.0 1.5 2.00.00.20.40.60.8ν = 0.90y^1,y^2Scoring function 0−homogeneous0.5−homogeneous(a)0.0 0.5 1.0 1.5 2.00.00.20.40.60.81.0ν = 0.99y^1,y^2Scoring function0−homogeneous0.5−homogeneous(b)Figure 2.3: Homogeneous scoring functions S(yˆ1, yˆ2,y) in (2.10) and (2.11) for the pair (VARν , ESν ),with ν = 0.90 (a) and ν = 0.99 (b). The value of y is set at 1.functions (SQα ) admits a representation of the formS(yˆ,y) =∫ +∞−∞SQα,θ (yˆ,y) dH(θ) (yˆ,y ∈ R), (2.12)where H is a nonnegative measure andSQα,θ (yˆ,y) = (1(y < yˆ)−α) (1(θ < yˆ)−1(θ < y))=1−α, y≤ θ < yˆ,α, yˆ≤ θ < y,0, otherwise.(2.13)The mixing measure H is unique and satisfies dH(θ) = dG(θ) for θ ∈R, where Gis the non-decreasing function in the representation (2.5).For the mixture representations, plots of the average scores obtained based onSQα,θ (yˆ,y) are used in comparing competing forecasts. These plots are called Mur-12phy diagrams. Examples of Murphy diagrams are illustrated in Figure 2.4 andFigure 2.5. The Murphy diagrams display the plot of the scores for two compet-ing forecasting procedures obtained from the elementary scoring function in (2.13)against the parameter θ and the plot of the score differences against the parameterθ . When there is no difference in the performance of competing forecasting pro-cedures, the plot of the scores overlap and the score differences are close to zero(Figure 2.4). However, when the first listed forecasting procedure performs betterthan the second, there is a distinction in the plot of the scores over a range of θvalues and the score differences are negative (Figure 2.5).−1 0 1 2 3 40.0000.0050.0100.015Parameter θGARCH AR−GARCH(a)−1 0 1 2 3 4−0.0020.0000.001Parameter θ(b)Figure 2.4: Murphy diagrams for simulation study comparing forecasts for 0.99-quantile. Data generatingprocess is a GARCH (1, 1) model with skew-t innovations. The first forecasting procedure uses aGARCH (1, 1) model and the second forecasting procedure uses an AR (1)-GARCH (1, 1) model. (a)the plot of the scores from the elementary scoring function against the θ . (b) the plot of the scoresdifferences against θ . No difference in the performance of forecasting procedures.13−1 0 1 2 3 40.000.040.08Parameter θGARCH ARCH(a)−1 0 1 2 3 4−0.10−0.06−0.02Parameter θ(b)Figure 2.5: Murphy diagrams for simulation study comparing forecasts for 0.99-quantile. Data generat-ing process is a GARCH (1, 1) model with skew-t innovations. The first forecasting procedure usesa GARCH (1, 1) model with information on the conditional variance and the second forecasting pro-cedure uses an ARCH (1) model with no information on the conditional variance. (a) the plot of thescores from the elementary scoring function against the θ . (b) the plot of the scores differences againstθ . Forecasting procedure with GARCH model performs better than forecasting procedure with ARCHmodel.14Chapter 3Choice of scoring functionsThe choice of a consistent scoring function for evaluating VAR and pair (VAR, ES)forecasts is a challenging task due to issues that arise in forecasting. Holzmannand Eulert [2014] show that increasing the information sets, which is the amountof information forecasting procedures have access to in making forecasts, leadsto better point forecasts and smaller average scores when strictly consistent scor-ing functions are used in evaluating the forecasts provided the underlying model iscorrectly specified. Patton examines the case of forecasting procedures using cor-rectly specified models and competing forecasting procedures with nested infor-mation sets compared to forecasting procedures with nonnested information setsor using misspecified models. He argues that the choice of consistent scoring func-tions for a given functional is of no concern when models are correctly specified,and if information sets are nested. However, when the models are misspecifiedor forecasting procedures’ information sets are nonnested, care should be taken inchoosing consistent scoring functions for evaluating forecasts. He gives the fol-lowing propositions for VAR forecasts.Proposition 1. (Patton) Assume that (i) The information sets of two forecastingprocedures are nested, soF Bt ⊆F At orF At ⊆F Bt , and (ii) Forecasts A and B areoptimal under some GPL scoring function. Then the ranking of these forecasts byexpected Lin-Lin scoring function is sufficient for their ranking by any GPL scoringfunction.15Proposition 2. Assume that (a) The information sets of two forecasting proceduresare nonnested, soF Bt *F At andF At *F Bt , for some t, but Forecasts A and B areoptimal under some GPL scoring function, or (b) one or both of the α-quantileforecasts are based on misspecified models. Then the ranking of these forecasts is,in general, sensitive to the choice of GPL scoring function.He presents a realistic set-up where he compares the ranking of forecasts bytwo forecasting procedures. Forecasting procedure A has knowledge of only theconditional mean of the time series process while forecasting procedure B hasknowledge of only the conditional variance of the process. In assessing the per-formance of quantile forecasts from the two forecasting procedures using a homo-geneous GPL scoring function, inconsistent results are obtained with forecastingprocedure B performing better than forecasting procedure A for quantiles close tothe lower tail while for quantiles between the lower tail and the center, forecastingprocedure A performs better than forecasting procedure B.Inconsistency in the results when ranking forecasts with different consistentscoring functions poses a possible problem in forecasting. Since the issue of pa-rameter estimation error, model misspecification and nonnested information setsare often encountered in practice, it is desirable to identify which of the consistentscoring functions are better at identifying forecasting procedures that give more ac-curate forecasts. That is, since two competing forecasting procedures can be com-pared based on the average score obtained from the scoring functions as shown in(1.1), we seek to identify consistent scoring functions that may lead to making theright decision on forecasting procedures. From Theorem 2.3.1 and Theorem 2.3.2,we realize that there is a large number of consistent scoring functions that can beused in assessing VAR and pair (VAR, ES) forecasts. The choice of a consistentscoring function depends on the choice of G in (2.5) and G1 and G2 in (2.6). Webegin by identifying some criteria for the selection of consistent scoring functionsto assess forecasts of the risk measures.A desirable property of scoring functions is the property of homogeneity. Pat-ton [2011] argues that for homogeneous scoring functions, the ranking of forecastsis invariant to re-scaling of data and this is very useful in economic and financialapplications where the choice of units of measurements is arbitrary (e.g., measur-16ing in US dollars versus Canadian dollars).A scoring function is positive homogeneous of order b ifS(cyˆ,cy) = cbS(yˆ,y) for all yˆ = (yˆ1, ..., yˆk), all y and for c > 0.We concentrate on the homogeneous scoring functions for VAR and pair (VAR,ES) presented in (2.8), (2.9), (2.10), and (2.11) and identify the scoring functionsthat yield more accurate ranking of forecasts based on the criteria presented infollowing paragraphs.The use of consistent scoring functions for a given functional is a minimal re-quirement in forecast evaluation, and hence one criterion for a desirable scoringfunction is that the expected value of the scoring function with respect to the dis-tribution of random variable Y (the observed value) should exist under the weakestconditions. In finance, the GARCH model is commonly used in modeling log-returns data and the stationary solutions of GARCH processes are known to haveheavy tails which may lead to restrictions on the choice of scoring functions asthe EFS(yˆ,Y ) may not exist. Following Sun and Zhou [2014], we examine the tailbehaviour of a GARCH (1, 1) model and what implications it has on the range ofvalues of b, the homogeneity order.Next, Patton mentions that the use of multiple scoring functions in evaluatingforecasts can lead to clouded results since a forecaster may be the best under onescoring function and the worst under the other. He states that consistent scoringfunctions have different sampling properties and a careful choice of the scoringfunction to use in forecast evaluation may result in improved efficiency. We assessthe finite-sample properties of the Diebold-Mariano (D-M) test which is used totest the significance of the score difference for two forecasting procedures. The D-M test makes use of score values obtained from scoring functions. We assess theperformance of the different homogeneous consistent scoring functions for VARand the pair (VAR, ES) when used for the D-M test.Lastly, the ability of scoring functions to penalize more for under-predictionthan over-prediction is a desirable property in financial applications. To enablefinancial institutions make informed decisions in risk management, risk measuresare estimated and the predicted amount is put aside to cater for any possible future17losses. Forecasts may over-predict or under-predict the possible loss and this mayhave some negative effect on the operations of financial institutions. The magnitudeof under-prediction could lead to institutions making big losses. We examine thehomogeneous scoring functions to identify the scoring functions that penalize morefor under-prediction than for over-prediction of VAR and ES. We expand on theabove listed criteria in the following chapters.18Chapter 4Scoring functions and tailbehaviour4.1 Expectation of homogeneous scoring functionThe choice of a consistent scoring function from the family of consistent scoringfunctions for VAR and pair (VAR, ES) respectively, depends on the choice of Gfor the consistent scoring functions for VAR and the choice of G1 and G2 for theconsistent scoring functions for the pair (VAR, ES). The EFG(Y ), for a chosen G,should exist and be finite for the expectation of the scoring function to exist andhence be consistent. The homogeneous scoring functions with homogeneity orderb in (2.8) is consistent for VARα if EF(|Y |b) exists and is finite.The use of GARCH processes to model returns is very common in finance.We seek to identify the conditions under which the expected value of a return Y,modeled as a GARCH process, will exist. This section looks at the tail behaviourof GARCH processes. In particular, we concentrate on the range of values for theparameters of a GARCH (1, 1) model for which the expectation of a homogeneousGPL scoring function exists.194.1.1 GARCH Processes in QRMWhen looking at VAR and ES in quantitative risk management, we concentrateon the tail behaviour of the loss distribution. Various models have been proposedwhen describing common features of financial returns. Amongst these are the mod-els of the GARCH and ARCH family. One popular model used is the GARCH (1,1) model. The stationary GARCH (1, 1) model is believed to capture various em-pirical observed properties of financial returns despite its simplicity [Mikosch andStarcia, 2000].The marginal distributions of GARCH models are known to have heavy tails.This may lead to restrictions on the choice of a consistent scoring function as theE[S(yˆt ,Yt)] may not exist. We next look at the tail behaviour of a GARCH (1, 1)model presented by [Sun and Zhou, 2014] and what implications it has on the rangeof values of b, the homogeneity order.Consider a GARCH (1, 1) model Yt ,Yt = σtεt ,σ2t = α0+α1Y2t−1+β1σ2t−1,(4.1)where {εt} are independent and identically distributed (i.i.d) innovations with zeromean and unit variance, and the parameters α0, α1 and β1 are non-negative.Suppose κ is the non-zero solution to the equation 1E[(α1ε2t−1+β1)κ ] = 1. (4.2)The stationary solution of {Yt} follows a heavy-tailed distribution with tail index2κ:P(|Yt |> x)∼Cx−2κ ,as x→ ∞. (4.3)From (4.3) it follows that E(|Yt |b)<∞ as long as b < 2κ . This leads to the follow-ing proposition.1the solution to (4.2) exists provided that α1 + β1 < 1 and E[ε2κ0t ] = +∞ for κ0 := sup{m :E[ε2mt ]<+∞} (see Davis and Mikosh, 2009).20Proposition 3. For a GARCH (1, 1) model in (4.1), the homogeneous scoringfunction with homogeneity order b in (2.8) is strictly consistent for VARα providedb < 2κ , where κ is the solution to (4.2).For the pair (VAR, ES), the expected value for the 0-homogeneous scoringfunction and the (1/2)-homogeneous scoring functions for the score differences isfinite and exists for given values of yˆ1 and yˆ2 and information set up to time t-1 aslong as κ > 12 .We present in Figure 4.1 the plot of tail index (2κ) values against selectedparameter values of α1 and β1 for the GARCH (1, 1) model to show the upperbound of the homogeneity order b for which the expected value of the scoringfunction will exist. We consider the case of the standard normal innovations, t-distributed and skewed t innovations for 3, 4, 5, 6, 7, and 8 degrees of freedom.With all tail indices falling above 2 for the different parameter value combinations,the different distributions of the innovations, and the different degrees of freedomfor the t-distribution and skewed t-distribution, it indicates that the homogeneousscoring function is consistent for VAR in most cases where the homogeneity orderranges from 0 to 2 and a GARCH (1, 1) model with α1 +β1 < 1 is used to modelfinancial returns. More care has to be taken in choosing homogeneous scoringfunctions with b > 2 since the moments might not exist.210.65 0.75 0.85 0.9510203040normal innovationsβ1tail index (2κ)α1=0.1α1=0.09α1=0.08α1=0.07α1=0.06α1=0.063 4 5 6 7 8024681012t−innovations, α1=0.1degrees of freedomtail index (2κ)β1=0.65β1=0.75β1=0.85β1=0.87β1=0.883 4 5 6 7 802468skewed t−innovations, α1=0.1, γ=5degrees of freedomtail index (2κ)β1=0.65β1=0.75β1=0.85β1=0.87β1=0.883 4 5 6 7 802468skewed t−innovations, α1=0.09, γ=5degrees of freedomtail index (2κ)β1=0.65β1=0.75β1=0.85β1=0.87β1=0.88Figure 4.1: Upper bounds of the homogeneity order b for the expectation of the b-homogeneous scoringfunctions for VARα to exist.22Chapter 5Finite-Sample Properties of theDiebold-Mariano testIn this chapter, we assess the finite-sample size and power properties of the Diebold-Mariano (D-M) test when the homogeneous consistent scoring functions are usedin assessing the VAR and the pair (VAR, ES) forecasts from misspecified models,forecasting procedures with nonnested information sets, and models with parame-ter estimation errors.For the simulation study, data is generated from an ARMA-GARCH processwith different model specifications. We consider the three cases: parameter esti-mation error, wrongly specified models, and forecasting procedures with nonnestedinformation sets in forecasting the risk measures. Table 5.1 and Table 5.2 summa-rize the models for the data generating process (DGP), the model used by the firstforecaster in making predictions and the model used by the second forecaster inmaking predictions for the various cases when looking at the size and power prop-erties of the D-M test for the homogeneous consistent scoring functions for VARand the pair (VAR, ES).5.1 Diebold-Mariano testWhen predictions are made by competing forecasting procedures, differences areobserved in these predictions and based on the observed value of the underlying23DGP Forecasting procedure A Forecasting procedure BParameter estimation errorCase 1 GARCH (1, 1), GARCH (1, 1), AR (1)-GARCH (1, 1),(FPE & FHS) ε ∼ Skew t(0,1,5,1.5) ε ∼ Skew t(0,1,ν ,γ) ε ∼ Skew t(0,1,ν ,γ)Case 2 GARCH (1, 1), GARCH (1, 1), AR (1)-GARCH (1, 1),(FPE) ε ∼ t(0,1,ν = 5) ε ∼ t(0,1,ν) ε ∼ t(0,1,ν)Model misspecificationCase 1 GARCH (1, 1), GARCH (1, 1), AR (1)-GARCH (1, 1),(FPE) ε ∼ Skew t(0,1,5,1.5) ε ∼ t(0,1,ν) ε ∼ t(0,1,ν)Case 2 GARCH (1, 1), GARCH (1, 1), AR (1)-GARCH (1, 1),(FPE) ε ∼ Skew t(0,1,5,1.5) ε ∼∼N (0,1) ε ∼N (0,1)Nonnested information sets(FPE) GARCH (1, 1), GARCH (1, 1), GARCH (1, 1),ε ∼ Skew t(0,1,5,1.5) ε ∼ Skew t(0,1,ν ,γ) ε ∼ Skew t(0,1,ν ,γ)Known Unknownconditional variance conditional varianceTable 5.1: Models for parameter estimation error, model misspecification and nonnested information setsin the simulation study for the finite-sample size property of the D-M test. Models are used in makingquantile and expected shortfall forecasts and evaluated using selected consistent scoring functions.DGP Forecasting procedure A Forecasting procedure BParameter estimation error(FPE) AR (1)-GARCH (1, 1), GARCH (1, 1), AR (1)-GARCH (1, 1),ε ∼ Skew t(0,1,3,3.5) ε ∼ Skew t(0,1,ν ,γ) ε ∼N (0,1)Model misspecification(FPE) AR (1)-GARCH (1, 1), GARCH (1, 1), ARCH (1),ε ∼ Skew t(0,1,4,3.5) ε ∼ t(0,1,ν) ε ∼N (0,1)Nonnested information sets(FPE) GARCH (1, 1), GARCH (1, 1), ARCH (1),ε ∼ Skew t(0,1,5,3) ε ∼ Skew t(0,1,ν ,γ) ε ∼N (0,1)Known Unknownconditional variance conditional varianceTable 5.2: Models for the three scenarios (parameter estimation error, model misspecification andnonnested information sets) in the simulation study for the finite-sample power property of the D-Mtest. Models are used in making quantile and expected shortfall forecasts and evaluated using selectedconsistent scoring functions.process, we would like to assess which forecast is the most accurate. There isthe need for formal tests to compare predictive accuracy. Diebold and Mariano[1995] proposed the Diebold-Mariano test which is now widely used in the fieldof econometrics in comparing the predictive accuracy of competing forecasts. TheD-M test makes use of (consistent) scoring functions in assessing the accuracy offorecasts. The scores for point forecasts from competing forecasting proceduresare obtained from a scoring function and the differences in the score values areused in the computation of the test statistic. This indicates that the ability of the24scoring function to accurately distinguish between forecasting procedures is keyto the outcome of the test. Patton and Sheppard [2009] assess the size and powerproperties of the D-M test for homogeneous scoring functions used in evaluationvolatility proxies and identify the 0-homogeneous scoring function as a desirablescoring function for evaluation of volatility forecasts.The D-M test is for pairwise comparison of forecasts with a null hypothesis ofequal expected score value against a one-sided or a two-sided alternative hypothe-sis. The D-M test is defined under the following null hypothesis:H0 : E[S(RˆAt ,Yt)] = E[S(RˆBt ,Yt)] ∀t, (5.1)where RˆAt and RˆBt are point forecasts from two competing forecasting procedures attime t, Yt is the observed value at time t, and S :Rk×R→R, is a consistent scoringfunction. For the computation of the test statistic, we first define the difference ina series of values from a scoring function for forecasts {dt : t = 1,2, ...,T} as:dt = S(RˆAt ,Yt)−S(RˆBt ,Yt) (5.2)and the test statistic is given as:DMT =d¯T√âvar(d¯T ), (5.3)where d¯T ≡ 1TT∑t=1dtand âvar(d¯T ) is a consistent estimator of the asymptotic variance of the averagedifference. The asymptotic variance of the average difference can be computedusing the Newey-West variance estimator with the number of lags set to T13 forh step ahead forecasts where h > 1 [Patton and Sheppard, 2009]. For h = 1 it ispossible to show that the score differential series is covariance stationary and hencewe can estimate the asymptotic variance of the average difference using the samplevariance. Under the null hypothesis, the test statistic is asymptotically normallydistributed [Diebold and Mariano, 1995].255.2 Simulation-based result for size propertiesWe begin by assessing the size properties of the D-M test when the respectivehomogeneous consistent scoring functions for VAR and pair (VAR, ES) are usedin assessing forecasts under realistic scenarios.One-step ahead forecasts for the 0.90, 0.95 and 0.99 quantiles using a movingwindow of w = 1000 observations are obtained for four different out-of-samplesizes, T = {500,1000,1500,2000}. The D-M test is repeated 2000 times for eachT at a 5% level of significance. For the homogeneous GPL scoring functions forVAR, we consider b ∈ {0,0.1,0.5,1,1.5,2,3,5}.We test the hypothesesH0 : E[S(RˆAt ,Yt)] = E[S(RˆBt ,Yt)] ∀tH1 : E[S(RˆAt ,Yt)] 6= E[S(RˆBt ,Yt)] ∀t.The first scenario considered is the situation of parameter estimation error. For thisexample, we generate data {Yt}t∈Z from a GARCH (1, 1) model first with skewedt innovations and then with t-distributed innovations:Yt = σtεt ,σ2t = 0.05+0.10Y2t−1+0.85σ2t−1.Case 1: εtiid∼ Skew t(0,1,5,1.5)Case 2: εtiid∼ t(0,1,5)(5.4)The two forecasting procedures compared are based on correctly specified modelsthat are subject to estimation error. The fully parametric estimation (FPE) andfiltered historic simulation (FHS) procedures 2 are used in parameter estimations.forecasting procedure A uses a GARCH (1, 1) model while forecasting procedureB uses an AR (1)-GARCH (1, 1) model with zero mean:V̂ARα(Y At |Ft−1) = σ̂tV̂ARα(εt)V̂ARα(Y Bt |Ft−1) = µ̂t + σ̂tV̂ARα(εt)(5.5)2description of the estimation methods is given in the Appendix.26where V̂ARα(εt) = F−1ε (α), with Fε (.) denoting the cumulative distribution func-tion of the distribution of the innovations. For case 1, both forecasting proceduresspecify a skewed t distribution for the innovations while both forecasting proce-dures also specify a t-distribution with mean zero and variance one for the innova-tions in the second case.Table 5.3 shows the results of the size of the D-M test for the case where thefully parametric approach is used in the estimation of model parameters. The sizeof the test when using the homogeneous GPL scoring function with b less than 2is close to the theoretical value of 0.05 for the chosen α levels and out-of-samplesizes (T ). For higher values of b, e.g., b = 5, the size of the test is higher thanthe theoretical value. Table 5.4 shows the size of the test when forecasting proce-dures use the FHS in estimating parameters and once again, homogeneous scoringfunctions with lower homogeneity order perform better compared to the higherhomogeneity order values.α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.059 0.054 0.063 0.058 0.046 0.060 0.053 0.057b=0.1 0.059 0.057 0.063 0.059 0.047 0.061 0.054 0.056b=0.5 0.061 0.055 0.059 0.058 0.050 0.065 0.055 0.059b=1.0 0.061 0.056 0.059 0.058 0.056 0.064 0.057 0.062b=1.5 0.063 0.066 0.063 0.057 0.062 0.072 0.059 0.067b=2.0 0.063 0.073 0.071 0.063 0.075 0.083 0.074 0.080b=3.0 0.073 0.086 0.083 0.078 0.097 0.125 0.106 0.116b=5.0 0.077 0.104 0.092 0.096 0.118 0.150 0.142 0.150α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.076 0.072 0.076 0.076b=0.1 0.078 0.072 0.074 0.075b=0.5 0.079 0.075 0.078 0.076b=1.0 0.080 0.079 0.080 0.079b=1.5 0.093 0.090 0.091 0.093b=2.0 0.113 0.115 0.111 0.110b=3.0 0.159 0.183 0.177 0.174b=5.0 0.259 0.312 0.308 0.303Table 5.3: Parameter estimation error: (Case 1) Size values of the D-M test for the one-step ahead forecastof the 0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is usedin the estimation of the model parameters in this case.We now consider the case where the innovations follow a t-distribution withmean zero and variance one rather than a skewed t-distribution using the samemodels in (5.5) for forecasting procedure A and forecasting procedure B respec-27α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.071 0.078 0.094 0.092 0.086 0.091 0.109 0.131b=0.1 0.070 0.078 0.093 0.094 0.088 0.089 0.108 0.131b=0.5 0.077 0.076 0.084 0.089 0.091 0.091 0.105 0.124b=1.0 0.077 0.076 0.082 0.090 0.092 0.092 0.106 0.118b=1.5 0.084 0.086 0.081 0.083 0.098 0.095 0.106 0.114b=2.0 0.088 0.092 0.084 0.088 0.106 0.109 0.114 0.117b=3.0 0.091 0.105 0.092 0.107 0.144 0.141 0.138 0.137b=5.0 0.102 0.112 0.102 0.119 0.178 0.184 0.164 0.172α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.145 0.124 0.141 0.152b=0.1 0.145 0.121 0.140 0.152b=0.5 0.152 0.123 0.146 0.147b=1.0 0.167 0.134 0.148 0.155b=1.5 0.181 0.149 0.159 0.155b=2.0 0.202 0.169 0.181 0.165b=3.0 0.248 0.216 0.227 0.216b=5.0 0.339 0.340 0.359 0.345Table 5.4: Parameter estimation error: (Case 1) Size values of the D-M test for the one-step ahead forecastof the 0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The filtered historic simulation isused in the estimation of the model parameters in this case.tively. Estimation of parameters is done using the fully parametric approach. Theresults presented in Table 5.5 show that the size of the test is close to the theoreticalvalue for the homogeneous GPL scoring function where b is 2 or less for the threeα levels. For higher values of b (e.g., b = 3), the size is slightly lower than 0.05 forthe 0.90 and 0.95 quantiles but gets close to the theoretical value for 0.99 quantile.Overall, the homogeneous GPL scoring functions with b less than 2 perform wellin the cases considered under the parameter estimation error scenario.Next, we examine the size of the D-M test for the situation where the twoforecasting procedures which are based on similar but misspecified models. Welook at the ability of the scoring function to assess similar forecasts. Just as in thefirst scenario, we generate data {Yt}t∈Z from a GARCH (1, 1) model with skewedt innovations:Yt = σtεt ,σ2t = 0.05+0.10Y2t−1+0.85σ2t−1.εtiid∼ Skew t(0,1,5,1.5)(5.6)For the first case, the forecasting procedures specify a t-distribution with mean28α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.044 0.055 0.062 0.060 0.051 0.061 0.060 0.061b=0.1 0.045 0.056 0.063 0.061 0.048 0.055 0.061 0.064b=0.5 0.046 0.059 0.060 0.061 0.048 0.057 0.060 0.061b=1.0 0.047 0.056 0.056 0.061 0.049 0.059 0.058 0.058b=1.5 0.049 0.056 0.055 0.060 0.044 0.055 0.058 0.056b=2.0 0.049 0.054 0.054 0.055 0.041 0.055 0.056 0.049b=3.0 0.046 0.047 0.049 0.052 0.037 0.050 0.049 0.044b=5.0 0.040 0.040 0.038 0.036 0.038 0.040 0.044 0.038α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.051 0.060 0.051 0.062b=0.1 0.050 0.059 0.051 0.062b=0.5 0.049 0.053 0.048 0.062b=1.0 0.046 0.052 0.047 0.061b=1.5 0.043 0.049 0.047 0.055b=2.0 0.042 0.047 0.042 0.053b=3.0 0.044 0.046 0.046 0.052b=5.0 0.054 0.046 0.052 0.055Table 5.5: Parameter estimation error: (Case 2) Size values of the D-M test for the one-step ahead forecastof the 0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is usedin the estimation of the model parameters in this case.zero and variance one for the innovations of their models, while they both specifya standard normal distribution for the innovations in the second case. Forecastingprocedure A uses a GARCH (1, 1) model while forecasting procedure B uses anAR (1)-GARCH (1, 1) model.The results presented in Table 5.6 show that the size of the test is close to thetheoretical value for lower values of b of the homogeneous scoring function. Thesize of the test is, however, higher for larger values of b(e.g., b = 5) as we getcloser to the upper tail. Table 5.7 shows results of the second example where bothforecasting procedures specify a standard normal distribution for the innovationsof their models. The size of the test is generally close to the theoretical value formost values of b. Some cases of higher size values are observed for the homoge-neous scoring function when b > 2 and T increases. Overall, the homogeneousGPL scoring functions with b < 2 do perform quite well in assessing the similarforecasts.The last scenario we consider for the size of the D-M test is when the forecast-ing procedures have nonnested information sets. Forecasting procedure A has in-formation about the conditional variance of the returns while forecasting procedure29α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.064 0.061 0.065 0.063 0.061 0.065 0.067 0.059b=0.1 0.063 0.060 0.065 0.063 0.059 0.065 0.065 0.059b=0.5 0.063 0.060 0.061 0.064 0.059 0.064 0.066 0.057b=1.0 0.064 0.068 0.063 0.062 0.060 0.066 0.067 0.058b=1.5 0.064 0.074 0.066 0.063 0.060 0.066 0.067 0.064b=2.0 0.063 0.074 0.068 0.066 0.061 0.071 0.072 0.067b=3.0 0.068 0.082 0.071 0.066 0.074 0.083 0.076 0.072b=5.0 0.064 0.083 0.070 0.069 0.081 0.090 0.080 0.085α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.045 0.060 0.063 0.061b=0.1 0.046 0.060 0.060 0.060b=0.5 0.047 0.061 0.061 0.061b=1.0 0.050 0.057 0.058 0.058b=1.5 0.050 0.056 0.057 0.057b=2.0 0.054 0.064 0.062 0.062b=3.0 0.073 0.076 0.065 0.065b=5.0 0.119 0.116 0.107 0.107Table 5.6: Model misspecification: (Case 1) Size values of the D-M test for the one-step ahead forecast ofthe 0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is usedin the estimation of the model parameters in this case.B has no information on the conditional variance of the process. Both forecastingprocedures use a GARCH (1, 1) model with skewed-t innovations. The data isgenerated from the same model used for the first case of the parameter estimationexample. Table 5.8 shows that the size of the test is close to the theoretical valuefor lower values of b of the homogeneous GPL scoring function across the differentout-of-sample sizes. The size values are higher for all homogeneous GPL scoringfunctions for the case where α = 0.99. The homogeneous GPL scoring functionwith b = 0 seems to perform well amongst all the scoring functions with size valuesrelatively close to the theoretical value of the test in most of the cases considered.We now assess the performance of consistent scoring functions used in evaluat-ing forecasts of the pair (VAR, ES) using the same scenarios presented for VAR. Aone-step ahead forecast is estimated for the ES at level ν = {0.754,0.875,0.975}which should yield similar magnitude of risk as VAR0.90, VAR0.95, and VAR0.99respectively [Nolde and Ziegel, 2016].For the size of the D-M test, the (1/2)-homogeneous and 0-homogeneous scor-ing functions presented in (2.10) and (2.11) respectively, produce size values whichare close to the theoretical value of 0.05 for the different values of T at the dif-30α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.062 0.063 0.063 0.066 0.058 0.066 0.077 0.069b=0.1 0.063 0.064 0.063 0.067 0.059 0.068 0.077 0.067b=0.5 0.062 0.065 0.067 0.067 0.056 0.071 0.079 0.071b=1.0 0.062 0.068 0.068 0.065 0.058 0.070 0.077 0.073b=1.5 0.061 0.074 0.070 0.069 0.061 0.072 0.080 0.074b=2.0 0.061 0.085 0.077 0.078 0.065 0.078 0.085 0.080b=3.0 0.071 0.090 0.083 0.079 0.072 0.084 0.085 0.084b=5.0 0.063 0.087 0.080 0.077 0.077 0.089 0.082 0.095α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.054 0.068 0.069 0.078b=0.1 0.054 0.070 0.071 0.078b=0.5 0.053 0.070 0.068 0.074b=1.0 0.052 0.070 0.067 0.079b=1.5 0.051 0.070 0.064 0.073b=2.0 0.055 0.068 0.059 0.073b=3.0 0.055 0.065 0.057 0.067b=5.0 0.064 0.069 0.066 0.069Table 5.7: Model misspecification: (Case 2) Size values of the D-M test for the one-step ahead forecast ofthe 0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is usedin the estimation of the model parameters in this case.ferent levels of ν . Comparing the size values of the pair (VAR, ES) at levelν = {0.754,0.875,0.975} with the size values obtained for VAR at level α ={0.90,0.95,0.99}, it is seen that the values obtained for the 0-homogeneous and(1/2)-homogeneous scoring functions used in assessing the pair (VAR, ES) fore-casts are similar to that of the 0-homogeneous and (1/2)-homogeneous GPL scor-ing functions used in assessing the VAR forecasts. Overall, the size values for theD-M test in the evaluation of forecasts of the pair (VAR, ES) are close to the theo-retical value of 0.05. The results for size of the D-M test for the pair (VARν , ESν )under the different scenarios are presented in the Tables 5.9, 5.10, 5.11, 5.12, and5.13.5.3 Simulation-based result for power propertiesFor the power of the D-M test, we look at the ability of the scoring function todistinguish between forecasts and assign smaller scoring values to forecasts close31α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.058 0.064 0.074 0.073 0.059 0.064 0.073 0.077b=0.1 0.056 0.064 0.074 0.074 0.060 0.066 0.071 0.078b=0.5 0.054 0.066 0.067 0.076 0.060 0.061 0.072 0.078b=1.0 0.052 0.064 0.066 0.076 0.065 0.064 0.071 0.078b=1.5 0.056 0.066 0.065 0.079 0.071 0.070 0.072 0.077b=2.0 0.060 0.076 0.065 0.086 0.083 0.083 0.083 0.082b=3.0 0.079 0.103 0.094 0.108 0.131 0.130 0.123 0.124b=5.0 0.159 0.179 0.165 0.162 0.243 0.243 0.237 0.244α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.130 0.108 0.097 0.109b=0.1 0.133 0.108 0.100 0.112b=0.5 0.138 0.111 0.105 0.114b=1.0 0.144 0.122 0.112 0.117b=1.5 0.164 0.136 0.120 0.136b=2.0 0.185 0.156 0.144 0.155b=3.0 0.259 0.225 0.227 0.235b=5.0 0.439 0.426 0.437 0.441Table 5.8: Nonnested information sets: Size values of the D-M test for the one-step ahead forecast of the0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is used inthe estimation of the model parameters in this case.ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.061 0.059 0.070 0.063 0.054 0.060 0.054 0.052b=0.5 0.059 0.064 0.071 0.067 0.059 0.057 0.061 0.056ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.053 0.064 0.066 0.063b=0.5 0.059 0.061 0.059 0.062Table 5.9: Parameter estimation error: (Case 1) Size values of the D-M test for the one-step ahead forecastof (VAR, ES) for ν values 0.754, 0.875 and 0.975 with various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters in this case.to the realized value. We test the hypothesesH0 : E[S(RˆAt ,Yt)] = E[S(RˆBt ,Yt)] ∀tH1 : E[S(RˆAt ,Yt)]< E[S(RˆBt ,Yt)] ∀t.We examine the three scenarios examined under the size of the D-M test in thissection. We begin with the parameter estimation error scenario for VAR forecastswhere we generate data from an AR (1)-GARCH (1, 1) model with a zero mean32ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.060 0.063 0.070 0.080 0.050 0.063 0.067 0.075b=0.5 0.055 0.066 0.069 0.079 0.049 0.058 0.068 0.075ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.042 0.048 0.058 0.058b=0.5 0.041 0.046 0.057 0.056Table 5.10: Parameter estimation error: (Case 2) Size values of the D-M test for the one-step ahead fore-cast of (VAR, ES) for ν values 0.754, 0.875 and 0.975 at various out-of-sample sizes. The fullyparametric approach is used in the estimation of the model parameters in this case.ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.056 0.067 0.067 0.064 0.059 0.061 0.064 0.061b=0.5 0.056 0.069 0.068 0.065 0.059 0.063 0.070 0.060ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.051 0.058 0.069 0.069b=0.5 0.058 0.061 0.064 0.069Table 5.11: Model misspecification: (Case 1) Size values of the D-M test for the one-step ahead forecast of(VAR, ES) for ν values 0.754, 0.875 and 0.975 with various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters in this case.and a right-skewed t-distribution for the innovations:Yt = µt +σtεt , µt = 0.05Yt−1σ2t = 0.05+0.10Y2t−1+0.85σ2t−1.εtiid∼ Skew t(0,1,3,3.5)(5.7)Forecasting procedure A uses a GARCH (1, 1) model with innovations follow-ing a skewed-t distribution while forecasting procedure B uses an AR (1)-GARCH(1, 1) model with standard normal innovations. The results in Table 5.14 showthat the homogeneous GPL scoring function with lower b values are able to dis-cern superior forecast performance for all the quantile levels recording higherpower values. Figure 5.1 shows the pattern plot of the power values for all T ∈{500,1000,1500,2000} across the values b. The plots for the 0.90, 0.95 and 0.99quantile show a general drop in the power of the test as the value of b increases forall levels of T . The highest power is mostly obtained when b = 0.For the second scenario, which looks at model misspecification, data is gener-33ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.058 0.068 0.075 0.068 0.060 0.063 0.071 0.065b=0.5 0.061 0.077 0.073 0.078 0.064 0.068 0.068 0.069ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.056 0.070 0.080 0.076b=0.5 0.058 0.072 0.071 0.075Table 5.12: Model misspecification: (Case 2) Size values of the D-M test for the one-step ahead forecast of(VAR, ES) for ν values 0.754, 0.875 and 0.975 with various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters in this case.ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.063 0.062 0.073 0.074 0.066 0.070 0.082 0.081b=0.5 0.056 0.054 0.073 0.073 0.057 0.066 0.075 0.075ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.108 0.099 0.105 0.121b=0.5 0.092 0.090 0.097 0.108Table 5.13: Nonnested information sets: Size values of the D-M test for the one-step ahead forecast of(VAR, ES) for ν values 0.754, 0.875 and 0.975 at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters in this case.ated from an AR (1)-GARCH (1, 1) model with a zero mean and a right-skewed tdistribution for the innovations:Yt = µt +σtεt , µt = 0.05Yt−1σ2t = 0.05+0.10Y2t−1+0.85σ2t−1.εtiid∼ Skew t(0,1,4,3.5)(5.8)Forecasting procedure A uses a GARCH (1, 1) model but wrongly specifies thedistribution of the innovations by choosing a t-distribution. forecasting procedureB on the other hand wrongly specifies the model and innovation using an ARCH(1) model with standard normal innovations. In this case, we expect forecastingprocedure A to perform better than forecasting procedure B. Table 5.15 showshigh power values for lower b of the homogeneous GPL scoring function with thelowest power obtained at b = 5. Higher power values for the lower values of bindicates the ability of these homogeneous GPL scoring functions to distinguishbetween forecasting procedures. Figure 5.2 shows the pattern plot of the power34α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.389 0.513 0.634 0.701 0.210 0.276 0.347 0.409b=0.1 0.368 0.511 0.635 0.698 0.189 0.264 0.339 0.397b=0.5 0.379 0.522 0.638 0.694 0.187 0.255 0.336 0.385b=1.0 0.389 0.521 0.646 0.668 0.188 0.242 0.322 0.363b=1.5 0.386 0.499 0.579 0.625 0.192 0.242 0.292 0.331b=2.0 0.403 0.478 0.542 0.576 0.203 0.227 0.263 0.297b=3.0 0.394 0.436 0.467 0.471 0.206 0.221 0.222 0.243b=5.0 0.370 0.350 0.344 0.340 0.202 0.190 0.205 0.206α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.477 0.715 0.836 0.888b=0.1 0.459 0.710 0.830 0.884b=0.5 0.423 0.681 0.801 0.863b=1.0 0.360 0.612 0.742 0.813b=1.5 0.269 0.490 0.628 0.714b=2.0 0.171 0.336 0.461 0.547b=3.0 0.046 0.109 0.139 0.171b=5.0 0.009 0.016 0.012 0.018Table 5.14: Parameter estimation error: Power values of the D-M test for the one-step ahead forecast ofthe 0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is usedin the estimation of the model parameters in this case.values for all T ∈ {500,1000,1500,2000} across the values b. The plots for the0.90 and 0.95 quantile show a peak around b = 1.5 with general drop in the powerof the test as the value of b increases.For the third scenario, we look at the case of forecasting procedures withnonnested information sets with data generated from a GARCH (1, 1) model withinnovations from a right-skewed t-distribution with mean zero and unit variance:Yt = µt +σtεt , µt = 0.05Yt−1σ2t = 0.05+0.10Y2t−1+0.85σ2t−1.εtiid∼ Skew t(0,1,5,3)(5.9)Forecasting procedure A uses a GARCH (1, 1) model with skewed t-distributedinnovations while forecasting procedure B uses an ARCH (1) model with standardnormal innovations. Forecasting procedure A has knowledge of the conditionalvariance while forecasting procedure B has no knowledge of the conditional vari-ance. The results presented in Table 5.16 show high power values for lower valuesof b of the homogeneous GPL scoring function with the power value decreasing as350 1 2 3 4 50.00.20.40.60.8α=0.90bpowerT=500 T=1000 T=1500 T=20000 1 2 3 4 50.00.20.40.60.8α=0.95bpowerT=500 T=1000 T=1500 T=20000 1 2 3 4 50.00.20.40.60.8α=0.99bpowerT=500 T=1000 T=1500 T=2000Figure 5.1: Plot of power of the D-M test against homogeneity order b of the homogeneous GPL for thedifferent out-of-sample sizes.b increases. From Figure 5.3, we see a general drop in the power values for all T∈ {500,1000,1500,2000} as the value of b increases. The highest power values ofthe D-M test are obtained when the homogeneous GPL scoring function with b =0 is used in assessing forecasting procedures.For the power of the D-M test for the pair (VAR, ES) forecasts, the 0-homogeneousand (1/2)-homogeneous scoring functions perform well under the three cases pre-sented. The power values increase as T gets larger for the different ν levels. Com-paring the power of (VAR, ES) at levels of ν = {0.754,0.875,0.975} to the power36α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.394 0.520 0.604 0.652 0.287 0.327 0.351 0.358b=0.1 0.385 0.520 0.612 0.659 0.261 0.293 0.330 0.341b=0.5 0.408 0.546 0.636 0.695 0.273 0.318 0.363 0.383b=1.0 0.430 0.560 0.654 0.710 0.283 0.351 0.404 0.437b=1.5 0.441 0.543 0.648 0.700 0.305 0.374 0.449 0.485b=2.0 0.460 0.533 0.620 0.658 0.332 0.389 0.461 0.503b=3.0 0.439 0.486 0.535 0.562 0.349 0.398 0.463 0.473b=5.0 0.326 0.346 0.361 0.360 0.295 0.328 0.348 0.346α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.494 0.613 0.708 0.773b=0.1 0.476 0.604 0.698 0.763b=0.5 0.468 0.604 0.706 0.775b=1.0 0.460 0.598 0.706 0.775b=1.5 0.443 0.589 0.706 0.767b=2.0 0.427 0.574 0.689 0.742b=3.0 0.399 0.523 0.622 0.688b=5.0 0.297 0.360 0.417 0.431Table 5.15: Model misspecification: Power values of the D-M test for the one-step ahead forecast of the0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is used inthe estimation of the model parameters in this case.obtained for VAR at levels of α = {0.90,0.95,0.99} it is seen that the power val-ues of the D-M test for the pair (VAR, ES) forecasts are higher than that of theVAR in most cases. Results are presented in Tables 5.17, 5.18 and 5.19. We alsopresent plots of the power of the test against the out-of-sample sizes for the two ho-mogeneous scoring functions considered. The plots indicate that the power of thetest increases as T increases. The (1/2)-homogeneous scoring function has higherpower values for ν values close to the center, however, the 0-homogeneous scoringfunction produces higher power values as we get closer to the upper tail.From the simulation results, we observe that the homogenous consistent scor-ing functions with lower homogeneity order values have desirable size and powerproperties for the D-M test. In particular, the 0-homogeneous scoring function forevaluating the VAR forecasts and pair (VAR, ES) forecasts respectively, have bet-ter finite-sample size and power properties in the presence of model uncertainty,parameter estimation error and nonnested information sets. It is also observed thatthe when looking at the extreme tails of the distribution, the size values tend toincrease while the performance of the scoring function for the power of the testswitches for the pair (VAR, ES) forecasts.370 1 2 3 4 50.00.20.40.60.8α=0.90bpowerT=500 T=1000 T=1500 T=20000 1 2 3 4 50.00.20.40.60.8α=0.95bpowerT=500 T=1000 T=1500 T=20000 1 2 3 4 50.00.20.40.60.8α=0.99bpowerT=500 T=1000 T=1500 T=2000Figure 5.2: Plot of power of the D-M test against homogeneity order b of the homogeneous GPL for thedifferent out-of-sample sizes.38α = 0.90 α = 0.95T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.402 0.570 0.702 0.779 0.649 0.871 0.938 0.970b=0.1 0.395 0.566 0.701 0.778 0.638 0.869 0.936 0.968b=0.5 0.395 0.555 0.694 0.771 0.619 0.850 0.927 0.963b=1.0 0.381 0.526 0.657 0.742 0.578 0.812 0.904 0.952b=1.5 0.371 0.488 0.593 0.682 0.511 0.753 0.860 0.927b=2.0 0.360 0.441 0.526 0.600 0.422 0.656 0.784 0.862b=3.0 0.333 0.369 0.415 0.462 0.240 0.401 0.504 0.576b=5.0 0.263 0.266 0.274 0.286 0.094 0.117 0.115 0.120α = 0.99T= 500 T=1000 T=1500 T=2000b=0 0.999 0.999 0.999 0.999b=0.1 0.996 0.999 0.999 0.999b=0.5 0.993 0.999 0.999 0.999b=1.0 0.988 0.999 0.999 0.999b=1.5 0.979 0.998 0.999 0.999b=2.0 0.941 0.994 0.999 0.999b=3.0 0.672 0.873 0.923 0.937b=5.0 0.002 0.007 0.014 0.017Table 5.16: Nonnested information sets: Power values of the D-M test for the one-step ahead forecast ofthe 0.90, 0.95 and 0.99 quantiles at various out-of-sample sizes. The fully parametric approach is usedin the estimation of the model parameters in this case.ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.890 0.989 0.995 0.998 0.682 0.909 0.972 0.994b=0.5 0.921 0.994 0.995 0.998 0.726 0.912 0.978 0.995ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.411 0.676 0.802 0.872b=0.5 0.356 0.604 0.741 0.815Table 5.17: Parameter estimation error: Power values of the D-M test for the one-step ahead forecast of(VAR, ES) for ν values 0.754, 0.875 and 0.975 at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters in this case.ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.529 0.725 0.823 0.892 0.378 0.491 0.587 0.656b=0.5 0.697 0.874 0.942 0.976 0.455 0.609 0.723 0.790ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.413 0.531 0.630 0.689b=0.5 0.375 0.487 0.581 0.635Table 5.18: Model misspecification: Power values of the D-M test for the one-step ahead forecast of(VAR, ES) for ν values 0.754, 0.875 and 0.975 at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters in this case.390 1 2 3 4 50.20.40.60.8α=0.90bpowerT=500 T=1000 T=1500 T=20000 1 2 3 4 50.20.40.60.81.0α=0.95bpowerT=500 T=1000 T=1500 T=20000 1 2 3 4 50.00.20.40.60.81.0α=0.99bpowerT=500 T=1000 T=1500 T=2000Figure 5.3: Plot of power of the D-M test against homogeneity order b of the homogeneous GPL for thedifferent out-of-sample sizes.ν = 0.754 ν = 0.875T= 500 T=1000 T=1500 T=2000 T= 500 T=1000 T=1500 T=2000b=0 0.708 0.922 0.983 0.998 0.560 0.819 0.925 0.975b=0.5 0.751 0.943 0.992 0.998 0.537 0.774 0.903 0.952ν = 0.975T= 500 T=1000 T=1500 T=2000b=0 0.541 0.777 0.900 0.946b=0.5 0.489 0.728 0.854 0.924Table 5.19: Nonnested information sets: Power values of the D-M test for the one-step ahead forecast of(VAR, ES) for ν values 0.754, 0.875 and 0.975 at various out-of-sample sizes. The fully parametricapproach is used in the estimation of the model parameters in this case.40500 1000 1500 20000.850.900.951.00ν=0.754Out−of−sample sizepowerb=0 b=0.5500 1000 1500 20000.650.700.750.800.850.900.951.00ν=0.875Out−of−sample sizepowerb=0 b=0.5500 1000 1500 20000.30.40.50.60.70.80.9ν=0.975Out−of−sample sizepowerb=0 b=0.5Figure 5.4: Parameter estimation Error: Plot of power of the D-M test against out-of-sample size (T) forthe homogeneous scoring functions for the pair (VARν , ESν ).41500 1000 1500 20000.50.60.70.80.91.0ν=0.754Out−of−sample sizepowerb=0 b=0.5500 1000 1500 20000.30.40.50.60.70.8ν=0.875Out−of−sample sizepowerb=0 b=0.5500 1000 1500 20000.30.40.50.60.7ν=0.975Out−of−sample sizepowerb=0 b=0.5Figure 5.5: Model misspecification: Plot of power of the D-M test against out-of-sample size (T) for thehomogeneous scoring functions for the pair (VARν , ESν ).42500 1000 1500 20000.700.750.800.850.900.951.00ν=0.754Out−of−sample sizepowerb=0 b=0.5500 1000 1500 20000.50.60.70.80.91.0ν=0.875Out−of−sample sizepowerb=0 b=0.5500 1000 1500 20000.40.50.60.70.80.9ν=0.975Out−of−sample sizepowerb=0 b=0.5Figure 5.6: Nonnested information sets: Plot of power of the D-M test against out-of-sample size (T) forthe homogeneous scoring functions for the pair (VARν , ESν ).43Chapter 6Over-prediction andUnder-prediction in financeFinancial institutions usually set aside an amount of money to deal with worstcase-scenarios in the financial markets. VAR and ES are often used as the under-lying risk measure to calculate the amount of capital to set aside to cover potentiallosses. Since institutions lose interest on the capital stored, it is their desire tomake accurate predictions of the VAR and ES. If financial institutions over-predictthe VAR or ES, they end up storing more capital than needed and hence lose po-tential interest income that they could have earned on the extra amount. However,under-prediction risk measures may lead to banks not having enough capital to ab-sorb large losses and hence makes them crises-prone.In this chapter, we look at how the homogeneous scoring functions penalize forover-prediction and under-prediction. Since firms only have to deal with the lossof profit that would have been made on extra money stored for over-predicted VARor ES, it is desirable for a scoring function to penalize more for under-predictionthan over-prediction.6.1 Simulation StudyWe begin by illustrating how the homogeneous scoring functions penalize for over-prediction and under-prediction of VARα of the same magnitude. We assume our44returns follow a standard normal distribution and select the optimal forecast (Ropt)as the 0.95 quantile of the underlying distribution. The 0.99 and 0.8325 quantilesare set as the over-predicted and under-predicted VARα forecasts respectively. Theover-predicted VARα forecast (Rˆov) and under-predicted VARα forecast (Rˆud) arechosen such thatRˆov−Ropt = Ropt − RˆudWe compute the expected score (EFS(Rˆ,Y )) for chosen values of the homogeneityorder (b) of our scoring function and rank the expected scores for the optimal, over-predicted and under-predicted VARα forecasts. Table 6.1 below shows the ranksfor different values of the homogeneity order. It is seen that the optimal forecasthas the lowest expected score and is hence ranked as the best forecast for all valuesb. The over-predicted VARα forecast has the second lowest expected score for thehomogeneous scoring functions with lower homogeneity order. The ranking of theover-predicted and under-predicted forecasts however changes for the case whereb = 3 or higher.0 200 400 600 800 1000−3−2−10123standard normalIndexvaluesoptimal forecast over prediction under predictionFigure 6.1: Plot depicting the optimal forecast value, over-predicted and under-predicted VAR forecastsfor a case where returns are assumed to follow a standard normal distribution45E(Sop)[Rank] E(Sov)[Rank] E(Sud)[Rank]b=0 0.0372 [1] 0.0434 [2] 0.0668 [3]b=0.1 0.5368 [1] 0.5455 [2] 0.5670 [3]b=0.5 0.1431 [1] 0.1546 [2] 0.1750 [3]b=1.0 0.1031 [1] 0.1197 [2] 0.1375 [3]b=1.5 0.0999 [1] 0.1238 [2] 0.1370 [3]b=2.0 0.1098 [1] 0.1442 [2] 0.1501 [3]b=3.0 0.1618 [1] 0.2337 [3] 0.2098 [2]b=5.0 0.5392 [1] 0.8592 [3] 0.6112 [2]Table 6.1: Expected scores and corresponding forecasts for selected values of the homogeneity order bused in the scoring function for VAR0.95 forecasts. Sop, Sov and Sud indicate scoring functions when theoptimal forecast, over-predicted forecast and under-predicted forecasts, respectively, are used.We further explore how the homogeneous scoring functions penalize for over-prediction and under-prediction as the magnitude of the difference (d) between theoptimal forecast and the over-/under-predicted forecasts increases. We assess thebehaviour of the scoring functions using data generated from a skewed-normal dis-tribution with skewness parameter of 3 and a skewed t-distribution with 5 degreesof freedom and a skewness parameter of 3. The expectation of the homogeneousscoring function under the given distribution of the random variable Y is computedat various magnitudes of difference.Figure 6.2 and Figure 6.3 display the plot of the expected score of the scoringfunction against the magnitude of the difference. For the two cases studied, it isseen that for lower homogeneity order, the expected score for the under-predictedvalue is greater than that of the over-predicted value. However, for higher valuesof b (e.g., b = 3), over-prediction is penalized more than under prediction as themagnitude of the difference (d) increases.For the pair (VAR, ES) forecasts, we look at the 0-homogeneous and the (1/2)-homogeneous scoring functions and assess how they penalize for over-predictionand under-prediction of the same magnitude. The homogeneous scoring functionsfor the pair (VAR, ES) assign negative score values and hence assessment of howthey penalize for under-prediction and over-prediction is done by measuring thedistance from the expected score at the optimal forecast to the expected score forthe under-predicted and over-predicted value at a given d. Figure 6.4 illustrateshow the distance from the expected score at the optimal forecast to the expectedscore of the under-predicted value is higher than that for the over-predicted value460.2 0.4 0.6 0.8 1.0 1.2 1.40.10.20.30.40.5skewed − normal, b = 0.5differenceexpected scoreoptimal forecast under−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.40.100.150.200.250.30skewed − normal, b = 1differenceexpected scoreoptimal forecastunder−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.40.30.40.50.60.7skewed − normal, b = 3differenceexpected scoreoptimal forecastunder−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.40.60.81.01.21.41.61.8skewed − normal, b = 4differenceexpected scoreoptimal forecastunder−prediction over−prediction Figure 6.2: Plot of expectation of homogeneous scoring function for VAR0.95 forecasts against the mag-nitude of difference for over-prediction and under-prediction. Top panel shows the case for the lowerhomogeneous order b, where under-prediction is penalized more than over-prediction. The lower panelshows the case for higher values of b where over-prediction is penalized more than under-prediction.Data is generated from a skewed-normal distribution with mean zero, unit variance and skewness pa-rameter of 3.as d increases.470.2 0.4 0.6 0.8 1.0 1.2 1.40.150.200.250.300.350.40skewed − t, b = 0.5differenceexpected scoreoptimal forecast under−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.40.100.150.200.250.30skewed − t, b = 1differenceexpected scoreoptimal forecast under−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.40.70.80.91.01.1skewed − t, b = 3differenceexpected scoreoptimal forecast under−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.44.64.85.05.25.45.65.8skewed − t, b = 4differenceexpected scoreoptimal forecast under−prediction over−prediction Figure 6.3: Plot of expectation of homogeneous scoring function for VAR0.95 forecasts against the mag-nitude of difference for over-prediction and under-prediction. Top panel shows the case for the lowerhomogeneous order b, where under-prediction is penalized more than over-prediction. The lower panelshows the case for higher values of b where over-prediction is penalized more than under-prediction.Data is generated from a skewed-t distribution with mean zero, unit variance, 5 degrees of freedom andskewness parameter of 3.480.2 0.4 0.6 0.8 1.0 1.2 1.4−0.50.00.51.0skewed−t, b = 0differenceexpected scoreoptimal forecast under−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.4−1.0−0.6−0.20.0skewed−t, b = 0.5differenceexpected score optimal forecast under−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.4−0.50.00.51.0skewed − normal, b = 0differenceexpected scoreoptimal forecastunder−prediction over−prediction 0.2 0.4 0.6 0.8 1.0 1.2 1.4−1.0−0.8−0.6−0.4−0.20.0skewed − normal, b = 0.5differenceexpected scoreoptimalforecast under−prediction over−prediction Figure 6.4: Plot of expectation of homogeneous scoring function for(VAR0.95, ES0.95) forecasts againstthe magnitude of difference for over-prediction and under-prediction. Top panel shows the case wheredata is generated from a skewed-t distribution with mean zero, unit variance, 5 degrees of freedom andskewness parameter of 3. The lower panel shows the case where data is generated from a skewed-normal distribution with mean zero, unit variance and skewness parameter of 3.49Chapter 7ConclusionThis thesis investigates the criteria for choosing desirable scoring functions forevaluating forecasting procedures for Value-at-Risk and Expected Shortfall. Gneit-ing [2011a] argues that consistent scoring functions lead to reasonable ranking offorecasts. Patton illustrates that in the face of model misspecification, parameter es-timation error and nonnested information sets, ranking of forecasts may change de-pending on the consistent scoring function used and hence care should be taken inselecting a consistent scoring function for the evaluation of forecasts. We identifyhomogeneity of scoring functions as the first criterion since ranking of forecasts isinvariant under change of units when homogeneous scoring functions are used. Weconcentrate on the family of homogeneous scoring functions for evaluating VARand pair (VAR, ES) forecasts. We show that for a heavy-tailed GARCH (1, 1)model with tail index 2κ , the expectation of the homogeneous scoring functionsexists as long as the homogeneity order (b) is less than the tail index. We assess thefinite-sample properties of the Diebold-Mariano test [Diebold and Mariano, 1995]which is used in testing for significance of the difference between competing fore-casts. With the aid of simulations, we show that homogeneous scoring functionswith lower homogeneity order have better size and power properties for the D-M test and hence should be considered in evaluating forecasts of VAR and pair(VAR, ES). Lastly we show that the homogeneous scoring functions with lowerhomogeneity order penalize more for under-prediction than over-prediction of thesame magnitude which is a desirable property of a scoring function for financial50institutions when predicting risk measures.Some future work can be done building on the criteria we have established forchoosing scoring functions. Firstly, further research can be done to study the be-haviour of the homogeneous scoring functions as we examine the extreme uppertails of a given distribution to help understand why the size and power values forthe extreme upper tails are different from the less extreme cases. Secondly, re-search can be done on other criteria to consider in choosing a consistent scoringfunction for evaluating the VAR and the pair (VAR, ES) forecasts. Furthermore, re-search can be done on the criteria for choosing among non-homogeneous consistentscoring functions for evaluating VAR and pair (VAR, ES). Lastly, this work con-centrates on the criteria for selecting scoring functions for evaluating forecasts forquantiles and expected shortfall in risk management. This can be extended to iden-tify the criteria for the selection of consistent scoring functions for the evaluation offorecasts for functionals such as the mean and expectiles in different applicationsand fields where forecasts of these functionals are issued and evaluated.51BibliographyC. Acerbi and B. Szekely. Backtesting expected shortfall. Risk Magazine, 2014.→ pages 2C. Acerbi and D. Tasche. On the coherence of expected shortfall. Journal ofBanking and Finance, 26(7):1487–1503, 2002. → pages 7P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk.Math. Finance, 9:203–228, 1999. → pages 7Bank for International Settlements. Consultative Document: Fundamental reviewof the trading book: A revised marked risk framework. 2013. → pages 7F. X. Diebold and R. S. Mariano. Comparing predictive accuracy. J. Bus. Econ.Stat., 13:253–263, 1995. → pages 3, 24, 25, 50F.X. Diebold, T. Schuermann, and J.D. Stroughair. Pitfalls and opportunities inthe use of extreme value theory in risk management. Journal of Risk Finance,1:30–35, 2000. → pages 56W. Ehm, T. Gneiting, A. Jordan, and F. Kru¨ger. Of quantiles and expectiles:Consistent scoring functions, choquet representations, and forecast rankings. J.Roy. Statist. Soc. Ser. B, 2016. To appear. → pages 11T. Fissler and J. F. Ziegel. Higher order elicitability and Osband’s principle. Ann.Statist., 2016. To appear. → pages 2, 9T. Fissler, J. F. Ziegel, and T. Gneiting. Expected shortfall is jointly elicitable withvalue at risk – implications for backtesting. Risk Magazine, December, 2015.→ pages 11T. Gneiting. Making and evaluating point forecasts. J. Am. Statist. Assoc., 106:746–762, 2011a. → pages 1, 2, 3, 5, 5052T. Gneiting. Quantiles as optimal point forecasts. Journal of Banking andFinance, 27:197–207, 2011b. → pages 8H. Holzmann and M. Eulert. The role of the information set for forecasting – withapplications to risk management. Ann. Stat., 8:595–621, 2014. → pages 15K. Kuester, S. Mittnik, and M.S. Paolella. Value-at-risk prediction: a comparisonof alternative strategies. J. Financial Econometrics, 4:53–89, 2006. → pages 55S. Kusuoka. On law invariant coherent risk measures. Advances in MathematicalEconomics, 3:83–95, 2001. → pages 7N. Lambert, D. M. Pennock, and Y. Shoham. Eliciting properties of probabilitydistributions. In Proceedings of the 9th ACM Conference on ElectronicCommerce, pages 129–138, Chicago, Il, USA, 2008. extended abstract. →pages 6A. J. McNeil and R. Frey. Estimation of tail-related risk measures forheteroscedastic financial time series: an extreme value approach. J. Empir.Financ., 7:271–300, 2000. → pages 56A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management:Concepts, Techniques, Tools. Princeton University Press, Princeton, 2005. →pages 1T. Mikosch and C. Starcia. Limit theory for the sample autocorrelations andextreme of a garch(1,1) process. Ann. Stat., pages 1427–1451, 2000. → pages20N. Nolde and J. F. Ziegel. Elicitability and backtesting. Ann. Appl. Statist., 2016.To appear. → pages 1, 6, 9, 10, 11, 30, 56A. J. Patton. Evaluating and comparing possibley misspecified forecasts, https://public.econ.duke.edu/∼ap172/Patton bregman comparison 27mar15.pdf.Accessed: 2017-04-20. → pages iii, 2, 3, 10, 15, 17, 50A. J. Patton. Volatility forecast comparison using imperfect volatility proxies. J.Econometrics, 160:246–256, 2011. → pages 16A. J. Patton and K. Sheppard. Evaluating volatility and correlation forecasts. InT. Mikosch, J.-P. Kreiss, R. A. Davis, and T. G. Andersen, editors, Handbook ofFinancial Time Series, pages 801–838. Springer, Berlin, 2009. → pages 25M. Saerens. Building cost functions minimizing to some summary statistics.IEEE Trans. Neural Netw., 11:1263–1271, 2000. → pages 2, 953P. Sun and C. Zhou. Diagnosing the distribution of garch innovations. J. Empir.Financ., 29:287–303, 2014. → pages 17, 20W. Thomson. Eliciting production possibilities from a well-informed manager. J.Econ. Theory, 20(3):360–380, 1979. → pages 2, 9J. F. Ziegel. Coherence and elicitability. Math. Finance, 26(4):901–918, 2016. →pages 654Appendix AForecasting Risk MeasuresThe aim of this report is to examine the performance of consistent scoring functionsused in assessing VAR and pair (VAR, ES) forecasts. There are various methodsused in producing point forecasts for risk measures. Kuester et al. [2006] reviewa number of approaches including the fully parametric estimation, historical sim-ulation, quantile regression approach, among other methods. To illustrate how theestimation methods for risk measures are used, we assume the series of negatedlog-returns {Yt}t∈N can be modeled asYt = µt +σtεt , (A.1)where {εt}t∈N is a sequence of independent and identically distributed (i.i.d) ran-dom variables with zero mean and unit variance, and µt and σt are measurablewith respect to sigma algebra Ft−1, which represents information about the pro-cess {Yt} available up to time t-1. To capture time dynamics of financial timeseries, we can assume that the conditional mean µt follows an ARMA process. AnARMA process of order (p,q) is given asµt = c+p∑i=1aiYt−i+ zt +q∑j=1b jzt− j, t ∈ Z, (A.2)where (zt) is the linear innovation process of (Yt).The conditional variance σ2t is assumed to evolve according to a GARCH (p,q)55model specification given asσ2t = ω+q∑i=1αiY 2t−i+p∑j=1β jσ2t− j, t ∈ Z. (A.3)Let R be a generic risk measure, a rea-valued map from a space of random vari-ables. Then based on (A.1), the conditional one step ahead forecast of a risk mea-sureR isR(Yt |Ft−1) = µt +σtR(ε), (A.4)where ε is used to denote a generic random variable with the same distributionas the εt’s. We estimate µt , σt via the maximum likelihood procedure under aspecific assumption on the distribution of innovations εt in (A.1) and use the fullyparametric estimation (FPE) and the (filtered) historic simulation (FHS) to estimateR(ε) based on the sample of standardized residuals{εˆt = (yt − µˆt)/σˆt}. (A.5)This two-stage estimation procedure from that of McNeil and Frey [2000] andDiebold et al. [2000].A.1 Fully parametric estimationFor the fully parametric approach, a parametric model is assumed for the innova-tions with parameters of the distribution estimated based on standardized residualsof the model in (A.1). From the fitted distribution, we estimate the given riskmeasure. For example, if the innovations follow a standardized t-distribution, thenVARα(ε) is given by t−1dˆ (α), where dˆ is the estimated degree of freedom andt−1dˆ(α) is the α-quantile of the distribution. The ES can be computed asESν(ε) = E(ε|ε ≥VARν(ε))where numeric integration is used to evaluate the conditional expectation [Noldeand Ziegel, 2016].56A.2 Filtered historic simulationThe historic simulation makes use of non-parametric estimation of VARα based onthe standardized residuals {εˆt} in (A.5), which can be seen as representing a filteredtime series. A sample {εˆ∗t ;1≤ t ≤m} of a large size m (e.g., m= 10,000) is drawnfrom the estimated standardized residuals {εˆt ;1 ≤ t ≤ n} and then the empiricalestimate of the α-quantile is taken which gives V̂ARFHSα (ε), the VAR estimate. TheES is estimated using the empirical version of the conditional expectation giventhat the residual exceeds the corresponding VAR estimate:ÊSFHSν (ε) =1#{i : i = 1, ...,m, εˆ∗i > V̂ARFHSα (ε)}m∑i=1εˆ∗i 1{εˆ∗i > V̂ARFHSα (ε)}.57

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0344014/manifest

Comment

Related Items