Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A flexible inference method for an autoregressive stochastic volatility model with an application to… Xie, Yijun 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_may_xie_yijun.pdf [ 2.41MB ]
Metadata
JSON: 24-1.0344013.json
JSON-LD: 24-1.0344013-ld.json
RDF/XML (Pretty): 24-1.0344013-rdf.xml
RDF/JSON: 24-1.0344013-rdf.json
Turtle: 24-1.0344013-turtle.txt
N-Triples: 24-1.0344013-rdf-ntriples.txt
Original Record: 24-1.0344013-source.json
Full Text
24-1.0344013-fulltext.txt
Citation
24-1.0344013.ris

Full Text

A Flexible Inference Method for an AutoregressiveStochastic Volatility Model with an Application to RiskManagementbyYijun XieB.Sc., University of Notre Dame du Lac, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)April 2017© Yijun Xie, 2017AbstractThe Autoregressive Stochastic Volatility (ARSV) model is a discrete-time stochas-tic volatility model that can model the financial returns time series and volatilities.This model is relevant for risk management. However, existing inference methodshave various limitations on model assumptions. In this report we discuss a new in-ference method that allows flexible model assumption for innovation of the ARSVmodel. We also present the application of ARSV model to risk management, andcompare the ARSV model with another commonly used model for financial timeseries, namely the GARCH model.iiPrefaceThis dissertation is original, unpublished work by the author, Yijun Xie. Thedataset used in this thesis is downloaded from Yahoo Finance (https://finance.yahoo.com/).iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Measures of Tail Dependence . . . . . . . . . . . . . . . . . . . . 52.2 Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 GARCH Process . . . . . . . . . . . . . . . . . . . . . . 72.2.2 ARSV Process . . . . . . . . . . . . . . . . . . . . . . . 83 Inference for ARSV Model . . . . . . . . . . . . . . . . . . . . . . . 153.1 Review of Existing Methods . . . . . . . . . . . . . . . . . . . . 153.2 New Inference Method . . . . . . . . . . . . . . . . . . . . . . . 173.2.1 Proposing Step . . . . . . . . . . . . . . . . . . . . . . . 18iv3.2.2 Classic Metropolis-Hastings Algorithm . . . . . . . . . . 193.2.3 Metropolis-within-Gibbs . . . . . . . . . . . . . . . . . . 203.2.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . 243.2.5 Discussion on the Flexibility . . . . . . . . . . . . . . . . 263.2.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Comparison of Inference Methods . . . . . . . . . . . . . . . . . 273.3.1 Parameter Estimation for Simulated Data . . . . . . . . . 273.3.2 Parameter Estimation for the S&P 500 Index . . . . . . . 324 CoVaR with ARSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1 Value-at-Risk forecasting under the ARSV Model . . . . . . . . . 344.2 CoVaR forecasting: GARCH Model vs ARSV Model . . . . . . . 364.2.1 First Definition of CoVaR . . . . . . . . . . . . . . . . . 364.2.2 Second Definition of CoVaR . . . . . . . . . . . . . . . . 384.3 Simulation Methods to Find CoVaR . . . . . . . . . . . . . . . . 414.4 Comparison of VaR and CoVaR Forecasts . . . . . . . . . . . . . 444.4.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . 444.4.2 Data Example . . . . . . . . . . . . . . . . . . . . . . . . 495 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55vList of TablesTable 3.1 An example of estimating parameters of an ARSV(1) processwith εt ∼ N(0,1) and ηt ∼ standardized t5. . . . . . . . . . . . 25Table 3.2 Median of estimated parameters for simulated data with εt ∼N(0,1) and ηt ∼ N(0,1). . . . . . . . . . . . . . . . . . . . . 28Table 3.3 Median of estimated parameters for simulated data with εt ∼N(0,1) and ηt ∼ standardized t5. . . . . . . . . . . . . . . . . 29Table 3.4 Median of estimated parameters for simulated data with εt ∼standardized t5 and ηt ∼ standardized t5. . . . . . . . . . . . . 31Table 3.5 First example of incorrect model specification. . . . . . . . . . 31Table 3.6 Second example of incorrect model specification. . . . . . . . 31Table 3.7 Summary statistics of estimated parameters with the model as-sumption that both innovations are normally distributed. . . . . 33Table 3.8 Summary statistics of estimated parameters with the model as-sumption that the first innovation distribution follows a Studentt distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Table 4.1 Violation rates and corresponding p-values of likelihood-ratiotests for VaRα forecasts at 95% and 99% levels for simulatedData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Table 4.2 Mean piece-wise linear scores and corresponding p-values ofconditional predictive ability tests for VaRα forecasts at 95%and 99% levels for simulated Data. . . . . . . . . . . . . . . . 47viTable 4.3 Violation rates and corresponding p-values of likelihood-ratiotests for VaRα forecasts at 95% and 99% levels for daily log-returns of S&P 500 Index. . . . . . . . . . . . . . . . . . . . . 50Table 4.4 Mean piece-wise linear scores and corresponding p-values ofconditional predictive ability tests for VaRα forecasts at 95%and 99% levels for daily log-returns of S&P 500 Index. . . . . 50viiList of FiguresFigure 1.1 Stylized facts of financial data – 1. . . . . . . . . . . . . . . . 2Figure 1.2 Stylized facts of financial data – 2. . . . . . . . . . . . . . . . 3Figure 2.1 Structure of GARCH process. . . . . . . . . . . . . . . . . . 9Figure 2.2 Structure of ARSV process. . . . . . . . . . . . . . . . . . . 10Figure 2.3 η plot for simulated processes. . . . . . . . . . . . . . . . . . 12Figure 2.4 χ/χ¯-plot for simulated processes. . . . . . . . . . . . . . . . 13Figure 2.5 Quantile plots and χ/χ¯ plots of the negative daily log-returnsof S&P 500 Index from 2000-01-03 to 2016-10-26. . . . . . . 14Figure 3.1 Path of β0 estimates when only implementing Gibbs sampler. . 19Figure 3.2 Illustration of Markov chain values of parameters after eachiteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.3 Distribution of estimated parameters based on simulated datawith εt ∼ N(0,1) and ηt ∼ standardized t5. . . . . . . . . . . 28Figure 3.4 Distribution of estimated parameters based on simulated datawith εt ∼ N(0,1) and ηt ∼ standardized t5. . . . . . . . . . . 29Figure 4.1 Estimated 95% and 99% VaR forecasts for the simulated ARSV(1)process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 4.2 Estimated 95% and 99% VaR forecasts for the simulated GARCH(1,1)process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 4.3 Estimated 95% and 99% CoVaR forecasts for the simulatedARSV(1) process. . . . . . . . . . . . . . . . . . . . . . . . 48viiiFigure 4.4 Estimated 95% and 99% CoVaR forecasts for the simulatedGARCH(1,1) process. . . . . . . . . . . . . . . . . . . . . . 49Figure 4.5 Estimated 95% and 99% VaR forecasts for daily log-returns ofS&P 500 Index. . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4.6 Estimated 95% and 99% CoVaR forecasts for daily log-returnsof S&P 500 Index. . . . . . . . . . . . . . . . . . . . . . . . 52ixList of Algorithms1 Flexible Inference for ARSV Model Inference . . . . . . . . . . . 272 Estimation CoVaR Using Simulation under a GARCH(1,1) Process 423 Simulating CoVaR under the ARSV Process . . . . . . . . . . . . 43xAcknowledgmentsFirst and foremost, I would like express my sincere gratitude to my supervisor,Professor Natalia Nolde, for her patience, insightfulness, and immense knowledge.Without her guidance I would never be able to finish this thesis. It is my fortuneand honor to work under her tutelage.Besides my advisor, I would like to thank the Department of Statistics at UBCfor the supportive environment. Thank you to everyone in my office, for the officedinners and all those wonderful time we spent together. Thank you to Ho Yin Hoand Tingting Zhao for your help and great advice when I was writing the thesis andbeing my lunch/dinner buddies. Special thanks to my best friend Mengtian Zhaoand her cat King for being my emotional support every time when I feel depressed,even though you are literally 2000 miles away.My sincere thanks also goes to my second reader Professor Harry Joe, whogenerously spent time reading and critiquing my works.Last but not least, I would like to thank my parents, Zhaoliang Xie and HuapingXie, for their unconditional love to me.xiChapter 1IntroductionModeling financial returns time series data is an essential component in a widerange of problems in finance, including risk management, portfolio optimization,pricing and hedging of financial risks. In order to develop a realistic model for timeseries of financial returns, it is important to first identify key features that such datatend to exhibit. These key features are usually referred to as the stylized facts(McNeil, Frey, and Embrechts, 2015a). The most widely acknowledged amongthese stylized facts are the following:1. The conditional expectation of financial returns is close to zero (see Fig. 1.1(left panel));2. The conditional standard deviation, known as the volatility, varies over timeand with large values having a tendency to cluster (see Fig. 1.1 (left panel));3. The marginal distribution of return series has heavier tail than that of thenormal distribution (See Fig. 1.1 (right panel));4. There is little serial correlation between returns (see Fig. 1.2 (left panel)).However, the squared or absolute values of returns show strong correlation(see Fig. 1.2 (right panel)).As the conditional expectation is usually close to zero, it is the volatility thathas the dominant effect on the dynamics of financial return series. For this reason,the models for financial returns are often called volatility models. Several types1Figure 1.1: The left panel shows the daily log-returns for the S&P 500 Indexfrom January 1, 2000 to October 26, 2016. Large values of log-returnsare clustered. The right panel shows the normal quantile plot of this log-return series, which has heavier tails than that of a normal distribution.of volatility models have been proposed in the literature. Among these models,arguably the most famous and widely used one is the Generalized Autoregres-sive Conditional Heteroskedasticity (GARCH) process (Engle (1982), Bollerslev(1986)). However, in this report, we discuss another type of a volatility modelcalled Autoregressive Stochastic Volatility (ARSV) process. Both of these pro-cesses can capture the above mentioned stylized facts, but they exhibit differentextremal dependence properties for consecutive observations. The returns modeledby GARCH process have the property of tail dependence, while those modeled byARSV process are tail independent. Detailed discussion of tail dependence andindependence as well as these two volatility models is provided in Chapter 2.There is empirical evidence (Drees, Segers, and Warchoł, 2015) suggesting thatfor some financial time series, the returns over consecutive periods are likely to betail independent. That is, extreme values at consecutive time points are independentand hence will tend not to co-occur. However, there should be a stronger depen-dence among large but not extremal observations than the classic ARSV model2Figure 1.2: The left and right panel show the autocorrelation function (ACF)for the log-return series and squared log-returns of the same dataset asin Figure 1.1.implies. Janssen and Drees (2016) propose a variation of the classic ARSV model,and they show that this new class of ARSV model which has a special form of theheavy-tailed second innovation, has stronger tail dependence than classic ARSVmodel, while remains tail independent for extremal observations.Inspired by their ideas, in this report we consider another extension of the clas-sic ARSV model, which also has a heavy-tailed second innovation and light-tailedfirst innovation. We conjecture that this model is also tail independent for extremalobservations, but has stronger dependence for sub-extremal observations than theclassic ARSV model. The conjecture is supported via simulations.However, the inference for ARSV models is notoriously difficult. Most of thecurrent inference methods for ARSV models require the assumption that both in-novations have normal distributions. Therefore, they can only be applied to theclassic ARSV models but not to the extended model we want to focus on in thisreport. We propose a new inference method for ARSV models which allows arbi-trary choices of distributions of both innovations. We show that this new inferencemethod works as good as traditional methods for the inference of the classic ARSV3model, while it can accurately estimate parameters when traditional methods fail.This new approach is discussed in Chapter 3.One of the most important applications for volatility models is to measure riskfor risk management purposes. It is also used by financial market regulators toset capital requirements for financial institutes. A correctly specified and flexi-ble volatility model can help both practitioners and regulators to accurately cap-ture features of the market data, and hence to measure risk more precisely. Thisis paramountly important for the stability of the financial system. In Chapter 4we discuss two risk measures: Value-at-Risk (VaR) and Conditional Value-at-Risk(CoVaR). The estimation methods for VaR under the GARCH model are alreadywell studied. In this report, we develop estimation methods for CoVaR under theGARCH model as well as for VaR and CoVaR under the proposed ARSV model.We then compare risk measure estimates under the GARCH and ARSV models.In Chapter 5, we provide discussion of the results and an outlook for futureresearch.4Chapter 2BackgroundIn this chapter, we first discuss the property of tail dependence. We also brieflyintroduce two volatility models, namely the GARCH process and ARSV process.Our project is motivated by the different tail dependence properties between thesetwo models.2.1 Measures of Tail DependenceA question for the practitioners of risk management would be that given a largeloss today, how likely it is to experience a large loss tomorrow or h days into thefuture.To answer this question, we need to understand the idea of tail dependence andtail independence. Different models for financial time series would have differencetail dependence properties, and it is important to choose the model with the taildependence property close to that of the real data.Consider two random variables Y and Z with continuous distribution functionsFY and FZ , respectively. The upper tail dependent coefficient between Y and Z isdefined as (Joe, 1997)χ = limu→1P(Z > F−1Z (u)|Y > F−1Y (u))= limu→1P(Z > F−1Z (u),Y > F−1Y (u))P(Y > F−1Y (u)), (2.1)5provided the limit exits. It is the limit of the ratio between the probability of theoccurrence of jointly extremal observations of Y and Z and the probability of theoccurrence of an extreme observation in one of the variables. The lower tail depen-dence coefficient can be defined similarly.If χ = 0, then Y and Z are said to be tail independent. On the other hand, when0 < χ ≤ 1, Y and Z are called (upper) tail dependent. The value of χ measures howstrong the tail dependence is.However, χ only provides us information about how strong the tail dependenceis when it is greater than zero. When χ = 0, we cannot tell how fast χ approacheszero as u→ 1. We need a more refined measure to tell us the speed of χ approach-ing zero.First, we call a function L on (0,∞) slowly varying at ∞ iflimx→∞L(tx)L(x)= 1, t > 0,while a function h on (0,∞) is called regularly varying at ∞ with index p iflimx→∞h(tx)h(x)= t p, t > 0.Let us define p(t) = P(Z > F−1Z (1− 1/t),Y > F−1Y (1− 1/t)), and supposep(t) is regularly varying with index −1/η for some η ∈ (0,1]. Then η is calledthe residual tail dependence coefficient (Ledford and Tawn, 1996).Ledford and Tawn (1996) show that if η < 1, then χ = 0 and hence Y and Z aretail independent, and η measures the speed of convergence to tail independence.We can further define (Coles, Heffernan, and Tawn, 1999)χ¯ = 2η−1,where −1 < χ¯ ≤ 1. When 1/2 < η ≤ 1 or 0 < χ¯ ≤ 1, Y and Z are non-negativelydependent, and when η = 1/2 or χ¯ = 0, Y and Z are exactly tail independent(Coles, Heffernan, and Tawn, 1999; Ledford and Tawn, 2003).Now we have the pair (χ, χ¯) to help us describe the extremal dependence.When χ > 0 and χ¯ = 1 the two random variables are tail dependent, and the value6of χ measures the strength of the dependence of this pair. If χ = 0 and−1< χ¯ < 1,the two random variables are tail independent, and the value of χ¯ measures thestrength of dependence of this pair.Understanding the type of tail dependence structure is very important for choos-ing good models. Let {Xt} denote a process of financial log-returns with losses inthe upper tail. As discussed in Chapter 1, financial data often suggests tail inde-pendence between consecutive returns. In other words, although high volatilitiestend to be persistent, extremal levels of returns should be tail independent (Lauriniand Tawn, 2008) so that P(Xt > F−1Xt (u)|Xt−1 > F−1Xt−1(u))→ 0 as u→ 1. A tail de-pendent process may lead to an overestimation of the potential loss. On the otherhand, Janssen and Drees (2016) suggest that a good model for returns should be tailindependent, while retaining stronger tail dependence at sub-extremal levels thanthe exactly tail independent case. In other words, we are looking for a stochasticvolatility process such that that for the pair (Xt−1,Xt), χ = 0 and 0 < χ¯ < 1.2.2 Volatility Models2.2.1 GARCH ProcessOne of the most commonly discussed volatility models is the generalized autore-gressive conditionally heteroscedastic (GARCH) model. GARCH family was firstproposed by Engle (1982) and Bollerslev (1986). A GARCH(p,q) process for{Xt , t = 1,2, ...,T} is defined as follows:Xt = σtεtσ2t = α0+p∑i=1αiσ2t−i+q∑j=1β jX2t− jwhere {εt} is a strict white noise process with zero mean and unit variance, α0 > 0,αi ≥ 0, i = 1,2, ..., p, and β j ≥ 0, j = 1,2, ...q. In order to achieve stationarity, wealso assume ∑pi=1αi+∑qj=1β j < 1. In financial statistics, Xt is usually the return onthe financial asset observed at time t and σt is the conditional standard deviation,or the volatility, at time t, which generally cannot be directly observed.LetFt denote the sigma algebra of all available information up to time t. Then,7for the GARCH model, volatility σt isFt−1 measurable. In other words, given theinformation up to t−1, σt is known.The empirical evidence suggests that volatilities are clustered: when high volatil-ity occurs at time t-1, it would be more likely to have large volatility again at timet. For a GARCH(p,q) model, the conditional squared volatility is based on a linearcombination of previous squared returns and squared volatilities, so a large abso-lute value in Xt−h,h = 1,2, ...q or a large value of σt−h,h = 1,2, ...p lead to a largevalue of σt . Therefore, we can observe the clustered behavior of large volatilityfrom a GARCH model. For simplicity, we only consider the GARCH(1,1) pro-cess, which in practice is believed to be sufficient to model the volatility processmost of the time.Mikosch and Starica (2000) and Basrak, Davis, and Mikosch (2002) show thatfor GARCH models,limx→∞P(Xh > x|X0 > x)> 0,i.e., η = 1 or (Xh,X0) are tail dependent. Laurini and Tawn (2008) suggest declus-tering returns over high threshold to remove the tail dependence. However, we canalso look for an alternative model such that the series of {Xt} is tail independentwhile volatility clustering is preserved.2.2.2 ARSV ProcessAn alternative to the GARCH model is the autoregressive stochastic volatilitymodel (ARSV). This model is first studied by Harvey, Ruiz, and Shephard (1994)and Jacquier, Polson, and Rossi (1994), among many others. We focus on thesimplest ARSV(1) model, which is defined asXt = σtεtlog(σ2t ) = β0+β1 log(σ2t−1)+δηtwhere {εt} and {ηt} are two independent strict white noise processes with zeromean and unit variance, and δ > 0 is a constant to adjust the standard deviation ofthe second innovation. β0 and β1 are coefficients for the log-volatility process. ForARSV(1) model, 0 < β1 < 1 is required for the process to be stationary.8The difference between a GARCH model and an ARSV model lies in thevolatility process. For a GARCH model, σt is Ft−1 measurable as mentionedabove. However, the volatility process of an ARSV model contains a second inno-vation term ηt . Thus, after conditioning on all information up to t−1, σt is still arandom variable.The tail dependence properties of ARSV(1) model are studied by Breidt andDavis (1998) and Hill (2011). They show that for either normally distributed orheavy-tailed εt , the extremes of Xt are independent. Liu and Tawn (2013) suggestthe difference comes from the source of clustering. For the GARCH model, thecomponents of previous volatilities have negligible effect on the current volatilitywhen αi’s (i≥ 1) are small. However, the return process and volatility process areinterconnected. Therefore, extremal return observed at time t−1 will lead to largevolatility value at time t, and hence a large probability of observing an extremalreturn at time t. This is illustrated below.Xt−1σt−1XtσtFigure 2.1: Structure of GARCH process.However, for the ARSV model, the process of the log-volatility is independentfrom the observed values of return process, while the return is the realization ofcurrent volatility times a noise term. Therefore, Xt−1 and σt are independent givenσt−1, and a large volatility value at time t− 1 does not necessarily lead to a largereturn value at time t.The tail behaviors of GARCH process and ARSV process can also be illus-trated with the residual tail dependence coefficient η , or measures of tail depen-9Xt−1σt−1XtσtFigure 2.2: Structure of ARSV process.dence χ and χ¯ discussed in Section 2.1. In Figure 2.3 we present the estimate ofresidual tail dependence coefficient, η , for the GARCH(1,1) process with α0 =1×10−6, α1 = 0.8, and β1 = 0.1; classic ARSV(1) process with β0 =−0.5, β1 =0.95, and δ = 0.35; and an extension of the classic ARSV(1) process with β0 =−0.5, β1 = 0.95, δ = 0.35, and a standardized second innovation ηt which is Stu-dent’s t distributed with 5 degress of freedom. The corresponding values of the(χ, χ¯) pairs 1 are presented in Figure 2.4. Janssen and Drees (2016) suggest that anARSV(1) process with heavy-tailed second innovation distribution has a strongertail dependence at sub-extremal levels than classic ARSV model while remains tailindependent. Figure 2.3 and Figure 2.4 both support this suggestion. As we can seefrom these two plots, when the quantile approaches 1 we have evidence to supportthe claim η = 1 (i.e. tail dependent), η = 0.5 (i.e. exactly tail independent), and0.5 < η < 1 (i.e. tail independent but not exactly tail independent) respectively forthe three scenarios. Similarly in Figure 2.4, for the GARCH(1,1) process, whenthe quantile approaches 1 the empirical value of χ is significantly larger than 0,which implies that (Xt−1, Xt) are tail dependent. For the classic ARSV(1) process,both empirical values of χ and χ¯ are not significantly larger than 0, which impliesthat (Xt−1, Xt) are exactly tail independent. For the extended ARSV(1) processes,the empirical value of χ is not significantly larger than 0 but the empirical value1Here the empirical values of χ and χ¯ are estimated using the evd package, which calculatesboth values using approximation method.10of χ¯ is significantly larger than 0, which implies that (Xt−1, Xt) have stronger taildependence than product of margins.We further compare the simulated data with an actual financial time series. Weestimate volatilities of the S&P 500 Index from 2000-01-03 to 2016-10-26 withsome unbiased estimators, and then fit the log squared volatilities to an AR(1)process. The top left panel of Figure 2.5 illustrates the quantile plot of residuals ofthis AR(1) process against a normal distribution, and the top right panel illustratesthe quantile plot against a Student’s t distribution with 3 degrees of freedom. Thesetwo plots hint that a heavy-tailed second innovation ηt might be a more suitablechoice. Also, we present the χ/χ¯ plots of the negative returns of this dataset.Comparing the bottom two panels of Figure 2.5 with all panels in Figure 2.4, wecan also conjecture that the χ/χ¯ plots from extended ARSV(1) process with thesecond innovation Student’s t distributed are the closest ones to those from the realdata. Thus we might prefer to model the financial return series using the extendedARSV(1) model.11Figure 2.3: Plot of η for a simulated GARCH(1,1) process (top panel), asimulated classic ARSV(1) process (middle panel), and a simulated ex-tended ARSV(1) process with first innovation distribution normal andsecond innovation distribution Student’s t (bottom panel).12Figure 2.4: χ/χ¯-plot for a simulated GARCH(1,1) process (top panels), asimulated classic ARSV(1) process (middle panels), and a simulatedextended ARSV(1) process with first innovation distribution normal andsecond innovation distribution Student’s t (bottom panels).13Figure 2.5: Quantile plots and χ/χ¯ plots of the negative daily log-returns ofS&P 500 Index from 2000-01-03 to 2016-10-26.14Chapter 3Inference for AutoregressiveStochastic Volatility ModelMany efforts have been made to develop inference methods for the traditionalARSV model with both innovation distributions as normal. In this section, wefirst briefly review the methods that inspire our new inference methods.However, these existing methods cannot be used to estimate parameters forthe extension of the ARSV model discussed at the end of last section. Therefore,we propose a new approach that allows flexible choices of both innovation distri-butions of the ARSV model. This new method can work as well as the existingmethods for the traditional ARSV model, and can provide good parameter esti-mates when existing methods fail. We describe the details of the new method inthis section. We also compare the results from this new method with those fromthe existing inference method.3.1 Review of Existing MethodsIn this section we review inference methods implementing the full Bayesian ap-proach with pre-specified prior distributions. Jacquier, Polson, and Rossi (1994)propose a cyclic MCMC approach for the classic ARSV(1) model with both inno-15vations normally distributed. Consider the modelXt = σtεtlogσ2t = β0+β1 logσ2t−1+δηt(εt ,ηt)∼ N(0, I2),where I2 is a two-dimensional identity matrix. Let ω denote the set of parameters(β0,β1,δ ). An inverse gamma distribution p(δ )∝ exp(−ν0s20/2δ 2)/δ ν0+1 is cho-sen as the prior distribution for δ , where (ν0,s0) are two hyperparameters. Theprior distributions of β0 ∼N(0,100) and β1 ∼N(0,10) are independent and essen-tially flat. The algorithm proposed by Jacquier, Polson, and Rossi (1994) includestwo stages:1. Sample parameters ω: p(ω| logσ2t ) is the posterior from a linear regression,and therefore a direct draws can be made.2. Sample σ2t fromp(σ2t |σ2t−1,σ2t+1,ω,xt) ∝ fX(xt |σ2t ) fa(σ2t |σ2t−1) fb(σ2t+1|σ2t )∝1σtexp(−x2t2σ2t)× 1σ2texp(−(logσ2t −µt)22δ 2),where µt =(β0(1−β1)+β1(logσt+1+logσ2t−1))/(1+β 21 ), and δ 2 = u2η/(1+β 21 ); fX is the probability density function of Xt conditioned on σ2t ; fa andfb are probability density functions of σ2t conditioned on σ2t−1 and σ2t−1 con-ditioned on σ2t respectively; pi is the posterior density function of σt .Jacquier, Polson, and Rossi (2004) extend the method above to deal with thecase where the first innovation εt follows a Student t distribution with ν degrees offreedom. They treat the heavy-tailed εt as a scale mixture of the inverse gammadistribution and normal distribution. This allows us to write Xt as Xt = σt√λtZt ,where Zt follows the standard normal distribution and λt follows an inverse gammadistribution IG(ν/2,2/ν), or ν/λt ∼ χ2ν . Let us denote X∗t = Xt/√λt . Then wecan replace {xt , t = 1,2, ...T} with {x∗t , t = 1,2, ...T}, and sample σ2 and ω inthe similar way as described above. Jacquier, Polson, and Rossi (2004) choose a16uniform discrete prior on [3,40] for ν , and prior distributions of other parametersare the same as above. The algorithm includes three stages:1. Sample from the posterior distribution p(σ2,ω|λ ,ν ,x∗) using the algorithmin Jacquier, Polson, and Rossi (1994).2. Sample λ from the posterior distributionp(λt |xt ,σ2t ,ν)∼ IG(ν+12,2(x∗t 2/ logσ2t )+ν).3. Sample ν from the posterior distributionp(ν |σ2,x,ω) = p(ν)T∏t=1νν/2Γ(ν+1/2)Γ(ν/2)Γ(1/2)(ν+ x2t /σ2t )−(ν+1)/2,where p(ν) is the prior distribution of ν .Kastner and Fru¨hwirth-Schnatter (2014) propose another method based on logX2tinstead of Xt . By taking square and then logarithm on Xt , we havelogX2t = logσ2t + logε2t ,where logε2t can be approximated by a mixture of normal distributions. Theyfurther implement the ancillary-sufficiency interweaving strategy to sample in the[logX2, logσ2,ω] space. This method is implemented in the stochvol package.Broto and Ruiz (2004) provide a more comprehensive and detailed review onother approaches for the classic ARSV model.3.2 New Inference MethodThe methods discussed in the preceding section assume that the second innovationdistribution in the ARSV model is normal, and the first innovation distribution canonly be normal or Student t. However, when the second innovation in the ARSVmodel is non-Gaussian, it would be very difficult to derive the corresponding poste-rior distribution, and hence very difficult to sample from the posterior distribution.In order to make model inference for the extension of the classic ARSV model we17discussed earlier, we propose a new inference method which does not require pre-sepcified prior distributions, and hence does not rely on the normality of innovationdistributions.For the rest of the report, we consider the modelXt = σtεtlogσ2t = β0+β1 logσ2t−1+δηt ,where εt follows a standard normal distribution and ηt follows a standardized Stu-dent t distribution with zero mean, unit variance, and νη degrees of freedom.To start, we first need to make an initial guess of the volatilities and parameters.Popular model-free methods for volatilities include a simple moving average ofsquared returns, or the Exponentially Weighted Moving Average (EWMA) amongmany others. Let us define ht := logσ2t , t = 1,2, ...T . With estimated volatili-ties {σ (0)0 , ...,σ (0)T }, we can calculate ht , t = 1, ...,T and have an initial guess ofparameters θ (1) by fitting an AR process as described in Section 3.2.4. In this re-port, we choose the 5-day moving average of squared returns as our initial guess ofvolatility.Assume that after the (i− 1)th iteration, we have sampled the sequence ofh(i−1)t , t = 1, ...,T and have updated parameters θ (i) = [β(i)0 ,β(i)1 ,δ(i),ν(i)η ] in theway discussed in Section 3.2.4. Then in the ith iteration, we sample {h(i)t , t =2, ...,T −1} in the way described in the following sections.3.2.1 Proposing StepThe sequence {h(i)t , t = 2, ...,T −1} is sampled sequentially. Suppose that after the(t−1)th step in the ith iteration, we have already sampled h(i)t−1. Since the volatilityprocess is independent from the observed returns, the distribution of ht will onlydepend on the value of ht−1 and the parameters θ = [β0,β1,δ ,νη ]. Therefore, byplugging in the estimated parameters θ (i) = [β (i)0 ,β(i)1 ,δ(i),ν(i)η ] and the sampledh(i)t−1 from the previous step, we can sample h′t from the Student t distribution withmean β (i)0 + β(i)1 h(i)t−1, standard deviation δ(i), and degrees of freedom ν(i)η . Wecan continue this sequence by sampling h(i)t+1 based on h(i)t and θ (i) for all t =2,3, ...T −1.18This step is similar to the commonly used technique in MCMC called Gibbssampler. However, Gibbs sampler is applied more often in the scenario when weare only dealing with a smaller sampling space. Our case is different from theregular setting as we want to sample a sequence of T −2 random variables, wherethe T can be as large as 1000 to 2000. Also, ht only depends on ht−1 and thisprocess can be analogous to a random walk process. Therefore, the classic Gibbssampling method would be very inefficient in this case, as it would take a very longtime to explore all regions with high probability. In Figure 3.1 we show the valuesof the estimated parameter βˆ0 in each iteration when we only use the classic Gibbssampler with the parameters estimation method discussed in Section 3.2.4. Thetrue value is −0.5. However, we can see that the estimated values do not approachthe true value even after nearly 2000 iterations.Figure 3.1: Path of β0 estimates when only implementing Gibbs sampler. Thered line represents the true value.3.2.2 Classic Metropolis-Hastings AlgorithmTo overcome the problem of slow convergence, we need to regulate the acceptancerate of each new sample. Instead of accepting all newly proposed samples, we can19adopt the Metropolis-Hastings’ algorithm to adjust our acceptance rate. For therest of this section, we consider sampling a generic sequence x(i) from distributionpi , where generally the functional form of pi is known except for a normalizingconstant. The classic Metropolis-Hastings’ algorithm contains the following threesteps ((Murphy, 2012)):First, in the ith iteration, we sample a new sample x′ from a proposal distributionq(x(i)|x(i−1),D), where x(i−1) is the value we obtained in the previous iteration, andD is the set of parameters.Second, we calculate the acceptance rate, r, which is found byr = min(1,α),with α =q(x(i−1)|x′)pi(x′)q(x′|x(i−1))pi(x(i−1)) , (3.1)where pi(·) is known up to a normalizing constant.The idea behind the acceptance rate is that in order to reveal the true distri-bution of x, we want our algorithm to be able to explore the whole space withoutstucking at one point, and to visit the regions with higher probability more often.The q(x(i−1)|x′)q(x′|x(i−1)) part ensures that the possibility of the sampler to revisit the previ-ous point is reserved. At the same time, the pi(x′)pi(x(i−1)) part ensures that regions withhigher probability would be more likely to be visited.The final step is to reject or accept the proposal we make in the first step. Weaccept this newly sampled proposal with the probability equals r. This is oftendone by generating a uniformly distributed random variable u between 0 and 1.Thenx(i) =x′, u < rx(i−1), u≥ r3.2.3 Metropolis-within-GibbsHowever, the classic Metropolis-Hastings algorithm cannot be directly applied toour problem. We need to know the full joint probability density function of the20newly sampled h′t defined in Section 3.2.1 and that of h(i−1)t from the previousiteration, which could be very difficult to find. Instead, we could implement analgorithm with the similar idea as the Metropolis-within-Gibbs algorithm (Robertsand Rosenthal, 2006). The details of our algorithm are described as below.In the first step, when t ≥ 2, we propose the new h′t conditioned on h(i)t−1 ratherthan h(i−1)t 1. As discussed in Section 3.2.1, the distribution of h(i)t is only deter-mined by h(i)t−1 and θ(i). Therefore, our proposal distribution would be a Studentt distribution with mean β (i)0 + β(i)1 h(i)t−1, standard deviation δ(i), and degrees offreedom ν(i)η . We denote this distribution as G(i)t .In the second step, since h′t and h(i−1)t are independent fixing the parametervector θ (i), the first part of the acceptance probability isq(h(i−1)t |h′t ,θ (i))q(h′t |h(i−1)t ,θ (i))=G(i)t (h(i−1)t )G(i)t (h′t).The full joint density function of ht’s cannot be easily derived. However, theMarkovian structure of the process of {ht} suggests that we only need to focus onthe joint density functions in the Markov blankets of h′t and h(i−1)t . A Markov blan-ket is the smallest set of data that can grant ht conditional independence from allother variables. The structure of ARSV process suggests that the Markov blanketof ht includes {ht−1,ht+1,xt}. That is, conditioned on the observed return at timet, xt , the log-volatility sampled from the previous step, ht−1, and the log-volatilityof the next step, ht+1, the ht is independent from all other observed return values orvolatility values. In our case, the Markov blanket of h′t includes {h(i)t−1,h(i−1)t+1 ,xt}.These are the log-volatility sampled in the previous step of this iteration, the log-volatility of the next step from the previous iteration, and the known return value.1Note that in the Gibb’s sampling scheme, a new value is drawn by conditioning on the valuefrom the previous iteration. However, here we condition on the value from the previous time step inthe same iteration.21Therefore,pi(h′t) = fht (h′t |xt ,h(i)t−1,h(i−1)t+1 ,θ (i))∝ f1(h′t ,xt ,h(i)t−1,h(i−1)t+1 ,θ(i))= fX(xt |h(i)t−1,h′t ,h(i−1)t+1 ,θ(i)) f2(h(i)t−1,h′t ,h(i−1)t+1 ,θ(i))= fX(xt |h′t) fht+1(h(i−1)t+1 |h′t ,h(i)t−1,θ(i)) f3(h′t ,h(i)t−1,θ(i))= fX(xt |h′t) fht+1(h(i−1)t+1 |h′t ,θ(i)) fht (h′t |h(i)t−1,θ (i)) fht−1(h(i)t−1|θ (i)) fp(θ (i))where fX is the conditional probability density function of returns 2; fht and fht+1are the conditional probability density functions of ht and ht+1 respectively; fht−1and fp are the marginal probability density functions of ht−1 and θ respectively;f1, f2, and f3 are joint probability density functions of {ht ,Xt ,ht−1,ht+1,θ},{ht ,ht−1,ht+1,θ}, and {ht−1,ht ,θ} respectively.Similarly,pi(h(i−1)t ) ∝fX(xt |h(i−1)t ) fht+1(h(i−1)t+1 |h(i−1)t ,θ (i)) fht (h(i−1)t |h(i)t−1,θ (i)) fht (h(i)t−1|θ (i)) fp(θ (i))We know that fht (h′t |h(i)t−1,θ (i)) = G(i)t (h′t), and we can assume that2Note that in this example we are considering a simple case where εt ∼ N(0,1), therefore thedistribution of Xt only depends on the value of σt . If we have further assumptions about εt thenfX (xt |h′t) should be replaced with fX (xt |h′t ,θ (i)).22fht (h(i−1)t |h(i)t−1,θ (i))≈ G(i)t (h(i−1)t ). Then, α in (3.2) can be written asα =q(h(i−1)t |h′t)q(h′t |h(i−1)t )× pi(h′t)pi(h(i−1)t )=G(i)t (h(i−1)t )G(i)t (h′t)×fX(xt |h′t) fht+1(h(i−1)t+1 |h′t ,θ(i)) fht (h′t |h(i)t−1,θ (i)) fht−1(h(i)t−1|θ (i)) fp(θ (i))fX(xt |h(i−1)t ) fht+1(h(i−1)t+1 |h(i−1)t ,θ (i)) fht (h(i−1)t |h(i)t−1,θ (i)) fht−1(h(i)t−1|θ (i)) fp(θ (i))=fX(xt |h′t) fht+1(h(i−1)t+1 |h′t ,θ(i)) fht−1(h(i)t−1|θ (i)) fp(θ (i))fX(xt |h(i−1)t ) fht+1(h(i−1)t+1 |h(i−1)t ,θ (i)) fht−1(h(i)t−1|θ (i)) fp(θ (i))=fX(xt |h′t) fht+1(h(i−1)t+1 |h′t ,θ(i))fX(xt |h(i−1)t ) fht+1(h(i−1)t+1 |h(i−1)t ,θ (i)). (3.2)Expression (3.3) can be evaluated easily for t = 2,3, ...,(T − 1) and i ≥ 1. Itcan be understood intuitively as the ratio of partial conditional likelihoods betweenh′t and h(i−1)t . Instead of comparing the full likelihood times the inverse of proposalkernel like the classic Metropolis-Hastings’ algorithm, the acceptance rate here isdetermined by comparing the likelihood of observing the current return value andthe log-volatility value of the next step between the newly proposed h′t and thevalue from the last iteration h(i−1)t . The decision of either keeping the new valueor retaining the old value is made in the same way as classic Metropolis-Hastingsalgorithm. However, when t = 1, h0 is unknown and therefore we cannot sampleh1, and need to make an arbitrary guess about it. Similarly, when t = T , hT+1 isunknown and the method described above cannot be applied to sample hT . Weneed to make an arbitrary guess about hT as well. One possible way to minimizethe impact of h1 and hT in the ith iteration is to discard a number of {h(i)t } for t < T1and t > T2, and only retain a subset of {h(i)t } as {h(i)t : T1 ≤ t ≤ T2}, where T1 andT2 are two arbitrary numbers to be selected prior to running the algorithm.Note that we do not have theoretical proof for the convergence of this algo-rithm. However, it is reasonable to believe that our method enjoys the same con-vergence property as the classic Metropolis-Hastings’ algorithm from the empiricalresults such as in Figure 3.2.233.2.4 Parameter EstimationPrevious methods implement the fully Bayesian approach to estimate the param-eters of the ARSV model. With carefully chosen prior distributions for the pa-rameters and the normality assumption of both innovations one can sample theparameters from their posterior distributions. However, it would be very difficultif the innovation distributions, especially the second innovation distribution of theARSV(1) model, are not conjugate with each other or the prior distribution chosenfor parameters.In our approach, we estimate the parameters separately. After each itera-tion we obtain the {h(i)t : T1 ≤ t ≤ T2}, and we want to estimate the parameters(β0,β1,δ ,νη) of an AR(1) processht = β0+β1ht−1+δηtwhere T1 ≤ t ≤ T2 and ηt is a strict white noise process with zero mean and unitvariance. Then the paramter estimation can be simplified to the problem of esti-mating the parameters of an AR(1) process with non-Gaussian innovation, whichis well studied and can be easily applied (for example, (Grunwald, Hyndman,Tedesco, and Tweedie, 2000)). In this study, we use the arfimafit functionin the rugarch package, which provides estimation of parameters for an autore-gressive model with Student t distribution with an MLE approach. If there is noavailable program for the parameter estimation, one can try different methods, suchas maximum likelihood method or Bayesian method, to estimate the parameters.Figure 3.2 illustrates the distribution of estimated parameters. It can be seen thatwith a starting value −1.2, the Markov chain values of β0 soon move close to thetrue value −0.5 after a few hundred iterations, and remain around the true value asthe number iterations increases. Also notice that the Markov chain values of thedegrees of freedom is not very stable, however, most of the iterations will returnestimations that are reasonably close to the true value. Therefore, we could discardthe first K iterations as the burn-in period, where the K is determined by observa-tion, and use some robust statistic such as the median of Markov chain values ofeach parameter from remaining Markov chain values as our estimator of the param-24eter3. This estimator will help us to obtain a good estimation of each parameter. InTable 3.1 we present an example of the results of our algorithm. Here the length ofMarkov chain is 3000 iterations, and the length burn-in period is 500 iterations. InFigure 3.2 we also present the Markov values after each iteration.Table 3.1: An example of estimating parameters of an ARSV(1) process withεt ∼ N(0,1) and ηt ∼ standardized t5. Median of Markov chain valuesafter discarding the values from the first 1000 iterations. The values inthe parentheses are standard error of each estimator.True Value β0 =−0.5 β1 = 0.95 δ = 0.35 νη = 5estimation -0.544(0.080) 0.944(0.008) 0.360(0.039) 5.46(20.7)Figure 3.2: Illustration of Markov chain values of parameters after each it-eration. Each dot represents the estimated value of the correspondingparameter after each iteration. The red lines represent the true value.3For simplicity we use median as the estimator for all parameters. However, particularly for theestimator for degrees of freedom, mode is also recommended.253.2.5 Discussion on the FlexibilityDuring the process of writing this report, we find a paper by Fridman and Harris(1998) which also discusses a flexible inference method for ARSV process basedon the maximum likelihood approach. The new method discussed in this reportand their method achieve the flexibility of allowing arbitrary choices of innovationdistribution in a different way. For the method proposed by Fridman and Harris(1998), inference for ARSV(1) process is simplified to finding a good numericalapproximation method to evaluate the integration on a given probability densityfunction. However, our new method simplifies the inference for ARSV(1) processto inference for an AR(1) process with given distribution of error terms. Further-more, it is easy for the method proposed by Fridman and Harris (1998) to includethe ARCH term in the volatility process, while increasing the lags in the volatilityprocess will lead to an exponential growth in the complexity of this method. Ourmethod, on the other hand, can handle the inference of ARSV(p) process with triv-ial modifications. However, it might be more difficult for our method to estimateparameters for the ARSV model with the ARCH term in the volatility process.Therefore, these two methods can be complementary to each other. The choiceshould be made depending on the assumption of underlying process.3.2.6 AlgorithmThe algorithm for our new method contains two layers of loops. Before startingour iterations, first we need to get an initial estimation of {h(0)t } using some proxiessuch as the moving average of squared returns. Then we initialize the parameterset θ (1) = [β (1)0 ,β(1)1 ,δ(1),ν(1)η ] based on methods discussed in Section 3.2.4. Thisalgorithm contains two layers of loops. In the ith iteration of the external loop,we follow sampling method discussed in Section 3.2.1 and 3.2.3 to generate thesequence of {h(i)t } for t = 1,2, ...N. After the internal loop ends, we update param-eter set based on the newly sampled {h(i)t }, t = 1,2, ...N with methods discussedin Section 3.2.4. Detailed algorithm is given below, and simulation results arepresented in the next section.26Algorithm 1 Flexible Inference for ARSV Model Inference1: initialize ite, {h(0)t }, θ (1), T , T1, and T22: while i < ite do3: for t from 2 to T-1 do4: sample h′t from fht (·|h(i)t−1,θ (i))5: set α =fX (xt |h′t ) fht+1 (h(i−1)t+1 |h′t ,θ (i))fX (xt |h(i−1)t ) fht+1 (h(i−1)t+1 |h(i−1)t ,θ (i−1))6: calculate acceptance rate A = min{1,α}7: generate u from uni f (0,1)8: if u < A then h(i)t = h(i−1)t9: else h(i)t = h′t10: end if11: end for12: keep {h(i)t ,T1 ≤ t ≤ T2}13: update θ (i+1) by fitting an AR(1) process to {h(i)t ,T1 ≤ t ≤ T2}14: end while3.3 Comparison of Inference MethodsIn this section we present the results of the model inference. We show that ournew method can estimate the parameters of the classic ARSV(1) model as well asprevious methods, and can be applied to the extension of classic ARSV(1) whenprevious methods may fail.3.3.1 Parameter Estimation for Simulated DataFirst we want to show that our new approach works as well as previous methods.The simulated data is generated from the following process:Xt = σtεt (3.3)logσ2t =−0.5+0.95logσ2t−1+0.35ηt , (3.4)where εt ∼ N(0,1), ηt ∼ N(0,1), and ηt and εt are independent.We generate 200 datasets with length T = 2500 based on (3.4) and (3.5) with27different random seeds. We then run our new method for 5000 iterations and dis-card the first 2000 runs. We compare our results with the stochvol package (Kast-ner, 2016). This package is one of the latest packages for ARSV(1) model infer-ence, and we regard it as the representative of existing methods. stochvol packageassumes the second innovation distribution {ηt} of ARSV(1) model to be normal,and can work with both normal innovation distributions or a Student t first innova-tion and a normal second innovation.Figure 3.3: Distribution of estimated parameters based on simulated datagenerated from (3.4) and (3.5) with εt ∼ N(0,1) and ηt ∼ N(0,1). Esti-mates are from our new method and the stochvol package.Table 3.2: Median of estimated parameters for simulated data in Figure 3.3using our new method. The values in the parentheses are standard devia-tions of the 200 estimated values of each parameter.Method β0 =−0.5 β1 = 0.95 δ = 0.35stochvol -0.527(0.113) 0.947(0.011) 0.359(0.033)New Method -0.495(0.177) 0.950(0.017) 0.348(0.045)28The example above illustrates that our new method works as well as previousmethods for the inference of the classic ARSV(1) model. Although the standarddeviations of our estimated parameters are slightly larger, in general two methodsare very close to each other.Next we want to show that our method is more flexible than previous methods.We consider two different scenarios: (1) εt ∼ N(0,1) and ηt ∼ standardized t5; (2)εt ∼ standardized t5 and ηt ∼ standardized t5.In the first scenario, 300 datasets with length T = 2500 are generated. Wecompare the results in the same way as above.Figure 3.4: Distribution of estimated parameters based on simulated datagenerated from (3.4) and (3.5) with εt ∼ N(0,1) and ηt ∼standardized t5. Estimates are from our new method and the stochvolpackage.Table 3.3: Median of estimated parameters for simulated data in Figure 3.4.The values in the parentheses are standard deviations of the 300 estimatedvalues of each parameter.Method β0 =−0.5 β1 = 0.95 δ = 0.35 νη = 5stochvol -0.531(0.110) 0.947(0.011) 0.359(0.033) NANew Method -0.506(0.138) 0.950(0.014) 0.357(0.045) 6.54(5.15)29From Figure 3.4 and Table 3.3 we can see that estimated parameters fromboth our new method and the stochvol package are very close. However, our newmethod estimates the degrees of freedom of the second innovation distribution rea-sonably well, while the stochvol package has to assume that the second innovationis normally distributed.We do similar simulation study for the second case, and the results are pre-sented below. Note that we estimate the parameters of ARSV model using thestochvol package with two different model specifications, namely assuming εt ∼N(0,1) or εt ∼ standardized tνε , to show how bad the performance can be when wespecify the model incorrectly. The results are presented in Table 3.4.When both innovations follow Student t distribution, we can observe a muchbetter performance for our new method in comparison to the stochvol package.The difference is significant especially for estimating the shape parameter of εt , asthe result from our method is less biased with a much smaller standard deviation.We can also see that when the model specification is incorrect, the performance ofthe method implemented in stochvol package could be very poor.Next, we want to show that our new method is robust against the incorrectmodel specification. First, let us consider the case when the true model has bothinnovations normally distributed. We estimate the parameters by specifying bothinnovations as Student t distributed. The results are presented in Table 3.5.We also consider the case when the first innovation of the true model is nor-mally distributed and the second innovation distribution is a skewed Student t dis-tribution. The skewing parameter γ = 1.5, and to amplify the impact of the incor-rect model specification we assume that the standard deviation of ηt equals 0.5. Weestimate the parameters by assuming that the first innovation distribution is normalwhile the second innovation distribution is Student t. The results are presented inTable 3.6.30Table 3.4: Median of estimated parameters for simulated data with εt ∼ standardized t5 and ηt ∼ standardized t5. Thevalues in the parentheses are standard deviations of the estimated values of each parameter.Method β0 =−0.5 β1 = 0.95 δ = 0.35 νε = 5 νη = 5stochvol(t-Normal) -0.561(0.156) 0.946(0.015) 0.363(0.054) 5.44(4.58) NAstochvol(Normal-Normal) -1.098(0.312) 0.893(0.031) 0.575(0.064) NA NANew Method -0.502(0.154) 0.950(0.015) 0.357(0.059) 4.98(1.02) 5.37(4.78)Table 3.5: Median of estimated parameters for simulated data with εt ∼N(0,1) and ηt ∼N(0,1). However, we estimatethe parameters by assuming that εt ∼ standardized tνε and ηt ∼ standardized tνη . The values in the parentheses arestandard deviations of the estimated values of each parameter.True Value β0 =−0.5 β1 = 0.95 δ = 0.35 νε = ∞ νη = ∞Estimated Value -0.493(0.139) 0.951(0.014) 0.344(0.042) 9.39(5.96) 92.22(44.6)Table 3.6: Median of estimated parameters for simulated data with εt ∼N(0,1) and ηt ∼ skewed− t(0,1,5,1.5). How-ever, we estimate the parameters by assuming that εt ∼ N(0,1) and ηt ∼ standardized tνη . The values in theparentheses are standard deviations of the estimated values of each parameter.True Value β0 =−0.5 β1 = 0.95 δ = 0.5 νε = 5 γ = 1.5Estimated Value -0.545(0.129) 0.946(0.013) 0.521(0.051) 4.93(5.19) NA31The two examples above suggest that our new method is robust against the in-correct model specification. Note that our method can work for arbitrary choices ofinnovation distributions. Therefore, we can always “over-specify” our model withmore general innovation distributions to achieve robustness. For example, in theexample illustrated in Table 3.4, we estimate the degrees of freedom for ηt as 9.39with standard deviation 5.96 and the degrees of freedom for εt as 92.22 with stan-dard deviation 44.6 while these two innovations are actually normally distributed.However, a Student t distribution with such a large degrees of freedom would prac-tically behave very similar to a normal distribution. Therefore, our method allows amore generalized model specification which brings robustness, while the previousmethods do not enjoy this flexibility.3.3.2 Parameter Estimation for the S&P 500 IndexIn Fridman and Harris (1998), results for the model inference are compared acrossseveral methods utilizing a Bayesian approach, semi-maximum likelihood approachand maximum likelihood approach. Here we compare the estimated parametersfrom the three methods mentioned above with results from our proposed methodand results from the stochvol package.We fit an ARSV(1) model to the daily log-return of the S&P 500 Index from1980 to 1987. In total, 2022 observations are used for model inference. We esti-mate the parameters following both the traditional model assumptions that the twoinnovation distributions are Gaussian. The results are presented in Table 3.7. Dueto the difference approach in achieving flexible model assumption, in Fridman andHarris (1998) the authors do not fit an ARSV(1) process with light-tailed first in-novation and heavy tailed second innovation. They consider the scenario that thefirst innovation is Student t distributed while the second innovation is normally dis-tributed, which can also be fitted using the stochvol package. We fit the data withthe same model assumption as well as a more generalized model assumption thatboth innovations are Student t distributed. The results are presented in Table 3.8.32Table 3.7: Summaries statistics of estimated parameters with the model as-sumption that both innovations are normally distributed. The “n” incolumn names stands for “normal”. The values in the parentheses areasymptotic standard deviations for SML and ML methods, posteriorstandard deviations for Bayes method and the method implemented instochvol package, and interquartile range (IQR) for the new approach.Bayes SML ML(n-n) stochvol New Method(n-n)β0 -.002(.004) -.002(.004) -.002(.0004) -.270(.007) -.010(.022)β1 .970(.008) .958(.014) .959(.005) .971(.001) .989(.003)δ .150(.017) .161(.026) .159(.009) .153(.002) .071(.014)Table 3.8: Summaries statistics of estimated parameters with the model as-sumption that the first innovation distribution follows a Student t distribu-tion. The values in the parentheses are asymptotic standard deviations forML methods, posterior standard deviations for the method implementedin stochvol package, and interquartile range (IQR) for the new approach.ML(t-n) stochvol New Method(t-n) New Method(t-t)β0 -.0038(.0013) -.1384(.0108) -.1020(.0473) -.1718(.2979)β1 .9813(.0056) .9855(.0011) .9897(.0054) .9821(.0312)δ .0942(.0199) .1016(.0042) .0799(.01853) .1153 (.0958)νε 10.39(5.88) 11.42(1.76) 10.11(1.07) 10.47(1.75)νη NA NA NA 2.50(1.96)Note that our new approach achieves the flexibility of arbitrary choices of in-novations at the cost of stability and efficiency. Therefore, outliers appear a fewtimes when estimating the parameters. More robust summary statistics such asmedian/IQR are preferred over mean/standard deviation. The results presented inTable 3.7 and Table 3.8 show that the results from our method are close to thosefrom existing method, except that the estimated δ using our method in Table 3.7is significantly lower than those using other methods. The reason requires furtherstudy.33Chapter 4Conditional Risk Measurementwith the ARSV ModelOne of the applications for volatility models is to estimate the potential risk givenknown information. In general, there are two approaches to risk measures, namelyconditional risk measures and unconditional risk measures. When we are dis-cussing GARCH process and ARSV process in Chapter 2, we assume that Ft−1,the sigma algebra of all available information up to time t − 1, is known. If wefurther assume that the distribution of the return Xt is conditioned on Ft−1, thenour measure of risk at time t should be conditioned onFt−1. On the other hand, wecan also measure the unconditional risk from the unconditional distribution of Xt .One can choose whether to use the conditional or unconditional distribution for riskmeasure forecasting. In this chapter we adopt the conditional estimation approach,and focus on the Value-at-Risk and Conditional Value-at-Risk risk measures underthe ARSV process.4.1 Value-at-Risk forecasting under the ARSV ModelThe Value-at-Risk (VaR) is arguably one of the most widely used risk measures,and for a confidence level α ∈ (0,1) it is defined as:VaRα(X) = inf{x : P(X ≥ x)≤ 1−α}, (4.1)34and typically α is close to 1 (e.g., 0.9 or 0.95). For an ARSV(1) model, Xt = σtεt ,where σt and εt are independent random variables. Since σt > 0 almost surely, wehaveP(Xt ≥ x|Ft−1) = P(σtεt ≥ x|Ft−1)=∫ ∞0P(εt ≥ x/s) fσt |Ft−1(s|Ft−1)ds, (4.2)where fσt is the conditional probability density function of σt given Ft−1 is theinformation set at time t−1.Since logσ2t = β0+β1 logσ2t−1+ηt ,P(σt ≤ s|Ft−1) = P(logσ2t ≤ logs2|Ft−1)= P(β0+β1 logσ2t−1+δηt ≤ logs2|Ft−1)= P(ηt ≤ (logs2−β0−β1 logσ2t−1)/δ |Ft−1)= Fη((logs2−β0−β1 logσ2t−1)/δ ),where Fη is the cumulative distribution function of a standardized Student’s t dis-tribution with zero mean, unit variance and νη degrees of freedom and assumingthat Ft−1 contains σt−1. Hence, the conditional density of σt is given byfσt |Ft−1(s|Ft−1) = fη((logs2−β0−β1 logσ2t−1)/δ )2δ s, s > 0..Therefore, (4.2) can be written asP(Xt ≥ x|Ft−1)=2δ∫ ∞0P(εt ≥ x/s) fη((logs2−β0−β1 logσ2t−1)/δ )1sds. (4.3)With estimated parameters θˆ = [βˆ0, βˆ1, δˆ , νˆη ] and estimated volatility σˆt−1, wecan plug (4.3) into (4.1) and solve for VaRα(Xt) using numerical methods. Param-eters θˆ can be estimated by the method we described in Chapter 3. The volatilityσˆt−1 can be estimated separately using unbiased volatility estimators such as av-erage squared returns or EWMA, or using inference methods for volatility models35that provide volatility estimation.4.2 CoVaR forecasting: GARCH Model vs ARSV ModelAdrian and Brunnermeier (2011) were first to define Conditional Value-at-Risk(CoVaR) as:CoVaR(1)α (Xt+1) = inf{x : P(Xt+1 ≥ x)≤ 1−α|Xt = VaRα ′(Xt),Ft−1}, (4.4)where both α and α ′ are two constants that are close to 1 and VaRα is defined inSection 4.1. For simplicity, we assume α = α ′.Girardi and Ergu¨n (2013) modify the definition of CoVaR to:CoVaR(2)α (Xt+1) = inf{x : P(Xt+1 ≥ x)≤ 1−α|Xt ≥ VaRα(Xt),Ft−1}. (4.5)The one-step ahead forecasts of VaRα are based on the estimate of α-quantile ofthe distribution of Xt give the historyFt−1. CoVaRα forecasting, on the other hand,makes a two-step forward estimation of the conditional quantile of the distributionof Xt+1 conditioned on Xt and history Ft−1. So instead of measuring the potentiallarge loss or gain on the next day as VaR does, CoVaR measures the potential con-secutive large gains for two days. In this section we discuss the CoVaR estimationunder both GARCH and ARSV models for both CoVaR definitions.4.2.1 First Definition of CoVaRFor the GARCH(1,1) process, given all information up to time t−1 and σt−1, thevolatility at time t is fixed as√α0+ασ2t−1+βX2t−1. Suppose that Xt =VaRα(Xt)=:36v , then σt+1 is also fixed and equals√α0+ασ2t +βv2. ThenP(Xt+1 ≥ x|Xt = v ,Ft−1)= P(σt+1εt+1 ≥ x|Xt = v ,Ft−1)= P(εt+1 ≥ x/σt+1|Xt = v ,Ft−1) (4.6)= P(εt+1 ≥ x/√α0+ασ2t +βv2|Ft−1)= Pεt+1 ≥ x√α0(α+1)+α2σ2t−1+αβX2t−1+βv2 . (4.7)Substituting (4.7) into (4.4), we can solve for CoVaR(1)α (Xt+1) numerically when{Xt} follows a GARCH(1,1) process.For the ARSV(1) process, given the same information and assuming Xt = v ,the value of σt+1 is no longer fixed. Therefore, expression (4.6) becomesP(Xt+1 ≥ x|Xt = v ,Ft−1)=∫ ∞0P(εt+1 ≥ x/s|Xt = v ,Ft−1,σt+1 = s) fσt+1|Xt=v ,Ft−1(s|Xt = v ,Ft−1)ds=∫ ∞0P(εt+1 ≥ x/s) fσt+1|Xt=v ,Ft−1(s|Xt = v ,Ft−1)ds, (4.8)where fσt+1|Xt=v ,Ft−1 is the conditional probability density function of σt+1 givenXt andFt−1. In order to evaluate (4.8), we need to find this conditional probabilitydensity function. Since for s > 0,P(σt+1 ≤ s|Xt = v ,Ft−1) = P(logσ2t+1 ≤ logs2|Xt = v ,Ft−1),we havefσt+1|Xt=v ,Ft−1(s|Xt = v ,Ft−1) = fht+1|Xt=v ,Ft−1(logs2|Xt = v ,Ft−1)2s, s > 0.37The conditional cumulative distribution function of ht+1 isFht+1|Xt=v ,Ft−1(u|Xt = v ,Ft−1) = P(ht+1 ≤ u|Xt = v ,Ft−1)= P(β0+2β1 logXtεt+δηt+1 ≤ u|Xt = v ,Ft−1)(4.9)= P(ηt+1 ≤ (u−β0−2β1 log vεt )δ )=∫ ∞−∞P(ηt+1 ≤ (u−β0−2β1 log vz )/δ ) fε(z)dz.Therefore,fht+1|Xt=v ,Ft−1(logs2|Xt = v ,Ft−1)= 1δ∫ ∞−∞fη((logs2−β0−2β1 log vz )/δ ) fε(z)dz,where fη is the probability density function of ηt .Then, (4.8) can be expressed asP(Xt+1 ≥ x|Xt = v ,Ft−1)=2δ∫ ∞0∫ ∞−∞P(εt+1 ≥ x/s) fη((logs2−β0−2β1 log vz )δ ) fε(z)1sdzds. (4.10)Substituting (4.10) into (4.4), we can solve for CoVaR(1)α (Xt+1) numerically when{Xt} follows an ARSV(1) process.4.2.2 Second Definition of CoVaRWe can also find the CoVaR under the second definition modified by Girardi andErgu¨n (2013) (see eq. (4.5)) in a similar way as discussed in Section 4.2.1. How-ever, we have a different conditioning event that Xt ≥ v instead of Xt = v .For the GARCH(1,1) process, σt is still fixed as√α0+α1σ2t−1+β1X2t−1. How-38ever, σt+1 now is a random variable√α0+α1σ2t +β1X2t . We haveP(Xt+1 ≥ x|Xt ≥ v ,Ft−1)= P(εt+1 ≥ x/√α0+α1σ2t +β1X2t |Xt ≥ v ,Ft−1)=∫ ∞vP(εt+1 ≥ x/√α0+α1σ2t +β1w2) fXt |Xt≥v ,Ft−1(w|Xt ≥ v ,Ft−1)dw. (4.11)Notice that for w≥ vP(Xt ≤ w|Xt ≥ v ,Ft−1) = P(Xt ≤ w,Xt ≥ v |Ft−1)P(Xt ≥ v |Ft−1)=1αP(v ≤ Xt ≤ w|Ft−1) by the definition of VaRα=1αP(v ≤ εtσt ≤ w|Ft−1)=1αP(v/σt ≤ εt ≤ w/σt |Ft−1)=1α(Fεt (w/σt)−Fεt (v/σt)). (4.12)From (4.12) we obtainfXt |Xt≥v ,Ft−1(w|Xt ≥ v ,Ft−1) =1ασtfε(w/σt), w≥ v . (4.13)With (4.13), (4.11) can be expressed asP(Xt+1 ≥ x|Xt ≥ v ,Ft−1)=1ασt∫ ∞vP(εt+1 ≥ x/√α0+α1σ2t +β1w2)fε(w/σt)dw. (4.14)Substituting (4.14) into (4.5), we can solve for CoVaR(2)α (Xt+1) numerically when{Xt} follows a GARCH(1,1) process.In order to find the CoVaR(2)α (Xt+1) under an ARSV(1) process, we can start39from (4.9). Under the condition that Xt ≥ v , (4.9) now becomesP(ht+1 ≤ u|Xt ≥ v ,Ft−1)= P(β0+2β1 logXtεt+δηt+1 ≤ u|Xt ≥ v ,Ft−1)= P(ηt+1 ≤ (u−β0−2β1 log Xtεt )/δ |Xt ≥ v ,Ft−1)=∫ ∞vP(ηt+1 ≤ (u−β0−2β1 log wεt )/δ ) fXt |Xt≥v ,Ft−1(w|Xt ≥ v ,Ft−1)dw.(4.15)In (4.15), the conditional probability density function of Xt , fXt |Xt≥v ,Ft−1(w|Xt ≥v ,Ft−1), is unknown. However,fXt |Xt≥v ,Ft−1(w|Xt ≥ v ,Ft−1) =1αfXt |Ft−1(w|Ft−1), w≥ v .Then, (4.15) becomesP(ht+1 ≤ u|Xt ≥ v ,Ft−1)=∫ ∞vP((ηt+1 ≤ u−β0−2β1 log wεt )/δ )1αfXt |Ft−1(w|Ft−1)dw=1α∫ ∞−∞∫ ∞vP(ηt+1 ≤ (u−β0−2β1 log wz )/δ ) fXt |Ft−1(w|Ft−1) fε(z)dwdz.Therefore,fht+1|Xt+1≥v ,Ft−1(u|Xt+1 ≥ v ,Ft−1)=1αδ∫ ∞−∞∫ ∞vfη((u−β0−2β1 log wz )/δ ) fXt |Ft−1(w|Ft−1) fε(z)dwdz. (4.16)The conditional cumulative distribution function of Xt isP(Xt ≤ w|Ft−1) = P(σtεt ≤ w|Ft−1)=∫ ∞0P(εt ≤ w/r) fσt |Ft−1(r|Ft−1)dr,40where fσt |Ft−1(r|Ft−1) = 2r fη(logr2−β0−β1 logσ2t−1) since for r > 0,P(σt ≤ r|Ft−1) = P(logσ2t ≤ logr2|Ft−1)= P(β0+β1 logσ2t−1+δηt ≤ logr2|Ft−1)= P(ηt ≤ (logr2−β0−β1 logσ2t−1)/δ ).SofXt |Ft−1(w|Ft−1) =1δ∫ ∞0fε(w/r) fη(logr2−β0−β1 logσ2t−1)2r2dr,and (4.16) can be expressed asfht+1|Xt+1≥v ,Ft−1(u|Xt+1 ≥ v ,Ft−1)=1αδ 2∫ ∞−∞∫ ∞v∫ ∞0fη((u−β0−2β1 log wz )/δ ) fε(w/r)fη((logr2−β0−β1 logσ2t−1)/δ ) fε(z)2r2drdwdz.Now (4.8) becomesP(Xt+1 ≥ x|Xt ≥ v ,Ft−1)=4αδ 2∫ ∞0∫ ∞−∞∫ ∞v∫ ∞0P(εt+1 ≥ x/s) fη((logs2−β0−2β1 log wz )/δ ) fε(w/r)fη((logr2−β0−β1 logσ2t−1)/δ ) fε(z)1r2sdrdwdzds. (4.17)Substituting (4.17) into (4.5), we can solve for CoVaR(2)α (Xt+1) numerically when{Xt} follows an ARSV(1) process.However, given the numerical complexity of the analytic expression, whichinvolves a 4-dimensional integral, we do not follow this approach. Instead, wepropose a simulation based computation that is introduced in the next section.4.3 Simulation Methods to Find CoVaRThe tail dependence properties of different models have a strong impact on thejoint distribution of the (Xt ,Xt+1) pair. To illustrate the difference in the estimation41of CoVaR under the GARCH model and ARSV model, we will only focus onthe second definition of CoVaR modified by Girardi and Ergu¨n (2013). However,it could be difficult to find the CoVaR using numerical method, especially underthe second definition of CoVaR. We introduce a simulation-based methods as analternative for find CoVaR. Since CoVaR can be seen as the conditional quantileof Xt+1 given Ft−1, our goal here is to sample Xˆt+1 with information up to timet−1. Then it will be easy to find the empirical quantile of {Xˆt+1} as our estimatedCoVaR.For GARCH process, given Ft−1 and estimated volatility at time t− 1 σˆt−1,σˆ2t = α0 +α1σˆ2t−1 + β1X2t−1 and Xˆt = σˆtεt ∼ N(0, σˆ2t ). Therefore, the conditionthat Xˆt ≥VaRα(Xˆt) suggests that εt ≥VaRα(εt). So we can sample εt first, and forthose εt’s that are greater than VaRα(εt) we further calculate σˆt+1 asσˆt+1 =√α0+α1σˆ2t +β1Xˆ2t=√α0+α1σˆ2t +(β1σˆ2t )ε2t .Then Xˆt+1 = σˆt+1εt+1 can be calculated by sampling another εt+1 from the distri-bution of {εt}. Details about estimating the CoVaR under the GARCH process asdescribed in Algorithm 2.Algorithm 2 Estimation CoVaR Using Simulation under a GARCH(1,1) Process1: initialize αˆ0, αˆ1, βˆ1, σˆt−1, Xt−1, N >> 11−α , the cumulative distribution func-tion of ε (Fε ), and α ∈ (0,1).2: Calculate σˆt as√αˆ0+ αˆ1σˆ2t−1+ βˆ1X2t−13: for i from 1 to N do4: Generate pt from Uniform(α,1)5: Find ε ′t such that Fε(ε ′t ) = pt6: Let Xˆt = σˆtε ′t7: Let σˆt+1 =√αˆ0+ αˆ1σˆ2t + βˆ1Xˆ2t8: Generate pt+1 from Uniform(0,1) and find εt+1 = F−1ε (pt+1)9: Calculate Xˆt+1,i = σˆt+1εt+110: Save Xˆt+1,i11: end for12: Find CoVaRα(Xˆt+1) as the α th quantile of {Xˆt+1,i, i = 1, ...,N}42For the ARSV(1) model, since the AR process of logσ2t ’s does not dependon the value of Xt , we need to simulate Xt+1 in a different way. The idea here isto sample ηt and calculate the corresponding σˆt following the AR process. Thenwe sample εt to calculate Xˆt = σˆtεt . We repeat the two steps above until this Xˆtis greater than VaRα(Xt), which can be estimated based on a rolling window ofhistorical data. Then with this σˆt and a newly sampled ηt+1 we can calculate theσˆt+1 and sample εt+1 to find Xˆt+1. The details are described in Algorithm 3.Algorithm 3 Simulating CoVaR under the ARSV Process1: initialize βˆ0, βˆ1, distribution of εt (Fε ), distribution of ηt (Fη ), σˆt−1, Xt−1,α ∈ (0,1), vˆ = VaRα from historical data, and N >> 11−α2: for i from 1 to N do3: Initialize Xˆt = 2|vˆ |4: while Xˆt < vˆ do5: Sample ηt ∼ Fη6: Let σˆt =√exp(βˆ0+ βˆ1 log σˆ2t−1+ηt)7: Sample εt ∼ Fε8: Calculate Xˆt = σˆtεt9: end while10: Sample ηt+1 ∼ Fη11: Let σˆt+1 =√exp(βˆ0+ βˆ1 log σˆ2t +ηt+1)12: Sample εt+1 ∼ Fε13: Calculate Xˆt+1,i = σˆt+1εt+114: Save Xˆt+1,i15: end for16: Find CoVaRα(Xt+1) as the α th quantile of {Xˆt+1,i : i = 1, ...,N}The results of comparisons of estimated VaR and CoVaR under the seconddefinition are shown in the next section.434.4 Comparison of VaR and CoVaR Forecasts UnderGARCH and ARSV ProcessesIn this section we present the results of risk forecasts from both simulated data andreal data. There are two scenarios of data generating processes for the simulationstudy:1. An ARSV(1) process:Xt = σtεtlogσ2t =−0.5+0.95logσ2t−1+0.35ηt .where εt ∼ N(0,1), ηt ∼ t5, and ηt and εt are independent.2. A GARCH(1,1) process:Xt = σtεtσ2t = 5×10−6+0.85σ2t−1+0.1X2t−1,where εt ∼ N(0,1).For the data example, we use the daily log-returns of the S&P 500 Index from1980 to 1987 to estimate parameter to forecast risk measures of the daily log-returns from 1988 to 2003.4.4.1 Simulation StudyValue-at-RiskIn the simulation study, we generate the data from both scenarios. The length ofeach generated dataset is 4500 after the burn-in period. We use the first 1000 obser-vations to estimate parameters of the ARSV(1) process with normally distributedfirst innovation and Student’s t distributed second innovation. Then we estimate theVaR for the rest of the data using a rolling window of size 1000. Note that whenestimating VaR at time t, we need to know the volatility at time t−1 first. However,our inference method cannot estimate σt−1 since we need to discard the last part of44estimated volatilities in our algorithm. Therefore, we need to use some external es-timators for σt−1 when estimating VaR. In this project we choose σt−1’s estimatedwhen fitting the data in the rolling window to the GARCH model. There are tworeasons for us to do so: first, it is more convenience since we are also estimatingthe VaR under the GARCH process. Second, based on some empirical analysis, wefound that σt−1’s estimated by fitting a GARCH(1,1) model are more accurate thanthose estimated by some unbiased but noisy model-free estimators. For example,in this simulated dataset, the root MSE of estimated σt−1 from GARCH(1,1) modelis√3.7×10−5 compared with√5.0×10−5 which is the MSE of estimated σt−1from the 5-day close-to-close estimator.We first compare the estimated VaR based on both ARSV(1) model andGARCH(1,1) model. The VaR forecasts under the GARCH(1,1) model are esti-mated using model-based method (McNeil, Frey, and Embrechts, 2015b), whilethe VaR forecasts under the ARSV(1) model are estimated by finding the numer-ical solutions discussed in Section 4.1. The results are shown in Figure 4.1 andFigure 4.2. From the plots it is hard to observe obvious differences between theforecasts estimated under the ARSV(1) model and those under the GARCH(1,1)model. We also present the results of traditional backtests in Table 4.1, and theresults of conditional predictive ability tests (Giacomini and White, 2006) in Table4.2. The scoring function we use here is the piece-wise linear scoring functionsuggested by Gneiting (2011). From Table 4.1 we can see that at both the 95%level and the 99% level, the VaR forecasts estimated under the ARSV(1) modeland those under the GARCH(1,1) model can pass the traditional backtests no mat-ter what the true underlying process is. However, in Table 4.2 we can find that atthe 99% level the GARCH(1,1) model has a stronger conditional predictive abilitythan the ARSV(1) when the true data generating process is a GARCH(1,1) pro-cess. Otherwise, there is no significant difference between the forecasts from anARSV(1) model and those from a GARCH(1,1) model.CoVaRThe difference is more obvious when we forecast CoVaR with different filters. Wefirst generate a dataset from the ARSV(1) process as specified in Scenario 1. at45the beginning of Section 4.4. We then estimate CoVaR under the second definitionmodified by Girardi and Ergu¨n (2013) using the simulation methods described inAlgorithm 2 and Algorithm 3. The forecasted CoVaR under the GARCH processand the ARSV process are shown in Figure 4.3 and Figure 4.4. It can be observedthat the CoVaR forecasts estimated from the GARCH(1,1) model are in generalhigher that those from the ARSV(1) model. We can also see that the large differ-ence between the two estimated CoVaR is highly correlated with the large returnsquared.Figure 4.1: Estimated 95% and 99% VaR forecasts for the simulatedARSV(1) process. Black lines represent the simulated daily returns,red lines represent the VaR estimated under the GARCH model, andblue lines represent the VaR under the ARSV model. The left panel il-lustrates the case of 95% level, while the right panel illustrates the caseof 99% level.Figure 4.2: Estimated 95% and 99% VaR forecasts for the simulatedGARCH(1,1) process. Black lines represent the simulated daily returns,red lines represent the VaR estimated under the GARCH model, andblue lines represent the VaR under the ARSV model. The left panel il-lustrates the case of 95% level, while the right panel illustrates the caseof 99% level.46Table 4.1: Violation rates and corresponding p-values of likelihood-ratio tests for VaRα forecasts at 95% and 99%levels for simulated Data. Column names are the underlying processes and corresponding risk levels 1−α , rownames are the models used to forecast the risk measures.ARSV(1)-95% ARSV(1)-99% GARCH(1,1)-95% GARCH(1,1)-99%ARSV(1) 4.35% (0.153) 1.19% (0.386) 4.24% (0.074) 0.64% (0.053)GARCH(1,1) 5.35% (0.452) 1.05% (0.806) 5.24% (0.585) 1.04% (0.842)Table 4.2: Mean piece-wise linear scores and corresponding p-values of conditional predictive ability tests for VaRαforecasts at 95% and 99% levels for simulated Data. Column names are the underlying processes and correspond-ing risk levels 1−α , row names are the models used to forecast the risk measures.ARSV(1)-95% ARSV(1)-99% GARCH(1,1)-95% GARCH(1,1)-99%ARSV(1) 9.93×10−40.8352.85×10−40.2371.03×10−30.1442.763×10−40.001GARCH(1,1) 9.97×10−4 2.95×10−4 1.033×10−4 2.756×10−447Figure 4.3: Estimated 95% and 99% CoVaR forecasts for the simulatedARSV(1) process. In the left column, black lines represent the simu-lated daily returns, red lines represent the CoVaR estimated under theGARCH model, and blue lines represent the CoVaR under the ARSVmodel. In the right column, the black lines represent squared return,and blue beams indicate top 5% largest differences in the estimated Co-VaR between GARCH(1,1) model and ARSV(1) model. The top panelsillustrate the case of 95% level, while the bottom panels illustrates thecase of 99% level.48Figure 4.4: Estimated 95% and 99% CoVaR forecasts for the simulatedGARCH(1,1) process. In the left column, black lines represent the sim-ulated daily returns, red lines represent the CoVaR estimated under theGARCH model, and blue lines represent the CoVaR under the ARSVmodel. In the right column, the black lines represent squared return,and blue beams indicate top 5% largest differences in the estimated Co-VaR between GARCH(1,1) model and ARSV(1) model. The top panelsillustrate the case of 95% level, while the bottom panels illustrates thecase of 99% level.4.4.2 Data ExampleTo further compare the VaR and CoVaR forecasts under the GARCH model andthe ARSV model, we apply the forecasting methods to daily log-returns of S&P500 Index from 1988 to 2003 with a rolling window of size 1000. The forecastingprocess is the same as in Section 4.4.1, and the results are presented below.For the VaR forecasts, we can find that there is no significant difference be-tween the values estimated from the ARSV(1) model and those from theGARCH(1,1) model from Figure 4.5. Both models can provide good forecasts49Table 4.3: Violation rates and corresponding p-values of likelihood-ratio testsfor VaRα forecasts at 95% and 99% levels for daily log-returns of S&P500 Index. Column names are the risk levels 1−α , row names are themodels used to forecast the risk measures.S&P 500-95% S&P 500-99%ARSV(1) 4.96% (0.929) 1.31% (0.109)GARCH(1,1) 5.24% (0.345) 1.34% (0.077)Table 4.4: Mean piece-wise linear scores and corresponding p-values forcomparison of two forecast methods of conditional predictive ability testsfor VaRα forecasts at 95% and 99% levels for daily log-returns of S&P500 Index. Column names are the risk levels 1−α , row names are themodels used to forecast the risk measures.S&P 500-95% S&P 500-99%ARSV(1) 1.06×10−30.4612.85×10−40.506GARCH(1,1) 1.05×10−3 2.79×10−4of VaR and pass the traditional backtests (Table 4.3). There is also no significantdifference in conditional predictive ability (Table 4.4).In Figure 4.6 we show the estimated CoVaR forecasts from the ARSV(1) modeland GARCH(1,1) model. We can observe a more obvious difference between es-timated CoVaR forecasts in the data example, as the values estimated under theGARCH(1,1) model are consistently higher that those under the ARSV(1) model.A possible explanation for this consistent difference could be that the GARCH(1,1)process is asymptotically tail dependent. However, the ARSV(1) process withheavy-tailed second innovation, although has stronger tail dependence than theclassic ARSV(1) model, is still asymptotically tail independent. Recall that CoVaRis the conditional quantile of Xt+1. Therefore, when conditioned on Xt ≥VaRα(Xt)for some α close to 1, Xt+1 is more likely to also be a large value under the GARCHprocess than under the ARSV process. So the α th quantile of forecasted Xt+1 isgreater under the GARCH process than that under the ARSV process.50Figure 4.5: Estimated 95% and 99% VaR forecasts for daily log-returns ofS&P 500 Index. Black lines represent the simulated daily returns, redlines represent the VaR estimated under the GARCH model, and bluelines represent the VaR under the ARSV model. The top panel illustratesthe case of 95% level, while the bottom panel illustrates the case of 99%level.51Figure 4.6: Estimated 95% and 99% CoVaR forecasts for daily log-returnsof S&P 500 Index. In the left column, black lines represent the simu-lated daily returns, red lines represent the CoVaR estimated under theGARCH model, and blue lines represent the CoVaR under the ARSVmodel. In the right column, the black lines represent squared return,and blue beams indicate top 5% largest differences in the estimated Co-VaR between GARCH(1,1) model and ARSV(1) model. The top panelsillustrate the case of 95% level, while the bottom panels illustrates thecase of 99% level.52Chapter 5DiscussionEmpirical evidence suggests that consecutive log-return values may be tail inde-pendent. Thus the GARCH process might not be a suitable choice in modelingconsecutive losses, as it may over-estimate the potential risk at extremal levels.However, the tail dependence should be stronger at sub-extremal levels than can bemodelled by the classic ARSV process with light-tailed second innovation. There-fore, modelling log-returns using the classic ARSV process may lead to under-estimation of the probability of consecutive large losses.In this thesis report we propose an extension of an ARSV model by takingthe second innovation to be Student’s t distributed. We conjecture that this modelexhibits stronger tail dependence at sub-extremal levels than the classic ARSVmodel with normally distributed second innovation. Our conjecture is supportedvia simulation.However, most existing inference methods for the ARSV process have limitedflexibility in the choice of innovation distributions. Most of these methods requirethe second innovation to be normal, and the first innovation to be either normalor Student’s t. There are only few methods allow the second innovation to beheavy-tailed. In this report, we develop a new inference method for the extendedARSV(1) process which also allows flexible distributional assumptions on bothinnovations. This new method works as well as existing methods in estimatingparameters for the classic ARSV(1) model. It can also successfully estimate pa-rameters of the extended ARSV(1) process we consider, which is out of the range53of most existing methods. Furthermore, it has the potential for being adjusted toother extensions of the classic ARSV(1) process. For example, the same schemecan be applied to estimating parameters of an ARSV(p) process for any arbitraryinteger p ≥ 1 and non-Gaussian second innovation, as long as we know how toestimate parameters of the AR(p) process with this desired non-Gaussian error dis-tribution. We hope this new inference method will provide a useful tool for futurestudy of ARSV models.We also study the VaR and CoVaR risk measures under the ARSV process andcompare them with those under the GARCH process. We show that there is nosignificant difference between estimated VaR under GARCH process and ARSVprocess. However, we can observe a large difference between the CoVaR estimatedunder these two processes. This difference reveals the impact of tail dependenceproperties implied by a chosen model.For future work, we first need to prove that χ¯ covers the full spectrum of sub-extremal tail dependence; i.e., 0 < χ¯ < 1, for the extended model. And with thesupport of a more grounded theory, we also need to improve the computational effi-ciency. Currently our new inference method is much slower than existing inferencemethods that similarly require an MCMC scheme. In many cases, the computingspeed of the proposed method is up to 100 times slower. Another shortcoming ofthe new method is that it only provides parameter estimates but volatility estimatescan only be obtained for a sub-period of the historical record.54BibliographyT. Adrian and M. K. Brunnermeier. CoVaR. Technical report, National Bureau ofEconomic Research, 2011. → pages 36B. Basrak, R. A. Davis, and T. Mikosch. Regular Variation of GARCH Processes.Stochastic Processes and Their Applications, 99:95–115, 2002. → pages 8T. Bollerslev. Generalized Autoregressive Conditional Heteroskedasticity.Journal of Econometrics, 31:307–327, 1986. → pages 2, 7F. J. Breidt and R. A. Davis. Extremes of Stochastic Volatility Models. Annals ofApplied Probability, 8:664–675, 1998. → pages 9C. Broto and E. Ruiz. Estimation Methods for Stochastic Volatility Models: ASurvey. Journal of Economic Surveys, 18:613–649, 2004. → pages 17S. Coles, J. Heffernan, and J. A. Tawn. Dependence Measures for Extreme ValueAnalyses. Extremes, 2:339–365, 1999. → pages 6H. Drees, J. Segers, and M. Warchoł. Statistics for Tail Processes of MarkovChains. Extremes, 18:369–402, 2015. → pages 2R. F. Engle. Autoregressive Conditional Heteroscedasticity with Estimates of theVariance of United Kingdom Inflation. Econometrica: Journal of theEconometric Society, 50:987–1007, 1982. → pages 2, 7M. Fridman and L. Harris. A Maximum Likelihood Approach for Non-GaussianStochastic Volatility Models. Journal of Business & Economic Statistics, 16:284–291, 1998. → pages 26, 32R. Giacomini and H. White. Tests of Conditional Predictive Ability.Econometrica, 74:1545–1578, 2006. → pages 4555G. Girardi and A. T. Ergu¨n. Systemic Risk Measurement: Multivariate GARCHEstimation of CoVaR. Journal of Banking & Finance, 37:3169–3180, 2013. →pages 36, 38, 42, 46T. Gneiting. Making and Evaluating Point Forecasts. Journal of the AmericanStatistical Association, 106:746–762, 2011. → pages 45G. K. Grunwald, R. J. Hyndman, L. Tedesco, and R. L. Tweedie. Theory &Methods: Non-Gaussian Conditional Linear AR(1) Models. Australian & NewZealand Journal of Statistics, 42:479–495, 2000. → pages 24A. Harvey, E. Ruiz, and N. Shephard. Multivariate Stochastic Variance Models.The Review of Economic Studies, 61:247–264, 1994. → pages 8J. B. Hill. Extremal Memory of Stochastic Volatility with an Application to TailShape Inference. Journal of Statistical Planning and Inference, 141:663–676,2011. → pages 9E. Jacquier, N. G. Polson, and P. E. Rossi. Bayesian Analysis of StochasticVolatility Models. Journal of Business & Economic Statistics, 12:69–87, 1994.→ pages 8, 15, 16, 17E. Jacquier, N. G. Polson, and P. E. Rossi. Bayesian Analysis of StochasticVolatility Models with Fat-tails and Correlated Errors. Journal ofEconometrics, 122:185–212, 2004. → pages 16A. Janssen and H. Drees. A Stochastic Volatility Model with Flexible ExtremalDependence structure. Bernoulli, 22:1448–1490, 2016. → pages 3, 7, 10H. Joe. Multivariate Models and Multivariate Dependence Concepts. CRC Press,1997. → pages 5G. Kastner. stochvol: Efficient Bayesian Inference for Stochastic Volatility (SV)Models, 2016. URL https://CRAN.R-project.org/package=stochvol. R packageversion 1.3.1. → pages 28G. Kastner and S. Fru¨hwirth-Schnatter. Ancillarity-sufficiency InterweavingStrategy (ASIS) for Boosting MCMC Estimation of Stochastic VolatilityModels. Computational Statistics & Data Analysis, 76:408–423, 2014. →pages 17F. Laurini and J. A. Tawn. Regular Variation and Extremal Dependence ofGARCH Residuals with Application to Market Risk Measures. EconometricReviews, 28:146–169, 2008. → pages 7, 856A. W. Ledford and J. A. Tawn. Statistics for Near Independence in MultivariateExtreme Values. Biometrika, 83:169–187, 1996. → pages 6A. W. Ledford and J. A. Tawn. Diagnostics for dependence within time seriesextremes. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 65(2):521–543, 2003. → pages 6Y. Liu and J. A. Tawn. Volatility Model Selection for Extremes of Financial TimeSeries. Journal of Statistical Planning and Inference, 143:520–530, 2013. →pages 9A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management, 2015a.pg. 133. → pages 1A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management, 2015b.pg. 48. → pages 45T. Mikosch and C. Starica. Limit Theory for the Sample Autocorrelations andExtremes of a GARCH(1, 1) Process. Annals of Statistics, 28:1427–1451,2000. → pages 8K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.→ pages 20G. O. Roberts and J. S. Rosenthal. Harris Recurrence of Metropolis-within-Gibbsand Trans-dimensional Markov Chains. The Annals of Applied Probability, 16:2123–2139, 2006. → pages 2157

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0344013/manifest

Comment

Related Items