"Science, Faculty of"@en . "Statistics, Department of"@en . "DSpace"@en . "UBCV"@en . "Xie, Yijun"@en . "2017-04-20T22:24:44Z"@en . "2017"@en . "Master of Science - MSc"@en . "University of British Columbia"@en . "The Autoregressive Stochastic Volatility (ARSV) model is a discrete-time stochastic volatility model that can model the financial returns time series and volatilities. This model is relevant for risk management. However, existing inference methods have various limitations on model assumptions. In this report we discuss a new inference method that allows flexible model assumption for innovation of the ARSV\r\n model. We also present the application of ARSV model to risk management, and\r\n compare the ARSV model with another commonly used model for financial time\r\n series, namely the GARCH model."@en . "https://circle.library.ubc.ca/rest/handle/2429/61313?expand=metadata"@en . "A Flexible Inference Method for an AutoregressiveStochastic Volatility Model with an Application to RiskManagementbyYijun XieB.Sc., University of Notre Dame du Lac, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Statistics)The University of British Columbia(Vancouver)April 2017\u00C2\u00A9 Yijun Xie, 2017AbstractThe Autoregressive Stochastic Volatility (ARSV) model is a discrete-time stochas-tic volatility model that can model the financial returns time series and volatilities.This model is relevant for risk management. However, existing inference methodshave various limitations on model assumptions. In this report we discuss a new in-ference method that allows flexible model assumption for innovation of the ARSVmodel. We also present the application of ARSV model to risk management, andcompare the ARSV model with another commonly used model for financial timeseries, namely the GARCH model.iiPrefaceThis dissertation is original, unpublished work by the author, Yijun Xie. Thedataset used in this thesis is downloaded from Yahoo Finance (https://finance.yahoo.com/).iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Measures of Tail Dependence . . . . . . . . . . . . . . . . . . . . 52.2 Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 GARCH Process . . . . . . . . . . . . . . . . . . . . . . 72.2.2 ARSV Process . . . . . . . . . . . . . . . . . . . . . . . 83 Inference for ARSV Model . . . . . . . . . . . . . . . . . . . . . . . 153.1 Review of Existing Methods . . . . . . . . . . . . . . . . . . . . 153.2 New Inference Method . . . . . . . . . . . . . . . . . . . . . . . 173.2.1 Proposing Step . . . . . . . . . . . . . . . . . . . . . . . 18iv3.2.2 Classic Metropolis-Hastings Algorithm . . . . . . . . . . 193.2.3 Metropolis-within-Gibbs . . . . . . . . . . . . . . . . . . 203.2.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . 243.2.5 Discussion on the Flexibility . . . . . . . . . . . . . . . . 263.2.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Comparison of Inference Methods . . . . . . . . . . . . . . . . . 273.3.1 Parameter Estimation for Simulated Data . . . . . . . . . 273.3.2 Parameter Estimation for the S&P 500 Index . . . . . . . 324 CoVaR with ARSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1 Value-at-Risk forecasting under the ARSV Model . . . . . . . . . 344.2 CoVaR forecasting: GARCH Model vs ARSV Model . . . . . . . 364.2.1 First Definition of CoVaR . . . . . . . . . . . . . . . . . 364.2.2 Second Definition of CoVaR . . . . . . . . . . . . . . . . 384.3 Simulation Methods to Find CoVaR . . . . . . . . . . . . . . . . 414.4 Comparison of VaR and CoVaR Forecasts . . . . . . . . . . . . . 444.4.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . 444.4.2 Data Example . . . . . . . . . . . . . . . . . . . . . . . . 495 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55vList of TablesTable 3.1 An example of estimating parameters of an ARSV(1) processwith \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5. . . . . . . . . . . . 25Table 3.2 Median of estimated parameters for simulated data with \u00CE\u00B5t \u00E2\u0088\u00BCN(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC N(0,1). . . . . . . . . . . . . . . . . . . . . 28Table 3.3 Median of estimated parameters for simulated data with \u00CE\u00B5t \u00E2\u0088\u00BCN(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5. . . . . . . . . . . . . . . . . 29Table 3.4 Median of estimated parameters for simulated data with \u00CE\u00B5t \u00E2\u0088\u00BCstandardized t5 and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5. . . . . . . . . . . . . 31Table 3.5 First example of incorrect model specification. . . . . . . . . . 31Table 3.6 Second example of incorrect model specification. . . . . . . . 31Table 3.7 Summary statistics of estimated parameters with the model as-sumption that both innovations are normally distributed. . . . . 33Table 3.8 Summary statistics of estimated parameters with the model as-sumption that the first innovation distribution follows a Studentt distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Table 4.1 Violation rates and corresponding p-values of likelihood-ratiotests for VaR\u00CE\u00B1 forecasts at 95% and 99% levels for simulatedData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Table 4.2 Mean piece-wise linear scores and corresponding p-values ofconditional predictive ability tests for VaR\u00CE\u00B1 forecasts at 95%and 99% levels for simulated Data. . . . . . . . . . . . . . . . 47viTable 4.3 Violation rates and corresponding p-values of likelihood-ratiotests for VaR\u00CE\u00B1 forecasts at 95% and 99% levels for daily log-returns of S&P 500 Index. . . . . . . . . . . . . . . . . . . . . 50Table 4.4 Mean piece-wise linear scores and corresponding p-values ofconditional predictive ability tests for VaR\u00CE\u00B1 forecasts at 95%and 99% levels for daily log-returns of S&P 500 Index. . . . . 50viiList of FiguresFigure 1.1 Stylized facts of financial data \u00E2\u0080\u0093 1. . . . . . . . . . . . . . . . 2Figure 1.2 Stylized facts of financial data \u00E2\u0080\u0093 2. . . . . . . . . . . . . . . . 3Figure 2.1 Structure of GARCH process. . . . . . . . . . . . . . . . . . 9Figure 2.2 Structure of ARSV process. . . . . . . . . . . . . . . . . . . 10Figure 2.3 \u00CE\u00B7 plot for simulated processes. . . . . . . . . . . . . . . . . . 12Figure 2.4 \u00CF\u0087/\u00CF\u0087\u00C2\u00AF-plot for simulated processes. . . . . . . . . . . . . . . . 13Figure 2.5 Quantile plots and \u00CF\u0087/\u00CF\u0087\u00C2\u00AF plots of the negative daily log-returnsof S&P 500 Index from 2000-01-03 to 2016-10-26. . . . . . . 14Figure 3.1 Path of \u00CE\u00B20 estimates when only implementing Gibbs sampler. . 19Figure 3.2 Illustration of Markov chain values of parameters after eachiteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.3 Distribution of estimated parameters based on simulated datawith \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5. . . . . . . . . . . 28Figure 3.4 Distribution of estimated parameters based on simulated datawith \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5. . . . . . . . . . . 29Figure 4.1 Estimated 95% and 99% VaR forecasts for the simulated ARSV(1)process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 4.2 Estimated 95% and 99% VaR forecasts for the simulated GARCH(1,1)process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 4.3 Estimated 95% and 99% CoVaR forecasts for the simulatedARSV(1) process. . . . . . . . . . . . . . . . . . . . . . . . 48viiiFigure 4.4 Estimated 95% and 99% CoVaR forecasts for the simulatedGARCH(1,1) process. . . . . . . . . . . . . . . . . . . . . . 49Figure 4.5 Estimated 95% and 99% VaR forecasts for daily log-returns ofS&P 500 Index. . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4.6 Estimated 95% and 99% CoVaR forecasts for daily log-returnsof S&P 500 Index. . . . . . . . . . . . . . . . . . . . . . . . 52ixList of Algorithms1 Flexible Inference for ARSV Model Inference . . . . . . . . . . . 272 Estimation CoVaR Using Simulation under a GARCH(1,1) Process 423 Simulating CoVaR under the ARSV Process . . . . . . . . . . . . 43xAcknowledgmentsFirst and foremost, I would like express my sincere gratitude to my supervisor,Professor Natalia Nolde, for her patience, insightfulness, and immense knowledge.Without her guidance I would never be able to finish this thesis. It is my fortuneand honor to work under her tutelage.Besides my advisor, I would like to thank the Department of Statistics at UBCfor the supportive environment. Thank you to everyone in my office, for the officedinners and all those wonderful time we spent together. Thank you to Ho Yin Hoand Tingting Zhao for your help and great advice when I was writing the thesis andbeing my lunch/dinner buddies. Special thanks to my best friend Mengtian Zhaoand her cat King for being my emotional support every time when I feel depressed,even though you are literally 2000 miles away.My sincere thanks also goes to my second reader Professor Harry Joe, whogenerously spent time reading and critiquing my works.Last but not least, I would like to thank my parents, Zhaoliang Xie and HuapingXie, for their unconditional love to me.xiChapter 1IntroductionModeling financial returns time series data is an essential component in a widerange of problems in finance, including risk management, portfolio optimization,pricing and hedging of financial risks. In order to develop a realistic model for timeseries of financial returns, it is important to first identify key features that such datatend to exhibit. These key features are usually referred to as the stylized facts(McNeil, Frey, and Embrechts, 2015a). The most widely acknowledged amongthese stylized facts are the following:1. The conditional expectation of financial returns is close to zero (see Fig. 1.1(left panel));2. The conditional standard deviation, known as the volatility, varies over timeand with large values having a tendency to cluster (see Fig. 1.1 (left panel));3. The marginal distribution of return series has heavier tail than that of thenormal distribution (See Fig. 1.1 (right panel));4. There is little serial correlation between returns (see Fig. 1.2 (left panel)).However, the squared or absolute values of returns show strong correlation(see Fig. 1.2 (right panel)).As the conditional expectation is usually close to zero, it is the volatility thathas the dominant effect on the dynamics of financial return series. For this reason,the models for financial returns are often called volatility models. Several types1Figure 1.1: The left panel shows the daily log-returns for the S&P 500 Indexfrom January 1, 2000 to October 26, 2016. Large values of log-returnsare clustered. The right panel shows the normal quantile plot of this log-return series, which has heavier tails than that of a normal distribution.of volatility models have been proposed in the literature. Among these models,arguably the most famous and widely used one is the Generalized Autoregres-sive Conditional Heteroskedasticity (GARCH) process (Engle (1982), Bollerslev(1986)). However, in this report, we discuss another type of a volatility modelcalled Autoregressive Stochastic Volatility (ARSV) process. Both of these pro-cesses can capture the above mentioned stylized facts, but they exhibit differentextremal dependence properties for consecutive observations. The returns modeledby GARCH process have the property of tail dependence, while those modeled byARSV process are tail independent. Detailed discussion of tail dependence andindependence as well as these two volatility models is provided in Chapter 2.There is empirical evidence (Drees, Segers, and Warcho\u00C5\u0082, 2015) suggesting thatfor some financial time series, the returns over consecutive periods are likely to betail independent. That is, extreme values at consecutive time points are independentand hence will tend not to co-occur. However, there should be a stronger depen-dence among large but not extremal observations than the classic ARSV model2Figure 1.2: The left and right panel show the autocorrelation function (ACF)for the log-return series and squared log-returns of the same dataset asin Figure 1.1.implies. Janssen and Drees (2016) propose a variation of the classic ARSV model,and they show that this new class of ARSV model which has a special form of theheavy-tailed second innovation, has stronger tail dependence than classic ARSVmodel, while remains tail independent for extremal observations.Inspired by their ideas, in this report we consider another extension of the clas-sic ARSV model, which also has a heavy-tailed second innovation and light-tailedfirst innovation. We conjecture that this model is also tail independent for extremalobservations, but has stronger dependence for sub-extremal observations than theclassic ARSV model. The conjecture is supported via simulations.However, the inference for ARSV models is notoriously difficult. Most of thecurrent inference methods for ARSV models require the assumption that both in-novations have normal distributions. Therefore, they can only be applied to theclassic ARSV models but not to the extended model we want to focus on in thisreport. We propose a new inference method for ARSV models which allows arbi-trary choices of distributions of both innovations. We show that this new inferencemethod works as good as traditional methods for the inference of the classic ARSV3model, while it can accurately estimate parameters when traditional methods fail.This new approach is discussed in Chapter 3.One of the most important applications for volatility models is to measure riskfor risk management purposes. It is also used by financial market regulators toset capital requirements for financial institutes. A correctly specified and flexi-ble volatility model can help both practitioners and regulators to accurately cap-ture features of the market data, and hence to measure risk more precisely. Thisis paramountly important for the stability of the financial system. In Chapter 4we discuss two risk measures: Value-at-Risk (VaR) and Conditional Value-at-Risk(CoVaR). The estimation methods for VaR under the GARCH model are alreadywell studied. In this report, we develop estimation methods for CoVaR under theGARCH model as well as for VaR and CoVaR under the proposed ARSV model.We then compare risk measure estimates under the GARCH and ARSV models.In Chapter 5, we provide discussion of the results and an outlook for futureresearch.4Chapter 2BackgroundIn this chapter, we first discuss the property of tail dependence. We also brieflyintroduce two volatility models, namely the GARCH process and ARSV process.Our project is motivated by the different tail dependence properties between thesetwo models.2.1 Measures of Tail DependenceA question for the practitioners of risk management would be that given a largeloss today, how likely it is to experience a large loss tomorrow or h days into thefuture.To answer this question, we need to understand the idea of tail dependence andtail independence. Different models for financial time series would have differencetail dependence properties, and it is important to choose the model with the taildependence property close to that of the real data.Consider two random variables Y and Z with continuous distribution functionsFY and FZ , respectively. The upper tail dependent coefficient between Y and Z isdefined as (Joe, 1997)\u00CF\u0087 = limu\u00E2\u0086\u00921P(Z > F\u00E2\u0088\u00921Z (u)|Y > F\u00E2\u0088\u00921Y (u))= limu\u00E2\u0086\u00921P(Z > F\u00E2\u0088\u00921Z (u),Y > F\u00E2\u0088\u00921Y (u))P(Y > F\u00E2\u0088\u00921Y (u)), (2.1)5provided the limit exits. It is the limit of the ratio between the probability of theoccurrence of jointly extremal observations of Y and Z and the probability of theoccurrence of an extreme observation in one of the variables. The lower tail depen-dence coefficient can be defined similarly.If \u00CF\u0087 = 0, then Y and Z are said to be tail independent. On the other hand, when0 < \u00CF\u0087 \u00E2\u0089\u00A4 1, Y and Z are called (upper) tail dependent. The value of \u00CF\u0087 measures howstrong the tail dependence is.However, \u00CF\u0087 only provides us information about how strong the tail dependenceis when it is greater than zero. When \u00CF\u0087 = 0, we cannot tell how fast \u00CF\u0087 approacheszero as u\u00E2\u0086\u0092 1. We need a more refined measure to tell us the speed of \u00CF\u0087 approach-ing zero.First, we call a function L on (0,\u00E2\u0088\u009E) slowly varying at \u00E2\u0088\u009E iflimx\u00E2\u0086\u0092\u00E2\u0088\u009EL(tx)L(x)= 1, t > 0,while a function h on (0,\u00E2\u0088\u009E) is called regularly varying at \u00E2\u0088\u009E with index p iflimx\u00E2\u0086\u0092\u00E2\u0088\u009Eh(tx)h(x)= t p, t > 0.Let us define p(t) = P(Z > F\u00E2\u0088\u00921Z (1\u00E2\u0088\u0092 1/t),Y > F\u00E2\u0088\u00921Y (1\u00E2\u0088\u0092 1/t)), and supposep(t) is regularly varying with index \u00E2\u0088\u00921/\u00CE\u00B7 for some \u00CE\u00B7 \u00E2\u0088\u0088 (0,1]. Then \u00CE\u00B7 is calledthe residual tail dependence coefficient (Ledford and Tawn, 1996).Ledford and Tawn (1996) show that if \u00CE\u00B7 < 1, then \u00CF\u0087 = 0 and hence Y and Z aretail independent, and \u00CE\u00B7 measures the speed of convergence to tail independence.We can further define (Coles, Heffernan, and Tawn, 1999)\u00CF\u0087\u00C2\u00AF = 2\u00CE\u00B7\u00E2\u0088\u00921,where \u00E2\u0088\u00921 < \u00CF\u0087\u00C2\u00AF \u00E2\u0089\u00A4 1. When 1/2 < \u00CE\u00B7 \u00E2\u0089\u00A4 1 or 0 < \u00CF\u0087\u00C2\u00AF \u00E2\u0089\u00A4 1, Y and Z are non-negativelydependent, and when \u00CE\u00B7 = 1/2 or \u00CF\u0087\u00C2\u00AF = 0, Y and Z are exactly tail independent(Coles, Heffernan, and Tawn, 1999; Ledford and Tawn, 2003).Now we have the pair (\u00CF\u0087, \u00CF\u0087\u00C2\u00AF) to help us describe the extremal dependence.When \u00CF\u0087 > 0 and \u00CF\u0087\u00C2\u00AF = 1 the two random variables are tail dependent, and the value6of \u00CF\u0087 measures the strength of the dependence of this pair. If \u00CF\u0087 = 0 and\u00E2\u0088\u00921< \u00CF\u0087\u00C2\u00AF < 1,the two random variables are tail independent, and the value of \u00CF\u0087\u00C2\u00AF measures thestrength of dependence of this pair.Understanding the type of tail dependence structure is very important for choos-ing good models. Let {Xt} denote a process of financial log-returns with losses inthe upper tail. As discussed in Chapter 1, financial data often suggests tail inde-pendence between consecutive returns. In other words, although high volatilitiestend to be persistent, extremal levels of returns should be tail independent (Lauriniand Tawn, 2008) so that P(Xt > F\u00E2\u0088\u00921Xt (u)|Xt\u00E2\u0088\u00921 > F\u00E2\u0088\u00921Xt\u00E2\u0088\u00921(u))\u00E2\u0086\u0092 0 as u\u00E2\u0086\u0092 1. A tail de-pendent process may lead to an overestimation of the potential loss. On the otherhand, Janssen and Drees (2016) suggest that a good model for returns should be tailindependent, while retaining stronger tail dependence at sub-extremal levels thanthe exactly tail independent case. In other words, we are looking for a stochasticvolatility process such that that for the pair (Xt\u00E2\u0088\u00921,Xt), \u00CF\u0087 = 0 and 0 < \u00CF\u0087\u00C2\u00AF < 1.2.2 Volatility Models2.2.1 GARCH ProcessOne of the most commonly discussed volatility models is the generalized autore-gressive conditionally heteroscedastic (GARCH) model. GARCH family was firstproposed by Engle (1982) and Bollerslev (1986). A GARCH(p,q) process for{Xt , t = 1,2, ...,T} is defined as follows:Xt = \u00CF\u0083t\u00CE\u00B5t\u00CF\u00832t = \u00CE\u00B10+p\u00E2\u0088\u0091i=1\u00CE\u00B1i\u00CF\u00832t\u00E2\u0088\u0092i+q\u00E2\u0088\u0091j=1\u00CE\u00B2 jX2t\u00E2\u0088\u0092 jwhere {\u00CE\u00B5t} is a strict white noise process with zero mean and unit variance, \u00CE\u00B10 > 0,\u00CE\u00B1i \u00E2\u0089\u00A5 0, i = 1,2, ..., p, and \u00CE\u00B2 j \u00E2\u0089\u00A5 0, j = 1,2, ...q. In order to achieve stationarity, wealso assume \u00E2\u0088\u0091pi=1\u00CE\u00B1i+\u00E2\u0088\u0091qj=1\u00CE\u00B2 j < 1. In financial statistics, Xt is usually the return onthe financial asset observed at time t and \u00CF\u0083t is the conditional standard deviation,or the volatility, at time t, which generally cannot be directly observed.LetFt denote the sigma algebra of all available information up to time t. Then,7for the GARCH model, volatility \u00CF\u0083t isFt\u00E2\u0088\u00921 measurable. In other words, given theinformation up to t\u00E2\u0088\u00921, \u00CF\u0083t is known.The empirical evidence suggests that volatilities are clustered: when high volatil-ity occurs at time t-1, it would be more likely to have large volatility again at timet. For a GARCH(p,q) model, the conditional squared volatility is based on a linearcombination of previous squared returns and squared volatilities, so a large abso-lute value in Xt\u00E2\u0088\u0092h,h = 1,2, ...q or a large value of \u00CF\u0083t\u00E2\u0088\u0092h,h = 1,2, ...p lead to a largevalue of \u00CF\u0083t . Therefore, we can observe the clustered behavior of large volatilityfrom a GARCH model. For simplicity, we only consider the GARCH(1,1) pro-cess, which in practice is believed to be sufficient to model the volatility processmost of the time.Mikosch and Starica (2000) and Basrak, Davis, and Mikosch (2002) show thatfor GARCH models,limx\u00E2\u0086\u0092\u00E2\u0088\u009EP(Xh > x|X0 > x)> 0,i.e., \u00CE\u00B7 = 1 or (Xh,X0) are tail dependent. Laurini and Tawn (2008) suggest declus-tering returns over high threshold to remove the tail dependence. However, we canalso look for an alternative model such that the series of {Xt} is tail independentwhile volatility clustering is preserved.2.2.2 ARSV ProcessAn alternative to the GARCH model is the autoregressive stochastic volatilitymodel (ARSV). This model is first studied by Harvey, Ruiz, and Shephard (1994)and Jacquier, Polson, and Rossi (1994), among many others. We focus on thesimplest ARSV(1) model, which is defined asXt = \u00CF\u0083t\u00CE\u00B5tlog(\u00CF\u00832t ) = \u00CE\u00B20+\u00CE\u00B21 log(\u00CF\u00832t\u00E2\u0088\u00921)+\u00CE\u00B4\u00CE\u00B7twhere {\u00CE\u00B5t} and {\u00CE\u00B7t} are two independent strict white noise processes with zeromean and unit variance, and \u00CE\u00B4 > 0 is a constant to adjust the standard deviation ofthe second innovation. \u00CE\u00B20 and \u00CE\u00B21 are coefficients for the log-volatility process. ForARSV(1) model, 0 < \u00CE\u00B21 < 1 is required for the process to be stationary.8The difference between a GARCH model and an ARSV model lies in thevolatility process. For a GARCH model, \u00CF\u0083t is Ft\u00E2\u0088\u00921 measurable as mentionedabove. However, the volatility process of an ARSV model contains a second inno-vation term \u00CE\u00B7t . Thus, after conditioning on all information up to t\u00E2\u0088\u00921, \u00CF\u0083t is still arandom variable.The tail dependence properties of ARSV(1) model are studied by Breidt andDavis (1998) and Hill (2011). They show that for either normally distributed orheavy-tailed \u00CE\u00B5t , the extremes of Xt are independent. Liu and Tawn (2013) suggestthe difference comes from the source of clustering. For the GARCH model, thecomponents of previous volatilities have negligible effect on the current volatilitywhen \u00CE\u00B1i\u00E2\u0080\u0099s (i\u00E2\u0089\u00A5 1) are small. However, the return process and volatility process areinterconnected. Therefore, extremal return observed at time t\u00E2\u0088\u00921 will lead to largevolatility value at time t, and hence a large probability of observing an extremalreturn at time t. This is illustrated below.Xt\u00E2\u0088\u00921\u00CF\u0083t\u00E2\u0088\u00921Xt\u00CF\u0083tFigure 2.1: Structure of GARCH process.However, for the ARSV model, the process of the log-volatility is independentfrom the observed values of return process, while the return is the realization ofcurrent volatility times a noise term. Therefore, Xt\u00E2\u0088\u00921 and \u00CF\u0083t are independent given\u00CF\u0083t\u00E2\u0088\u00921, and a large volatility value at time t\u00E2\u0088\u0092 1 does not necessarily lead to a largereturn value at time t.The tail behaviors of GARCH process and ARSV process can also be illus-trated with the residual tail dependence coefficient \u00CE\u00B7 , or measures of tail depen-9Xt\u00E2\u0088\u00921\u00CF\u0083t\u00E2\u0088\u00921Xt\u00CF\u0083tFigure 2.2: Structure of ARSV process.dence \u00CF\u0087 and \u00CF\u0087\u00C2\u00AF discussed in Section 2.1. In Figure 2.3 we present the estimate ofresidual tail dependence coefficient, \u00CE\u00B7 , for the GARCH(1,1) process with \u00CE\u00B10 =1\u00C3\u009710\u00E2\u0088\u00926, \u00CE\u00B11 = 0.8, and \u00CE\u00B21 = 0.1; classic ARSV(1) process with \u00CE\u00B20 =\u00E2\u0088\u00920.5, \u00CE\u00B21 =0.95, and \u00CE\u00B4 = 0.35; and an extension of the classic ARSV(1) process with \u00CE\u00B20 =\u00E2\u0088\u00920.5, \u00CE\u00B21 = 0.95, \u00CE\u00B4 = 0.35, and a standardized second innovation \u00CE\u00B7t which is Stu-dent\u00E2\u0080\u0099s t distributed with 5 degress of freedom. The corresponding values of the(\u00CF\u0087, \u00CF\u0087\u00C2\u00AF) pairs 1 are presented in Figure 2.4. Janssen and Drees (2016) suggest that anARSV(1) process with heavy-tailed second innovation distribution has a strongertail dependence at sub-extremal levels than classic ARSV model while remains tailindependent. Figure 2.3 and Figure 2.4 both support this suggestion. As we can seefrom these two plots, when the quantile approaches 1 we have evidence to supportthe claim \u00CE\u00B7 = 1 (i.e. tail dependent), \u00CE\u00B7 = 0.5 (i.e. exactly tail independent), and0.5 < \u00CE\u00B7 < 1 (i.e. tail independent but not exactly tail independent) respectively forthe three scenarios. Similarly in Figure 2.4, for the GARCH(1,1) process, whenthe quantile approaches 1 the empirical value of \u00CF\u0087 is significantly larger than 0,which implies that (Xt\u00E2\u0088\u00921, Xt) are tail dependent. For the classic ARSV(1) process,both empirical values of \u00CF\u0087 and \u00CF\u0087\u00C2\u00AF are not significantly larger than 0, which impliesthat (Xt\u00E2\u0088\u00921, Xt) are exactly tail independent. For the extended ARSV(1) processes,the empirical value of \u00CF\u0087 is not significantly larger than 0 but the empirical value1Here the empirical values of \u00CF\u0087 and \u00CF\u0087\u00C2\u00AF are estimated using the evd package, which calculatesboth values using approximation method.10of \u00CF\u0087\u00C2\u00AF is significantly larger than 0, which implies that (Xt\u00E2\u0088\u00921, Xt) have stronger taildependence than product of margins.We further compare the simulated data with an actual financial time series. Weestimate volatilities of the S&P 500 Index from 2000-01-03 to 2016-10-26 withsome unbiased estimators, and then fit the log squared volatilities to an AR(1)process. The top left panel of Figure 2.5 illustrates the quantile plot of residuals ofthis AR(1) process against a normal distribution, and the top right panel illustratesthe quantile plot against a Student\u00E2\u0080\u0099s t distribution with 3 degrees of freedom. Thesetwo plots hint that a heavy-tailed second innovation \u00CE\u00B7t might be a more suitablechoice. Also, we present the \u00CF\u0087/\u00CF\u0087\u00C2\u00AF plots of the negative returns of this dataset.Comparing the bottom two panels of Figure 2.5 with all panels in Figure 2.4, wecan also conjecture that the \u00CF\u0087/\u00CF\u0087\u00C2\u00AF plots from extended ARSV(1) process with thesecond innovation Student\u00E2\u0080\u0099s t distributed are the closest ones to those from the realdata. Thus we might prefer to model the financial return series using the extendedARSV(1) model.11Figure 2.3: Plot of \u00CE\u00B7 for a simulated GARCH(1,1) process (top panel), asimulated classic ARSV(1) process (middle panel), and a simulated ex-tended ARSV(1) process with first innovation distribution normal andsecond innovation distribution Student\u00E2\u0080\u0099s t (bottom panel).12Figure 2.4: \u00CF\u0087/\u00CF\u0087\u00C2\u00AF-plot for a simulated GARCH(1,1) process (top panels), asimulated classic ARSV(1) process (middle panels), and a simulatedextended ARSV(1) process with first innovation distribution normal andsecond innovation distribution Student\u00E2\u0080\u0099s t (bottom panels).13Figure 2.5: Quantile plots and \u00CF\u0087/\u00CF\u0087\u00C2\u00AF plots of the negative daily log-returns ofS&P 500 Index from 2000-01-03 to 2016-10-26.14Chapter 3Inference for AutoregressiveStochastic Volatility ModelMany efforts have been made to develop inference methods for the traditionalARSV model with both innovation distributions as normal. In this section, wefirst briefly review the methods that inspire our new inference methods.However, these existing methods cannot be used to estimate parameters forthe extension of the ARSV model discussed at the end of last section. Therefore,we propose a new approach that allows flexible choices of both innovation distri-butions of the ARSV model. This new method can work as well as the existingmethods for the traditional ARSV model, and can provide good parameter esti-mates when existing methods fail. We describe the details of the new method inthis section. We also compare the results from this new method with those fromthe existing inference method.3.1 Review of Existing MethodsIn this section we review inference methods implementing the full Bayesian ap-proach with pre-specified prior distributions. Jacquier, Polson, and Rossi (1994)propose a cyclic MCMC approach for the classic ARSV(1) model with both inno-15vations normally distributed. Consider the modelXt = \u00CF\u0083t\u00CE\u00B5tlog\u00CF\u00832t = \u00CE\u00B20+\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B4\u00CE\u00B7t(\u00CE\u00B5t ,\u00CE\u00B7t)\u00E2\u0088\u00BC N(0, I2),where I2 is a two-dimensional identity matrix. Let \u00CF\u0089 denote the set of parameters(\u00CE\u00B20,\u00CE\u00B21,\u00CE\u00B4 ). An inverse gamma distribution p(\u00CE\u00B4 )\u00E2\u0088\u009D exp(\u00E2\u0088\u0092\u00CE\u00BD0s20/2\u00CE\u00B4 2)/\u00CE\u00B4 \u00CE\u00BD0+1 is cho-sen as the prior distribution for \u00CE\u00B4 , where (\u00CE\u00BD0,s0) are two hyperparameters. Theprior distributions of \u00CE\u00B20 \u00E2\u0088\u00BCN(0,100) and \u00CE\u00B21 \u00E2\u0088\u00BCN(0,10) are independent and essen-tially flat. The algorithm proposed by Jacquier, Polson, and Rossi (1994) includestwo stages:1. Sample parameters \u00CF\u0089: p(\u00CF\u0089| log\u00CF\u00832t ) is the posterior from a linear regression,and therefore a direct draws can be made.2. Sample \u00CF\u00832t fromp(\u00CF\u00832t |\u00CF\u00832t\u00E2\u0088\u00921,\u00CF\u00832t+1,\u00CF\u0089,xt) \u00E2\u0088\u009D fX(xt |\u00CF\u00832t ) fa(\u00CF\u00832t |\u00CF\u00832t\u00E2\u0088\u00921) fb(\u00CF\u00832t+1|\u00CF\u00832t )\u00E2\u0088\u009D1\u00CF\u0083texp(\u00E2\u0088\u0092x2t2\u00CF\u00832t)\u00C3\u0097 1\u00CF\u00832texp(\u00E2\u0088\u0092(log\u00CF\u00832t \u00E2\u0088\u0092\u00C2\u00B5t)22\u00CE\u00B4 2),where \u00C2\u00B5t =(\u00CE\u00B20(1\u00E2\u0088\u0092\u00CE\u00B21)+\u00CE\u00B21(log\u00CF\u0083t+1+log\u00CF\u00832t\u00E2\u0088\u00921))/(1+\u00CE\u00B2 21 ), and \u00CE\u00B4 2 = u2\u00CE\u00B7/(1+\u00CE\u00B2 21 ); fX is the probability density function of Xt conditioned on \u00CF\u00832t ; fa andfb are probability density functions of \u00CF\u00832t conditioned on \u00CF\u00832t\u00E2\u0088\u00921 and \u00CF\u00832t\u00E2\u0088\u00921 con-ditioned on \u00CF\u00832t respectively; pi is the posterior density function of \u00CF\u0083t .Jacquier, Polson, and Rossi (2004) extend the method above to deal with thecase where the first innovation \u00CE\u00B5t follows a Student t distribution with \u00CE\u00BD degrees offreedom. They treat the heavy-tailed \u00CE\u00B5t as a scale mixture of the inverse gammadistribution and normal distribution. This allows us to write Xt as Xt = \u00CF\u0083t\u00E2\u0088\u009A\u00CE\u00BBtZt ,where Zt follows the standard normal distribution and \u00CE\u00BBt follows an inverse gammadistribution IG(\u00CE\u00BD/2,2/\u00CE\u00BD), or \u00CE\u00BD/\u00CE\u00BBt \u00E2\u0088\u00BC \u00CF\u00872\u00CE\u00BD . Let us denote X\u00E2\u0088\u0097t = Xt/\u00E2\u0088\u009A\u00CE\u00BBt . Then wecan replace {xt , t = 1,2, ...T} with {x\u00E2\u0088\u0097t , t = 1,2, ...T}, and sample \u00CF\u00832 and \u00CF\u0089 inthe similar way as described above. Jacquier, Polson, and Rossi (2004) choose a16uniform discrete prior on [3,40] for \u00CE\u00BD , and prior distributions of other parametersare the same as above. The algorithm includes three stages:1. Sample from the posterior distribution p(\u00CF\u00832,\u00CF\u0089|\u00CE\u00BB ,\u00CE\u00BD ,x\u00E2\u0088\u0097) using the algorithmin Jacquier, Polson, and Rossi (1994).2. Sample \u00CE\u00BB from the posterior distributionp(\u00CE\u00BBt |xt ,\u00CF\u00832t ,\u00CE\u00BD)\u00E2\u0088\u00BC IG(\u00CE\u00BD+12,2(x\u00E2\u0088\u0097t 2/ log\u00CF\u00832t )+\u00CE\u00BD).3. Sample \u00CE\u00BD from the posterior distributionp(\u00CE\u00BD |\u00CF\u00832,x,\u00CF\u0089) = p(\u00CE\u00BD)T\u00E2\u0088\u008Ft=1\u00CE\u00BD\u00CE\u00BD/2\u00CE\u0093(\u00CE\u00BD+1/2)\u00CE\u0093(\u00CE\u00BD/2)\u00CE\u0093(1/2)(\u00CE\u00BD+ x2t /\u00CF\u00832t )\u00E2\u0088\u0092(\u00CE\u00BD+1)/2,where p(\u00CE\u00BD) is the prior distribution of \u00CE\u00BD .Kastner and Fru\u00C2\u00A8hwirth-Schnatter (2014) propose another method based on logX2tinstead of Xt . By taking square and then logarithm on Xt , we havelogX2t = log\u00CF\u00832t + log\u00CE\u00B52t ,where log\u00CE\u00B52t can be approximated by a mixture of normal distributions. Theyfurther implement the ancillary-sufficiency interweaving strategy to sample in the[logX2, log\u00CF\u00832,\u00CF\u0089] space. This method is implemented in the stochvol package.Broto and Ruiz (2004) provide a more comprehensive and detailed review onother approaches for the classic ARSV model.3.2 New Inference MethodThe methods discussed in the preceding section assume that the second innovationdistribution in the ARSV model is normal, and the first innovation distribution canonly be normal or Student t. However, when the second innovation in the ARSVmodel is non-Gaussian, it would be very difficult to derive the corresponding poste-rior distribution, and hence very difficult to sample from the posterior distribution.In order to make model inference for the extension of the classic ARSV model we17discussed earlier, we propose a new inference method which does not require pre-sepcified prior distributions, and hence does not rely on the normality of innovationdistributions.For the rest of the report, we consider the modelXt = \u00CF\u0083t\u00CE\u00B5tlog\u00CF\u00832t = \u00CE\u00B20+\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B4\u00CE\u00B7t ,where \u00CE\u00B5t follows a standard normal distribution and \u00CE\u00B7t follows a standardized Stu-dent t distribution with zero mean, unit variance, and \u00CE\u00BD\u00CE\u00B7 degrees of freedom.To start, we first need to make an initial guess of the volatilities and parameters.Popular model-free methods for volatilities include a simple moving average ofsquared returns, or the Exponentially Weighted Moving Average (EWMA) amongmany others. Let us define ht := log\u00CF\u00832t , t = 1,2, ...T . With estimated volatili-ties {\u00CF\u0083 (0)0 , ...,\u00CF\u0083 (0)T }, we can calculate ht , t = 1, ...,T and have an initial guess ofparameters \u00CE\u00B8 (1) by fitting an AR process as described in Section 3.2.4. In this re-port, we choose the 5-day moving average of squared returns as our initial guess ofvolatility.Assume that after the (i\u00E2\u0088\u0092 1)th iteration, we have sampled the sequence ofh(i\u00E2\u0088\u00921)t , t = 1, ...,T and have updated parameters \u00CE\u00B8 (i) = [\u00CE\u00B2(i)0 ,\u00CE\u00B2(i)1 ,\u00CE\u00B4(i),\u00CE\u00BD(i)\u00CE\u00B7 ] in theway discussed in Section 3.2.4. Then in the ith iteration, we sample {h(i)t , t =2, ...,T \u00E2\u0088\u00921} in the way described in the following sections.3.2.1 Proposing StepThe sequence {h(i)t , t = 2, ...,T \u00E2\u0088\u00921} is sampled sequentially. Suppose that after the(t\u00E2\u0088\u00921)th step in the ith iteration, we have already sampled h(i)t\u00E2\u0088\u00921. Since the volatilityprocess is independent from the observed returns, the distribution of ht will onlydepend on the value of ht\u00E2\u0088\u00921 and the parameters \u00CE\u00B8 = [\u00CE\u00B20,\u00CE\u00B21,\u00CE\u00B4 ,\u00CE\u00BD\u00CE\u00B7 ]. Therefore, byplugging in the estimated parameters \u00CE\u00B8 (i) = [\u00CE\u00B2 (i)0 ,\u00CE\u00B2(i)1 ,\u00CE\u00B4(i),\u00CE\u00BD(i)\u00CE\u00B7 ] and the sampledh(i)t\u00E2\u0088\u00921 from the previous step, we can sample h\u00E2\u0080\u00B2t from the Student t distribution withmean \u00CE\u00B2 (i)0 + \u00CE\u00B2(i)1 h(i)t\u00E2\u0088\u00921, standard deviation \u00CE\u00B4(i), and degrees of freedom \u00CE\u00BD(i)\u00CE\u00B7 . Wecan continue this sequence by sampling h(i)t+1 based on h(i)t and \u00CE\u00B8 (i) for all t =2,3, ...T \u00E2\u0088\u00921.18This step is similar to the commonly used technique in MCMC called Gibbssampler. However, Gibbs sampler is applied more often in the scenario when weare only dealing with a smaller sampling space. Our case is different from theregular setting as we want to sample a sequence of T \u00E2\u0088\u00922 random variables, wherethe T can be as large as 1000 to 2000. Also, ht only depends on ht\u00E2\u0088\u00921 and thisprocess can be analogous to a random walk process. Therefore, the classic Gibbssampling method would be very inefficient in this case, as it would take a very longtime to explore all regions with high probability. In Figure 3.1 we show the valuesof the estimated parameter \u00CE\u00B2\u00CB\u00860 in each iteration when we only use the classic Gibbssampler with the parameters estimation method discussed in Section 3.2.4. Thetrue value is \u00E2\u0088\u00920.5. However, we can see that the estimated values do not approachthe true value even after nearly 2000 iterations.Figure 3.1: Path of \u00CE\u00B20 estimates when only implementing Gibbs sampler. Thered line represents the true value.3.2.2 Classic Metropolis-Hastings AlgorithmTo overcome the problem of slow convergence, we need to regulate the acceptancerate of each new sample. Instead of accepting all newly proposed samples, we can19adopt the Metropolis-Hastings\u00E2\u0080\u0099 algorithm to adjust our acceptance rate. For therest of this section, we consider sampling a generic sequence x(i) from distributionpi , where generally the functional form of pi is known except for a normalizingconstant. The classic Metropolis-Hastings\u00E2\u0080\u0099 algorithm contains the following threesteps ((Murphy, 2012)):First, in the ith iteration, we sample a new sample x\u00E2\u0080\u00B2 from a proposal distributionq(x(i)|x(i\u00E2\u0088\u00921),D), where x(i\u00E2\u0088\u00921) is the value we obtained in the previous iteration, andD is the set of parameters.Second, we calculate the acceptance rate, r, which is found byr = min(1,\u00CE\u00B1),with \u00CE\u00B1 =q(x(i\u00E2\u0088\u00921)|x\u00E2\u0080\u00B2)pi(x\u00E2\u0080\u00B2)q(x\u00E2\u0080\u00B2|x(i\u00E2\u0088\u00921))pi(x(i\u00E2\u0088\u00921)) , (3.1)where pi(\u00C2\u00B7) is known up to a normalizing constant.The idea behind the acceptance rate is that in order to reveal the true distri-bution of x, we want our algorithm to be able to explore the whole space withoutstucking at one point, and to visit the regions with higher probability more often.The q(x(i\u00E2\u0088\u00921)|x\u00E2\u0080\u00B2)q(x\u00E2\u0080\u00B2|x(i\u00E2\u0088\u00921)) part ensures that the possibility of the sampler to revisit the previ-ous point is reserved. At the same time, the pi(x\u00E2\u0080\u00B2)pi(x(i\u00E2\u0088\u00921)) part ensures that regions withhigher probability would be more likely to be visited.The final step is to reject or accept the proposal we make in the first step. Weaccept this newly sampled proposal with the probability equals r. This is oftendone by generating a uniformly distributed random variable u between 0 and 1.Thenx(i) =\u00EF\u00A3\u00B1\u00EF\u00A3\u00B2\u00EF\u00A3\u00B3x\u00E2\u0080\u00B2, u < rx(i\u00E2\u0088\u00921), u\u00E2\u0089\u00A5 r3.2.3 Metropolis-within-GibbsHowever, the classic Metropolis-Hastings algorithm cannot be directly applied toour problem. We need to know the full joint probability density function of the20newly sampled h\u00E2\u0080\u00B2t defined in Section 3.2.1 and that of h(i\u00E2\u0088\u00921)t from the previousiteration, which could be very difficult to find. Instead, we could implement analgorithm with the similar idea as the Metropolis-within-Gibbs algorithm (Robertsand Rosenthal, 2006). The details of our algorithm are described as below.In the first step, when t \u00E2\u0089\u00A5 2, we propose the new h\u00E2\u0080\u00B2t conditioned on h(i)t\u00E2\u0088\u00921 ratherthan h(i\u00E2\u0088\u00921)t 1. As discussed in Section 3.2.1, the distribution of h(i)t is only deter-mined by h(i)t\u00E2\u0088\u00921 and \u00CE\u00B8(i). Therefore, our proposal distribution would be a Studentt distribution with mean \u00CE\u00B2 (i)0 + \u00CE\u00B2(i)1 h(i)t\u00E2\u0088\u00921, standard deviation \u00CE\u00B4(i), and degrees offreedom \u00CE\u00BD(i)\u00CE\u00B7 . We denote this distribution as G(i)t .In the second step, since h\u00E2\u0080\u00B2t and h(i\u00E2\u0088\u00921)t are independent fixing the parametervector \u00CE\u00B8 (i), the first part of the acceptance probability isq(h(i\u00E2\u0088\u00921)t |h\u00E2\u0080\u00B2t ,\u00CE\u00B8 (i))q(h\u00E2\u0080\u00B2t |h(i\u00E2\u0088\u00921)t ,\u00CE\u00B8 (i))=G(i)t (h(i\u00E2\u0088\u00921)t )G(i)t (h\u00E2\u0080\u00B2t).The full joint density function of ht\u00E2\u0080\u0099s cannot be easily derived. However, theMarkovian structure of the process of {ht} suggests that we only need to focus onthe joint density functions in the Markov blankets of h\u00E2\u0080\u00B2t and h(i\u00E2\u0088\u00921)t . A Markov blan-ket is the smallest set of data that can grant ht conditional independence from allother variables. The structure of ARSV process suggests that the Markov blanketof ht includes {ht\u00E2\u0088\u00921,ht+1,xt}. That is, conditioned on the observed return at timet, xt , the log-volatility sampled from the previous step, ht\u00E2\u0088\u00921, and the log-volatilityof the next step, ht+1, the ht is independent from all other observed return values orvolatility values. In our case, the Markov blanket of h\u00E2\u0080\u00B2t includes {h(i)t\u00E2\u0088\u00921,h(i\u00E2\u0088\u00921)t+1 ,xt}.These are the log-volatility sampled in the previous step of this iteration, the log-volatility of the next step from the previous iteration, and the known return value.1Note that in the Gibb\u00E2\u0080\u0099s sampling scheme, a new value is drawn by conditioning on the valuefrom the previous iteration. However, here we condition on the value from the previous time step inthe same iteration.21Therefore,pi(h\u00E2\u0080\u00B2t) = fht (h\u00E2\u0080\u00B2t |xt ,h(i)t\u00E2\u0088\u00921,h(i\u00E2\u0088\u00921)t+1 ,\u00CE\u00B8 (i))\u00E2\u0088\u009D f1(h\u00E2\u0080\u00B2t ,xt ,h(i)t\u00E2\u0088\u00921,h(i\u00E2\u0088\u00921)t+1 ,\u00CE\u00B8(i))= fX(xt |h(i)t\u00E2\u0088\u00921,h\u00E2\u0080\u00B2t ,h(i\u00E2\u0088\u00921)t+1 ,\u00CE\u00B8(i)) f2(h(i)t\u00E2\u0088\u00921,h\u00E2\u0080\u00B2t ,h(i\u00E2\u0088\u00921)t+1 ,\u00CE\u00B8(i))= fX(xt |h\u00E2\u0080\u00B2t) fht+1(h(i\u00E2\u0088\u00921)t+1 |h\u00E2\u0080\u00B2t ,h(i)t\u00E2\u0088\u00921,\u00CE\u00B8(i)) f3(h\u00E2\u0080\u00B2t ,h(i)t\u00E2\u0088\u00921,\u00CE\u00B8(i))= fX(xt |h\u00E2\u0080\u00B2t) fht+1(h(i\u00E2\u0088\u00921)t+1 |h\u00E2\u0080\u00B2t ,\u00CE\u00B8(i)) fht (h\u00E2\u0080\u00B2t |h(i)t\u00E2\u0088\u00921,\u00CE\u00B8 (i)) fht\u00E2\u0088\u00921(h(i)t\u00E2\u0088\u00921|\u00CE\u00B8 (i)) fp(\u00CE\u00B8 (i))where fX is the conditional probability density function of returns 2; fht and fht+1are the conditional probability density functions of ht and ht+1 respectively; fht\u00E2\u0088\u00921and fp are the marginal probability density functions of ht\u00E2\u0088\u00921 and \u00CE\u00B8 respectively;f1, f2, and f3 are joint probability density functions of {ht ,Xt ,ht\u00E2\u0088\u00921,ht+1,\u00CE\u00B8},{ht ,ht\u00E2\u0088\u00921,ht+1,\u00CE\u00B8}, and {ht\u00E2\u0088\u00921,ht ,\u00CE\u00B8} respectively.Similarly,pi(h(i\u00E2\u0088\u00921)t ) \u00E2\u0088\u009DfX(xt |h(i\u00E2\u0088\u00921)t ) fht+1(h(i\u00E2\u0088\u00921)t+1 |h(i\u00E2\u0088\u00921)t ,\u00CE\u00B8 (i)) fht (h(i\u00E2\u0088\u00921)t |h(i)t\u00E2\u0088\u00921,\u00CE\u00B8 (i)) fht (h(i)t\u00E2\u0088\u00921|\u00CE\u00B8 (i)) fp(\u00CE\u00B8 (i))We know that fht (h\u00E2\u0080\u00B2t |h(i)t\u00E2\u0088\u00921,\u00CE\u00B8 (i)) = G(i)t (h\u00E2\u0080\u00B2t), and we can assume that2Note that in this example we are considering a simple case where \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1), therefore thedistribution of Xt only depends on the value of \u00CF\u0083t . If we have further assumptions about \u00CE\u00B5t thenfX (xt |h\u00E2\u0080\u00B2t) should be replaced with fX (xt |h\u00E2\u0080\u00B2t ,\u00CE\u00B8 (i)).22fht (h(i\u00E2\u0088\u00921)t |h(i)t\u00E2\u0088\u00921,\u00CE\u00B8 (i))\u00E2\u0089\u0088 G(i)t (h(i\u00E2\u0088\u00921)t ). Then, \u00CE\u00B1 in (3.2) can be written as\u00CE\u00B1 =q(h(i\u00E2\u0088\u00921)t |h\u00E2\u0080\u00B2t)q(h\u00E2\u0080\u00B2t |h(i\u00E2\u0088\u00921)t )\u00C3\u0097 pi(h\u00E2\u0080\u00B2t)pi(h(i\u00E2\u0088\u00921)t )=G(i)t (h(i\u00E2\u0088\u00921)t )G(i)t (h\u00E2\u0080\u00B2t)\u00C3\u0097fX(xt |h\u00E2\u0080\u00B2t) fht+1(h(i\u00E2\u0088\u00921)t+1 |h\u00E2\u0080\u00B2t ,\u00CE\u00B8(i)) fht (h\u00E2\u0080\u00B2t |h(i)t\u00E2\u0088\u00921,\u00CE\u00B8 (i)) fht\u00E2\u0088\u00921(h(i)t\u00E2\u0088\u00921|\u00CE\u00B8 (i)) fp(\u00CE\u00B8 (i))fX(xt |h(i\u00E2\u0088\u00921)t ) fht+1(h(i\u00E2\u0088\u00921)t+1 |h(i\u00E2\u0088\u00921)t ,\u00CE\u00B8 (i)) fht (h(i\u00E2\u0088\u00921)t |h(i)t\u00E2\u0088\u00921,\u00CE\u00B8 (i)) fht\u00E2\u0088\u00921(h(i)t\u00E2\u0088\u00921|\u00CE\u00B8 (i)) fp(\u00CE\u00B8 (i))=fX(xt |h\u00E2\u0080\u00B2t) fht+1(h(i\u00E2\u0088\u00921)t+1 |h\u00E2\u0080\u00B2t ,\u00CE\u00B8(i)) fht\u00E2\u0088\u00921(h(i)t\u00E2\u0088\u00921|\u00CE\u00B8 (i)) fp(\u00CE\u00B8 (i))fX(xt |h(i\u00E2\u0088\u00921)t ) fht+1(h(i\u00E2\u0088\u00921)t+1 |h(i\u00E2\u0088\u00921)t ,\u00CE\u00B8 (i)) fht\u00E2\u0088\u00921(h(i)t\u00E2\u0088\u00921|\u00CE\u00B8 (i)) fp(\u00CE\u00B8 (i))=fX(xt |h\u00E2\u0080\u00B2t) fht+1(h(i\u00E2\u0088\u00921)t+1 |h\u00E2\u0080\u00B2t ,\u00CE\u00B8(i))fX(xt |h(i\u00E2\u0088\u00921)t ) fht+1(h(i\u00E2\u0088\u00921)t+1 |h(i\u00E2\u0088\u00921)t ,\u00CE\u00B8 (i)). (3.2)Expression (3.3) can be evaluated easily for t = 2,3, ...,(T \u00E2\u0088\u0092 1) and i \u00E2\u0089\u00A5 1. Itcan be understood intuitively as the ratio of partial conditional likelihoods betweenh\u00E2\u0080\u00B2t and h(i\u00E2\u0088\u00921)t . Instead of comparing the full likelihood times the inverse of proposalkernel like the classic Metropolis-Hastings\u00E2\u0080\u0099 algorithm, the acceptance rate here isdetermined by comparing the likelihood of observing the current return value andthe log-volatility value of the next step between the newly proposed h\u00E2\u0080\u00B2t and thevalue from the last iteration h(i\u00E2\u0088\u00921)t . The decision of either keeping the new valueor retaining the old value is made in the same way as classic Metropolis-Hastingsalgorithm. However, when t = 1, h0 is unknown and therefore we cannot sampleh1, and need to make an arbitrary guess about it. Similarly, when t = T , hT+1 isunknown and the method described above cannot be applied to sample hT . Weneed to make an arbitrary guess about hT as well. One possible way to minimizethe impact of h1 and hT in the ith iteration is to discard a number of {h(i)t } for t < T1and t > T2, and only retain a subset of {h(i)t } as {h(i)t : T1 \u00E2\u0089\u00A4 t \u00E2\u0089\u00A4 T2}, where T1 andT2 are two arbitrary numbers to be selected prior to running the algorithm.Note that we do not have theoretical proof for the convergence of this algo-rithm. However, it is reasonable to believe that our method enjoys the same con-vergence property as the classic Metropolis-Hastings\u00E2\u0080\u0099 algorithm from the empiricalresults such as in Figure 3.2.233.2.4 Parameter EstimationPrevious methods implement the fully Bayesian approach to estimate the param-eters of the ARSV model. With carefully chosen prior distributions for the pa-rameters and the normality assumption of both innovations one can sample theparameters from their posterior distributions. However, it would be very difficultif the innovation distributions, especially the second innovation distribution of theARSV(1) model, are not conjugate with each other or the prior distribution chosenfor parameters.In our approach, we estimate the parameters separately. After each itera-tion we obtain the {h(i)t : T1 \u00E2\u0089\u00A4 t \u00E2\u0089\u00A4 T2}, and we want to estimate the parameters(\u00CE\u00B20,\u00CE\u00B21,\u00CE\u00B4 ,\u00CE\u00BD\u00CE\u00B7) of an AR(1) processht = \u00CE\u00B20+\u00CE\u00B21ht\u00E2\u0088\u00921+\u00CE\u00B4\u00CE\u00B7twhere T1 \u00E2\u0089\u00A4 t \u00E2\u0089\u00A4 T2 and \u00CE\u00B7t is a strict white noise process with zero mean and unitvariance. Then the paramter estimation can be simplified to the problem of esti-mating the parameters of an AR(1) process with non-Gaussian innovation, whichis well studied and can be easily applied (for example, (Grunwald, Hyndman,Tedesco, and Tweedie, 2000)). In this study, we use the arfimafit functionin the rugarch package, which provides estimation of parameters for an autore-gressive model with Student t distribution with an MLE approach. If there is noavailable program for the parameter estimation, one can try different methods, suchas maximum likelihood method or Bayesian method, to estimate the parameters.Figure 3.2 illustrates the distribution of estimated parameters. It can be seen thatwith a starting value \u00E2\u0088\u00921.2, the Markov chain values of \u00CE\u00B20 soon move close to thetrue value \u00E2\u0088\u00920.5 after a few hundred iterations, and remain around the true value asthe number iterations increases. Also notice that the Markov chain values of thedegrees of freedom is not very stable, however, most of the iterations will returnestimations that are reasonably close to the true value. Therefore, we could discardthe first K iterations as the burn-in period, where the K is determined by observa-tion, and use some robust statistic such as the median of Markov chain values ofeach parameter from remaining Markov chain values as our estimator of the param-24eter3. This estimator will help us to obtain a good estimation of each parameter. InTable 3.1 we present an example of the results of our algorithm. Here the length ofMarkov chain is 3000 iterations, and the length burn-in period is 500 iterations. InFigure 3.2 we also present the Markov values after each iteration.Table 3.1: An example of estimating parameters of an ARSV(1) process with\u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5. Median of Markov chain valuesafter discarding the values from the first 1000 iterations. The values inthe parentheses are standard error of each estimator.True Value \u00CE\u00B20 =\u00E2\u0088\u00920.5 \u00CE\u00B21 = 0.95 \u00CE\u00B4 = 0.35 \u00CE\u00BD\u00CE\u00B7 = 5estimation -0.544(0.080) 0.944(0.008) 0.360(0.039) 5.46(20.7)Figure 3.2: Illustration of Markov chain values of parameters after each it-eration. Each dot represents the estimated value of the correspondingparameter after each iteration. The red lines represent the true value.3For simplicity we use median as the estimator for all parameters. However, particularly for theestimator for degrees of freedom, mode is also recommended.253.2.5 Discussion on the FlexibilityDuring the process of writing this report, we find a paper by Fridman and Harris(1998) which also discusses a flexible inference method for ARSV process basedon the maximum likelihood approach. The new method discussed in this reportand their method achieve the flexibility of allowing arbitrary choices of innovationdistribution in a different way. For the method proposed by Fridman and Harris(1998), inference for ARSV(1) process is simplified to finding a good numericalapproximation method to evaluate the integration on a given probability densityfunction. However, our new method simplifies the inference for ARSV(1) processto inference for an AR(1) process with given distribution of error terms. Further-more, it is easy for the method proposed by Fridman and Harris (1998) to includethe ARCH term in the volatility process, while increasing the lags in the volatilityprocess will lead to an exponential growth in the complexity of this method. Ourmethod, on the other hand, can handle the inference of ARSV(p) process with triv-ial modifications. However, it might be more difficult for our method to estimateparameters for the ARSV model with the ARCH term in the volatility process.Therefore, these two methods can be complementary to each other. The choiceshould be made depending on the assumption of underlying process.3.2.6 AlgorithmThe algorithm for our new method contains two layers of loops. Before startingour iterations, first we need to get an initial estimation of {h(0)t } using some proxiessuch as the moving average of squared returns. Then we initialize the parameterset \u00CE\u00B8 (1) = [\u00CE\u00B2 (1)0 ,\u00CE\u00B2(1)1 ,\u00CE\u00B4(1),\u00CE\u00BD(1)\u00CE\u00B7 ] based on methods discussed in Section 3.2.4. Thisalgorithm contains two layers of loops. In the ith iteration of the external loop,we follow sampling method discussed in Section 3.2.1 and 3.2.3 to generate thesequence of {h(i)t } for t = 1,2, ...N. After the internal loop ends, we update param-eter set based on the newly sampled {h(i)t }, t = 1,2, ...N with methods discussedin Section 3.2.4. Detailed algorithm is given below, and simulation results arepresented in the next section.26Algorithm 1 Flexible Inference for ARSV Model Inference1: initialize ite, {h(0)t }, \u00CE\u00B8 (1), T , T1, and T22: while i < ite do3: for t from 2 to T-1 do4: sample h\u00E2\u0080\u00B2t from fht (\u00C2\u00B7|h(i)t\u00E2\u0088\u00921,\u00CE\u00B8 (i))5: set \u00CE\u00B1 =fX (xt |h\u00E2\u0080\u00B2t ) fht+1 (h(i\u00E2\u0088\u00921)t+1 |h\u00E2\u0080\u00B2t ,\u00CE\u00B8 (i))fX (xt |h(i\u00E2\u0088\u00921)t ) fht+1 (h(i\u00E2\u0088\u00921)t+1 |h(i\u00E2\u0088\u00921)t ,\u00CE\u00B8 (i\u00E2\u0088\u00921))6: calculate acceptance rate A = min{1,\u00CE\u00B1}7: generate u from uni f (0,1)8: if u < A then h(i)t = h(i\u00E2\u0088\u00921)t9: else h(i)t = h\u00E2\u0080\u00B2t10: end if11: end for12: keep {h(i)t ,T1 \u00E2\u0089\u00A4 t \u00E2\u0089\u00A4 T2}13: update \u00CE\u00B8 (i+1) by fitting an AR(1) process to {h(i)t ,T1 \u00E2\u0089\u00A4 t \u00E2\u0089\u00A4 T2}14: end while3.3 Comparison of Inference MethodsIn this section we present the results of the model inference. We show that ournew method can estimate the parameters of the classic ARSV(1) model as well asprevious methods, and can be applied to the extension of classic ARSV(1) whenprevious methods may fail.3.3.1 Parameter Estimation for Simulated DataFirst we want to show that our new approach works as well as previous methods.The simulated data is generated from the following process:Xt = \u00CF\u0083t\u00CE\u00B5t (3.3)log\u00CF\u00832t =\u00E2\u0088\u00920.5+0.95log\u00CF\u00832t\u00E2\u0088\u00921+0.35\u00CE\u00B7t , (3.4)where \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1), \u00CE\u00B7t \u00E2\u0088\u00BC N(0,1), and \u00CE\u00B7t and \u00CE\u00B5t are independent.We generate 200 datasets with length T = 2500 based on (3.4) and (3.5) with27different random seeds. We then run our new method for 5000 iterations and dis-card the first 2000 runs. We compare our results with the stochvol package (Kast-ner, 2016). This package is one of the latest packages for ARSV(1) model infer-ence, and we regard it as the representative of existing methods. stochvol packageassumes the second innovation distribution {\u00CE\u00B7t} of ARSV(1) model to be normal,and can work with both normal innovation distributions or a Student t first innova-tion and a normal second innovation.Figure 3.3: Distribution of estimated parameters based on simulated datagenerated from (3.4) and (3.5) with \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC N(0,1). Esti-mates are from our new method and the stochvol package.Table 3.2: Median of estimated parameters for simulated data in Figure 3.3using our new method. The values in the parentheses are standard devia-tions of the 200 estimated values of each parameter.Method \u00CE\u00B20 =\u00E2\u0088\u00920.5 \u00CE\u00B21 = 0.95 \u00CE\u00B4 = 0.35stochvol -0.527(0.113) 0.947(0.011) 0.359(0.033)New Method -0.495(0.177) 0.950(0.017) 0.348(0.045)28The example above illustrates that our new method works as well as previousmethods for the inference of the classic ARSV(1) model. Although the standarddeviations of our estimated parameters are slightly larger, in general two methodsare very close to each other.Next we want to show that our method is more flexible than previous methods.We consider two different scenarios: (1) \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5; (2)\u00CE\u00B5t \u00E2\u0088\u00BC standardized t5 and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5.In the first scenario, 300 datasets with length T = 2500 are generated. Wecompare the results in the same way as above.Figure 3.4: Distribution of estimated parameters based on simulated datagenerated from (3.4) and (3.5) with \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BCstandardized t5. Estimates are from our new method and the stochvolpackage.Table 3.3: Median of estimated parameters for simulated data in Figure 3.4.The values in the parentheses are standard deviations of the 300 estimatedvalues of each parameter.Method \u00CE\u00B20 =\u00E2\u0088\u00920.5 \u00CE\u00B21 = 0.95 \u00CE\u00B4 = 0.35 \u00CE\u00BD\u00CE\u00B7 = 5stochvol -0.531(0.110) 0.947(0.011) 0.359(0.033) NANew Method -0.506(0.138) 0.950(0.014) 0.357(0.045) 6.54(5.15)29From Figure 3.4 and Table 3.3 we can see that estimated parameters fromboth our new method and the stochvol package are very close. However, our newmethod estimates the degrees of freedom of the second innovation distribution rea-sonably well, while the stochvol package has to assume that the second innovationis normally distributed.We do similar simulation study for the second case, and the results are pre-sented below. Note that we estimate the parameters of ARSV model using thestochvol package with two different model specifications, namely assuming \u00CE\u00B5t \u00E2\u0088\u00BCN(0,1) or \u00CE\u00B5t \u00E2\u0088\u00BC standardized t\u00CE\u00BD\u00CE\u00B5 , to show how bad the performance can be when wespecify the model incorrectly. The results are presented in Table 3.4.When both innovations follow Student t distribution, we can observe a muchbetter performance for our new method in comparison to the stochvol package.The difference is significant especially for estimating the shape parameter of \u00CE\u00B5t , asthe result from our method is less biased with a much smaller standard deviation.We can also see that when the model specification is incorrect, the performance ofthe method implemented in stochvol package could be very poor.Next, we want to show that our new method is robust against the incorrectmodel specification. First, let us consider the case when the true model has bothinnovations normally distributed. We estimate the parameters by specifying bothinnovations as Student t distributed. The results are presented in Table 3.5.We also consider the case when the first innovation of the true model is nor-mally distributed and the second innovation distribution is a skewed Student t dis-tribution. The skewing parameter \u00CE\u00B3 = 1.5, and to amplify the impact of the incor-rect model specification we assume that the standard deviation of \u00CE\u00B7t equals 0.5. Weestimate the parameters by assuming that the first innovation distribution is normalwhile the second innovation distribution is Student t. The results are presented inTable 3.6.30Table 3.4: Median of estimated parameters for simulated data with \u00CE\u00B5t \u00E2\u0088\u00BC standardized t5 and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t5. Thevalues in the parentheses are standard deviations of the estimated values of each parameter.Method \u00CE\u00B20 =\u00E2\u0088\u00920.5 \u00CE\u00B21 = 0.95 \u00CE\u00B4 = 0.35 \u00CE\u00BD\u00CE\u00B5 = 5 \u00CE\u00BD\u00CE\u00B7 = 5stochvol(t-Normal) -0.561(0.156) 0.946(0.015) 0.363(0.054) 5.44(4.58) NAstochvol(Normal-Normal) -1.098(0.312) 0.893(0.031) 0.575(0.064) NA NANew Method -0.502(0.154) 0.950(0.015) 0.357(0.059) 4.98(1.02) 5.37(4.78)Table 3.5: Median of estimated parameters for simulated data with \u00CE\u00B5t \u00E2\u0088\u00BCN(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BCN(0,1). However, we estimatethe parameters by assuming that \u00CE\u00B5t \u00E2\u0088\u00BC standardized t\u00CE\u00BD\u00CE\u00B5 and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t\u00CE\u00BD\u00CE\u00B7 . The values in the parentheses arestandard deviations of the estimated values of each parameter.True Value \u00CE\u00B20 =\u00E2\u0088\u00920.5 \u00CE\u00B21 = 0.95 \u00CE\u00B4 = 0.35 \u00CE\u00BD\u00CE\u00B5 = \u00E2\u0088\u009E \u00CE\u00BD\u00CE\u00B7 = \u00E2\u0088\u009EEstimated Value -0.493(0.139) 0.951(0.014) 0.344(0.042) 9.39(5.96) 92.22(44.6)Table 3.6: Median of estimated parameters for simulated data with \u00CE\u00B5t \u00E2\u0088\u00BCN(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC skewed\u00E2\u0088\u0092 t(0,1,5,1.5). How-ever, we estimate the parameters by assuming that \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1) and \u00CE\u00B7t \u00E2\u0088\u00BC standardized t\u00CE\u00BD\u00CE\u00B7 . The values in theparentheses are standard deviations of the estimated values of each parameter.True Value \u00CE\u00B20 =\u00E2\u0088\u00920.5 \u00CE\u00B21 = 0.95 \u00CE\u00B4 = 0.5 \u00CE\u00BD\u00CE\u00B5 = 5 \u00CE\u00B3 = 1.5Estimated Value -0.545(0.129) 0.946(0.013) 0.521(0.051) 4.93(5.19) NA31The two examples above suggest that our new method is robust against the in-correct model specification. Note that our method can work for arbitrary choices ofinnovation distributions. Therefore, we can always \u00E2\u0080\u009Cover-specify\u00E2\u0080\u009D our model withmore general innovation distributions to achieve robustness. For example, in theexample illustrated in Table 3.4, we estimate the degrees of freedom for \u00CE\u00B7t as 9.39with standard deviation 5.96 and the degrees of freedom for \u00CE\u00B5t as 92.22 with stan-dard deviation 44.6 while these two innovations are actually normally distributed.However, a Student t distribution with such a large degrees of freedom would prac-tically behave very similar to a normal distribution. Therefore, our method allows amore generalized model specification which brings robustness, while the previousmethods do not enjoy this flexibility.3.3.2 Parameter Estimation for the S&P 500 IndexIn Fridman and Harris (1998), results for the model inference are compared acrossseveral methods utilizing a Bayesian approach, semi-maximum likelihood approachand maximum likelihood approach. Here we compare the estimated parametersfrom the three methods mentioned above with results from our proposed methodand results from the stochvol package.We fit an ARSV(1) model to the daily log-return of the S&P 500 Index from1980 to 1987. In total, 2022 observations are used for model inference. We esti-mate the parameters following both the traditional model assumptions that the twoinnovation distributions are Gaussian. The results are presented in Table 3.7. Dueto the difference approach in achieving flexible model assumption, in Fridman andHarris (1998) the authors do not fit an ARSV(1) process with light-tailed first in-novation and heavy tailed second innovation. They consider the scenario that thefirst innovation is Student t distributed while the second innovation is normally dis-tributed, which can also be fitted using the stochvol package. We fit the data withthe same model assumption as well as a more generalized model assumption thatboth innovations are Student t distributed. The results are presented in Table 3.8.32Table 3.7: Summaries statistics of estimated parameters with the model as-sumption that both innovations are normally distributed. The \u00E2\u0080\u009Cn\u00E2\u0080\u009D incolumn names stands for \u00E2\u0080\u009Cnormal\u00E2\u0080\u009D. The values in the parentheses areasymptotic standard deviations for SML and ML methods, posteriorstandard deviations for Bayes method and the method implemented instochvol package, and interquartile range (IQR) for the new approach.Bayes SML ML(n-n) stochvol New Method(n-n)\u00CE\u00B20 -.002(.004) -.002(.004) -.002(.0004) -.270(.007) -.010(.022)\u00CE\u00B21 .970(.008) .958(.014) .959(.005) .971(.001) .989(.003)\u00CE\u00B4 .150(.017) .161(.026) .159(.009) .153(.002) .071(.014)Table 3.8: Summaries statistics of estimated parameters with the model as-sumption that the first innovation distribution follows a Student t distribu-tion. The values in the parentheses are asymptotic standard deviations forML methods, posterior standard deviations for the method implementedin stochvol package, and interquartile range (IQR) for the new approach.ML(t-n) stochvol New Method(t-n) New Method(t-t)\u00CE\u00B20 -.0038(.0013) -.1384(.0108) -.1020(.0473) -.1718(.2979)\u00CE\u00B21 .9813(.0056) .9855(.0011) .9897(.0054) .9821(.0312)\u00CE\u00B4 .0942(.0199) .1016(.0042) .0799(.01853) .1153 (.0958)\u00CE\u00BD\u00CE\u00B5 10.39(5.88) 11.42(1.76) 10.11(1.07) 10.47(1.75)\u00CE\u00BD\u00CE\u00B7 NA NA NA 2.50(1.96)Note that our new approach achieves the flexibility of arbitrary choices of in-novations at the cost of stability and efficiency. Therefore, outliers appear a fewtimes when estimating the parameters. More robust summary statistics such asmedian/IQR are preferred over mean/standard deviation. The results presented inTable 3.7 and Table 3.8 show that the results from our method are close to thosefrom existing method, except that the estimated \u00CE\u00B4 using our method in Table 3.7is significantly lower than those using other methods. The reason requires furtherstudy.33Chapter 4Conditional Risk Measurementwith the ARSV ModelOne of the applications for volatility models is to estimate the potential risk givenknown information. In general, there are two approaches to risk measures, namelyconditional risk measures and unconditional risk measures. When we are dis-cussing GARCH process and ARSV process in Chapter 2, we assume that Ft\u00E2\u0088\u00921,the sigma algebra of all available information up to time t \u00E2\u0088\u0092 1, is known. If wefurther assume that the distribution of the return Xt is conditioned on Ft\u00E2\u0088\u00921, thenour measure of risk at time t should be conditioned onFt\u00E2\u0088\u00921. On the other hand, wecan also measure the unconditional risk from the unconditional distribution of Xt .One can choose whether to use the conditional or unconditional distribution for riskmeasure forecasting. In this chapter we adopt the conditional estimation approach,and focus on the Value-at-Risk and Conditional Value-at-Risk risk measures underthe ARSV process.4.1 Value-at-Risk forecasting under the ARSV ModelThe Value-at-Risk (VaR) is arguably one of the most widely used risk measures,and for a confidence level \u00CE\u00B1 \u00E2\u0088\u0088 (0,1) it is defined as:VaR\u00CE\u00B1(X) = inf{x : P(X \u00E2\u0089\u00A5 x)\u00E2\u0089\u00A4 1\u00E2\u0088\u0092\u00CE\u00B1}, (4.1)34and typically \u00CE\u00B1 is close to 1 (e.g., 0.9 or 0.95). For an ARSV(1) model, Xt = \u00CF\u0083t\u00CE\u00B5t ,where \u00CF\u0083t and \u00CE\u00B5t are independent random variables. Since \u00CF\u0083t > 0 almost surely, wehaveP(Xt \u00E2\u0089\u00A5 x|Ft\u00E2\u0088\u00921) = P(\u00CF\u0083t\u00CE\u00B5t \u00E2\u0089\u00A5 x|Ft\u00E2\u0088\u00921)=\u00E2\u0088\u00AB \u00E2\u0088\u009E0P(\u00CE\u00B5t \u00E2\u0089\u00A5 x/s) f\u00CF\u0083t |Ft\u00E2\u0088\u00921(s|Ft\u00E2\u0088\u00921)ds, (4.2)where f\u00CF\u0083t is the conditional probability density function of \u00CF\u0083t given Ft\u00E2\u0088\u00921 is theinformation set at time t\u00E2\u0088\u00921.Since log\u00CF\u00832t = \u00CE\u00B20+\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B7t ,P(\u00CF\u0083t \u00E2\u0089\u00A4 s|Ft\u00E2\u0088\u00921) = P(log\u00CF\u00832t \u00E2\u0089\u00A4 logs2|Ft\u00E2\u0088\u00921)= P(\u00CE\u00B20+\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B4\u00CE\u00B7t \u00E2\u0089\u00A4 logs2|Ft\u00E2\u0088\u00921)= P(\u00CE\u00B7t \u00E2\u0089\u00A4 (logs2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)/\u00CE\u00B4 |Ft\u00E2\u0088\u00921)= F\u00CE\u00B7((logs2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)/\u00CE\u00B4 ),where F\u00CE\u00B7 is the cumulative distribution function of a standardized Student\u00E2\u0080\u0099s t dis-tribution with zero mean, unit variance and \u00CE\u00BD\u00CE\u00B7 degrees of freedom and assumingthat Ft\u00E2\u0088\u00921 contains \u00CF\u0083t\u00E2\u0088\u00921. Hence, the conditional density of \u00CF\u0083t is given byf\u00CF\u0083t |Ft\u00E2\u0088\u00921(s|Ft\u00E2\u0088\u00921) = f\u00CE\u00B7((logs2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)/\u00CE\u00B4 )2\u00CE\u00B4 s, s > 0..Therefore, (4.2) can be written asP(Xt \u00E2\u0089\u00A5 x|Ft\u00E2\u0088\u00921)=2\u00CE\u00B4\u00E2\u0088\u00AB \u00E2\u0088\u009E0P(\u00CE\u00B5t \u00E2\u0089\u00A5 x/s) f\u00CE\u00B7((logs2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)/\u00CE\u00B4 )1sds. (4.3)With estimated parameters \u00CE\u00B8\u00CB\u0086 = [\u00CE\u00B2\u00CB\u00860, \u00CE\u00B2\u00CB\u00861, \u00CE\u00B4\u00CB\u0086 , \u00CE\u00BD\u00CB\u0086\u00CE\u00B7 ] and estimated volatility \u00CF\u0083\u00CB\u0086t\u00E2\u0088\u00921, wecan plug (4.3) into (4.1) and solve for VaR\u00CE\u00B1(Xt) using numerical methods. Param-eters \u00CE\u00B8\u00CB\u0086 can be estimated by the method we described in Chapter 3. The volatility\u00CF\u0083\u00CB\u0086t\u00E2\u0088\u00921 can be estimated separately using unbiased volatility estimators such as av-erage squared returns or EWMA, or using inference methods for volatility models35that provide volatility estimation.4.2 CoVaR forecasting: GARCH Model vs ARSV ModelAdrian and Brunnermeier (2011) were first to define Conditional Value-at-Risk(CoVaR) as:CoVaR(1)\u00CE\u00B1 (Xt+1) = inf{x : P(Xt+1 \u00E2\u0089\u00A5 x)\u00E2\u0089\u00A4 1\u00E2\u0088\u0092\u00CE\u00B1|Xt = VaR\u00CE\u00B1 \u00E2\u0080\u00B2(Xt),Ft\u00E2\u0088\u00921}, (4.4)where both \u00CE\u00B1 and \u00CE\u00B1 \u00E2\u0080\u00B2 are two constants that are close to 1 and VaR\u00CE\u00B1 is defined inSection 4.1. For simplicity, we assume \u00CE\u00B1 = \u00CE\u00B1 \u00E2\u0080\u00B2.Girardi and Ergu\u00C2\u00A8n (2013) modify the definition of CoVaR to:CoVaR(2)\u00CE\u00B1 (Xt+1) = inf{x : P(Xt+1 \u00E2\u0089\u00A5 x)\u00E2\u0089\u00A4 1\u00E2\u0088\u0092\u00CE\u00B1|Xt \u00E2\u0089\u00A5 VaR\u00CE\u00B1(Xt),Ft\u00E2\u0088\u00921}. (4.5)The one-step ahead forecasts of VaR\u00CE\u00B1 are based on the estimate of \u00CE\u00B1-quantile ofthe distribution of Xt give the historyFt\u00E2\u0088\u00921. CoVaR\u00CE\u00B1 forecasting, on the other hand,makes a two-step forward estimation of the conditional quantile of the distributionof Xt+1 conditioned on Xt and history Ft\u00E2\u0088\u00921. So instead of measuring the potentiallarge loss or gain on the next day as VaR does, CoVaR measures the potential con-secutive large gains for two days. In this section we discuss the CoVaR estimationunder both GARCH and ARSV models for both CoVaR definitions.4.2.1 First Definition of CoVaRFor the GARCH(1,1) process, given all information up to time t\u00E2\u0088\u00921 and \u00CF\u0083t\u00E2\u0088\u00921, thevolatility at time t is fixed as\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B1\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B2X2t\u00E2\u0088\u00921. Suppose that Xt =VaR\u00CE\u00B1(Xt)=:36v , then \u00CF\u0083t+1 is also fixed and equals\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B1\u00CF\u00832t +\u00CE\u00B2v2. ThenP(Xt+1 \u00E2\u0089\u00A5 x|Xt = v ,Ft\u00E2\u0088\u00921)= P(\u00CF\u0083t+1\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x|Xt = v ,Ft\u00E2\u0088\u00921)= P(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/\u00CF\u0083t+1|Xt = v ,Ft\u00E2\u0088\u00921) (4.6)= P(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B1\u00CF\u00832t +\u00CE\u00B2v2|Ft\u00E2\u0088\u00921)= P\u00EF\u00A3\u00AB\u00EF\u00A3\u00AD\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x\u00E2\u0088\u009A\u00CE\u00B10(\u00CE\u00B1+1)+\u00CE\u00B12\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B1\u00CE\u00B2X2t\u00E2\u0088\u00921+\u00CE\u00B2v2\u00EF\u00A3\u00B6\u00EF\u00A3\u00B8 . (4.7)Substituting (4.7) into (4.4), we can solve for CoVaR(1)\u00CE\u00B1 (Xt+1) numerically when{Xt} follows a GARCH(1,1) process.For the ARSV(1) process, given the same information and assuming Xt = v ,the value of \u00CF\u0083t+1 is no longer fixed. Therefore, expression (4.6) becomesP(Xt+1 \u00E2\u0089\u00A5 x|Xt = v ,Ft\u00E2\u0088\u00921)=\u00E2\u0088\u00AB \u00E2\u0088\u009E0P(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/s|Xt = v ,Ft\u00E2\u0088\u00921,\u00CF\u0083t+1 = s) f\u00CF\u0083t+1|Xt=v ,Ft\u00E2\u0088\u00921(s|Xt = v ,Ft\u00E2\u0088\u00921)ds=\u00E2\u0088\u00AB \u00E2\u0088\u009E0P(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/s) f\u00CF\u0083t+1|Xt=v ,Ft\u00E2\u0088\u00921(s|Xt = v ,Ft\u00E2\u0088\u00921)ds, (4.8)where f\u00CF\u0083t+1|Xt=v ,Ft\u00E2\u0088\u00921 is the conditional probability density function of \u00CF\u0083t+1 givenXt andFt\u00E2\u0088\u00921. In order to evaluate (4.8), we need to find this conditional probabilitydensity function. Since for s > 0,P(\u00CF\u0083t+1 \u00E2\u0089\u00A4 s|Xt = v ,Ft\u00E2\u0088\u00921) = P(log\u00CF\u00832t+1 \u00E2\u0089\u00A4 logs2|Xt = v ,Ft\u00E2\u0088\u00921),we havef\u00CF\u0083t+1|Xt=v ,Ft\u00E2\u0088\u00921(s|Xt = v ,Ft\u00E2\u0088\u00921) = fht+1|Xt=v ,Ft\u00E2\u0088\u00921(logs2|Xt = v ,Ft\u00E2\u0088\u00921)2s, s > 0.37The conditional cumulative distribution function of ht+1 isFht+1|Xt=v ,Ft\u00E2\u0088\u00921(u|Xt = v ,Ft\u00E2\u0088\u00921) = P(ht+1 \u00E2\u0089\u00A4 u|Xt = v ,Ft\u00E2\u0088\u00921)= P(\u00CE\u00B20+2\u00CE\u00B21 logXt\u00CE\u00B5t+\u00CE\u00B4\u00CE\u00B7t+1 \u00E2\u0089\u00A4 u|Xt = v ,Ft\u00E2\u0088\u00921)(4.9)= P(\u00CE\u00B7t+1 \u00E2\u0089\u00A4 (u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log v\u00CE\u00B5t )\u00CE\u00B4 )=\u00E2\u0088\u00AB \u00E2\u0088\u009E\u00E2\u0088\u0092\u00E2\u0088\u009EP(\u00CE\u00B7t+1 \u00E2\u0089\u00A4 (u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log vz )/\u00CE\u00B4 ) f\u00CE\u00B5(z)dz.Therefore,fht+1|Xt=v ,Ft\u00E2\u0088\u00921(logs2|Xt = v ,Ft\u00E2\u0088\u00921)= 1\u00CE\u00B4\u00E2\u0088\u00AB \u00E2\u0088\u009E\u00E2\u0088\u0092\u00E2\u0088\u009Ef\u00CE\u00B7((logs2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log vz )/\u00CE\u00B4 ) f\u00CE\u00B5(z)dz,where f\u00CE\u00B7 is the probability density function of \u00CE\u00B7t .Then, (4.8) can be expressed asP(Xt+1 \u00E2\u0089\u00A5 x|Xt = v ,Ft\u00E2\u0088\u00921)=2\u00CE\u00B4\u00E2\u0088\u00AB \u00E2\u0088\u009E0\u00E2\u0088\u00AB \u00E2\u0088\u009E\u00E2\u0088\u0092\u00E2\u0088\u009EP(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/s) f\u00CE\u00B7((logs2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log vz )\u00CE\u00B4 ) f\u00CE\u00B5(z)1sdzds. (4.10)Substituting (4.10) into (4.4), we can solve for CoVaR(1)\u00CE\u00B1 (Xt+1) numerically when{Xt} follows an ARSV(1) process.4.2.2 Second Definition of CoVaRWe can also find the CoVaR under the second definition modified by Girardi andErgu\u00C2\u00A8n (2013) (see eq. (4.5)) in a similar way as discussed in Section 4.2.1. How-ever, we have a different conditioning event that Xt \u00E2\u0089\u00A5 v instead of Xt = v .For the GARCH(1,1) process, \u00CF\u0083t is still fixed as\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B11\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B21X2t\u00E2\u0088\u00921. How-38ever, \u00CF\u0083t+1 now is a random variable\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B11\u00CF\u00832t +\u00CE\u00B21X2t . We haveP(Xt+1 \u00E2\u0089\u00A5 x|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)= P(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B11\u00CF\u00832t +\u00CE\u00B21X2t |Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)=\u00E2\u0088\u00AB \u00E2\u0088\u009EvP(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B11\u00CF\u00832t +\u00CE\u00B21w2) fXt |Xt\u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921(w|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)dw. (4.11)Notice that for w\u00E2\u0089\u00A5 vP(Xt \u00E2\u0089\u00A4 w|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921) = P(Xt \u00E2\u0089\u00A4 w,Xt \u00E2\u0089\u00A5 v |Ft\u00E2\u0088\u00921)P(Xt \u00E2\u0089\u00A5 v |Ft\u00E2\u0088\u00921)=1\u00CE\u00B1P(v \u00E2\u0089\u00A4 Xt \u00E2\u0089\u00A4 w|Ft\u00E2\u0088\u00921) by the definition of VaR\u00CE\u00B1=1\u00CE\u00B1P(v \u00E2\u0089\u00A4 \u00CE\u00B5t\u00CF\u0083t \u00E2\u0089\u00A4 w|Ft\u00E2\u0088\u00921)=1\u00CE\u00B1P(v/\u00CF\u0083t \u00E2\u0089\u00A4 \u00CE\u00B5t \u00E2\u0089\u00A4 w/\u00CF\u0083t |Ft\u00E2\u0088\u00921)=1\u00CE\u00B1(F\u00CE\u00B5t (w/\u00CF\u0083t)\u00E2\u0088\u0092F\u00CE\u00B5t (v/\u00CF\u0083t)). (4.12)From (4.12) we obtainfXt |Xt\u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921(w|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921) =1\u00CE\u00B1\u00CF\u0083tf\u00CE\u00B5(w/\u00CF\u0083t), w\u00E2\u0089\u00A5 v . (4.13)With (4.13), (4.11) can be expressed asP(Xt+1 \u00E2\u0089\u00A5 x|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)=1\u00CE\u00B1\u00CF\u0083t\u00E2\u0088\u00AB \u00E2\u0088\u009EvP(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B11\u00CF\u00832t +\u00CE\u00B21w2)f\u00CE\u00B5(w/\u00CF\u0083t)dw. (4.14)Substituting (4.14) into (4.5), we can solve for CoVaR(2)\u00CE\u00B1 (Xt+1) numerically when{Xt} follows a GARCH(1,1) process.In order to find the CoVaR(2)\u00CE\u00B1 (Xt+1) under an ARSV(1) process, we can start39from (4.9). Under the condition that Xt \u00E2\u0089\u00A5 v , (4.9) now becomesP(ht+1 \u00E2\u0089\u00A4 u|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)= P(\u00CE\u00B20+2\u00CE\u00B21 logXt\u00CE\u00B5t+\u00CE\u00B4\u00CE\u00B7t+1 \u00E2\u0089\u00A4 u|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)= P(\u00CE\u00B7t+1 \u00E2\u0089\u00A4 (u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log Xt\u00CE\u00B5t )/\u00CE\u00B4 |Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)=\u00E2\u0088\u00AB \u00E2\u0088\u009EvP(\u00CE\u00B7t+1 \u00E2\u0089\u00A4 (u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log w\u00CE\u00B5t )/\u00CE\u00B4 ) fXt |Xt\u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921(w|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)dw.(4.15)In (4.15), the conditional probability density function of Xt , fXt |Xt\u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921(w|Xt \u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921), is unknown. However,fXt |Xt\u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921(w|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921) =1\u00CE\u00B1fXt |Ft\u00E2\u0088\u00921(w|Ft\u00E2\u0088\u00921), w\u00E2\u0089\u00A5 v .Then, (4.15) becomesP(ht+1 \u00E2\u0089\u00A4 u|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)=\u00E2\u0088\u00AB \u00E2\u0088\u009EvP((\u00CE\u00B7t+1 \u00E2\u0089\u00A4 u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log w\u00CE\u00B5t )/\u00CE\u00B4 )1\u00CE\u00B1fXt |Ft\u00E2\u0088\u00921(w|Ft\u00E2\u0088\u00921)dw=1\u00CE\u00B1\u00E2\u0088\u00AB \u00E2\u0088\u009E\u00E2\u0088\u0092\u00E2\u0088\u009E\u00E2\u0088\u00AB \u00E2\u0088\u009EvP(\u00CE\u00B7t+1 \u00E2\u0089\u00A4 (u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log wz )/\u00CE\u00B4 ) fXt |Ft\u00E2\u0088\u00921(w|Ft\u00E2\u0088\u00921) f\u00CE\u00B5(z)dwdz.Therefore,fht+1|Xt+1\u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921(u|Xt+1 \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)=1\u00CE\u00B1\u00CE\u00B4\u00E2\u0088\u00AB \u00E2\u0088\u009E\u00E2\u0088\u0092\u00E2\u0088\u009E\u00E2\u0088\u00AB \u00E2\u0088\u009Evf\u00CE\u00B7((u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log wz )/\u00CE\u00B4 ) fXt |Ft\u00E2\u0088\u00921(w|Ft\u00E2\u0088\u00921) f\u00CE\u00B5(z)dwdz. (4.16)The conditional cumulative distribution function of Xt isP(Xt \u00E2\u0089\u00A4 w|Ft\u00E2\u0088\u00921) = P(\u00CF\u0083t\u00CE\u00B5t \u00E2\u0089\u00A4 w|Ft\u00E2\u0088\u00921)=\u00E2\u0088\u00AB \u00E2\u0088\u009E0P(\u00CE\u00B5t \u00E2\u0089\u00A4 w/r) f\u00CF\u0083t |Ft\u00E2\u0088\u00921(r|Ft\u00E2\u0088\u00921)dr,40where f\u00CF\u0083t |Ft\u00E2\u0088\u00921(r|Ft\u00E2\u0088\u00921) = 2r f\u00CE\u00B7(logr2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921) since for r > 0,P(\u00CF\u0083t \u00E2\u0089\u00A4 r|Ft\u00E2\u0088\u00921) = P(log\u00CF\u00832t \u00E2\u0089\u00A4 logr2|Ft\u00E2\u0088\u00921)= P(\u00CE\u00B20+\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921+\u00CE\u00B4\u00CE\u00B7t \u00E2\u0089\u00A4 logr2|Ft\u00E2\u0088\u00921)= P(\u00CE\u00B7t \u00E2\u0089\u00A4 (logr2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)/\u00CE\u00B4 ).SofXt |Ft\u00E2\u0088\u00921(w|Ft\u00E2\u0088\u00921) =1\u00CE\u00B4\u00E2\u0088\u00AB \u00E2\u0088\u009E0f\u00CE\u00B5(w/r) f\u00CE\u00B7(logr2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)2r2dr,and (4.16) can be expressed asfht+1|Xt+1\u00E2\u0089\u00A5v ,Ft\u00E2\u0088\u00921(u|Xt+1 \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)=1\u00CE\u00B1\u00CE\u00B4 2\u00E2\u0088\u00AB \u00E2\u0088\u009E\u00E2\u0088\u0092\u00E2\u0088\u009E\u00E2\u0088\u00AB \u00E2\u0088\u009Ev\u00E2\u0088\u00AB \u00E2\u0088\u009E0f\u00CE\u00B7((u\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log wz )/\u00CE\u00B4 ) f\u00CE\u00B5(w/r)f\u00CE\u00B7((logr2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)/\u00CE\u00B4 ) f\u00CE\u00B5(z)2r2drdwdz.Now (4.8) becomesP(Xt+1 \u00E2\u0089\u00A5 x|Xt \u00E2\u0089\u00A5 v ,Ft\u00E2\u0088\u00921)=4\u00CE\u00B1\u00CE\u00B4 2\u00E2\u0088\u00AB \u00E2\u0088\u009E0\u00E2\u0088\u00AB \u00E2\u0088\u009E\u00E2\u0088\u0092\u00E2\u0088\u009E\u00E2\u0088\u00AB \u00E2\u0088\u009Ev\u00E2\u0088\u00AB \u00E2\u0088\u009E0P(\u00CE\u00B5t+1 \u00E2\u0089\u00A5 x/s) f\u00CE\u00B7((logs2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u00922\u00CE\u00B21 log wz )/\u00CE\u00B4 ) f\u00CE\u00B5(w/r)f\u00CE\u00B7((logr2\u00E2\u0088\u0092\u00CE\u00B20\u00E2\u0088\u0092\u00CE\u00B21 log\u00CF\u00832t\u00E2\u0088\u00921)/\u00CE\u00B4 ) f\u00CE\u00B5(z)1r2sdrdwdzds. (4.17)Substituting (4.17) into (4.5), we can solve for CoVaR(2)\u00CE\u00B1 (Xt+1) numerically when{Xt} follows an ARSV(1) process.However, given the numerical complexity of the analytic expression, whichinvolves a 4-dimensional integral, we do not follow this approach. Instead, wepropose a simulation based computation that is introduced in the next section.4.3 Simulation Methods to Find CoVaRThe tail dependence properties of different models have a strong impact on thejoint distribution of the (Xt ,Xt+1) pair. To illustrate the difference in the estimation41of CoVaR under the GARCH model and ARSV model, we will only focus onthe second definition of CoVaR modified by Girardi and Ergu\u00C2\u00A8n (2013). However,it could be difficult to find the CoVaR using numerical method, especially underthe second definition of CoVaR. We introduce a simulation-based methods as analternative for find CoVaR. Since CoVaR can be seen as the conditional quantileof Xt+1 given Ft\u00E2\u0088\u00921, our goal here is to sample X\u00CB\u0086t+1 with information up to timet\u00E2\u0088\u00921. Then it will be easy to find the empirical quantile of {X\u00CB\u0086t+1} as our estimatedCoVaR.For GARCH process, given Ft\u00E2\u0088\u00921 and estimated volatility at time t\u00E2\u0088\u0092 1 \u00CF\u0083\u00CB\u0086t\u00E2\u0088\u00921,\u00CF\u0083\u00CB\u00862t = \u00CE\u00B10 +\u00CE\u00B11\u00CF\u0083\u00CB\u00862t\u00E2\u0088\u00921 + \u00CE\u00B21X2t\u00E2\u0088\u00921 and X\u00CB\u0086t = \u00CF\u0083\u00CB\u0086t\u00CE\u00B5t \u00E2\u0088\u00BC N(0, \u00CF\u0083\u00CB\u00862t ). Therefore, the conditionthat X\u00CB\u0086t \u00E2\u0089\u00A5VaR\u00CE\u00B1(X\u00CB\u0086t) suggests that \u00CE\u00B5t \u00E2\u0089\u00A5VaR\u00CE\u00B1(\u00CE\u00B5t). So we can sample \u00CE\u00B5t first, and forthose \u00CE\u00B5t\u00E2\u0080\u0099s that are greater than VaR\u00CE\u00B1(\u00CE\u00B5t) we further calculate \u00CF\u0083\u00CB\u0086t+1 as\u00CF\u0083\u00CB\u0086t+1 =\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B11\u00CF\u0083\u00CB\u00862t +\u00CE\u00B21X\u00CB\u00862t=\u00E2\u0088\u009A\u00CE\u00B10+\u00CE\u00B11\u00CF\u0083\u00CB\u00862t +(\u00CE\u00B21\u00CF\u0083\u00CB\u00862t )\u00CE\u00B52t .Then X\u00CB\u0086t+1 = \u00CF\u0083\u00CB\u0086t+1\u00CE\u00B5t+1 can be calculated by sampling another \u00CE\u00B5t+1 from the distri-bution of {\u00CE\u00B5t}. Details about estimating the CoVaR under the GARCH process asdescribed in Algorithm 2.Algorithm 2 Estimation CoVaR Using Simulation under a GARCH(1,1) Process1: initialize \u00CE\u00B1\u00CB\u00860, \u00CE\u00B1\u00CB\u00861, \u00CE\u00B2\u00CB\u00861, \u00CF\u0083\u00CB\u0086t\u00E2\u0088\u00921, Xt\u00E2\u0088\u00921, N >> 11\u00E2\u0088\u0092\u00CE\u00B1 , the cumulative distribution func-tion of \u00CE\u00B5 (F\u00CE\u00B5 ), and \u00CE\u00B1 \u00E2\u0088\u0088 (0,1).2: Calculate \u00CF\u0083\u00CB\u0086t as\u00E2\u0088\u009A\u00CE\u00B1\u00CB\u00860+ \u00CE\u00B1\u00CB\u00861\u00CF\u0083\u00CB\u00862t\u00E2\u0088\u00921+ \u00CE\u00B2\u00CB\u00861X2t\u00E2\u0088\u009213: for i from 1 to N do4: Generate pt from Uniform(\u00CE\u00B1,1)5: Find \u00CE\u00B5 \u00E2\u0080\u00B2t such that F\u00CE\u00B5(\u00CE\u00B5 \u00E2\u0080\u00B2t ) = pt6: Let X\u00CB\u0086t = \u00CF\u0083\u00CB\u0086t\u00CE\u00B5 \u00E2\u0080\u00B2t7: Let \u00CF\u0083\u00CB\u0086t+1 =\u00E2\u0088\u009A\u00CE\u00B1\u00CB\u00860+ \u00CE\u00B1\u00CB\u00861\u00CF\u0083\u00CB\u00862t + \u00CE\u00B2\u00CB\u00861X\u00CB\u00862t8: Generate pt+1 from Uniform(0,1) and find \u00CE\u00B5t+1 = F\u00E2\u0088\u00921\u00CE\u00B5 (pt+1)9: Calculate X\u00CB\u0086t+1,i = \u00CF\u0083\u00CB\u0086t+1\u00CE\u00B5t+110: Save X\u00CB\u0086t+1,i11: end for12: Find CoVaR\u00CE\u00B1(X\u00CB\u0086t+1) as the \u00CE\u00B1 th quantile of {X\u00CB\u0086t+1,i, i = 1, ...,N}42For the ARSV(1) model, since the AR process of log\u00CF\u00832t \u00E2\u0080\u0099s does not dependon the value of Xt , we need to simulate Xt+1 in a different way. The idea here isto sample \u00CE\u00B7t and calculate the corresponding \u00CF\u0083\u00CB\u0086t following the AR process. Thenwe sample \u00CE\u00B5t to calculate X\u00CB\u0086t = \u00CF\u0083\u00CB\u0086t\u00CE\u00B5t . We repeat the two steps above until this X\u00CB\u0086tis greater than VaR\u00CE\u00B1(Xt), which can be estimated based on a rolling window ofhistorical data. Then with this \u00CF\u0083\u00CB\u0086t and a newly sampled \u00CE\u00B7t+1 we can calculate the\u00CF\u0083\u00CB\u0086t+1 and sample \u00CE\u00B5t+1 to find X\u00CB\u0086t+1. The details are described in Algorithm 3.Algorithm 3 Simulating CoVaR under the ARSV Process1: initialize \u00CE\u00B2\u00CB\u00860, \u00CE\u00B2\u00CB\u00861, distribution of \u00CE\u00B5t (F\u00CE\u00B5 ), distribution of \u00CE\u00B7t (F\u00CE\u00B7 ), \u00CF\u0083\u00CB\u0086t\u00E2\u0088\u00921, Xt\u00E2\u0088\u00921,\u00CE\u00B1 \u00E2\u0088\u0088 (0,1), v\u00CB\u0086 = VaR\u00CE\u00B1 from historical data, and N >> 11\u00E2\u0088\u0092\u00CE\u00B12: for i from 1 to N do3: Initialize X\u00CB\u0086t = 2|v\u00CB\u0086 |4: while X\u00CB\u0086t < v\u00CB\u0086 do5: Sample \u00CE\u00B7t \u00E2\u0088\u00BC F\u00CE\u00B76: Let \u00CF\u0083\u00CB\u0086t =\u00E2\u0088\u009Aexp(\u00CE\u00B2\u00CB\u00860+ \u00CE\u00B2\u00CB\u00861 log \u00CF\u0083\u00CB\u00862t\u00E2\u0088\u00921+\u00CE\u00B7t)7: Sample \u00CE\u00B5t \u00E2\u0088\u00BC F\u00CE\u00B58: Calculate X\u00CB\u0086t = \u00CF\u0083\u00CB\u0086t\u00CE\u00B5t9: end while10: Sample \u00CE\u00B7t+1 \u00E2\u0088\u00BC F\u00CE\u00B711: Let \u00CF\u0083\u00CB\u0086t+1 =\u00E2\u0088\u009Aexp(\u00CE\u00B2\u00CB\u00860+ \u00CE\u00B2\u00CB\u00861 log \u00CF\u0083\u00CB\u00862t +\u00CE\u00B7t+1)12: Sample \u00CE\u00B5t+1 \u00E2\u0088\u00BC F\u00CE\u00B513: Calculate X\u00CB\u0086t+1,i = \u00CF\u0083\u00CB\u0086t+1\u00CE\u00B5t+114: Save X\u00CB\u0086t+1,i15: end for16: Find CoVaR\u00CE\u00B1(Xt+1) as the \u00CE\u00B1 th quantile of {X\u00CB\u0086t+1,i : i = 1, ...,N}The results of comparisons of estimated VaR and CoVaR under the seconddefinition are shown in the next section.434.4 Comparison of VaR and CoVaR Forecasts UnderGARCH and ARSV ProcessesIn this section we present the results of risk forecasts from both simulated data andreal data. There are two scenarios of data generating processes for the simulationstudy:1. An ARSV(1) process:Xt = \u00CF\u0083t\u00CE\u00B5tlog\u00CF\u00832t =\u00E2\u0088\u00920.5+0.95log\u00CF\u00832t\u00E2\u0088\u00921+0.35\u00CE\u00B7t .where \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1), \u00CE\u00B7t \u00E2\u0088\u00BC t5, and \u00CE\u00B7t and \u00CE\u00B5t are independent.2. A GARCH(1,1) process:Xt = \u00CF\u0083t\u00CE\u00B5t\u00CF\u00832t = 5\u00C3\u009710\u00E2\u0088\u00926+0.85\u00CF\u00832t\u00E2\u0088\u00921+0.1X2t\u00E2\u0088\u00921,where \u00CE\u00B5t \u00E2\u0088\u00BC N(0,1).For the data example, we use the daily log-returns of the S&P 500 Index from1980 to 1987 to estimate parameter to forecast risk measures of the daily log-returns from 1988 to 2003.4.4.1 Simulation StudyValue-at-RiskIn the simulation study, we generate the data from both scenarios. The length ofeach generated dataset is 4500 after the burn-in period. We use the first 1000 obser-vations to estimate parameters of the ARSV(1) process with normally distributedfirst innovation and Student\u00E2\u0080\u0099s t distributed second innovation. Then we estimate theVaR for the rest of the data using a rolling window of size 1000. Note that whenestimating VaR at time t, we need to know the volatility at time t\u00E2\u0088\u00921 first. However,our inference method cannot estimate \u00CF\u0083t\u00E2\u0088\u00921 since we need to discard the last part of44estimated volatilities in our algorithm. Therefore, we need to use some external es-timators for \u00CF\u0083t\u00E2\u0088\u00921 when estimating VaR. In this project we choose \u00CF\u0083t\u00E2\u0088\u00921\u00E2\u0080\u0099s estimatedwhen fitting the data in the rolling window to the GARCH model. There are tworeasons for us to do so: first, it is more convenience since we are also estimatingthe VaR under the GARCH process. Second, based on some empirical analysis, wefound that \u00CF\u0083t\u00E2\u0088\u00921\u00E2\u0080\u0099s estimated by fitting a GARCH(1,1) model are more accurate thanthose estimated by some unbiased but noisy model-free estimators. For example,in this simulated dataset, the root MSE of estimated \u00CF\u0083t\u00E2\u0088\u00921 from GARCH(1,1) modelis\u00E2\u0088\u009A3.7\u00C3\u009710\u00E2\u0088\u00925 compared with\u00E2\u0088\u009A5.0\u00C3\u009710\u00E2\u0088\u00925 which is the MSE of estimated \u00CF\u0083t\u00E2\u0088\u00921from the 5-day close-to-close estimator.We first compare the estimated VaR based on both ARSV(1) model andGARCH(1,1) model. The VaR forecasts under the GARCH(1,1) model are esti-mated using model-based method (McNeil, Frey, and Embrechts, 2015b), whilethe VaR forecasts under the ARSV(1) model are estimated by finding the numer-ical solutions discussed in Section 4.1. The results are shown in Figure 4.1 andFigure 4.2. From the plots it is hard to observe obvious differences between theforecasts estimated under the ARSV(1) model and those under the GARCH(1,1)model. We also present the results of traditional backtests in Table 4.1, and theresults of conditional predictive ability tests (Giacomini and White, 2006) in Table4.2. The scoring function we use here is the piece-wise linear scoring functionsuggested by Gneiting (2011). From Table 4.1 we can see that at both the 95%level and the 99% level, the VaR forecasts estimated under the ARSV(1) modeland those under the GARCH(1,1) model can pass the traditional backtests no mat-ter what the true underlying process is. However, in Table 4.2 we can find that atthe 99% level the GARCH(1,1) model has a stronger conditional predictive abilitythan the ARSV(1) when the true data generating process is a GARCH(1,1) pro-cess. Otherwise, there is no significant difference between the forecasts from anARSV(1) model and those from a GARCH(1,1) model.CoVaRThe difference is more obvious when we forecast CoVaR with different filters. Wefirst generate a dataset from the ARSV(1) process as specified in Scenario 1. at45the beginning of Section 4.4. We then estimate CoVaR under the second definitionmodified by Girardi and Ergu\u00C2\u00A8n (2013) using the simulation methods described inAlgorithm 2 and Algorithm 3. The forecasted CoVaR under the GARCH processand the ARSV process are shown in Figure 4.3 and Figure 4.4. It can be observedthat the CoVaR forecasts estimated from the GARCH(1,1) model are in generalhigher that those from the ARSV(1) model. We can also see that the large differ-ence between the two estimated CoVaR is highly correlated with the large returnsquared.Figure 4.1: Estimated 95% and 99% VaR forecasts for the simulatedARSV(1) process. Black lines represent the simulated daily returns,red lines represent the VaR estimated under the GARCH model, andblue lines represent the VaR under the ARSV model. The left panel il-lustrates the case of 95% level, while the right panel illustrates the caseof 99% level.Figure 4.2: Estimated 95% and 99% VaR forecasts for the simulatedGARCH(1,1) process. Black lines represent the simulated daily returns,red lines represent the VaR estimated under the GARCH model, andblue lines represent the VaR under the ARSV model. The left panel il-lustrates the case of 95% level, while the right panel illustrates the caseof 99% level.46Table 4.1: Violation rates and corresponding p-values of likelihood-ratio tests for VaR\u00CE\u00B1 forecasts at 95% and 99%levels for simulated Data. Column names are the underlying processes and corresponding risk levels 1\u00E2\u0088\u0092\u00CE\u00B1 , rownames are the models used to forecast the risk measures.ARSV(1)-95% ARSV(1)-99% GARCH(1,1)-95% GARCH(1,1)-99%ARSV(1) 4.35% (0.153) 1.19% (0.386) 4.24% (0.074) 0.64% (0.053)GARCH(1,1) 5.35% (0.452) 1.05% (0.806) 5.24% (0.585) 1.04% (0.842)Table 4.2: Mean piece-wise linear scores and corresponding p-values of conditional predictive ability tests for VaR\u00CE\u00B1forecasts at 95% and 99% levels for simulated Data. Column names are the underlying processes and correspond-ing risk levels 1\u00E2\u0088\u0092\u00CE\u00B1 , row names are the models used to forecast the risk measures.ARSV(1)-95% ARSV(1)-99% GARCH(1,1)-95% GARCH(1,1)-99%ARSV(1) 9.93\u00C3\u009710\u00E2\u0088\u009240.8352.85\u00C3\u009710\u00E2\u0088\u009240.2371.03\u00C3\u009710\u00E2\u0088\u009230.1442.763\u00C3\u009710\u00E2\u0088\u009240.001GARCH(1,1) 9.97\u00C3\u009710\u00E2\u0088\u00924 2.95\u00C3\u009710\u00E2\u0088\u00924 1.033\u00C3\u009710\u00E2\u0088\u00924 2.756\u00C3\u009710\u00E2\u0088\u0092447Figure 4.3: Estimated 95% and 99% CoVaR forecasts for the simulatedARSV(1) process. In the left column, black lines represent the simu-lated daily returns, red lines represent the CoVaR estimated under theGARCH model, and blue lines represent the CoVaR under the ARSVmodel. In the right column, the black lines represent squared return,and blue beams indicate top 5% largest differences in the estimated Co-VaR between GARCH(1,1) model and ARSV(1) model. The top panelsillustrate the case of 95% level, while the bottom panels illustrates thecase of 99% level.48Figure 4.4: Estimated 95% and 99% CoVaR forecasts for the simulatedGARCH(1,1) process. In the left column, black lines represent the sim-ulated daily returns, red lines represent the CoVaR estimated under theGARCH model, and blue lines represent the CoVaR under the ARSVmodel. In the right column, the black lines represent squared return,and blue beams indicate top 5% largest differences in the estimated Co-VaR between GARCH(1,1) model and ARSV(1) model. The top panelsillustrate the case of 95% level, while the bottom panels illustrates thecase of 99% level.4.4.2 Data ExampleTo further compare the VaR and CoVaR forecasts under the GARCH model andthe ARSV model, we apply the forecasting methods to daily log-returns of S&P500 Index from 1988 to 2003 with a rolling window of size 1000. The forecastingprocess is the same as in Section 4.4.1, and the results are presented below.For the VaR forecasts, we can find that there is no significant difference be-tween the values estimated from the ARSV(1) model and those from theGARCH(1,1) model from Figure 4.5. Both models can provide good forecasts49Table 4.3: Violation rates and corresponding p-values of likelihood-ratio testsfor VaR\u00CE\u00B1 forecasts at 95% and 99% levels for daily log-returns of S&P500 Index. Column names are the risk levels 1\u00E2\u0088\u0092\u00CE\u00B1 , row names are themodels used to forecast the risk measures.S&P 500-95% S&P 500-99%ARSV(1) 4.96% (0.929) 1.31% (0.109)GARCH(1,1) 5.24% (0.345) 1.34% (0.077)Table 4.4: Mean piece-wise linear scores and corresponding p-values forcomparison of two forecast methods of conditional predictive ability testsfor VaR\u00CE\u00B1 forecasts at 95% and 99% levels for daily log-returns of S&P500 Index. Column names are the risk levels 1\u00E2\u0088\u0092\u00CE\u00B1 , row names are themodels used to forecast the risk measures.S&P 500-95% S&P 500-99%ARSV(1) 1.06\u00C3\u009710\u00E2\u0088\u009230.4612.85\u00C3\u009710\u00E2\u0088\u009240.506GARCH(1,1) 1.05\u00C3\u009710\u00E2\u0088\u00923 2.79\u00C3\u009710\u00E2\u0088\u00924of VaR and pass the traditional backtests (Table 4.3). There is also no significantdifference in conditional predictive ability (Table 4.4).In Figure 4.6 we show the estimated CoVaR forecasts from the ARSV(1) modeland GARCH(1,1) model. We can observe a more obvious difference between es-timated CoVaR forecasts in the data example, as the values estimated under theGARCH(1,1) model are consistently higher that those under the ARSV(1) model.A possible explanation for this consistent difference could be that the GARCH(1,1)process is asymptotically tail dependent. However, the ARSV(1) process withheavy-tailed second innovation, although has stronger tail dependence than theclassic ARSV(1) model, is still asymptotically tail independent. Recall that CoVaRis the conditional quantile of Xt+1. Therefore, when conditioned on Xt \u00E2\u0089\u00A5VaR\u00CE\u00B1(Xt)for some \u00CE\u00B1 close to 1, Xt+1 is more likely to also be a large value under the GARCHprocess than under the ARSV process. So the \u00CE\u00B1 th quantile of forecasted Xt+1 isgreater under the GARCH process than that under the ARSV process.50Figure 4.5: Estimated 95% and 99% VaR forecasts for daily log-returns ofS&P 500 Index. Black lines represent the simulated daily returns, redlines represent the VaR estimated under the GARCH model, and bluelines represent the VaR under the ARSV model. The top panel illustratesthe case of 95% level, while the bottom panel illustrates the case of 99%level.51Figure 4.6: Estimated 95% and 99% CoVaR forecasts for daily log-returnsof S&P 500 Index. In the left column, black lines represent the simu-lated daily returns, red lines represent the CoVaR estimated under theGARCH model, and blue lines represent the CoVaR under the ARSVmodel. In the right column, the black lines represent squared return,and blue beams indicate top 5% largest differences in the estimated Co-VaR between GARCH(1,1) model and ARSV(1) model. The top panelsillustrate the case of 95% level, while the bottom panels illustrates thecase of 99% level.52Chapter 5DiscussionEmpirical evidence suggests that consecutive log-return values may be tail inde-pendent. Thus the GARCH process might not be a suitable choice in modelingconsecutive losses, as it may over-estimate the potential risk at extremal levels.However, the tail dependence should be stronger at sub-extremal levels than can bemodelled by the classic ARSV process with light-tailed second innovation. There-fore, modelling log-returns using the classic ARSV process may lead to under-estimation of the probability of consecutive large losses.In this thesis report we propose an extension of an ARSV model by takingthe second innovation to be Student\u00E2\u0080\u0099s t distributed. We conjecture that this modelexhibits stronger tail dependence at sub-extremal levels than the classic ARSVmodel with normally distributed second innovation. Our conjecture is supportedvia simulation.However, most existing inference methods for the ARSV process have limitedflexibility in the choice of innovation distributions. Most of these methods requirethe second innovation to be normal, and the first innovation to be either normalor Student\u00E2\u0080\u0099s t. There are only few methods allow the second innovation to beheavy-tailed. In this report, we develop a new inference method for the extendedARSV(1) process which also allows flexible distributional assumptions on bothinnovations. This new method works as well as existing methods in estimatingparameters for the classic ARSV(1) model. It can also successfully estimate pa-rameters of the extended ARSV(1) process we consider, which is out of the range53of most existing methods. Furthermore, it has the potential for being adjusted toother extensions of the classic ARSV(1) process. For example, the same schemecan be applied to estimating parameters of an ARSV(p) process for any arbitraryinteger p \u00E2\u0089\u00A5 1 and non-Gaussian second innovation, as long as we know how toestimate parameters of the AR(p) process with this desired non-Gaussian error dis-tribution. We hope this new inference method will provide a useful tool for futurestudy of ARSV models.We also study the VaR and CoVaR risk measures under the ARSV process andcompare them with those under the GARCH process. We show that there is nosignificant difference between estimated VaR under GARCH process and ARSVprocess. However, we can observe a large difference between the CoVaR estimatedunder these two processes. This difference reveals the impact of tail dependenceproperties implied by a chosen model.For future work, we first need to prove that \u00CF\u0087\u00C2\u00AF covers the full spectrum of sub-extremal tail dependence; i.e., 0 < \u00CF\u0087\u00C2\u00AF < 1, for the extended model. And with thesupport of a more grounded theory, we also need to improve the computational effi-ciency. Currently our new inference method is much slower than existing inferencemethods that similarly require an MCMC scheme. In many cases, the computingspeed of the proposed method is up to 100 times slower. Another shortcoming ofthe new method is that it only provides parameter estimates but volatility estimatescan only be obtained for a sub-period of the historical record.54BibliographyT. Adrian and M. K. Brunnermeier. CoVaR. Technical report, National Bureau ofEconomic Research, 2011. \u00E2\u0086\u0092 pages 36B. Basrak, R. A. Davis, and T. Mikosch. Regular Variation of GARCH Processes.Stochastic Processes and Their Applications, 99:95\u00E2\u0080\u0093115, 2002. \u00E2\u0086\u0092 pages 8T. Bollerslev. Generalized Autoregressive Conditional Heteroskedasticity.Journal of Econometrics, 31:307\u00E2\u0080\u0093327, 1986. \u00E2\u0086\u0092 pages 2, 7F. J. Breidt and R. A. Davis. Extremes of Stochastic Volatility Models. Annals ofApplied Probability, 8:664\u00E2\u0080\u0093675, 1998. \u00E2\u0086\u0092 pages 9C. Broto and E. Ruiz. Estimation Methods for Stochastic Volatility Models: ASurvey. Journal of Economic Surveys, 18:613\u00E2\u0080\u0093649, 2004. \u00E2\u0086\u0092 pages 17S. Coles, J. Heffernan, and J. A. Tawn. Dependence Measures for Extreme ValueAnalyses. Extremes, 2:339\u00E2\u0080\u0093365, 1999. \u00E2\u0086\u0092 pages 6H. Drees, J. Segers, and M. Warcho\u00C5\u0082. Statistics for Tail Processes of MarkovChains. Extremes, 18:369\u00E2\u0080\u0093402, 2015. \u00E2\u0086\u0092 pages 2R. F. Engle. Autoregressive Conditional Heteroscedasticity with Estimates of theVariance of United Kingdom Inflation. Econometrica: Journal of theEconometric Society, 50:987\u00E2\u0080\u00931007, 1982. \u00E2\u0086\u0092 pages 2, 7M. Fridman and L. Harris. A Maximum Likelihood Approach for Non-GaussianStochastic Volatility Models. Journal of Business & Economic Statistics, 16:284\u00E2\u0080\u0093291, 1998. \u00E2\u0086\u0092 pages 26, 32R. Giacomini and H. White. Tests of Conditional Predictive Ability.Econometrica, 74:1545\u00E2\u0080\u00931578, 2006. \u00E2\u0086\u0092 pages 4555G. Girardi and A. T. Ergu\u00C2\u00A8n. Systemic Risk Measurement: Multivariate GARCHEstimation of CoVaR. Journal of Banking & Finance, 37:3169\u00E2\u0080\u00933180, 2013. \u00E2\u0086\u0092pages 36, 38, 42, 46T. Gneiting. Making and Evaluating Point Forecasts. Journal of the AmericanStatistical Association, 106:746\u00E2\u0080\u0093762, 2011. \u00E2\u0086\u0092 pages 45G. K. Grunwald, R. J. Hyndman, L. Tedesco, and R. L. Tweedie. Theory &Methods: Non-Gaussian Conditional Linear AR(1) Models. Australian & NewZealand Journal of Statistics, 42:479\u00E2\u0080\u0093495, 2000. \u00E2\u0086\u0092 pages 24A. Harvey, E. Ruiz, and N. Shephard. Multivariate Stochastic Variance Models.The Review of Economic Studies, 61:247\u00E2\u0080\u0093264, 1994. \u00E2\u0086\u0092 pages 8J. B. Hill. Extremal Memory of Stochastic Volatility with an Application to TailShape Inference. Journal of Statistical Planning and Inference, 141:663\u00E2\u0080\u0093676,2011. \u00E2\u0086\u0092 pages 9E. Jacquier, N. G. Polson, and P. E. Rossi. Bayesian Analysis of StochasticVolatility Models. Journal of Business & Economic Statistics, 12:69\u00E2\u0080\u009387, 1994.\u00E2\u0086\u0092 pages 8, 15, 16, 17E. Jacquier, N. G. Polson, and P. E. Rossi. Bayesian Analysis of StochasticVolatility Models with Fat-tails and Correlated Errors. Journal ofEconometrics, 122:185\u00E2\u0080\u0093212, 2004. \u00E2\u0086\u0092 pages 16A. Janssen and H. Drees. A Stochastic Volatility Model with Flexible ExtremalDependence structure. Bernoulli, 22:1448\u00E2\u0080\u00931490, 2016. \u00E2\u0086\u0092 pages 3, 7, 10H. Joe. Multivariate Models and Multivariate Dependence Concepts. CRC Press,1997. \u00E2\u0086\u0092 pages 5G. Kastner. stochvol: Efficient Bayesian Inference for Stochastic Volatility (SV)Models, 2016. URL https://CRAN.R-project.org/package=stochvol. R packageversion 1.3.1. \u00E2\u0086\u0092 pages 28G. Kastner and S. Fru\u00C2\u00A8hwirth-Schnatter. Ancillarity-sufficiency InterweavingStrategy (ASIS) for Boosting MCMC Estimation of Stochastic VolatilityModels. Computational Statistics & Data Analysis, 76:408\u00E2\u0080\u0093423, 2014. \u00E2\u0086\u0092pages 17F. Laurini and J. A. Tawn. Regular Variation and Extremal Dependence ofGARCH Residuals with Application to Market Risk Measures. EconometricReviews, 28:146\u00E2\u0080\u0093169, 2008. \u00E2\u0086\u0092 pages 7, 856A. W. Ledford and J. A. Tawn. Statistics for Near Independence in MultivariateExtreme Values. Biometrika, 83:169\u00E2\u0080\u0093187, 1996. \u00E2\u0086\u0092 pages 6A. W. Ledford and J. A. Tawn. Diagnostics for dependence within time seriesextremes. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 65(2):521\u00E2\u0080\u0093543, 2003. \u00E2\u0086\u0092 pages 6Y. Liu and J. A. Tawn. Volatility Model Selection for Extremes of Financial TimeSeries. Journal of Statistical Planning and Inference, 143:520\u00E2\u0080\u0093530, 2013. \u00E2\u0086\u0092pages 9A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management, 2015a.pg. 133. \u00E2\u0086\u0092 pages 1A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management, 2015b.pg. 48. \u00E2\u0086\u0092 pages 45T. Mikosch and C. Starica. Limit Theory for the Sample Autocorrelations andExtremes of a GARCH(1, 1) Process. Annals of Statistics, 28:1427\u00E2\u0080\u00931451,2000. \u00E2\u0086\u0092 pages 8K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.\u00E2\u0086\u0092 pages 20G. O. Roberts and J. S. Rosenthal. Harris Recurrence of Metropolis-within-Gibbsand Trans-dimensional Markov Chains. The Annals of Applied Probability, 16:2123\u00E2\u0080\u00932139, 2006. \u00E2\u0086\u0092 pages 2157"@en . "Thesis/Dissertation"@en . "2017-05"@en . "10.14288/1.0344013"@en . "eng"@en . "Statistics"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@* . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* . "Graduate"@en . "A flexible inference method for an autoregressive stochastic volatility model with an application to risk management"@en . "Text"@en . "http://hdl.handle.net/2429/61313"@en .