Extreme Value Approach to CoVaR EstimationbyMenglin ZhouB.Sc., Sun Yat-sen University, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Statistics)The University of British Columbia(Vancouver)August 2019c©Menglin Zhou, 2019The following individuals certify that they have read, and recommend to the Faculty of Grad-uate and Postdoctoral Studies for acceptance, the thesis entitled:Extreme Value Approach to CoVaR Estimationsubmitted by Menglin Zhou in partial fulfillment of the requirements for the degree of Masterof Science in Statistics.Examining Committee:Natalia Nolde, StatisticsSupervisorHarry JoeAdditional ExamineriiAbstractThe global financial crisis of 2007-2009 revealed the importance of systemic risk: the risk thatmay destabilize the global economy due to financial contagion. Accurate assessment of sys-temic risk would not only enable regulators to introduce suitable policies to mitigate the risk,but also allow individual institutions to monitor and mitigate their vulnerability. An effectivemeasure of systemic risk should be able to capture the co-movements between a financial sys-tem (or market) and individual financial institutions. One popular measure of systemic riskis CoVaR. In this thesis, a methodology is proposed to compute dynamic forecasts of Co-VaR semi-parametrically within the classical framework of multivariate extreme value theory(EVT). According to the definition, CoVaR can be viewed as a high quantile of a conditionaldistribution where the conditioning event corresponds to large losses of an institution. The ideaof our methodology is to relate this conditional distribution to the tail dependence function.We develop an EVT-based framework to estimate CoVaR statically by combining paramet-ric modelling of the tail dependence function to address the issue of data sparsity in the jointtail regions and semi-parametric univariate tail estimation techniques. The performance of themethodology is illustrated via simulation studies and real data examples.iiiLay SummaryOne of the popular measures of systemic risk is Conditional Value-at-Risk (CoVaR), whichcan capture co-movements between a financial system (or market) and individual financialinstitutions. CoVaR is defined as a high quantile of the conditional distribution of systemproxy such as a market index conditional on the event that an institution is experiencing alarge loss in excess of a high quantile or the so-called Value-at-Risk (VaR). In view of datasparsity and model uncertainty when dealing with extreme events, empirical estimates andfully parametric statistical inference are not reliable and an effective way is to rely on theasymptotic approximations in the spirit of extreme value theory (EVT). In this sense, thisthesis develops a flexible EVT-based framework to estimate CoVaR semi-parametrically bycombining parametric modelling of the tail dependence function to address the issue of datasparsity in the joint tail regions and nonparametric univariate tail estimation techniques.ivPrefaceThis thesis is original, unpublished work by the author, Menglin Zhou, under the supervisionof Professor Natalia Nolde. The research idea in Chapter 3 was proposed by Professors NataliaNolde and Chen Zhou. All simulations and analyses were designed and carried out by theauthor.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5vi2.1 Multivariate Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . 52.1.1 Univariate EVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Hill estimator for the tail index . . . . . . . . . . . . . . . . . . . . . . 92.1.3 Extreme quantile estimation . . . . . . . . . . . . . . . . . . . . . . . 132.1.4 Multivariate extreme value distributions . . . . . . . . . . . . . . . . . 142.2 Tail Dependence Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.3 Estimation of the tail dependence function . . . . . . . . . . . . . . . . 203 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1 CoVaR Estimation in a Stationary Setting . . . . . . . . . . . . . . . . . . . . 233.2 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Performance of the M-estimator of the TD Function . . . . . . . . . . . 293.2.2 Performance of the CoVaR Estimator . . . . . . . . . . . . . . . . . . 364 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.1 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51vii5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67A.1 Regular Variation of Skew-t Distribution . . . . . . . . . . . . . . . . . . . . . 67A.2 Proof of Proposition 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68A.3 Proof of Proposition 2.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69A.4 Peaks-Over-Threshold Method . . . . . . . . . . . . . . . . . . . . . . . . . . 71B Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.1 Time series plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.2 χ(u) and χ¯(u) plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.3 Plots of CoVaR estimates against the sample fraction for other seven institutions 78viiiList of TablesTable 2.1 Domain of attraction of the Fre´chet distribution . . . . . . . . . . . . . . . . 9Table 3.1 Summary statistics of proposed CoVaR estimates at level pn = 0.05. Themargins of the first four distributions are all standard Fre´chet distributionand the margins of the bivariate t distribution are all student t distribution. . 37Table 3.2 Summary statistics of proposed CoVaR estimates at level pn = 0.01. Themargins of the first four distributions are all standard Fre´chet distributionand the margins of the bivariate t distribution are all student t distribution. . 45Table 4.1 Unconditional coverage tests for VaR of institutions and CoVaR based onraw data. The X and Y variables for VaR and CoVaR are 100 × log return.“Log, HR, Bilog, Alog, t” stands for logistic, Hu¨sler-Reiss, bilogistic, asym-mertric and t distribution, respectively. En is the number of exceedances ofthe VaR estimate, and Ebn is the number of joint exceedances of VaR andCoVaR estimates. Moreover, T represents the test statistic value in (4.3). . . 55Table 4.2 Unconditional coverage tests for VaR of institutions and CoVaR based onrealized residuals at level pn = (0.02, 0.05). “Log, HR, Bilog, Alog, t”stands for logistic, Hu¨sler-Reiss, bilogistic, asymmertric and t distribution,respectively. En is the number of exceedances of the VaR estimate, and Ebnis the number of joint exceedances of VaR and CoVaR estimates. Moreover,T represents the test statistic value in (4.3). . . . . . . . . . . . . . . . . . . 59ixTable 4.3 The average quantile scores S¯ of CoVaR estimates based on realized resid-uals at level pn = (0.02, 0.05). . . . . . . . . . . . . . . . . . . . . . . . . 60xList of FiguresFigure 3.1 Plot of R(1, η) as a function of η for the bivariate logistic distribution fordifferent values of θ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.2 Plot of R(1, η) as a function of η for the bivariate Hu¨sler-Reiss distributionfor different values of θ. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 3.3 Plot of R(1, η) as a function of η for the bivariate bilogistic distribution fordifferent values of α and β. . . . . . . . . . . . . . . . . . . . . . . . . . . 27Figure 3.4 Plots of R(1, η) as a function of η for the bivariate asymmetric logisticdistribution for different values of θ, ψ1, ψ2. . . . . . . . . . . . . . . . . . 27Figure 3.5 Plot ofR(1, η) as a function of η for the bivariate t distribution for differentvalues of ν and ρ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Figure 3.6 The bias and RMSE of M-estimator of θ based on 100 samples of size 2000simulated from the bivariate logistic model with parameter θ = 0.6. . . . . 30Figure 3.7 The bias and RMSE of M-estimator based on 100 samples of size 2000simulated from the bivariate HR model with parameter θ = 2.5. . . . . . . 31Figure 3.8 The bias and RMSE of M-estimator based on 100 samples of size 2000simulated from the bivariate bilogistic model with parameter α = 0.4, β =0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32xiFigure 3.9 The bias and RMSE of M-estimator based on 100 replications of size 2500simulated for the asymmetric logistic model with parameter θ = 0.6, ψ1 =0.5, ψ2 = 0.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 3.10 The bias and RMSE of simultaneous M-estimator based on 100 samplesof size 3000 simulated from the bivariate t model with parameter ν = 6,ρ = 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 3.11 The sampling densities of estimated parameters based on 100 samples forcorresponding parametric models. . . . . . . . . . . . . . . . . . . . . . . 36Figure 3.12 The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2000 for the logistic modelwith parameter θ = 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Figure 3.13 The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2000 for the HR model withparameter θ = 2.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 3.14 The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2000 for the bilogistic modelwith parameters α = 0.4, β = 0.7. . . . . . . . . . . . . . . . . . . . . . . 39Figure 3.15 The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2500 for the asymmetriclogistic model with parameters θ = 0.6, ψ1 = 0.5, ψ2 = 0.8. . . . . . . . . 40Figure 3.16 The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 3000 for t distribution withparameters ν = 5, ρ = 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . 40Figure 3.17 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the logistic modelwith parameter θ = 0.6. Threshold u is chosen as: u = Yn,n−k with k = 450. 42xiiFigure 3.18 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the HR modelwith parameter θ = 2.5. Threshold u is chosen as: u = Yn,n−k with k = 700. 43Figure 3.19 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the bilogisticmodel with parameter α = 0.4 and β = 0.7. Threshold u is chosen as:u = Yn,n−k with k = 450. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 3.20 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the asymmetriclogistic model with parameter θ = 0.6, ψ1 = 0.5 and ψ2 = 0.8. Thresholdu is chosen as: u = Yn,n−k with k = 400. . . . . . . . . . . . . . . . . . . 44Figure 3.21 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for t distribution withparameter ν = 5 and ρ = 0.6. Threshold u is chosen as: u = Yn,n−k withk = 150. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 4.1 Scatter plots of standardized daily losses (%) for time series introduced inSection 4.2 over the period from June 27, 2000 to May 9, 2019. . . . . . . 49Figure 4.2 χ(u) and χ¯(u) plots for the BAC-DJUSFN data. . . . . . . . . . . . . . . . 50Figure 4.3 Maximum likelihood estimates of χ¯ with 95% confidence bands based onthe profile likelihood for the ALL-DJUSFN data. . . . . . . . . . . . . . . 51Figure 4.4 Estimates of CoVaR as a function of k at level pn = (0.05, 0.05) for differ-ent values of m for original BAC-DJUSFN data. . . . . . . . . . . . . . . . 53Figure 4.5 Estimates of CoVaR as a function of k at level pn = (0.02, 0.05) for differ-ent values of m for original BAC-DJUSFN data. . . . . . . . . . . . . . . . 53Figure 4.6 Scatter plots of realized residuals from time series introduced in Section4.2 over the period from June 27, 2000 to May 9, 2019. . . . . . . . . . . . 56xiiiFigure 4.7 Estimates of CoVaR as a function of ks2 at level pn = (0.02, 0.05) forestimated residuals from BAC-DJUSFN data. The dotted vertical line rep-resent the ks2 = ks1 = 230. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Figure B.1 Time series plots of daily losses for institutions and financial system. . . . . 74Figure B.2 Time series plots of realized residuals for institutions and financial system. . 76Figure B.3 χ(u) and χ¯(u) plots for other seven institutions . . . . . . . . . . . . . . . 77Figure B.4 Estimates of CoVaR as a function of k with raw data at level pn = (0.05, 0.05)for different values of m. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure B.5 Estimates of CoVaR as a function of k with raw data at level pn = (0.02, 0.05)for different values of m. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Figure B.6 Estimates of CoVaR as a function of ks2 with realized residuals at levelpn = (0.02, 0.05). The vertical line represents the ks2 = ks1 = 230. . . . . . 89xivGlossaryABiasAFLALLAMSEAVarAXPBACBENBKCEVCLTCoVaRdfDJUSFNDOAESTEVDEVTGPGSHRAsymptotic BiasAFLAC INCALLSTATE CORPAsymptotic Mean Squared ErrorAsymptotic VarianceAMERICAN EXPRESS COBANK OF AMERICA CORPFRANKLIN RESOURCES INCBANK NEWYORK INCConditional Extreme ValueCentral Limit TheoremConditional Value-at-Riskdistribution functionDow Jones US Financials IndexDomain of AttractionExtended Skew-t DistributionExtreme Value DistributionExtreme Value TheoryGeneralized Pareto DistributionGOLDMAN SACHS GROUP INCHu¨sler-ReissxvMESMEVPOTRMSESESSTTDTROWVaRMarginal Expected ShortfallMultivariate Extreme ValuePeaks Over ThresholdRoot Mean Squared ErrorSystemic Expected ShortfallSkew-t DistributionTail DependenceT ROWE PRICE GROUP INCValue-at-RiskxviAcknowledgmentsFirst and foremost I would like to thank my supervisor Natalia Nolde. It is quite lucky to be herstudent, as she is not only knowledgable in statistics, but also patient with students. I appreciateall her contributions of time, ideas, and funding to my research. The joy and enthusiasm shehas for her research inspire me to further pursue a Ph.D. degree. Natalia, many thanks for yourkind support and guidance.I would also love to thank my family for all their love. I am grateful for my mom. Herunfailing love and unconditional support in all of my pursuits are invaluable. I would like tothank my boyfriend Xiaotian Zhan for taking care of me when I suffer. His understandingencourage me to move forward.Finally, I am grateful to my friends in UBC and SYSU. Thanks for their inspirations whenI came cross research problems. Thanks for their accompany when I felt lonely in this foreignland. I have been very fortunate to be friend with so many incredible people.xviiChapter 1IntroductionThe global financial crisis of 2007-2009 alerted us to the importance of systemic risk: the riskor the possibility of breakdowns in an entire system, as opposed to breakdowns in individualparts or components, that can be contained without harming the entire system (Kaufman andScott [2003]). Systemic risk is argued to be a particular feature of financial systems and mayhave significant adverse effects on the global economy due to financial contagion (De Bandtand Hartmann [2000]). The spillover losses to the U.S. and some foreign banks when theHerstatt Bank in Germany failed and was closed by the authorities in 1974 are typical examples.To mitigate the risk spillovers and keep the stability of the financial system, there is a clear needfor effective measurement of systemic risk.Kaufman and Scott [2003] point out that systemic risk is evidenced by the co-movements(correlation) among most or all the components in an entire system. It is followed that an ef-fective measure of systemic risk should be able to capture co-movements between a financialsystem (or market) and individual financial institutions. Benoit et al. [2017] give a comprehen-sive survey of systemic risk, reviewing measures of systemic risk and connecting them to thecurrent regulatory debate. Acharya et al. [2017] calculate Marginal Expected Shortfall (MES)and Systemic Expected Shortfall (SES) by using equity returns of a financial institution. MESis an institution’s average loss when the financial system is in its left tail, and SES extendsMES by calculating the weighted average of the institution’s MES and its leverage. Acharyaet al. [2012] propose SRISK measure, which corresponds to the expected capital shortfall ofa given financial institution, conditional on a system crisis. The firms with the largest capitalshortfall are considered the most systemically risky from the SRISK prospective.1In addition to the measures introduced above, this report considers another popular measureof systemic risk – the Conditional Value-at-Risk (CoVaR). Adrian and Brunnermeier [2011]define CoVaR as a high quantile of the conditional distribution of one institution conditionalon the event that another institution is experiencing a large loss being at a high quantile or theso-called Value-at-Risk VaR. VaR is a widely-used risk measure by financial institutions. Fora random variable X , VaR at confidence level 1 − p is defined as the (1 − p)-quantile of theunderlying distribution:VaRX(1− p) = infx{P(X ≤ x) ≥ 1− p} , p ∈ (0, 1). (1.1)By considering a specific case where the first institution is the financial system, it is able toassess the impact of a financial institution’s large losses to systemic risk.Girardi and Ergu¨n [2013] modify the definition of CoVaR by specifying the conditionalevent as a loss in excess of the VaR, rather than being exactly at the VaR level. LetX ∼ F1 andY ∼ F2 denote, respectively, losses for a financial institution and a system proxy. Followingthe modified definition, the CoVaRY |X at confidence level 1− p is defined as:P{Y ≥ CoVaRY |X(1− p)|X ≥ VaRX(1− p)}= p, p ∈ (0, 1). (1.2)This modification allows us to consider more severe distress events of institutions and to back-test CoVaR estimates with the tests used for VaR (Girardi and Ergu¨n [2013]). In addition,Mainik and Schaanning [2014] show that this change leads to the dependence consistency ofCoVaR, as it allows CoVaR to hold a monotonic relationship with the Pearson correlation co-efficient for elliptical distributions. In other words, as the losses of an institution become morecorrelated with the financial system, its systemic risk will increase.CoVaR is actually a high quantile of the conditional distribution where the conditioningevent corresponds to large losses. From the definition of CoVaR, one possible way to esti-mate CoVaR is to estimate the conditional probability P{Y ≥ y|X ≥ x} directly. However,when x is large, empirical estimates of this conditional probability might not be sensible sincethere are too few, even no, observations falling in the region of interest. Moreover, the fullyparametric statistical inference (see e.g. Girardi and Ergu¨n [2013]) focuses on the central partof the data whereas actual interest is in the tails. In situations dealing with extreme events,such as risk management in insurance, finance, and hydrology, an effective approach is to relyon asymptotic models in the spirit of extreme value theory (EVT); see e.g., Embrechts et al.[1997].2A conditional extreme value (CEV) model was proposed by Heffernan and Tawn [2004],followed by Heffernan and Resnick [2007] and Das and Resnick [2011]. Classical multivari-ate extreme value (MEV) models have limitations in the application when interest is in a tailregion rather than the joint tail region. The CEV model addresses this issue by conditioningon one component of the random vector and finding the limiting conditional distribution of theremaining components as the conditioning variable becomes large. Moreover, the CEV modelrelaxes the assumption of MEV model that the distribution of (X, Y ) is in a multivariate do-main of attraction (DOA) to that when only one of the marginal distributions is in the univariateDOA. Abdous et al. [2005] estimate P{Y < y|X ≥ x} when x is large for the class of ellip-tical distribution; Nolde and Zhang [2018] extend their methodology to a more general classof skew-elliptical distributions, which can describe the asymmetry for asset pricing and riskmanagement in finance and insurance, and put it into a semi-parametric EVT-based frameworkfor CoVaR estimation.Some limitations of the approach in Nolde and Zhang [2018] are the requirement of multi-variate regular variation, which in particular imposes the restriction of the same tail index forboth the institutional and system losses, and a somewhat restrictive parametric assumption onthe extremal dependence structure. While in some cases these assumptions may be viable, ingeneral, system losses tend to be lighter-tailed than those of individual institutions based onour empirical analysis, and exhibit a greater variety of dependence structures. Our aim is toexplore a more flexible framework to address CoVaR estimation.The idea of our methodology is to link the definition of CoVaR in (1.2) with the tail depen-dence function introduced in Nikoloulopoulos et al. [2009] and Joe et al. [2010]. Assume thatthe distribution function of the random pair (X, Y ) is in the DOA of a bivariate extreme valuedistribution and random variable Y has a positive tail index. Following de Haan and Ferreira[2006], we havelimu→0P{Y > F−12 (1− uy), X > F−11 (1− ux)}u= R(x, y), x, y > 0. (1.3)where R is the upper tail dependence function. The background information on multivariateEVT including concept of domain of attraction and tail dependence function will be providedin Chapter 3. Moreover, we assume that there exists a constant η such thatCoVaRY |X(1− p) = VaRY (1− pη), (1.4)3which allows us to rewrite equation (1.2) as follows:P {Y > VaRY (1− pη), X > VaRX(1− p)}p= p. (1.5)With the positive dependence relationship between X and Y , we have 0 < η < 1 and η = ηpdepends on the value of p. In an extreme value setting where the confidence level p is close tozero, combining (1.3) and (1.5) impliesP {Y > VaRY (1− pη), X > VaRX(1− p)}p=P {F2(Y ) > 1− pη, F1(X) > 1− p}p≈ R(1, η).Hence, we are looking for an η∗ such that R(1, η∗) = p. An estimator of CoVaR can then beobtained based on equation (1.4). Due to data sparsity, efficiency can be gained by assuming aparametric model for the tail dependence function R(x, y) = R(x, y;θ). We apply a methodof moment to estimate the unknown parameters in R(x, y;θ); see e.g., Einmahl et al. [2008]and Einmahl et al. [2012]. This moment estimator is shown to be consistent and asymptoticallynormal under weak conditions. The tail index parameter γ > 0 for the system proxy Y can beestimated using the Hill estimator (Hill [1975]), in which the sampling fraction is automaticallyselected with a two-step subsample bootstrap method of Danielsson et al. [2001].This report is organized as follows. Chapter 2 reviews relevant definitions and results thatwill be used as a basis for the proposed methodology. Chapter 3 give details of CoVaR es-timation, and illustrate performance of the proposed methods via several simulation studies.Chapter 4 gives an application of the developed estimator to financial data. Finally, Chapter5 presents some concluding remarks and outlines directions for future research. Proofs andadditional figures are delegated to the Appendixes.4Chapter 2BackgroundThe aim of this chapter is to review relevant definitions and results from the literature that willbe used in the sequel as a basis for the proposed methodology for CoVaR estimation, includingextreme value theory, Hill estimation for the tail index and extreme quantile estimation.2.1 Multivariate Extreme Value TheoryMultivariate extreme value theory (EVT) provides the main probabilistic framework that wewill adopt. Intuitively, EVT deals with extreme events which occur with very small proba-bility. The central result of EVT is the Fisher-Tippett theorem (see Theorem 2.1.1), which isdeveloped in parallel with the central limit theorem (CLT). Suppose we have a sequence of in-dependent and identically distributed (i.i.d.) non-degenerated random variables X1, X2, ...Xn.The general CLT concerns the limit laws of sample sums X1 + X2 + ... + Xn when properlynormalized and centered, whereas the Fisher-Tippett theorem investigates the limit laws ofsample maxima max{X1, X2, ..., Xn} or min{X1, X2, ..., Xn} when properly normalized andcentered (for more details, see Embrechts et al. [1997]).The EVT has been widely used in many fields, especially in financial risk management,which is closely related to tail probabilities and quantiles. The ability to manage extremefinancial risks (e.g., currency crises, stock market crashes and large bond defaults) effectivelyis in fact the ability to assess the extreme probabilities and quantiles accurately (Diebold et al.5[2000]). For example, in the credit or operational risk management, the goal is to determinethe risk capital we require as a cushion against irregular losses. In the problem of portfolioselection, a safe criterion is to select the assets which minimize the probability of a returnbelow a prespecified threshold return level. Traditional statistical methods that based on theentire dataset produce a good fit in central region but is biased to the assessment of tail regions,where fewer or even no observations fall. In this sense, it seems that EVT can help us to buildmore appropriate statistical models describing extreme events in financial risk management.2.1.1 Univariate EVTWe begin by recalling the key results from univariate EVT. For more details, please refer toResnick [1987] or de Haan and Ferreira [2006].Definition 1. Suppose X1, ..., Xn are i.i.d. random variables with distribution function (df) F .Let Mn := max{X1, ..., Xn} denotes the sample maximum. The df F is said to belong to the(maximum) domain of attraction of df G (written F ∈ D(G)) if(i) G is non-degenerate, and(ii) There exist an > 0 and bn ∈ R such thatP{Mn − bnan≤ x}= F n(anx+ bn)→G(x), n→∞, for all x ∈ C(G),where C(G) denotes the set of all continuity points of G.In the CLT, with finite variance, the non-degenerate limit distribution is found to be thenormal distribution. In the EVT, Fisher-Tippett theorem help us to identify the class of limit dfG.Theorem 2.1.1 (Fisher and Tippett [1928], Gnedenko [1943]). Suppose F ∈ D(G). Then G isone of the following types1:1. Type I, Gumbel: Λ(x) = exp{−e−x}, x ∈ R;2. Type II, Fre´chet: Φα(x) ={0, x < 0exp{−x−α}, x ≥ 0, α > 0;1Two dfs U(x) and V (x) are of the same type if for some A > 0, B ∈ R, we have V (x) = U(Ax + B) forall x.63. Type III, Weibull: Ψα(x) ={exp{−(−x)α}, x < 01, x ≥ 0, α > 0.These three types of distributions are called the class of Extreme Value Distributions (EVD).For statistical purposes, Von Mises [1936] and Jenkinson [1955] generalize and unify theabove three types of extreme value distributions into one family, named generalized extremevalue distribution (GEV (µ, σ, ξ)), whose df is given byGξ(x− µσ)=exp{−e−(x−µ)/σ}, ξ = 0,exp{−(1 + ξx− µσ)−1/ξ}, ξ 6= 0 1 + ξ x− µσ> 0.where µ ∈ R is the location parameter, σ > 0 is the scale parameter and ξ ∈ R is termed as theshape parameter or tail index. It is clear that when ξ > 0, Gξ is of the same type as the Fre´chetdf with ξ = 1/α; when ξ < 0, Gξ is of the same type as the Weibull df with ξ = −1/α. AndG0 is of the same type as the Gumbel df.Definition 2. A non-degenerate df F is max-stable if for X1, ..., Xn i.i.d. with df F , there existan > 0 and bn ∈ R such that Mn d= anX1 + bn; i.e., for each n, Mn and X1 are of the sametype.Clearly, the Gumbel df is max-stable with an = 1 and bn = − log n; the Fre´chet df is max-stable with an = n1/α(α > 0) and bn = 0; the Weibull df is max-stable with an = n1/α(α < 0)and bn = 0.In the application of risk management, the data often show a pattern of heavy tails, suchas record-breaking insurance losses and financial log-returns. Heavy tail is a characteristic ofphenomena where the probability of a huge value is relatively big. An effective tool for dealingwith heavy-tail phenomena is the theory of regularly varying functions, so here we give somebrief introductions about regular variation (for more details, see Resnick [2007]).Definition 3. A positive measurable function h on (0,∞) is regularly varying at infinity withindex ρ ∈ R (written h ∈ RVρ) if for all x > 0,limt→∞h(tx)h(t)= xρ,where ρ is the exponent of variation.7If ρ = 0, we call the function slowly varying. For example, function L(x) = log x is aslowly varying function, as limx→∞ log(tx)/ log x = 1. Moreover, if function h ∈ RVρ, thenh(x) = L(x) · xρ, where L is a slowly varying function.With the definition above, we can see that the distributions we concerned in application offinancial risk management are actually the distributions whose tails are regularly varying.Definition 4. A random variable X with df F is said to be regularly varying with index γ > 0(written 1− F ∈ RV−γ) if for every x > 0,limt→∞P(X > tx)P(X > t)= x−γ.Intuitively, a random variable is regularly varying when its associated distribution has aheavy tail which decays according to a power law with exponent−γ. In probability theory, theparameter γ > 0 is often called the tail index.Example 2.1.1. The Student’s t distribution with degree of freedom ν has a density function:f(x) =Γ(ν+12)√νpiΓ(ν/2)(1 +x2ν)− ν+12, x ∈ R.It is easy to verify that f ∈ RV−ν−1. For every x ∈ R, we havelimt→∞1− FT (tx; ν)1− FT (t; ν) = limt→∞xf(tx)f(t)= x−ν ,which means that the tail of Student’s t distribution with degree of freedom ν is regularlyvarying with tail index ν.Theorem 2.1.2 (Gnedenko [1943]). F ∈ D(Φα) if and only if 1 − F ∈ RV−α. In this case,F n(anx)→Φα(x) with an = (1/(1− F ))−1 (n).Theorem 2.1.2 identifies that Frchet domain of attraction characterizes heavy-tailed dis-tributions with a regularly-varying upper tail. To be specific, a random variable that has aheavy-tailed df F should be regularly varying and the df F is in the Fre´chet domain of at-traction. This connection will help us to construct our semi-parametric framework of CoVaRestimation. Here, we give four examples that are in the Fre´chet domain of attraction in Table2.1[Embrechts et al. [1997]].8Table 2.1: Domain of attraction of the Fre´chet distributionCauchy distributionf(x) = (pi(1 + x2))−1, x ∈ R.an = n/pi.Pareto distributionf(x) =βkβxβ+1, x ≥ k; k, β > 0.an = (kn)1/β.Burr distributionf(x) =ckxc−1(1 + xc)k+1, , x > 0; c, k > 0.an = (n1/k − 1)1/c.Loggamma distributionf(x) =αβΓ(β)(lnx)β−1x−α−1, x > 1; α, β > 0.an =((Γ(β))−1(lnn)β−1n)1/α.2.1.2 Hill estimator for the tail indexIn this section, we introduce the widely used Hill estimator (Hill [1975]) of the tail index in thequantile view (for detail, see Beirlant et al. [2006]). Let X1, X2, ..., Xn be independent randomvariables with common df F .Consider the mean excess function defined as:e(t) = E(X − t|X > t).The mean excess function can be estimated empirically by:eˆn(t) =∑ni=1Xi1(t,∞)(Xi)∑ni=1 1(t,∞)(Xi)− t,where 1(t,∞)(Xi) equals 1 if Xi > t, and 0 otherwise. Let Xn,1 ≤ Xn,2 ≤ ... ≤ Xn,n be theorder statistics and take t = Xn,n−k, k = 1, .., n−1. The estimator of the mean excess functionat Xn,n−k can be rewritten asek,n = eˆn(Xn,n−k) =1kk∑i=1Xn,n−i+1 −Xn,n−k. (2.1)In addition to being the estimator of e(t) at some specific values of t, ek,n can also be interpretedas an estimator of the slope of the exponential Q-Q plot to the right of a reference point with9coordinates (− log(k/n), Xn−k,n).Suppose the df F has a regularly varying tail, that is1− F (x) = x−1/γL1(x), γ > 0, (2.2)where L1 is a slowly varying function and 1/γ is the tail index. In this case, the distribution isidentified as Pareto-type distribution, which includes Burr distribution, Fre´chet distribution andlog-gamma distribution. Define Q(p) as the quantile function: Q(p) := inf{x : F (x) ≥ p}.An equivalent formula of (2.2) is:Q(1− 1/x) = xγL2(x), γ > 0, (2.3)where L2(x) is also a slowly varying function and is linked with L1 via de Bruyn conjugation2(see Proposition 2.5 in Beirlant et al. [2006]). Since for every slowly varying function L2(x),we have logL2(1/p)/ log(1/p)→0 as p→0 (see Proposition 2.4 in Beirlant et al. [2006]), thenthe Pareto-type distribution would satisfylimp→0logQ(1− p)− log p = γ.It means that the Pareto Q-Q plot, which is the exponential Q-Q plot based on the log-transformeddata, is ultimately linear with slope γ near the largest observations and we could use the meanexcess values introduced in equation (2.1) of the log-transformed data to estimate the slope.Then we could obtain the well-known Hill estimator as:γn(k) =1kk∑i=1logXn,n−i+1 − logXn,n−k, (2.4)where k = kn ∈ {1, ..., n} is an intermediate sequence; that is, kn→∞ and kn/n→0 as n→∞.Assume the df F satisfies the second-order condition (de Haan and Resnick [1996]): thereexists a function A∗, not changing sign near infinity, such that for x > 0limt→∞((1− F (tx)1− F (t) − x−1/γ)/A∗(t))= x−1/γxβ/γ − 1β/γ,2If l(x) is slowly varying function, then there exists an slowly varying function l∗(x), the de Bruyn conjugateof l, such that l(x)l∗ (xl(x))→1, as x→∞. The de Bruyn conjugate is asymptotically unique in the sense that ifalso is slowly varying and l(x)l˜ (xl(x))→1, then l∗ ∼ l˜. Furthermore (l∗)∗ ∼ l.10where β ≤ 0 is the second-order parameter. A reformulated version of this condition with theinverse function U of 1/(1 − F ) is: there exists a function A, not changing sign near infinity,such thatlimt→∞U(tx)/U(t)− xγA(t)= xγxβ − 1β, (2.5)where function |A| is regularly varying at infinity with index β.Using notation An,k = A(n+1k+1 ), the asymptotic bias of the Hill estimator can be expressedas:ABias(γn(k)) ∼ An,k 1kk∑i=1( ik + 1)−β∼ An,k1− β , k, n→∞, k/n→0.The asymptotic variance of the Hill estimator isAVar(γn(k)) ∼ γ2k, k, n→∞, k/n→0.Notice that the bias will be small only if An,k is small, which in turn is k to be small, andthe variance will be small if k is large. The asymptotic normality of the Hill estimator can beexpected when k, n→∞, k/n→0 and if√kAn,k→0,√k(γn(k)/γ − 1) d→ N(0, 1).An important step in Hill estimation is to choose the value of k. An ideal method is toselect k by minimizing the asymptotic mean squared error of γn(k), which is defined ask0(n) := arg minkAMSE(n, k) = arg minkAsyE(γn(k)− γ)2. (2.6)The main problem is that there is an unknown parameter γ in (2.6). Due to this considera-tion, Danielsson et al. [2001] propose to select k by minimizing AsyE (Mn(k)− 2(γn(k))2)2,where Mn(k) = 1k∑ki=1(logXn,n−i+1 − logXn,n−k)2. They show that under some conditions(see Theorem 1 and 2 in Danielsson et al. [2001]), the k-value that minimizes AMSE(n, k)and the k-value that minimizes Asy E (Mn(k)− 2(γn(k))2)2 are of the same general order(with respect to n). Moreover, in order to yield an AMSE estimator which is asymptotic toAMSE(n, k), they use a two-step subsample bootstrap algorithm, which is shown in Algorithm1 below.11Suppose drawing A∗ni = {X∗1 , ..., X∗ni}(ni < n, i = 1, 2) from An = {X1, ..., Xn} withreplacement and X∗ni,1 ≤ ... ≤ X∗ni,ni denote the order statistics of A∗ni . Defineγ∗ni(ki) =1kik1∑j=1logX∗ni,ni−j+1 − logX∗ni,ni−ki ;M∗ni(ki) =1kiki∑j=1(logX∗ni,ni−j+1 − logX∗ni,ni−ki)2;L(ni, ki) = E((M∗ni(ki)− 2(γ∗ni(ki))2)2 |An) .Suppose k∗i,0(ni) minimizes L(ni, ki). Then the optimal choice of k in Danielsson et al.[2001] is defined askˆ0(n) =(k∗1,0(n1))2k∗2,0(n2)((log k∗1,0(n1))2(2 log n1 − log k∗1,0(n1))2)logn1−log k∗1,0(n1)/ logn1. (2.7)Algorithm 1 Two-step Subsample Bootstrap Method1: Input step size h and the number of bootstrap samples B; set N = d(n−√n)/he;2: for each integer j ∈ [1, N ] do3: set n1,j =√n+ (j − 1)h, and draw B bootstrap samples of size n1,j from An;4: calculate L(n1,j, k1) at each integer k1 ∈ [1, n1,j], and find the k∗1,0(n1,j) that minimizesL(n1,j, k1);5: set n2,j = (n1,j)2/n, and draw B bootstrap samples of size n2,j from An;6: calculate L(n2,j, k2) at each integer k2 ∈ [1, n2,j], and find the k∗2,0(n2,j) that minimizesL(n2,j, k2);7: Calculate R(n1,j) = L(n1,j, k∗1,0)2/L(n2,j, k∗2,0).8: end for9: set n1 = arg minn1,j R(n1,j), and repeat step 4, 5, 6 to get k∗1,0(n1) and k∗2,0(n2);10: Calculate kˆ0(n) with equation (2.7).From Danielsson et al. [2001], the tail estimator γn(kˆ0) based on kˆ0(n) above will have thesame asymptotic efficiency as γn(k0) based on k0(n) defined in (2.6). All the details are shownin Theorems 4, 6 and Corollaries 5, 7 in Danielsson et al. [2001],122.1.3 Extreme quantile estimationRecall equation (1.1) in Section 1, the VaR at a confidence level 1− p is defined as the (1− p)-quantile, so estimating VaR is actually estimating a quantile. Following Beirlant et al. [2006],by assuming that the ultimate linearity of the Pareto quantile plot persists from the largest kobservations till infinity, we can summarize the quantile plot(− log i−1n, logXn,n−i+1), i =1, ..., k + 1 with line:y = logXn,n−k + γˆn(k)(x+ logkn),where γˆn(k) is the Hill estimator of γ (see equation (2.4)).To get an estimator of Q(1 − p), take x = − log p, and the estimator (first proposed byWeissman [1978]) is given byQˆk,1−p = exp{logXn,n−k + γˆn(k)(− log p+ log kn)}= Xn,n−k( knp)γˆn(k). (2.8)Denote the asymptotic expectation operator by E∞. When p = pn→0 and npn→c > 0Beirlant et al. [2006] conclude that,E∞(logQˆk,1−pQ(1− p))∼ An,k1− β logknp+ An,k1− (npk)−ββandAVar(log Qˆk,1−p) ∼ γ2k(1 + log2knp), k, n→∞, k/n→0.Furthermore, when k, n→∞ and k/n→0 such that√kE∞(logQˆk,1−pQ(1−p))→0,√k(1 + log2knp)−1/2( Qˆk,1−pQ(1− p) − 1)d→ N(0, γ2).132.1.4 Multivariate extreme value distributionsThe problem of CoVaR estimation is inherently bivariate, so it is necessary for us to extendunivariate EVT in Section 2.1.1 to the multivariate setting. For the univariate case, the orderingprinciple is clear and unambiguous, but ordering is not unique in multivariate setting. Barnet[1976] discusses several different categories of order relations for multivariate data, and themost popular one is called marginal ordering: for d-dimensional vectors x = (x1, ..., xd) andy = (y1, ..., yd), the relation x ≤ y is defined as xj ≤ yj for all j = 1, ..., d. With thisordering principle, the component-wise maximum of random vectors x and y is defined asx ∨ y := (x1 ∨ y1, ..., xd ∨ yd).Definition 5. Consider a sample of i.i.d d-dimensional observations Yi ∈ Rd(i = 1, ..., n)with df F and margins Fj, j = 1, .., d. The sample maximum Mn = (Mn,1, ...Mn,d) is definedas a vector of component-wise maxima (i.e. Mn =∨ni=1 Yi). The multivariate df F is said tobelong to the (maximum) domain of attraction of a multivariate df G (written as F ∈ Dd(G)),if there exist Rd-sequences an > 0 and bn such that for all continuity points y of G,limn→∞P(Mn − bnan≤ y)= limn→∞F n(any + bn) = G(y), (2.9)where G is the limit distribution with non-degenerate margins Gj , any = (an,1y1, ..., an,dyd)and (Mn − bn)/an = ((Mn,1 − bn,1)/an,1, ..., (Mn,d − bn,d)/an,d). Then, G is called multi-variate extreme value df.By setting yi =∞, ∀i 6= j in (2.9), we have for j = 1, ..., d,limn→∞F nj (an,jyj + bn,j) = Gj(yj).This indicates that each margins Gj of G is a univariate extreme value df and Fj ∈ D(Gj).Example 2.1.2. A d-dimensional random vector Y follows a multivariate logistic distributionwith dependence parameter θ ∈ (0, 1] if its joint df isG(y) = exp{−( d∑j=1z1/θj)θ}, zj ≥ 0,where zj = {1 + ξj(yi − µi)/σi}−1/ξi (see Gumbel [1960]).Example 2.1.3. Let B be the nonempty subsets of {1, 2, ..., d}. Let B1 = {b ∈ B : |b| = 1},where |b| is the number of elements in the set b, and let B(j) = {b ∈ B : j ∈ b}. The14d-dimensional multivariate asymmetric logistic df (see Tawn [1990]) is given byG(y) = exp{−∑b∈B[∑j∈b(ψj,bzj)1/θb]θb},where zj = {1 + ξj(yi − µi)/σi}−1/ξi , the dependence parameters θb ∈ (0, 1] for all b ∈B \ B1, and the asymmetry parameters ψj,b ∈ [0, 1] for all b ∈ B and j ∈ b. The constraints∑j∈B(j) ψj,b = 1 ensure that the marginal distributions are generalized extreme value. Themodel contains 2d − d− 1 dependence parameters and d(2d−1 − 1) asymmetry parameters.2.2 Tail Dependence FunctionIn addition to characterizing the marginal distributions of multivariate extremes, another im-portant aspect of studying multivariate extremes is their dependence structure. There existsa great variety of equivalent descriptions of extreme value dependence structures (for moredetails, refer to Beirlant et al. [2006]), and the tail dependence (TD) function is one of the pop-ular ways. As the CoVaR estimation problem involves bivariate random vectors, we restrictour attention to the TD function in the bivariate setting.2.2.1 DefinitionConsider a random vector (X, Y ) with df F and margins F1, F2. Let U = (U1, U2) =(F1(X), F2(Y )). Obviously, the random vector U has standard uniform margins.Definition 6. The df F is said to have the upper TD function R(x, y) if for all x, y > 0, thefollowing limit exists:limu→0P{U1 ≥ 1− ux, U2 ≥ 1− uy}u= limu→0P{1− F1(X) ≤ ux, 1− F2(Y ) ≤ uy}u=R(x, y).(2.10)In addition to the upper TD function we use in this study, another popular TD function isthe stable TD function introduced by Huang [1992].15Definition 7. The df F is said to have a stable TD function l(x, y) if for all x, y > 0, thefollowing limits exists:limu→0P{U1 ≥ 1− ux or U2 ≥ 1− uy}u= limu→0P{1− F1(X) ≤ ux or 1− F2(Y ) ≤ uy}u=l(x, y).(2.11)It is easy to find the stable TD function for EVDs. Suppose the random vector (X, Y ) hasa joint df G. Then the stable TD function can be expressed as:l(x, y) = − logG (G−11 (e−x), G−12 (e−y)) , (2.12)where G1 and G2 are the margins of G. Therefore, we can obtain the upper TD function forthe bivariate extreme value distribution function with formula:R(x, y) = x+ y − l(x, y). (2.13)The upper TD function is differentiable almost surely and homogeneous of degree one3 (seee.g., Nikoloulopoulos et al. [2009]). With Euler’s theorem4 (Wilson [1912]) on homogeneousfunctions, the upper TD functions can be written as:R(x, y) = x∂R(x, y)∂x+ y∂R(x, y)∂y,where partial derivatives ∂R/∂x, ∂R/∂y are homogeneous of order 0 and bounded. Withthe sufficient condition of continuous second-order partial derivatives, the order of limits anddifferentiation can be exchanged. Then we have (see (2.3), Nikoloulopoulos et al. [2009])∂R(x, y)∂x= limu→0P{U2 > 1− uy|U1 = 1− ux},and∂R(x, y)∂y= limu→0P{U1 > 1− ux|U2 = 1− uy}.3If f : V→W is a function between two vector spaces on a field F , and k is an integer, then f is said to behomogeneous of degree k if f(αv) = αkf(v) for all α ∈ F and v ∈ V .4Let f : Rn+→R be continuous and also differentiable on Rn+. Then f is homogeneous of degree k if and onlyif for all x ∈ Rn+, kf(x) =∑ni=1 xi∂f∂xi.16Therefore, the upper TD function can be rewritten asR(x, y) = x limu→0P{U2 > 1− uy|U1 = 1− ux}+ y limu→0P{U1 > 1− ux|U2 = 1− uy}= x limu→0P{Y > Q1(1− uy)|X = Q2(1− ux)}+ y limu→0P{X > Q1(1− ux)|Y = Q2(1− uy)},(2.14)where Q1 and Q2 are the lower quantile functions of the marginal distributions for X and Y ,respectively.2.2.2 ExamplesIn this section, we give examples of the TD function for five popular distributions. These fiveexample models will be used later in the simulation and empirical studies.Example 2.2.1. The bivariate logistic df with standard Fre´chet margins is given byG(x, y; θ) = exp{−(x−1/θ + (y−1/θ)θ} , (2.15)where x, y > 0 and θ ∈ (0, 1] is the dependence parameter. The dependence increases as θdecreases and θ = 1 stands for independence while θ = 0 represents comonotonicity (per-fect dependence). The corresponding margins are G1(x) = G2(x) = exp(−1/x), so thatG−11 (e−x) = G−12 (e−x) = 1/x. Using equation (2.12), the stable TD function is given by(Gumbel [1960])l(x, y; θ) = − logG(1/x, 1/y; θ) = (x1/θ + y1/θ)θ.And then the upper TD function of the bivariate logistic distribution has the form:R(x, y; θ) = x+ y − (x1/θ + y1/θ)θ. (2.16)Example 2.2.2. The bivariate Hu¨sler-Reiss df (Hu¨sler and Reiss [1989]) with standard Fre´chetmargins isG(x, y; θ) = exp{−x−1Φ(θ−1 +θ2log(y/x))− y−1Φ(θ−1 +θ2log(x/y))}, (2.17)where x, y > 0, θ > 0 and Φ(·) is the standard normal df. As θ increases, the dependence will17increase, that is when θ = 0, X and Y are independent and when θ→∞, X and Y are perfectlydependent. Following the same procedure as above, the corresponding stable TD function isexpressed asl(x, y; θ) = xΦ(θ−1 +θ2log(x/y))+ yΦ(θ−1 +θ2log(y/x)),and the tail dependence function is given byR(x, y; θ) = x+ y − xΦ(θ−1 +θ2log(x/y))− yΦ(θ−1 +θ2log(y/x)). (2.18)Example 2.2.3. The bilogistic df (Smith [1990]) with standard Fre´chet margins is written asG(x, y;α, β) = exp{−x−1q1−α − y−1(1− q)1−β} , x, y > 0, (2.19)where q is the root of the equation (1−α)x−1(1− q)β − (1− β)y−1qα = 0, and 0 < α, β < 1.The dependence becomes stronger as each of α, β decreases. As α = β = 0, X and Y areindependent and α = β→1 represents comonotonicity. The stable TD function of the bilogisticmodel isl(x, y;α, β) =∫ 10max{(1− α)t−αx, (1− β)(1− t)−βy} dt,and the upper TD function is expressed asR(x, y;α, β) = x+ y −∫ 10max{(1− α)t−αx, (1− β)(1− t)−βy} dt. (2.20)Example 2.2.4. The bivariate asymmetric logistic distribution with standard Fre´chet marginshas df of the formG(x, y;ψ1, ψ2, θ) = exp{−(1− ψ1)/x− (1− ψ2)/y −((ψ1/x)1/θ + (ψ2/y)1/θ)θ}, (2.21)where x, y > 0, θ is the dependence parameter and ψ1, ψ2 ∈ [0, 1] are asymmetry parameters.We should note that when the asymmetry parameters are zero, the distribution becomes thebivariate logistic distribution (see Example 2.2.1). The stable TD function of the bivariateasysmmetric logistic distribution is introduced as an extension of that of the logistic distribution(Tawn [1988]) and has the form:l(x, y;ψ1, ψ2, θ) = (1− ψ1)x+ (1− ψ2)y +((xψ1)1/θ + (yψ2)1/θ)θ.18Therefore, we could get the upper TD function asR(x, y;ψ1, ψ2, θ) = ψ1x+ ψ2y −((xψ1)1/θ + (yψ2)1/θ)θ. (2.22)Example 2.2.5. Suppose W = (X, Y ) has a bivariate t distribution with location parameterµ = (0, 0)T , scale parameter Ω =(1 ρρ 1), ν degree of freedom and equal margins FX(·) =FY (·) = FT(·; ν), where FT(·; ν) is the df of a univariate Stundet’s t random variable with νdegrees of freedom. Then the joint density function of W is written asfT(w; ρ, ν) =Γ((ν + 2)/2)√1− ρ2νpiΓ(ν/2)(1 +1νwTΩ−1w)−(ν+2)/2,w ∈ R2. (2.23)The dependence strength increases as ν decrease or ρ increases. And X and Y are independentas ν→∞ and ρ = 0. They are perfectly dependent as ν→0 and ρ = 1.Proposition 2.2.1. (Demarta and McNeil [2005]) Suppose a random vector (X, Y )T has thejoint density function in (2.23). Then its upper TD function is given by (see A.2 for proof.)R(x, y; ρ, ν) = xFT(√ ν + 11− ρ2(ρ− (y/x)−1/ν); ν + 1)+ yFT(√ ν + 11− ρ2(ρ− (x/y)−1/ν); ν + 1). (2.24)Example 2.2.6. Suppose W = (X, Y )T has a bivariate skew-t distribution with location pa-rameter µ = (0, 0)T , shape parameter α = (α1, α2)T , scale parameter Ω =(1 ρρ 1), andν degree of freedom, denoted as ST2(0,Ω,α, ν). Then its joint density is given by (Azzaliniand Capitanio [2003])fST(w) = 2fT (w; ρ, ν)FT(αTw√ν + 2ν + wTΩ−1w; ν + 2). (2.25)When α1 = α2 = 0, the bivariate skew-t distribution reduces to a bivariate t distribution.The dependence strength is controlled by the parameters ν and ρ, which is the same as thatin the bivariate t distribution. Before we introduce the TD function of the bivariate skew-tdistribution, we need to give a definition of the extended skew-t distribution, which will beused to specify the upper TD function.Definition 8. A random variable Y ∈ R follows a univariate extended skew-t distributionwith location parameter µ, scale parameter ω, shape parameter α, extended parameter τ and19degree of freedom ν > 0, denoted by Y ∼ EST1(µ, ω, α, τ, ν), if its density function isfEST(y) =fT(y;µ, ω, ν)FT(τ/√1 + α2; ν)FT(√ν + 1ν + z2(αz + τ); ν + 1),where z = (y−µ)/ω. When τ = 0, the distribution becomes the univariate skew-t distribution.Proposition 2.2.2. Suppose a random vector (X, Y )T has joint density function in (2.25). Thenits upper TD function is given by (see A.3 for proof)R(x, y; ρ, α1, α2, ν) = x · F¯EST( √ν + 1√1− ρ2((x¯/y¯)1/ν − ρ); 0, 1,√1− ρ2α2, τ1, ν + 1)+ y · F¯EST( √ν + 1√1− ρ2((y¯/x¯)1/ν − ρ); 0, 1,√1− ρ2α1, τ2, ν + 1),(2.26)where x¯ = xFT (α¯2√ν + 1; ν), y¯ = xFT (α¯1√ν + 1; ν), α¯1 =α1 + ρα2√1 + (1− ρ2)α2, α¯2 =α2 + ρα1√1 + (1− ρ2)α1, τ1 =√ν + 1(α1 + α2ρ), τ2 =√ν + 1(α2 + α1ρ) and F¯EST is the sur-vival function of the extended skew-t distribution in Definition 8.2.2.3 Estimation of the tail dependence functionThere exists a number of methods that can be used to estimate the TD function, however, dueto data sparsity, efficiency can be gained by assuming a parametric model for the TD function.Coles and Tawn [1991] and Joe et al. [1992] apply maximum likelihood estimation to estimate,while Ledford and Tawn [1996] and Smith [1994] use a censored likelihood approach to im-plement the estimation. Einmahl et al. [2008] point out that these likelihood-based estimationmethods require the smoothness (or even existence) of the partial derivatives of the TD func-tion. Therefore, they propose an estimator based on the method-of-moments for dimensiontwo, which requires smaller set of conditions. In the simulation studies and subsequent dataanalysis, we adopt the method-of-moments (M-estimator) proposed in Einmahl et al. [2008]and later extended in Einmahl et al. [2012]. This extended M-estimator can be used in arbitrarydimension d and its consistency and asymptotic normality hold under weak conditions.Let (X1, Y1), ..., (Xn, Yn) be independent random vectors in R2 with a common continuousdf F and margins F1 and F2. Let RXi and RYi denote the rank of Xi among X1, ..., Xn and20the rank of Yi among Y1, .., Yn, respectively, where i ∈ {1, ..., n}. Then for 1 ≤ m ≤ n, anonparametric estimator for the bivariate upper TD function R is defined as:Rˆn(x, y) :=1mn∑i=11{RXi ≥ n+12−mx,RYi ≥ n+12−my}, (2.27)where m = mn ∈ {1, ..., n} is an intermediate sequence.We suppose that the function R belongs to some parametric family {R(·, ·;θ) : θ ∈ Θ},where Θ ⊂ Rp(p ≥ 1) is the parameter space. Let g = (g1, ..., gp)T : [0, 1]2→Rp be a vectorof integrable functions. Define function ϕ : Θ→Rp as:ϕ(θ) :=∫ ∫[0,1]2g(x, y)R(x, y;θ)dxdy, (2.28)where ϕ is a homeomorphism5 between Θ and its image. For example, in the logistic andHu¨sler-Reiss distribution, ϕ(θ) is a 1-1 mapping of the dependence parameter. In the asym-metric distribution, one component of ϕ(θ) is the mapping of the dependence parameter andthe other two components are mappings of the asymmetry parameters.Let θ0 denote the true value of parameter θ. The M-estimator θˆn of θ0 is defined as aminimizer of the functionSm,n(θ) =∣∣∣∣∣∣∣∣ϕ(θ)− ∫ ∫[0,1]2g(x, y)Rˆn(x, y)dxdy∣∣∣∣∣∣∣∣2 , (2.29)where || · || is the Euclidean norm. However, we should note that the choice of function ϕ isnot unique, which means that the M-estimation is not unique. How to choose a proper ϕ willbe discussed in Section 3.2.1.Theorem 2.2.3. (Existence, uniqueness and consistency of θˆn) Define Θˆn as the set of mini-mizers of Sm,n in (2.29). Let g : [0, 1]2→Rp be integrable.(i) If ϕ is a homeomorphism from Θ→ϕ(Θ) and if there exists 0 > 0 such that the set{θ ∈ Θ : ||θ − θ0|| ≤ 0} is closed, then for every such that 0 > > 0, as n→∞,P(Θˆn 6= ∅ and Θˆn ⊂ {θ ∈ Θ : ||θ − θ0|| ≤ })→1.5A function f : X→Y between two topological space (X, τX) and (Y, τY ) is homeomorphism if: (1) fis bijection (one-to-one and onto); (2)f is continuous; (3) the inverse function f−1 is continuous (f is openmapping).21(ii) If in addition to the assumptions of (i), θ0 is in the interior of the parameter space,ϕ is twice continuously differentiable and its derivative matrix Dϕ(θ0) is of full rank,then, with probability tending to one, Sm,n in equation (2.29) has a unique minimizer θˆn.Hence,θˆnP→ θ0 as n→∞.Denote W as a mean-zero Wiener process on [0,∞]2 \ {(∞,∞)} with covariance functionE(W (x1, y1)W (x2, y2))= R(x1 ∧ x2, y1 ∧ y2)and for x, y ∈ [0,∞), let W1(x) := W (x,∞), W2(x) := W (∞, y). Further, for (x, y) ∈[0,∞)2, let R1(x, y) and R2(x, y) be the right-hand partial derivatives of R at the point (x, y)with respect to the first and second coordinate, respectively. WriteB(x, y) = W (x, y)−R1(x, y)W1(x)−R2(x, y)W2(y),B˜ =∫ ∫[0,1]2g(x, y)B(x, y)dxdy.Theorem 2.2.4. (Asymptotic normality of θˆn) In addition to the assumptions of Theorem2.2.3(ii), if the following two conditions also hold:(i) t−1P{1 − FX(X) ≤ ux, 1 − FY (Y ) ≤ uy} − R(x, y) = O(tα), uniformly on the set{(x, y) : x+ y = 1, x ≥ 0, y ≥ 0} as u→0, for some α > 0;(ii) m = mn→∞ and m = o(n2α/(1+2α)) as n→∞,then as n→∞, √m(θˆn − θ0) d→ Dϕ(θ0)−1B˜. (2.30)Theorems 2.2.3 and 2.2.4 establish the existence and asymptotic normality of M-estimator.For proofs of these two theorems, please refer to Einmahl et al. [2012].22Chapter 3MethodologyIn this chapter, we focus on the so-called conditional Value-at-Risk or CoVaR as a way tocapture systemic risk contributions, and explore how the proposed EVT-based semi-parametricmethodology can be utilized in the stationary setting to produce estimates of CoVaR. We alsofurther examine the performance with several simulation studies.3.1 CoVaR Estimation in a Stationary SettingIn this study, we adopt a modified definition of CoVaR proposed by Girardi and Ergu¨n [2013],where the financial distress is specified by a loss in excess of the VaR, rather than being at theVaR level. Mainik and Schaanning [2014] show that, under this definition, CoVaR is depen-dence consistent in the sense that an increase in the strength of dependence does lead to anincrease in systemic risk as measured by CoVaR.Let X and and Y denote, respectively, losses for a financial institution and a system proxy.The VaR at a confidence level p ∈ (0, 1) forX is defined as the 1−p-quantile of the underlyingdistribution (assuming the quantile is single-valued):P{X ≥ VaRX(1− p)} = p,23and CoVaRY |X(1− p) is defined by the (1− p)-quantile of the conditional distribution:P{Y ≥ CoVaRY |X(1− p)|X ≥ VaRX(1− p)} = p. (3.1)The idea of our methodology is to link the definition of CoVaR to the tail dependencefunction. As CoVaR is actually a high-quantile of the conditional distribution of Y , it seemssensible to make the following assumption: there exists a constant η such thatCoVaRY |X(1− p) = VaRY (1− pη). (3.2)Under this assumption, we can re-write equation (3.1) asP {Y > VaRY (1− pη), X > VaRX(1− p)}p= p. (3.3)Suppose the two-dimensional random vector (X, Y ) has joint distribution function F andcontinuous margins F1 and F2. Assume that(i) F ∈ D2(G), where G is a bivariate extreme value distribution, and(ii) F2 ∈ D(Φ1/γ) for some γ > 0.From (ii), we have (see Chapter 6 in de Haan and Ferreira [2006])limu→0P{F1(X) > 1− ux, F2(Y ) > 1− uy}u= R(x, y), x, y > 0, (3.4)where R is known as the upper TD function discussed in Section 2.2.Combining equation (3.3) and (3.4), we have for p close to zero:P {Y > VaRY (1− pη), X > VaRX(1− p)}p=P {F2(Y ) > 1− pη, F1(X) > 1− p}p≈ R(1, η).(3.5)Due to the fact that the distribution function of Y is in the domain of attraction with positiveindex 1/γ, we have 1 − F2 ∈ RV−1/γ (see Theorem 2.1.2), which means VaRY (1 − pη) ≈η−γ VaRY (1− p) (see Proposition 0.8 (v) in Resnick [1987]). Hence, if we can find an η∗ such24that R(1, η∗) = p, we obtain the following approximation for CoVaR valid for p close to zero:CoVaRY |X(1− p) = VaRY (1− pη∗) ≈ VaRY (1− p)η−γ∗ .In order to find η∗, it is necessary for us to estimate the TD function first. There are variousways to estimate the upper TD function. However, due to data sparsity, efficiency can be gainedby assuming a flexible parametric model for the TD function R(x, y) = R(x, y;θ), where θdenotes the parameter vector. Note that the functionR is the df of a measure (see Chapter 6.1.5in de Haan and Ferreira [2006]), so R(x, y;θ) is monotone at x and y. Moreover, we can seethat 0 ≤ R(x, y) ≤ x∧ y, so 0 ≤ R(1, η;θ) ≤ η when η ≤ 1. In this sense, there always existsan η∗ > p such that R(1, η∗;θ) = p. If we can find such η∗, then we should have η∗ = g(θ, p),that is, η∗ is a function of CoVaR level p and model parameter θ. We next illustrate plots ofR(1, η∗;θ) as a function of η∗ for some parametric models and at various parameter values toshow how η∗ is influenced by the parameter θ.Example 3.1.1 (Bivariate logistic distribution). The function R(1, η) of the bivariate logisticdistribution is given byR(1, η; θ) = 1 + η − (1 + η1/θ)θ, 0 < θ ≤ 1.From Figure 3.1, we can see that there is a negative monotone relationship between R(1, η)and θ, which indicates that for a fixed level p, as the strength of dependence increases (i.e., θis smaller), η∗ becomes samller. Moreover,R(1, η; 1) = 0 for all η. As θ→0, R(1, η; θ)→η forη < 1 and R(1, η; θ)→1 for η ≥ 1.Figure 3.1: Plot of R(1, η) as a function of η for the bivariate logistic distribution for differentvalues of θ.25Example 3.1.2 (Bivariate Hu¨sler-Reiss distribution). The function R(1, η) of the bivariateHu¨sler-Reiss distribution is expressed asR(1, η; θ) = 1 + η − Φ(θ−1 − θ2log η)− ηΦ(θ−1 +θ2log η), θ > 0.From Figure 3.2, we can see that at a given level p, as θ increases, that is the strength of taildependence increases, η∗ will becomes smaller. Moreover, as θ→0, R(1, η; θ)→0 for all η. Asθ→∞, R(1, η; θ)→η for η < 1 and R(1, η; θ)→1 for η > 1.Figure 3.2: Plot of R(1, η) as a function of η for the bivariate Hu¨sler-Reiss distribution fordifferent values of θ.Example 3.1.3 (Bivariate bilogistic distribution). The function R(1, η) of the bivariate bilogis-tic distribution is written asR(1, η;α, β) = 1 + η −∫ 10max{(1− α)t−α, (1− β)(1− t)−βη} dt, 0 < α, β < 1.From Figure 3.3, there is a positive monotone relationship between α and β. That is to say, fora given level p, η decreases as the dependence becomes stronger (i.e., each of α, β decreases).26Figure 3.3: Plot ofR(1, η) as a function of η for the bivariate bilogistic distribution for differentvalues of α and β.Example 3.1.4 (Bivariate asymmetric logistic distribution). The function R(1, η) of the bivari-ate asymmetric logistic distribution is given byR(1, η;ψ1, ψ2, θ) = ψ1 + ψ2η −(ψ1/θ1 + (ηψ2)1/θ)θ, 0 < θ ≤ 1 and 0 ≤ ψ1, ψ2 ≤ 1.Obviously, the effect of dependence strength (the value of θ) on the value of η∗ will be the sameas for the bivariate logistic distribution. Furthermore, in Figure 3.4, we can see that η∗ willincreases as ψ1 or ψ2 decreases. However, when we increase three parameters simultaneously,the change in the value of η∗ for a given value of p may increase or decrease.Figure 3.4: Plots of R(1, η) as a function of η for the bivariate asymmetric logistic distributionfor different values of θ, ψ1, ψ2.Example 3.1.5 (Bivariate t distribution). The function R(1, η) of the bivariate t distribution27with joint density function (2.23) is given byR(1, η; ρ, ν) = FT(√ ν + 11− ρ2 (ρ− η−1/ν); 0, 1, ν + 1)+ ηFT(√ ν + 11− ρ2 (ρ− η1/ν); 0, 1, ν + 1), 0 ≤ ρ ≤ 1 and ν > 0.In Figure 3.5, it is obvious that, η increases with stronger dependence (i.e., ν decreases or ρincreases). Moreover, for every ν, R(1, η; 1, ν) = η. As ν→∞, the bivariate t distributionbecomes the bivariate Gaussian distribution and then R(1, η; 0, ν)→0.Figure 3.5: Plot of R(1, η) as a function of η for the bivariate t distribution for different valuesof ν and ρ.Let {(X1, Y1), ..., (Xn, Yn)} be a random sample. In an extreme value setting, supposep = pn is small relative to sample size n. We estimate the parameter vector θ in R(1, η;θ)with the M-estimator in Section 2.2.3. And then, the estimator of η∗ is expressed asηˆ∗ = g(θˆ, pn), (3.6)where θˆ is the M-estimator. Combining the Hill estimator for the tail index γ in Section 2.1.2, anonparametric estimator of a high-quantile for VaRY (1− p) in Section 2.1.3 and the estimatorfor parameter η∗ in (3.6), we obtain the following estimator for CoVaR at level pn:ĈoVaRY |X(1− pn) = Yn,n−k( knpn)γˆηˆ−γˆ∗ , (3.7)where Yn,1 ≤ ... ≤ Yn,n are the order statistics. Algorithm 2 summarizes the procedure of28CoVaR estimation for the random sample {(X1, Y1), ..., (Xn, Yn)}.Algorithm 2 CoVaR Estimation in Stationary Setting1: Select a parametric model for the tail dependence function, and estimate parameter θ usingM-estimator;2: Obtain ηˆ∗ by solving equation R(1, η∗; θˆ) = pn for a given pn;3: Obtain Hill estimator of γ with equation (2.4), where k is choosing with the two-stepsubsample bootstrap method described in Algorithm 1;4: Estimate VaRY (1− pn) by utilizing γˆ with equation (2.8);5: Input all the estimators into equation (3.7) to get the estimator of CoVaRY |X(1− pn).3.2 Simulation StudiesIn this section, we report results of five simulation studies to show the performance of theproposed method. These studies are based on the following distributions: bivariate logisticdistribution, bivariate asymmetric logistic distribution, bivariate Hu¨sler-Reiss distribution, bi-variate bilogistic distribution and bivariate t distribution. The first four models all in the classof bivariate EVDs. Due to the max-stability property of EVDs, they are in the domain of at-traction. Moreover, the bivariate t distribution has regularly varying tail, hence is also in thedomain of attraction (see e.g., Example 5.21 in Resnick [1987]).3.2.1 Performance of the M-estimator of the TD FunctionThe M-estimator discussed in Section 2.2.3 depends on the choices of m in equation (2.27)and the function vector g in equation (2.28). However, what is a proper choice of m and g? Toanswer this question, we simulated samples with different sizes from several bivariate distribu-tions to explore how the values of m and g influence the M-estimator of model parameters. Toillustrate the performance of estimators, bias and root mean squared error (RMSE) are plottedfor a range of values of m. For each example, we look at 100 replications of samples and setm ∈ {80, 130, 180, 230, 280, 330} for the first four examples, while for bivariate t distribu-tion, we try m ∈ {50, 100, 150, 200, 250, 300, 350}. The sample sizes vary according to thedimension of parameter space Θ.Example 1. Bivariate logistic distribution29The bivariate logistic distribution has the upper TD function given in equation (2.16). Fol-lowing the analysis in Einmahl et al. [2012], we take (∂/∂θ)R(x, y; θ) as the optimal choice ofg. Note that the optimal function g depends on the true values of model parameters and hencewould not be available in practice. We also consider other choices of g, given by low orderpolynomials: g0(x, y) = 1, g1(x, y) = x1. We simulate random samples of size n = 2000 froma bivariate logistic model with θ = 0.6. Results are shown in Figure 3.6.Figure 3.6: The bias and RMSE of M-estimator of θ based on 100 samples of size 2000 simu-lated from the bivariate logistic model with parameter θ = 0.6.In Figure 3.6, it is observed that, for all choices of function g, the bias increases as m be-comes bigger, while RMSE is minimized around m = 180. Compared with the optimal choiceof g, the simpler options have smaller biases and RMSEs, and the simplest one g0(x, y) = 1shows the best performance with the smallest bias and RMSE. However, compared with m,the differences among the three choices of g are quite small, which means that in practice, thefunction g does not affect the estimation much and we can use the simplest form of g directly.This is consistent with findings reported in Einmahl et al. [2012]. However, the value of mappears to matter considerably here. From Figure 3.6, it seems that choosing m between 150and 250 is reasonable when the sample size is 2000.Example 2. Bivariate Hu¨sler-Reiss (HR) distributionThe bivariate HR distribution defined in equation (2.17) has the upper TD function as spec-ified in (2.18). Similarly, we simulate random samples of size n = 2000 from a bivariate HRmodel with θ = 2.5. Based on the observation above regarding the choice of g, in this case, a30simple form is assigned to function g: g(x, y) = x.Figure 3.7: The bias and RMSE of M-estimator based on 100 samples of size 2000 simulatedfrom the bivariate HR model with parameter θ = 2.5.Based on Figure 3.7, the results for the HR distribution show a similar behavior as thosefor the logistic distribution. To be specific, the bias increases with m, while the RMSE attainsa minimal point around m = 280. It means that for the HR distribution, when n = 2000, achoice of m between 250 and 300 could be considered in practice.Example 3. Bivariate bilogistic distributionThe upper TD function of the bivariate bilogistic distribution is given in equation (2.18).Following the same steps as in the previous two examples, we simulate from the bivariatebilogistic model with α = 0.4, β = 0.7 and n = 2000. In this case, two parameters need tobe estimated, so we set the function vector as g(x, y) = (1, x)T . The biases and RMSEs areplotted for α and β separately in Figure 3.8.31Figure 3.8: The bias and RMSE of M-estimator based on 100 samples of size 2000 simulatedfrom the bivariate bilogistic model with parameter α = 0.4, β = 0.7.In Figure 3.8, we can see that as m increases, the absolute value of bias also increases forboth α and β. Moreover, RMSE’s are both minimized at aroundm = 180, which is the same asin Example 1 for the bivariate logistic distribution. These results indicate that for the bivariatebilogistic model, when we have 2000 observations, a value of m between 150 and 200 wouldbe a good choice.Example 4. Bivariate asymmetric logistic distributionIn Section 2.2, equation (2.22) gives the upper TD function of the bivariate asymmetriclogistic distribution, where three parameters need to be estimated. In this simulation, we alsoonly concentrate on the choices of m, and set the function vector g as in Einmahl et al. [2012].We simulate from a bivariate asymmetric logistical model with θ = 0.6, ψ1 = 0.5 andψ2 = 0.8 and set g(x, y) =(1, x, 2(x+ y))T . As the dimension of the parameter space here islarger than in the previous examples, we increase the sample size to 2500 to keep the accuracyof estimators.32Figure 3.9: The bias and RMSE of M-estimator based on 100 replications of size 2500 simu-lated for the asymmetric logistic model with parameter θ = 0.6, ψ1 = 0.5, ψ2 = 0.8.In Figure 3.9, results show an increasing trend in the absolute value of the bias overm and aminimal point of RMSE at m = 180 for all three parameters. These observations indicate thata reasonable choice for m is around 150 – 200 when n = 2500 for the bivariate asymmetriclogistic model.Example 5. Bivariate t distributionWe give the upper TD function of the standard bivariate t distribution in equation (2.24).There are two parameters ρ and ν that need to be estimated. We simulate from a bivariatet distribution with ρ = 0.6, ν = 5 and set g(x, y) = (x, x + y)T . Due to the difficulty ofestimating the tail parameter, we increase the sample size to 3000 to make the estimators moreaccurate.Figure 3.10 shows the bias and RMSE of M-estimators of the two parameters. We can seethat although the M-estimator of ν has a good performance and displays a similar behaviorwith respect to values of m as seen in Examples 1 – 4, most of the bias and RMSE of ρ arebigger than 0.1, accounting for 25% of the true real value and indicating poor performance ofthe estimator of ρ. Moreover, while the bias of ρˆ increases as m increases, the RMSE alsobecomes bigger, indicating the dominance of bias. However, based on the performance of νˆ,m = 100 here is a reasonable choice.33Figure 3.10: The bias and RMSE of simultaneous M-estimator based on 100 samples of size3000 simulated from the bivariate t model with parameter ν = 6, ρ = 0.6.In summary, compared with choosing function g, selection of a suitable value of m is lessstraightforward. A good choice of m depends on the model, the dimension of the parameterspace and also the sample size. It is hard to make a general recommendation. Simulation stud-ies provide some guidance with regard to selection of m for a given model based on bias andRMSE of estimators in finite sample settings. For example, from the results for the bivariatelogistic model, we find that a value of m ∈ [150, 200] for sample size n = 2000 may be agood choice, that is about 8% – 10% of the sample size. In our later analysis, including simu-lation and empirical studies, we would let m equal to 9% of the sample size when estimatingparameters of the TD function under the bivariate logistic model.Figure 3.11 displays sampling densities for model parameters using values of m, whichminimized RMSE.34(a) logistic distribution: n = 2000, m = 180 (b) HR distribution: n = 2000, m = 280(c) Bilogistic distribution: n = 2000, m = 180(d) Asysmmetric logisitic distribution: n = 2500, m = 18035(e) t distribution: n = 3000, m = 100Figure 3.11: The sampling densities of estimated parameters based on 100 samples for corre-sponding parametric models.From the sampling densities of the model parameter(s), it is clear that the estimated param-eters in the first four examples are roughly distributed around the true value. For the bivariatet distribution, the estimated ρ has a bigger bias compared with other parameters, but the es-timator of ν performs well with small variability. Einmahl et al. [2008] and Einmahl et al.[2012] show that under some conditions (for details, see Theorem 2.2.4 in Section 2.2.3), theM-estimators will be asymptotically normal, which is reflected in our plots. Due to the finitesample sizes, estimators exhibit minor biases. Overall, M-estimators of parameters of the TDfunction perform well in the considered examples, and will hence used in subsequent studies.3.2.2 Performance of the CoVaR EstimatorRecall equation (3.7) for estimating CoVaR in Section 3.1. For a given pn, in order to estimateCoVaR, we need to estimate γ, η∗ and VaRY (1 − pn). Tail index γ is estimated with theHill estimator; VaRY (1 − pn) is estimated with nonparametric extreme quantile estimator byutilizing γˆ; η∗ is estimated by solving equation R(1, η∗; θˆ) = pn, where R(x, y,θ) is theupper TD function for an assumed parametric model and θˆ is the M-estimator of θ. All theseestimators are then used to produce an estimator of CoVaR. As accuracy of CoVaR estimationis influenced by accuracy of estimators for the three components mentioned above, our initialanalysis aims to separate estimation errors attributed to each of the three components.36In this section, we explore the performance of our CoVaR estimator together with ηˆ∗, γˆ andV̂aRY (1 − pn). Monte Carlo simulations are carried with the same five example distributionsas in Section 3.2.1. At first, the parameters follow the same setting as in Figure 3.11 so thatwe can also see how the M-estimators affect our CoVaR estimation. To be specific, for logisticdistribution, we put θ = 0.6, n = 2000 and m = 180. For HR distribution, we let θ = 2.5,n = 2000 and m = 280. For bilogistic distribution, we generate samples with α = 0.4,β = 0.7, n = 2000 and estimate with m = 180. For asymmetric logistic distribution, we haveθ = 0.6, ψ1 = 0.5, ψ2 = 0.8, N = 2500 and m = 180. For t distribution, we let ν = 5,ρ = 0.6, n = 3000 and m = 100. For each model, we let pn = 0.05 and do 100 replications ofthe CoVaR estimation procedure.When exploring the performance of ĈoVaRY |X(1 − pn), the true CoVaRY |X(1 − pn) iscomputed by finding the quantile of the conditional distribution, which is given as the root ofh(y) =∫{(u,v)∈R2:u>c,v>y}f(u, v)dudv = p2n, (3.8)where c = VaRX(1 − pn) is the true (1 − pn)-quantile of the marginal distribution of X ,and f(x, y) is the joint density function of (X, Y ). Table 3.1 gives the summary statistics ofproposed CoVaR estimates with ηˆ∗.Table 3.1: Summary statistics of proposed CoVaR estimates at level pn = 0.05. The margins ofthe first four distributions are all standard Fre´chet distribution and the margins of the bivariatet distribution are all student t distribution.logistic HR bilogistic asymmetric logistic tTrue CoVaRY |X 367.3064 399.4755 341.5227 281.4862 4.4215Mean 446.3422 463.4036 460.9443 327.7476 4.5000Median 425.4983 456.3283 434.5008 314.4263 4.4763Standard deviation 127.9184 130.7512 149.8606 83.3997 0.5457To further explore the performance, for each model, we also compute a CoVaR∗Y |X(1−pn)asCoVaR∗Y |X(1− pn) = VaRY (1− pnη∗), (3.9)where VaRY (1 − pnη∗) is the true (1 − pnη∗)-quantile of the marginal distribution of Y andη∗ is the value solving equation R(1, η∗;θ) = pn with true θ. This η∗ is also used to examinethe performance of ηˆ∗. In order to further see how the estimated η∗ would affect the estimationof CoVaR, we also make a comparison between ĈoVaRY |X(1 − pn) in (3.7) with estimator37ĈoVaR∗Y |X(1 − pn) = Yn,n−k (k/(npn))γˆ η−γˆ∗ . Moreover, we also compute the real value of ηin (3.2) asη0 =P{Y > CoVaRY |X(1− pn)}pn, (3.10)where CoVaRY |X(1 − pn) is the true value. Apart from exploring the performance of estima-tors, the distances between η0 and η∗, CoVaRY |X and CoVaR∗Y |X can show us how well theupper TD function approximates the conditional tail probability when pn is relatively small.Furthermore, for each replication, we compute the empirical estimate of the conditionalexceedance probability asen :=#{Xi > V̂aRX(1− pn), Yi > ĈoVaRY |X(1− pn)}#{Xi > V̂aRX(1− pn)}, (3.11)where (X1, Y1), ..., (Xn, Yn) are simulated vectors, and V̂aRX(1 − pn) is obtained from themethod given in Section 2.1.3. When estimation of VaR and CoVaR is accurate, the ratioen should be close to the probability level pn. It is analogy to the VaR backtesting (Kuesteret al. [2006]) based on the exceedances of VaR, which will be discussed in Section 4.1. Thisstatistic is useful in practice as a check on the accuracy of CoVaR estimator, provided pn isnot too extreme (small) relative to the sample size. All the results are displayed by smoothedhistograms of estimators in Figures 3.12, 3.13, 3.14, 3.15 and 3.16.Figure 3.12: The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2000 for the logistic model with parameter θ = 0.6.38Figure 3.13: The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2000 for the HR model with parameter θ = 2.5.Figure 3.14: The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2000 for the bilogistic model with parameters α = 0.4,β = 0.7.39Figure 3.15: The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2500 for the asymmetric logistic model with parame-ters θ = 0.6, ψ1 = 0.5, ψ2 = 0.8.Figure 3.16: The sampling densities of estimates of γ, η∗, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 3000 for t distribution with parameters ν = 5, ρ = 0.6.Based on the sampling densities, firstly we can observe that the Hill estimator of γ exhibitsa small positive bias in all situations. Secondly, except the case of the bivariate t distribution,the estimators of η∗ are nearly unbiased and densities are symmetric over the true η∗, especiallyin the cases of the HR distribution and the bilogistic distribution. This also indicates that the40M-estimator of θ performs well and will not cause a large bias to the estimator of η∗ for thefirst four examples. The value of η∗ are close to the values of η0 in those four cases, whileηˆ∗ in bilogistic distribution underestimate η0 considerably from Figure 3.14. This may be dueto the fact that the range of estimate values in the bilogistic model is much narrower thanthat in the logistic distribution and the asymmetric logistic distribution. In the case of the tdistribution, although the difference between η0 and η∗ is bigger and ηˆ∗ is more biased, aninteresting phenomenon is that the density of ηˆ∗ is centered and symmetric over η0, which isexpected to lead to more accurate CoVaR estimation.Thirdly, the estimates of VaRY (1 − pn) are distributed around the true value in all ex-amples. However, the standard errors of V̂aRY (1 − pn) are big, which indicates that thisnonparametric estimator is not quite stable and is more likely to fail in practice. Moreover,for the estimation of CoVaR, all the estimates of CoVaRY |X(1 − pn) are roughly distributedaround CoVaR∗Y |X(1 − pn) = VaRY (1 − pnη∗), while most of them over-estimate the truevalue of CoVaRY |X(1 − pn). This is sensible, as under the key assumption in our methodol-ogy, ĈoVaR(1− pn) is in fact estimating VaRY (1− pnη∗). Furthermore, there is no differencebetween the densities of ĈoVaRY |X(1−pn) and those of ĈoVaR∗Y |X(1−pn), indicating that theestimated η∗ does not have a big influence on the CoVaR estimation. The shapes of densitiesof ĈoVaRY |X(1 − pn) are similar to those of V̂aRY (1 − pn), showing that CoVaR estimationis dominated by the estimation of VaRY . In logistic model and HR model, the true CoVaRY |Xand CoVaR∗Y |X overlap with each other, which is resulted from the tiny distance between η0and η∗, implying that for these two distribution, R(1, η) approximates the conditional proba-bility P{Y > VaRY (1 − pη)|X > VaRX(1 − p)} very well. Finally, from the histograms ofthe empirical estimates of the conditional exceedance probabilities, we can confirm the earlierobservation that there is a tendancy for the proposed CoVaR estimator to over-estimate the truevalue of CoVaR.From the results above, it seems that most of the CoVaR estimation error can be attributedto the estimation of VaRY (1 − pn). To further explore the influence of V̂aRY (1 − pn), forall distributions, we do another round of simulations by increasing the sample sizes to 5000,replication number to 200 and decrease the level pn to 0.01. These changes allow us to producemore accurate estimates of each component. In addition to applying the nonparametric estima-tion of VaRY (1 − pn) introduced in Section 2.1.3, we also estimate the VaRY (1 − pn) withthe Peaks-Over-Threshold (POT) method (see e.g., Embrechts et al. [1997]), where a general-ized Pareto (GP) distribution is used to approximate the distribution function of exceedancesabove threshold u (Pickands [1975]). The details of this method are given in Appendix A.4.41Paramters of the GP distribution are fitted via maximum likelihood estimation, implemented infunction “gdp.fit” of the R package “ismev”.An important aspect of the POT method is the selection of threshold u. Due to the largenumber of datasets, it is impossible for us to select u manually for each sample with the meanexcess plot (Davison and Smith [1990]) or parameter stability plots. In our simulations, weadopt a pragmatic approach and set the threshold consistent with the value of k in the Hillestimator of γ: u = Yn,n−k. For brevity of presentation, we only illustrate the samplingdensities of the two estimator of VaRY (1 − pn) together with the corresponding estimatorsof CoVaRY |X(1 − pn). We also plot the histograms of en to further see the performance ofCoVaRY |X(1− pn) estimation based on the semi-parametric of the quantile. The denominatorand nominator of en in (3.11) are also computed and plotted in Figure 3.17 – 3.21. They arethe number of exceedances of VaRX :En := #{Xi > V̂aRX(1− pn)}(3.12)and the number of joint exceedances:Ebn := #{Xi > V̂aRX(1− pn), Yi > ĈoVaRY |X(1− pn)}(3.13)Figure 3.17: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the logistic model with parameter θ = 0.6.Threshold u is chosen as: u = Yn,n−k with k = 450.42Figure 3.18: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the HR model with parameter θ = 2.5.Threshold u is chosen as: u = Yn,n−k with k = 700.Figure 3.19: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the bilogistic model with parameter α = 0.4and β = 0.7. Threshold u is chosen as: u = Yn,n−k with k = 450.43Figure 3.20: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the asymmetric logistic model with parameterθ = 0.6, ψ1 = 0.5 and ψ2 = 0.8. Threshold u is chosen as: u = Yn,n−k with k = 400.Figure 3.21: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for t distribution with parameter ν = 5 andρ = 0.6. Threshold u is chosen as: u = Yn,n−k with k = 150.According to the sampling densities of estimated VaRY (1− pn), it seems that the two esti-44mators of VaRY (1−pn) are quite similar to each other, especially in the cases of the HR modeland bilogistic model. Both of them have large variability that may lead to prediction inaccu-racies in practice. For all example distributions, the sampling densities of ĈoVaRY |X(1− pn)share similar shape of those of V̂aRY (1 − pn), which is consistent with the results given inFigure 3.12 – 3.16, where simulations are generated with smaller sample sizes and at levelpn = 0.05. This confirms that the key to improve the performance of CoVaR estimation inour methodology is to improve the accuracy of the estimator of VaRY (1 − pn). From the his-tograms of En, most of the empirical estimators are bigger smaller than the nominal value:5000 × 0.01 = 50, indicating that the estimator of VaRX(1 − pn) overestimate the true valueslightly. The histograms of conditional probability shows that the CoVaR estimation based onlarger sample sizes and smaller pn is better and more acceptable, because most of the proba-bility values are between 0 and 0.02. This is close our nominal value from the histograms ofEbn. As by letting pn = 0.01 and n = 5000, the nominal nominal value of Ebn is n× p2n = 0.5.However, the number of exceedances can only be integer, which means that Ebn is expected tobe 0 or 1. When Ebn = 0, en = 0 for every En, and when En = 1, en = 0.02 when En = 50.The summary statistics of proposed estimates are given in Table 3.2.Table 3.2: Summary statistics of proposed CoVaR estimates at level pn = 0.01. The margins ofthe first four distributions are all standard Fre´chet distribution and the margins of the bivariatet distribution are all student t distribution.logistic HR bilogistic asymmetric logistic tTrue CoVaRY |X 9719.46 9999.50 7660.29 281.4862 9.2215Mean 1313.98 463.4036 13772.85 10887.50 11.04Median 12158.84 456.3283 13166.82 10500.52 10.84Standard deviation 4986.23 4694.61 3644.54 83.3997 2.6445Chapter 4Empirical StudyIn this chapter, we use the proposed CoVaR estimation method on financial time series in astationary setting by utilizing the five parametric models discussed in Section 3.2. Due to thevolatility of financial time series, we apply the methodology to the realized innovations froma AR(1)-GARCH(1,1) filter. However, Sun and Zhou [2014] demonstrate that the realizedinnovations from GARCH filter for a sample with normal innovations may follow a heavy-tailed distribution, we also consider to perform the CoVaR estimation on the raw financialtime series. A backtesting procedure in Girardi and Ergu¨n [2013] is applied to assess theperformance.Remark 1. Mainik and Schaanning [2014] give a more general definition of CoVaR than pre-sented in equation (3.1). Let (X, Y ) be a random vector with joint df F and continuous marginsF1 and F2. The CoVaR at confidence level 1 − p2 is defined as the (1 − p2)-quantile of theconditional distribution:P{Y ≥ CoVaRY |X(p1, p2)|X ≥ VaRX(1− p1)}= p2, (4.1)where risk levels p1 and p2 can be different. Unlike the definition in Girardi and Ergu¨n [2013],this CoVaR definition allows us to consider different levels of CoVaR when conditioning on thesame distress event, and the same level of CoVaR when conditioning on a more severe distressevent. It is straightforward to adapt our methodology to this more general definition of CoVaR.Under the same assumptions as stated in Section 3.1, we can re-write equation (4.1) as46p2 =P{Y > VaRY (1− p2η), X > VaRX(1− p1)}p1=P{F2(Y ) > 1− p2p1η × p1, F1(X) > 1− p1}p1.(4.2)When p1 is close to 0, (4.2) can be approximated with R(1,p2p1η), where R(x, y) is the upperTD function of (X, Y ). Then, if we can find η∗ such that R(1,p2p1η∗)= p2, we can also esti-mate CoVaRY |X(p1, p2) with VaRY (1− p2)η−γ∗ , where γ is the tail index for random variableY .4.1 BacktestingIn this section, we review the backtesting procedure to assess and compare different CoVaRestimates from various parametric model assumptions on TD function of R. By noting thatCoVaR is actually the quantile of a conditional distribution, its evaluation can be achievedwith the standard Kupiec [1995] and Christoffersen [1998] tests. The test is first designedfor evaluating the accuracy of interval forecasts, and has been applied to assess the predictiveperformance of VaR (Kuester et al. [2006]) and CoVaR (Girardi and Ergu¨n [2013]).Let {Xt}t∈N and {Yt}t∈N denote losses (negative log-returns) for an institution and a systemproxy. The VaR at confidence level 1− p1 ∈ (0, 1) for Xt is written as VaRXt(1− p1) and theCoVaR at confidence level 1−p2 ∈ (0, 1) for Yt conditional on Xt > VaRXt(1−p1) is writtenas CoVaRYt|Xt(p1, p2).Suppose our sample include n observations with t = 1, ..., n. For each institution, definethe “hit sequence” of violation asIXt ={1, if Xt > VaRXt(1− p1)0, if Xt ≤ VaRXt(1− p1).For a sub-sample (assumed to have n1 observations), when the institution is in financial distress{Xt > VaRXt(1− p1)}, construct the second “hit sequence” of violation asIYt|Xt ={1, if Yt > CoVaRYt|Xt(p1, p2)0, if Yt ≤ CoVaRYt|Xt(p1, p2).47We test the correct unconditional coverage of VaR forecasts at level 1 − p1 by specifyingthe following hypotheses:H0 : E[IXt ] ≡ λ = p1 versus HA : E[IXt ] ≡ λ 6= p1.Under the null hypothesis, the likelihood-ratio test statistic is given by (for more details, seeKupiec [1995]):LRuc = 2[L(λˆ; IX1 , IX2 , ..., IXn)− L(p1; IX1 , IX2 , ..., IXn)] ·∼ χ21, for large n, (4.3)where L(x; IX1 , IX2 , ..., IXn) = log [xn1(1− x)n−n1 ]. The maximum-likelihood estimator λˆ isn1/T1, where n1 is the number of violations, that is #{t : IXt = 1}.Similarly, the null hypothesis to test the unconditional coverage property of CoVaR fore-casts at confidence level 1− p2 isH0 : E[IYt|Xt ] ≡ λ = p2,and the maximum-likelihood estimation λˆ under alternative hypothesis is n2/n1, where n2 isthe number of violations in the second “hit sequence”, that is #{t : IYt|Xt = 1}.In addition to the unconditional coverage test based on the binomial distribution, furthercalibration tests for risk measures are discussed in Nolde and Ziegel [2017]. They proposea two-stage procedures for assessment of risk measure forecasts, which can not only assesscalibration of risk measure forecasts, but also allow for a comparison of several methods bysupplementing traditional backtests with comparative backtests.4.2 DataWe consider a partial panel of financial institutions studied in Acharya et al. [2017] with amarket capitalization in excess of 5 billion USD as of the end of June 2007. The total num-ber of firms in our sample is 8, including Bank OF AMERICA CORP (BAC), BANK NEWYORK INC (BK), AFLAC INC (AFL), ALLSTATE CORP (ALL), AMERICAN EXPRESSCO (AXP), FRANKLIN RESOURCES INC (BEN), GOLDMAN SACHS GROUP INC (GS)and T ROWE PRICE GROUP INC (TROW). The sample period is from June 26, 2000 to May489, 2019, consisting of 4747 daily price observations for each institution. The Dow Jones USFinancials Index (DJUSFN) is used as a proxy for the aggregate financial system. The dailyprices and capitalization information are extracted from Yahoo Finance. Daily losses (%) arecalculated as negative log returns. Figure B.1 in Appendix B.1 gives the time series plots ofdaily losses for institutions and DJUSFN.Before we do the estimation, it is essential for us to validate the tail dependence assumptionof our samples. Figure 4.1displays scatter plots of losses for each institution and the market in-dex. For presentation purpose, we standardized the daily losses by subtracting the sample meanand dividing the sample standard deviation. From the plots, for each institution, its dependencewith DJUSFN seems to be persistent in both the upper and lower joint tails, especially for theBAC-DJUSFN and BEN-DJUSFN data pairs. This is indicative of the variables being taildependent.Figure 4.1: Scatter plots of standardized daily losses (%) for time series introduced in Section4.2 over the period from June 27, 2000 to May 9, 2019.One way to test for tail dependence is via estimating the extremal dependence measures χand χ¯ (Coles et al. [1999]). Let random variables X and Y denote, respectively, losses of afinancial institution and a system proxy with joint df F and margins F1 and F2. The extremaldependence measures χ and χ¯ are defined asχ = limt→∞P{X > F−11 (1− 1/t), Y > F−12 (1− 1/t)}P{X > F−11 (1− 1/t)} ,49χ¯ = limt→∞2 logP{X > F−11 (1− 1/t)}logP{X > F−11 (1− 1/t), Y > F−12 (1− 1/t)} − 1,where 0 ≤ χ ≤ 1 and −1 < χ¯ ≤ 1 provided the limits exist. If (X, Y ) is tail independent,then χ = 0 and −1 < χ¯ < 1; if (X, Y ) is tail dependent, then χ¯ = 1 and 0 < χ < 1. Defineχ(u) := 2− logC(u, u)log uand χ¯(u) :=2 log(1− u)log C¯(u, u)− 1,where C(u1, u2) := F(F−11 (u1), F−11 (u2))for (u1, u2) ∈ [0, 1]2 is the copula of bivariate dfF and C¯(u1, u2) = 1− u1 − u2 + C(u1, u2) is the survival copula. It is shown (see Coles andTawn [1991]) thatχ = limu→1χ(u) and χ¯ = limu→1χ¯(u).By replacing C and C¯ with their empirical estimators, we can obtain simple estimators of(χ, χ¯). Figure 4.2 illustrates the χ(u) and χ¯(u) plots for the BAC-DJUSFN data, which areimplemented with the function “chi.plot” in R package “evd”. The χ(u) and χ¯(u) plots forother institutions are given in Appendix B.2.Figure 4.2: χ(u) and χ¯(u) plots for the BAC-DJUSFN data.From theses plots, when taking sampling variability into account, the χ¯(u) estimates ap-pear to be converging to one for u→1, especially in BEN-DJUSFN and DS-DJUSFN data.This justifies the assumption of tail dependence. However, the estimates of χ for the ALL-DJUSFN pair have a decreasing trend towards 0 when u→1, which suggests the possibility oftail independence. In order to further confirm the tail dependence assumption for this dataset,we re-estimate χ¯ with an alternative method.50Ledford and Tawn [1996] established that under weak conditionsP{X > F−11 (1− 1/t), Y > F−12 (1− 1/t)} ∈ RV−1/ξ, (4.4)where ξ is called the residual dependence coefficient. It follows that χ¯ = 2ξ − 1. Trans-form the original random variables X and Y to unit Fre´chet: Z1 = −1/ logF1(X) andZ2 = −1/ logF (Y ) and define T = min{Z1, Z2}. We note that (4.4) is equivalent toP(Z1 > z, Z2 > z) = P(T > z) ∈ RV−1/ξ.Hence, we can estimate ξ (and χ¯) as the shape parameter for random variable T with themaximum likelihood estimator in the POT method. Figure 4.3 gives the plot of estimated χ¯versus threshold in the POT method for the ALL-DJUSFN data. From the graph, the 95%confidence band covers the value of one for thresholds exceeding roughly the 70%-quantile.This gives support for the earlier claim that the data could be assumed to be tail dependent.Figure 4.3: Maximum likelihood estimates of χ¯ with 95% confidence bands based on theprofile likelihood for the ALL-DJUSFN data.4.3 ResultsFirstly, we show the results of CoVaR estimation by treating observations in time series of dailylosses for firms and a market proxy as i.i.d.. We use the whole dataset to do the estimation51and then compare the CoVaR estimates under different parametric models by utilizing theunconditional coverage test in Section 4.1.Rather than to selecting k in Hill estimator for tail index automatically with the two-stepsubsample bootstrap algorithm, in these studies, we want to see how the k value will affect theestimator of CoVaR. Due to this consideration, we assess stability of estimates to values of kby plotting k versus CoVaR estimates. All the five parametric models in Section 3.2 are usedto do the estimation, and for each model, we consider three choices of m for the M-estimator:m/n ∈ {9%, 20%, 30%} for logistic distribution, m/n ∈ {14%, 25%, 35%} for Hu¨sler-Reissdistribution, m/n ∈ {9%, 20%, 30%} for bilogistic distribution, m/n ∈ {8%, 20%, 30%}for asymmetric logistic distribution and m/n ∈ {3%, 15%, 20%} for t distribution. For alldatasets, n = 4746 is the sample size. The first value of m in each considered set correspondsto the best values in terms of RMSE, as it comes from the results of simulation studies in Sec-tion 3.2.1. We consider p = pn = (p1,n, p2,n) = (0.05, 0.05) and (0.02, 0.05). For brevityof presentation, here we just show the graphs of BAC company in Figures 4.4 and 4.5. Theadditional graphs for the other seven institutions are given in Appendix B.3.From the graphs, we can see that the lines for different m values nearly coincide with eachother, especially for the logistic, Hu¨sler-Reiss and bilogistic distributions, indicating that thechoice of m will not have a large effect on CoVaR estimation. Furthermore, for all datasetsand parametric models, the CoVaR estimates tend to increase as k increases and have largevariability for k values below 100. This is similar to the effect of k in the tail index estimationas we point out in Section 2.1.2: if k is large, the bias will be large, and if k is small, thevariance will be large. In this situation, we hope to select a k value that balances bias andvariance. From the graphs, it seems that the estimators are much more stable from aroundk = 280 to k = 380, thus choosing a k inside [280, 380] seems to be suitable. This correspondsto k/n ∈ [6%, 8%].52Figure 4.4: Estimates of CoVaR as a function of k at level pn = (0.05, 0.05) for differentvalues of m for original BAC-DJUSFN data.Figure 4.5: Estimates of CoVaR as a function of k at level pn = (0.02, 0.05) for differentvalues of m for original BAC-DJUSFN data.53The next step is to evaluate the proposed estimator of CoVaR via the backtesting procedureusing the unconditional coverage test. Based on the results above, for each parametric modeland institution, we compute the CoVaR estimates with the first value of m in each consideredset and k = 350. Before we perform the unconditional coverage test, we need to estimate theVaR of institutions, where we also use the nonparametric extreme quantile estimation methodand let k = 300 (the k values are selected by plotting k versus estimates of VaR). We alsobacktest these VaR estimates to see how this nonparametric extreme quantile estimation methodperforms. Results are given in Table 4.1.From the p-values of VaR backtesting, we can see that for all institutions, the unconditionalcoverage property for the VaR measure is satisfied at 5% significance level, indicating a goodperformance of the nonparametric extreme quantile estimation method. For both risk levels,the CoVaR estimates for the logistic, Hu¨sler-Reiss, bilogistic and t distribution over-estimateCoVaR as the values of Ebn are smaller than the nominal value: En × 0.05. From the p-valuesat risk level pn = (0.05, 0.05), for all institutions, the CoVaR measure under the logistic dis-tribution, the asymmetric logistic distribution and the t distribution satisfies the unconditionalcoverage property at 5% significance level, while for some institutions, the unconditional cov-erage property is rejected under the Hu¨sler-Reiss and the bilogistic distribution. Moreover, theasymmetric logistic distribution seems to produce more accurate estimates, as its Ebn’s are theclosest to the nominal value. From the p-values at risk level pn = (0.02, 0.05), only the Co-VaR estimates under asymmetric logistic distribution satisfy the unconditional coverage prop-erty, indicating that for the datasets, asymmetric logistic distribution is the most suitable modelamong the considered five models, which is in line with the results at level pn = (0.05, 0.05).We should note that the financial time series we plot in Figure B.1 show volatility clusteringand are dependent, which violates the i.i.d. assumption. We next design to remove the volatil-ity and time dependence via an AR(1)-GARCH(1,1) filter and see how the proposed CoVaRestimator would perform on the resulting residuals. Let Xt denote the losses for an institutionor the market index. By assuming the AR(1)-GARCH(1,1) model, the losses satisfyXt = µt + σtZt, µt = α0 + α1Xt−1, σ2t = β0 + β1(σt−1Zt−1)2 + β2σ2t−1,where the standardized vectors {Zt}t∈N are i.i.d. with zero mean and variance one. Param-eters in the AR(1)-GARCH(1,1) filter are estimated using maximum likelihood assuming astandardized skew-t distribution (Ferna´ndez and Steel [1998]) for the innovations {Zt}.54Table 4.1: Unconditional coverage tests for VaR of institutions and CoVaR based on raw data. The X and Y variables for VaR andCoVaR are 100 × log return. “Log, HR, Bilog, Alog, t” stands for logistic, Hu¨sler-Reiss, bilogistic, asymmertric and t distribution,respectively. En is the number of exceedances of the VaR estimate, and Ebn is the number of joint exceedances of VaR and CoVaRestimates. Moreover, T represents the test statistic value in (4.3).pn = (0.05, 0.05) pn = (0.02, 0.05)VaR CoVaR VaR CoVaRLog HR Bilog Alog t Log HR Bilog Alog tBACestimate 3.486 10.659 10.699 10.662 9.414 10.406 5.675 17.003 17.067 17.008 15.019 16.599En/Ebn 237 7 6 6 12 7 92 1 1 1 3 1T 0.001 2.434 3.684 3.684 0.002 2.434 0.093 4.294 4.294 4.294 0.664 4.294p-value 0.984 0.119 0.055 0.055 0.964 0.119 0.761 0.038 0.038 0.038 0.415 0.038BKestimate 3.213 10.598 10.696 10.604 9.192 10.192 4.729 16.905 17.063 16.915 14.664 16.258En/Ebn 237 7 6 7 14 9 96 1 1 1 3 1T 0.001 2.434 3.684 2.434 0.389 0.784 0.013 4.619 4.619 4.619 0.815 4.619p-value 0.984 0.119 0.055 0.119 0.533 0.376 0.911 0.032 0.032 0.032 0.367 0.032AFLestimate 2.718 10.503 10.677 10.509 8.841 10.074 4.436 16.754 17.032 16.764 14.103 16.070En/Ebn 249 7 6 7 15 10 98 1 1 1 3 1T 0.598 2.963 4.315 2.963 0.518 0.543 0.751 4.783 4.783 4.783 0.895 4.783p-value 0.439 0.085 0.038 0.085 0.472 0.461 0.751 0.029 0.029 0.029 0.344 0.029ALLestimate 2.279 10.383 10.656 10.395 8.639 9.974 3.733 16.563 16.999 16.582 13.780 15.911En/Ebn 239 7 7 7 15 10 102 1 1 1 3 1T 0.013 2.520 2.520 2.520 0.761 0.354 0.526 5.113 5.1136 5.113 1.061 5.113p-value 0.910 0.112 0.112 0.112 0.383 0.552 0.468 0.024 0.024 0.024 0.303 0.024AXPestimate 3.196 10.624 10.698 10.629 9.282 10.222 4.810 16.947 17.066 16.955 14.806 16.307En/Ebn 246 7 6 7 13 8 99 1 1 1 3 1T 0.332 2.828 4.154 2.828 0.041 1.796 0.177 4.865 4.865 4.865 0.936 4.865p-value 0.565 0.093 0.042 0.093 0.839 0.180 0.674 0.027 0.027 0.027 0.333 0.027BENestimate 3.157 10.644 10.699 10.649 9.254 10.296 4.589 16.979 17.067 16.987 14.763 16.424En/Ebn 248 7 6 7 12 7 103 1 1 1 3 1T 0.501 2.918 4.261 2.918 0.014 2.918 0.683 5.196 5.196 5.196 1.105 5.196p-value 0.479 0.087 0.039 0.088 0.907 0.088 0.409 0.023 0.023 0.023 0.293 0.023GSestimate 3.228 10.567 10.693 10.572 9.039 9.956 4.709 16.856 17.057 16.864 14.419 15.881En/Ebn 257 7 6 6 12 7 97 1 1 1 3 1T 1.678 3.335 4.751 4.751 0.061 3.335 0.046 4.701 4.701 4.701 0.855 4.701p-value 0.195 0.068 0.029 0.029 0.806 0.068 0.830 0.030 0.030 0.030 0.355 0.030TROWestimate 3.227 10.658 10.702 10.661 9.408 10.220 4.781 17.002 17.071 17.006 15.008 16.302En/Ebn 246 7 6 6 12 8 103 1 1 1 3 1T 0.332 2.828 4.154 4.154 0.008 1.796 0.683 5.196 5.196 5.196 1.105 5.196p-value 0.565 0.093 0.042 0.042 0.930 0.180 0.409 0.023 0.023 0.023 0.293 0.02355With the estimates of µt and σt, we can obtain the sequence of residuals as:{Zˆt = (Xt −µˆt)/σˆt}. The following estimation is based on these realized residuals {Zˆt}. We give the timeseries plots of realized residuals for institutions and market index in Figure B.2 of AppendixB.1.Before we perform the estimation, we produce the scatter plots of institution’s residualsversus market’s residuals to validate the tail dependence assumption. Graphs are shown inFigure 4.6. We can see that the upper tail dependence is obvious for each pair.Figure 4.6: Scatter plots of realized residuals from time series introduced in Section 4.2 overthe period from June 27, 2000 to May 9, 2019.Remark 2. Throughout the nonparametric extreme quantile estimation we discussed in Section2.1.3 and the analysis we constructed before, we estimate the VaR at a confidence level 1 − pfor the i.i.d. random variables Y1, Y2, ..., Yn via formula:V̂aRY (1− p) = Yn,n−k( knp)γˆn(k),where Yn,1 ≤ Yn,2 ≤ ... ≤ Yn,n are the order statistics and γˆn(k) is the Hill estimator of the tailindex with sample fraction k. However, a more general way to estimate the VaR isV̂aRY (1− p) = Yn,n−k2( k2np)γˆn(k1), (4.5)where the intermediate sequences k1 = k1,n, k2 = k2,n can be different. From this formula, toestimate the VaR, we first select a proper k1 for the tail index using Hill plot or the bootstrap-56based method. Then, we put the estimates γˆn(k1) into (4.5) and do sensitivity analysis betweenk2 and VaR estimates to select a proper value of k2 and get an accurate estimate of VaR. In thefollowing estimation, we will apply this procedure to estimate the VaR of market index andinstitutions. We denote ki1 and ki2 as the sample fractions of the tail index estimate and VaRestimate for institution i (i ∈ {AFL, ALL, AXP, BAC, BEN, BK, GS, TROW}), respectively.The parameter ks1 and ks2 are denoted as the sample fractions of the tail index estimate and VaRestimate for the market index, respectively.For each institution, we consider pn = (0.02, 0.05) and the first value of m in each con-sidered set. We select the ki1’s and ks1 through the stability of Hill plot and the results are:ks1 = 230, kAFL1 = 230, kALL1 = 230, kAXP1 = 240, kBAC1 = 240, kBEN1 = 240, kBK1 = 245,kGS1 = 250 and kTROW1 = 240. To select ks2, we plot VaR estimates as well as the plots ofCoVaR estimates under different parametric model assumptions as a function of ks2. For thesake of brevity, we also just show the graphs of BAC company here in Figure 4.7. The othersare shown in Figure B.6 in Appendix B.3.Figure 4.7: Estimates of CoVaR as a function of ks2 at level pn = (0.02, 0.05) for estimatedresiduals from BAC-DJUSFN data. The dotted vertical line represent the ks2 = ks1 = 230.From the plots, firstly we can see that the curves of CoVaR estimates under different modelassumptions have the same shape as the curve of VaR estimates. This is sensible, as the quotient57between CoVaR estimator and VaR estimator is ηˆ−γˆ∗ , which is uncorrelated with the value ofks2. Moreover, the curves seem to be stable from around 100 to 250, and then decrease linearly.Thus, it seems that selecting ks2 = ks1 = 230 is reasonable for all pairs. Compared withthe graphs in Figures 4.4 and 4.5, these plots are more informative to select a proper valueof sample fraction, as the stable region seems to be more obvious. This may indicate thatconstructing sensitivity analysis following procedures in Remark 2 is more suitable for theproposed CoVaR estimation methodology.To select the value of ki2, we plot the VaR estimates against sample fraction ki2 and thegraphs suggest selecting ki2 = 200 for all i. The next step is to assess the CoVaR estimates withthe unconditional coverage test. Based on the results above, we put the CoVaR estimates withks2 = 230 into backtesting. The results are given in Table 4.2.From the test results of estimated VaR of institutions, all VaR estimates perform well andsatisfy the unconditional coverage property, although the VaR estimates for company GS ap-pear to be slightly overestimate. This indicates that ki2 = 200 for all i is a good choice here.From the estimates of CoVaR, it seems that the logistic, Hu¨sler-Reiss and bilogistic distribu-tions have similar performance, as their estimates for each institution are close to each other.However, from the results of Ebn and p-value of the unconditional coverage test, we can seethat the estimates under these three parametric models over-estimate CoVaR and the uncon-ditional coverage test is rejected at 5% significance level, especially under the Hu¨sler-Reissdistribution. However, the estimates under asymmetric logistic and t distributions have Esb thatare much closer to the nominal value: En × 0.05, and all pass the unconditional coverage testat 5% significance level, indicating more accurate estimation of CoVaR. These are similar tothe backtesting results of the raw data in Table 4.1. However, for the asymmetric logistic andt distributions, although they seem to have the same Ebn value for some institutions, their esti-mates are quite different from each other. This means that although this traditional calibrationbacktesting procedure allows us to assess a particular risk measure estimation method, it mayfail to provide an adequate comparison of the results among several methods.To make further comparisons across these five parametric models, we then calculate anaverage quantile score for each institution under each model. Following the notations in Section4.1, define I = {t : IXt = 1} and n1 = |I|. The average quantile score S¯ is expressed asS¯ =1n1∑t∈IS(ĈoVaRYt|Xt(p1, p2), Yt),58Table 4.2: Unconditional coverage tests for VaR of institutions and CoVaR based on realizedresiduals at level pn = (0.02, 0.05). “Log, HR, Bilog, Alog, t” stands for logistic, Hu¨sler-Reiss, bilogistic, asymmertric and t distribution, respectively. En is the number of exceedancesof the VaR estimate, and Ebn is the number of joint exceedances of VaR and CoVaR estimates.Moreover, T represents the test statistic value in (4.3).VaR CoVaRLog HR Bilog Alog tBACestimate 2.239 5.220 5.257 5.223 4.791 4.989En/Ebn 90 1 1 1 6 3T 0.265 4.133 4.133 4.133 0.479 0.593p-value 0.607 0.042 0.042 0.042 0.489 0.441BKestimate 2.178 5.168 5.251 5.173 4.638 4.879En/Ebn 98 1 1 1 6 4T 0.101 4.782 4.782 4.782 0.243 0.185p-value 0.751 0.029 0.029 0.029 0.622 0.667AFLestimate 2.147 5.028 5.206 5.252 4.374 4.579En/Ebn 97 1 0 0 4 4T 0.046 4.701 1.373 1.373 0.166 0.166p-value 0.830 0.030 NaN NaN 0.683 0.683ALLestimate 2.088 4.960 5.197 4.974 4.271 4.403En/Ebn 94 2 0 2 4 4T 0.009 2.063 1.370 2.063 0.115 0.115p-value 0.924 0.151 NaN 0.151 0.734 0.734AXPestimate 2.200 5.162 5.249 5.166 4.606 4.885En/Ebn 96 1 1 1 6 4T 0.012 4.619 4.619 4.619 0.294 0.148p-value 0.911 0.032 0.032 0.032 0.588 0.700BENestimate 2.221 5.192 5.253 5.195 4.693 4.902En/Ebn 93 1 1 1 6 3T 0.040 4.375 4.375 4.375 0.379 0.701p-value 0.842 0.037 0.037 0.037 0.538 0.402GSestimate 2.181 5.197 5.252 5.200 4.7078 4.908En/Ebn 85 0 0 0 5 2T 1.096 1.361 1.361 1.361 0.132 1.547p-value 0.295 NaN NaN NaN 0.716 0.214TROWestimate 2.146 5.181 5.253 5.183 4.686 4.907En/Ebn 97 1 1 1 6 3T 0.046 4.701 4.701 4.701 0.268 0.855p-value 0.830 0.030 0.030 0.030 0.605 0.35559where S(r, x) is a consistent scoring function for VaR at level 1 − p2, which is expressed as(for more details, see Nolde and Ziegel [2017]):S(r, x) = (p2 − 1{x > r}) r + 1{x > r}x.A lower score indicates better performance. The results are reported in Table 4.3. From the val-ues of S¯, the asymmetric logistic distribution and the t distributions has superior performanceon CoVaR estimation, compared with the other three distributions. In addition, among the firstfour extreme value distributions, it seems that the multi-parameter model (asymmetric logisticdistribution) has the better performance than the one-parameter models (logistic and HR distri-butions). This is making sense, as from the M-estimates of model parameters, for each dataset,ψˆ1 is around 0.1 while ψˆ2 is around 0.6, showing big difference and thus obvious asymme-try. Furthermore, the HR distribution seems to have the worst performance among these fivedistributions for the most of institutions. These results are consistent with the results of uncon-ditional coverage tests in Table 4.2. Finally, the model with the best performance seems to varyacross different institutions, but can only be either the asymmetric logistic distribution or the tdistribution.Table 4.3: The average quantile scores S¯ of CoVaR estimates based on realized residuals atlevel pn = (0.02, 0.05).BAC BK AFL ALL AXP BEN GS TROWLog 0.262 0.256 0.253 0.251 0.259 0.260 0.259 0.260HR 0.263 0.263 0.260 0.260 0.263 0.263 0.263 0.263Bilog 0.262 0.260 0.263 0.251 0.259 0.261 0.260 0.260Alog 0.253 0.254 0.244 0.227 0.255 0.254 0.249 0.254t 0.255 0.252 0.243 0.244 0.252 0.253 0.249 0.25360Chapter 5ConclusionThis thesis develops a semi-parametric method for estimating a high quantile of a conditionaldistribution given that one of the components of a bivariate random vector is extreme, and thenapplies this method to systemic risk estimation with CoVaR as a risk measure. The method-ology rests on asymptotic results from multivariate EVT, combining parametric modelling ofthe tail dependence function to address the issue of data sparsity in the joint tail regions andnonparametric univariate tail estimation techniques.The main advantage of EVT-based estimators is that they capture useful information in thetail of the data without restricting the behaviour of the central part. We note that there is an-other EVT-based estimation method for CoVaR, which makes use of the limit expression forthe conditional probability given that one of the components of a bivariate random vector isextreme; see Nolde and Zhang [2018]. Although this method is shown to provide a compet-itive alternative to a flexible fully-parametric method, its requirement of multivariate regularvariation imposes the restriction of the same tail index for both the institutional and systemlosses, and a somewhat restrictive parametric assumption on the extremal dependence struc-ture. Thus, another advantage of the framework in our methodology is that it allows to balancemodel uncertainty under a milder condition of multivariate domain of attraction.In the simulation studies in Section 3.2, it was shown that the variance of the proposedCoVaR estimator is dominated by the variability in the estimation of the high quantile. We alsoobserved a minor positive bias under all models used in data simulation. The methodology hasbeen applied to real data including log-returns on eight US financial firms with returns on the61Dow Jones US Financials Index used as market proxy. A test for unconditional coverage wascarried out on CoVaR estimates under several parametric model assumptions on the tail depen-dence function. It was found that there were minor differences among the CoVaR estimatesunder symmetric models, and the asymmetric model showed the best performance, since thebacktest for unconditional coverage was passed under the asymmetric logistic distribution forall financial firms. A similar analysis was conducted after applying an AR(1)-GARCH(1,1)filter to the individual time series. Through the backtesting results, the asymmetric logistic dis-tribution and t distribution were found to have superior performance on the resulting CoVaRestimates and passed the unconditional coverage test for financial institutions. The other threemodels also resulted in similar CoVaR estimates and the unconditional coverage property wererejected under these three models.As the next step, we would like to conduct a more rigorous backtesting of the proposedCoVaR estimation procedure in order to be able to identify the best performing model forthe tail dependence function, as well as to compare our methodology with other availableapproaches. Through the simulation studies, we were able to assess finite sample propertiesof the proposed estimator. In the follow-up work, we aim to study asymptotic properties ofthe estimator, including consistency and normality. Finally, we are interested in extendingour methodology to the case where the losses of an institution and of the system exhibit tailindependence as the current framework only applies to situations in which the tail dependencefunction is not identically zero, i.e., only in the presence of tail dependence.62BibliographyB. Abdous, A.L. Fouge`res, and K. Ghoudi. Extreme behaviour for bivariate elliptical distribu-tions. Canadian Journal of Statistics, 33:317–334, 2005. → page 3V.V. Acharya, R. Engle, and M. Richardson. Capital shortfall: A new approach to ranking andregulating systemic risks. American Economic Review, 102:59–64, 2012. → page 1V.V. Acharya, L.H. Pedersen, T. Philippon, and M. Richardson. Measuring systemic risk. TheReview of Financial Studies, 30:2–47, 2017. → pages 1, 48T. Adrian and M.K. Brunnermeier. CoVaR. Technical report, National Bureau of EconomicResearch, 2011. → page 2A. Azzalini and A. Capitanio. Distributions generated by perturbation of symmetry with em-phasis on a multivariate skew-t distribution. Journal of the Royal Statistical Society: SeriesB, 65:367–389, 2003. → page 19V. Barnet. The ordering of multivariate data (with discussion). Journal of the Royal StatisticsSociety, Series A, 139:318–354, 1976. → page 14J. Beirlant, Y. Goegebeur, J. Segers, and J.L. Teugels. Statistics of Extremes: Theory andApplications. John Wiley & Sons, 2006. → pages 9, 10, 13, 15S. Benoit, J.E. Colliard, C. Hurlin, and C. Pe´rignon. Where the risks lie: A survey on systemicrisk. Review of Finance, 21:109–152, 2017. → page 1P.F. Christoffersen. Evaluating interval forecasts. International Economic Review, pages 841–862, 1998. → page 47S.G. Coles and J.A. Tawn. Modelling extreme multivariate events. Journal of the Royal Statis-tical Society: Series B, 53:377–392, 1991. → pages 20, 50S.G. Coles, J. Heffernan, and J.A. Tawn. Dependence measures for multivariate extremes.Extremes, 2:339–365, 1999. → page 49J. Danielsson, L. de Haan, L. Peng, and C.G. de Vries. Using a bootstrap method to choosethe sample fraction in tail index estimation. Journal of Multivariate Analysis, 76:226–248,2001. → pages 4, 11, 1263B. Das and S.I. Resnick. Conditioning on an extreme component: Model consistency withregular variation on cones. Bernoulli, 17:226–252, 2011. → page 3Anthony C Davison and Richard L Smith. Models for exceedances over high thresholds. Jour-nal of the Royal Statistical Society: Series B (Methodological), 52(3):393–425, 1990. →page 42O. De Bandt and P. Hartmann. Systemic risk: a survey. 2000. → page 1L. de Haan and A. Ferreira. Extreme Value Theory: An Introduction. Springer Science &Business Media, 2006. → pages 3, 6, 24, 25L. de Haan and S.I. Resnick. Second-order regular variation and rates of convergence inextreme-value theory. The Annals of Probability, 24:97–124, 1996. → page 10Stefano Demarta and Alexander J McNeil. The t copula and related copulas. Internationalstatistical review, 73(1):111–129, 2005. → page 19F.X. Diebold, T. Schuermann, and J.D. Stroughair. Pitfalls and opportunities in the use ofextreme value theory in risk management. The Journal of Risk Finance, 1:30–35, 2000. →page 5J.H. Einmahl, A. Krajina, and J. Segers. A method of moments estimator of tail dependence.Bernoulli, 14:1003–1026, 2008. → pages 4, 20, 36J.H. Einmahl, A. Krajina, and J. Segers. An m-estimator for tail dependence in arbitrary di-mensions. The Annals of Statistics, 40:1764–1793, 2012. → pages 4, 20, 22, 30, 32, 36P. Embrechts, C. Klu¨ppelberg, and T. Mikosch. Modelling Extremal Events: for Insurance andFinance. Springer Berlin Heidelberg, 1997. → pages 2, 5, 8, 41C. Ferna´ndez and M. Steel. On Bayesian modeling of fat tails and skewness. Journal of theAmerican Statistical Association, 93:359–371, 1998. → page 54R.A. Fisher and L.H.C. Tippett. Limiting forms of the frequency distribution of the largest orsmallest member of a sample. In Mathematical Proceedings of the Cambridge PhilosophicalSociety, volume 24, pages 180–190. Cambridge University Press, 1928. → page 6G. Girardi and A.T. Ergu¨n. Systemic risk measurement: Multivariate GARCH estimation ofCoVaR. Journal of Banking & Finance, 37:3169–3180, 2013. → pages 2, 23, 46, 47B. Gnedenko. Sur la distribution limite du terme maximum d’une serie ale´atoire. Annals ofMathematics, pages 423–453, 1943. → pages 6, 8E.J. Gumbel. Distributions des valeurs extremeˆs en plusiers dimensions. Publ. Inst. Statist.Univ. Paris, 9:171–173, 1960. → pages 14, 17J.E. Heffernan and S.I. Resnick. Limit laws for random vectors with an extreme component.The Annals of Applied Probability, 17:537–571, 2007. → page 364J.E. Heffernan and J.A. Tawn. A conditional approach for multivariate extreme values (withdiscussion). Journal of the Royal Statistical Society: Series B, 66:497–546, 2004. → page 3B.M. Hill. A simple general approach to inference about the tail of a distribution. The Annalsof Statistics, pages 1163–1174, 1975. → pages 4, 9X. Huang. Statistics of bivariate extremes. Tinbergen Institute Research Series, 22, 1992. →page 15J. Hu¨sler and R.D. Reiss. Maxima of normal random vectors: between independence andcomplete dependence. Statistics & Probability Letters, 7:283–286, 1989. → page 17A.F. Jenkinson. The frequency distribution of the annual maximum (or minimum) values ofmeteorological elements. Quarterly Journal of the Royal Meteorological Society, 81:158–171, 1955. → page 7H. Joe, R.L. Smith, and I. Weissman. Bivariate threshold methods for extremes. Journal of theRoyal Statistical Society: Series B (Methodological), 54:171–183, 1992. → page 20H. Joe, H. Li, and A.K. Nikoloulopoulos. Tail dependence functions and vine copulas. Journalof Multivariate Analysis, 101:252–270, 2010. → page 3G.G. Kaufman and K.E. Scott. What is systemic risk, and do bank regulators retard or con-tribute to it? The Independent Review, 7:371–391, 2003. → page 1S. Kotz and S. Nadarajah. Multivariate t-distributions and Their Applications. CambridgeUniversity Press, 2004. → page 68K. Kuester, S. Mittnik, and M.S. Paolella. Value-at-risk prediction: A comparison of alternativestrategies. Journal of Financial Econometrics, 4:53–89, 2006. → pages 38, 47P. Kupiec. Techniques for verifying the accuracy of risk measurement models. The Journal ofDerivatives, 3, 1995. → pages 47, 48A.W. Ledford and J.A. Tawn. Statistics for near independence in multivariate extreme values.Biometrika, 83:169–187, 1996. → pages 20, 50G. Mainik and E. Schaanning. On dependence consistency of CoVaR and some other systemicrisk measures. Statistics & Risk Modeling, 31:49–77, 2014. → pages 2, 23, 46A.K. Nikoloulopoulos, H. Joe, and H. Li. Extreme value properties of multivariate t copulas.Extremes, 12:129–148, 2009. → pages 3, 16, 68N. Nolde and J. Zhang. Conditional extremes in asymmetric financial markets. Journal ofBusiness & Economic Statistics, pages 1–29, 2018. → pages 3, 61N. Nolde and J. Ziegel. Elicitability and backtesting: Perspectives for banking regulation. Theannals of applied statistics, 11:1833–1874, 2017. → pages 48, 6065S.A. Padoan. Multivariate extreme models based on underlying skew-t and skew-normal dis-tributions. Journal of Multivariate Analysis, 102:977–991, 2011. → pages 68, 69I.J. Pickands. Statistical inference using extreme order statistics. The Annals of Statistics, 3:119–131, 1975. → page 41S.I. Resnick. Extreme Values, Regular Variation and Point Processes. Springer, 1987.→ pages6, 24, 29S.I. Resnick. Heavy-tail Phenomena: Probabilistic and Statistical Modeling. Springer Science& Business Media, 2007. → page 7S.I. Resnick. Extreme Values, Regular Variation and Point Processes. Springer, 2013. → page67R.L. Smith. Extreme value theory. Handbook of Applicable Mathematics, 7:437–471, 1990.→ page 18R.L. Smith. Multivariate threshold methods. In Extreme Value Theory and Applications, pages225–248. Springer, 1994. → page 20P. Sun and C. Zhou. Diagnosing the distribution of GARCH innovations. Journal of EmpiricalFinance, 29:287–303, 2014. → page 46J.A. Tawn. Bivariate extreme value theory: models and estimation. Biometrika, 75:397–415,1988. → page 18J.A. Tawn. Modelling multivariate extreme value distributions. Biometrika, 77:245–253, 1990.→ page 15R. Von Mises. La distribution de la plus grande de n valuers. Rev. Math. Union Interbalcanique,1:141–160, 1936. → page 7I. Weissman. Estimation of parameters and large quantiles based on the k largest observations.Journal of the American Statistical Association, 73:812–815, 1978. → page 13E.B. Wilson. Advanced Calculus. Ginn, 1912. → page 1666Appendix AProofsIn this appendix, we collect several supplementaty theoretical results and derivations referredto in the main body of the thesis.A.1 Regular Variation of Skew-t DistributionLet Y ∼ ST1(0, 1, α, ν) be a random variable with density functionfST (y) = 2fT (y; ν)FT(αy√ν + 1ν + y2; ν + 1),where fT (·; ν) and FT (·; ν) are the density function and df of a univariate Student’s t randomvariable with ν degree of freedom.It can be verified that the tail of the standard skew-t distribution, for ν > 1 and α ∈ R, isa regularly varying function with index −ν (see p 13-17, Resnick [2013]), and 1 − FST(y) ∼ν−1y−νs(y;α, ν) as y→∞, wheres(y;α, ν) =2Γ{(ν + 1)/2}Γ(ν/2)(1y2+1ν)−(ν+1)/2FT(α√ν + 1; 0, 1, ν)is a slowly varying function.67Define Qα,ν(·) as the lower quantile function of distribution ST1(0, 1, α, ν). Then we canget as u→ 0 (see (4), Padoan [2011]),Qα,ν(1− u) ∼ (s(α, ν)/u)1/ν , (A.1)wheres(α, ν) =Γ{(ν + 1)/2}νν/2FT(α√ν + 1; 0, 1, ν)Γ(ν/2)√pi.A.2 Proof of Proposition 2.2.1The proof is following Nikoloulopoulos et al. [2009], where the TD function is derived inmultivariate version.Lemma A.2.1 (Kotz and Nadarajah [2004]). Let X = (XT1 ,XT2 )T ∈ Rd follows the multivari-ate t distribution with parameter = 0, scale parameter Σ and ν degree of freedom, whereΣ =(Σ11 Σ12Σ21 Σ22), Σ22;1 = Σ22 − Σ21Σ−111 Σ12,If X1 and X2 are k and d− k dimensional sub-vectors, respectively, thenP{X2 ≤ x2|X1 = x1} = FT(√ν + kν + xT1 Σ−111 x1x2;1; 0,Σ22;1, ν + k),where x2;1 = x2 − Σ21Σ−111 x1.We know that the student’s t distribution is a special case of the standard skew-t distributionwith α = 0. So from equation (A.1), the Q1(1− ux) and Q2(1− uy) in (2.14) satisfy:Q1(1− ux) ∼ (s(0, ν)/ux)1/ν and Q2(1− uy) ∼ (s(0, ν)/uy)1/ν , u→0.Then the upper tail dependence function of the bivariate t distribution can be written asR(x, y; ρ, ν) = x limu→0P{Y > (s(0, ν)/(uy))1/ν |X = (s(0, ν)/(ux))1/ν}+ y limu→0P{X > (s(0, ν)/(ux))1/ν |Y = (s(0, ν)/(uy))1/ν}.(A.2)68From Lemma A.2.1, we obtainP{Y ≤ {s(0, ν)/(uy)}1/ν |X = (s(0, ν)/(ux))1/ν}= FT(√(ν + 1)(s(0, ν)/(ux))2/ν(1− ρ2)(v + (s(0, ν)/(ux))2/ν)((yx)−1/ν− ρ); 0, 1, ν + 1).(A.3)Observing thatlimu→0(s(0, ν)/ux)2/νν + (s(0, ν)/ux)2/ν= 1,we obtain the final expression for the upper tail function of the bivariate t distribution:R(x, y; ρ, ν) = xFT(√ν + 11− ρ2(ρ−(yx)−1/ν); ν + 1)+ yFT(√ν + 11− ρ2(ρ−(xy)−1/ν); ν + 1).(A.4)A.3 Proof of Proposition 2.2.2The proof is following Padoan [2011], where the TD function is derived in multivariate version.Lemma A.3.1 (Padoan [2011]). Let X = (X1, X2)T ∼ ST2(0,Ω,α, ν), where Ω =(1 ρρ 1).Then for i 6= j ∈ {1, 2},• Xj ∼ ST1(0, 1, α¯j, ν), where α¯j = αj+ραi√1+αi(1−ρ2).• The conditional distribution is univariate extended skew-t distribution:P(Xi ≤ xi|Xj = xj) = FEST((xi − ρxj)/√αQj(1− ρ2); 0, 1,√1− ρ2αi, τi|j, ν + 1),where αQj = (ν + x2j)/(ν + 1), τi|j =√(ν + 1/(ν + x2j))(αiρ+ αj)xj .Then from Lemma A.3.1, for the bivariate skew-t distribution in Example 2.2.6, we can69rewrite the conditional probability in equation (2.14) aslimu→0P{U2 > 1− uy|U1 = 1− ux}=limu→0P{Y > Q2(1− uy)|X = Q1(1− ux)}=limu→0F¯EST Q2(1−uy)Q1(1−ux) − ρ√1− ρ2√ν + 1√νQ1(1−ux)2 + 1; 0, 1,√1− ρ2α2, τ1, ν + 1 ,where τ1 =√ν+1ν/qα¯1,ν(1−uy)2+1(α2ρ + α1) and F¯EST is the survival function of FEST. Fur-thermore, Q1, Q2 are the lower quantile functions for ST1(0, 1, α¯1, ν) and ST1(0, 1, α¯2, ν),respectively. The parameters α¯1 and α¯2 are given in Lemma A.3.1.Applying regular variation of the tail of the skew-t distribution, we havelimu→0Q2(1− uy)Q1(1− ux) =( y¯x¯)−1/νandlimu→0ν/Q1(1− ux)2 = 0where x¯ = xFT(α¯2√ν + 1; 0, 1, ν) and y¯ = yFT(α¯1√ν + 1; 0, 1, ν). Similarly, we can getthe form of limu→0 P{U2 > 1 − uy|U1 = 1 − ux}. Combining all the results, we have thefinal expression for the the upper tail dependence function of the bivariate skew-t distributionfunction asR(x, y; ρ, α1, α2, ν) = x · F¯EST( √ν + 1√1− ρ2((x¯y¯)1/ν− ρ); 0, 1,√1− ρ2α2, τ1, ν + 1)+ y · F¯EST( √ν + 1√1− ρ2(( y¯x¯)1/v− ρ); 0, 1,√1− ρ2α1, τ2, ν + 1),(A.5)where x¯ = xTν(α¯2√ν + 1), y¯ = yTν(α¯1√ν + 1), α¯1 = α1+ρα2√1+(1−ρ2)α2, α¯2 = α2+ρα1√1+(1−ρ2)α1,τ1 =√ν + 1(α1 + α2ρ), τ2 =√ν + 1(α2 + α1ρ).70A.4 Peaks-Over-Threshold MethodOne way to define an extreme event is when the process exceeds a high threshold; i.e., eventsof the form {X > u} for large threshold u. The question is can we find a limiting distributionfor excesses over u, X − u, conditional on the event that X has exceeded u, i.e., X > u.Definition 9. The distribution with df of the formHσ,ξ(y) =1−(1 + ξyσ)−1/ξ+, ξ 6= 01− e−y/σ, ξ = 0is called a generalized Pareto distribution with scale parameter σ > 0 andd shape parameterξ ∈ R, written as GP (σ, ξ).Suppose X1, ..., Xn are i.i.d random variables with df F , where F ∈ D(Gξ). That is to say,when n is large, we haveF n(anx+ bn) ≈ Gξ(x) = exp{−(1 + ξx)−1/ξ+}. (A.6)Define xF = sup{x : F (x) < 1}. In particular, we can assume that there exist u, near xF , suchthat (A.6) holds for all x such that anx+ bn > u. Let anx+ bn = y, bn = µ and an = σ. Thenwe haveF (y) ≈ exp{−(1 + ξy − uσ)−1/ξ+}≈ 1−(1 + ξy − uσ)−1/ξ+, y > u.Now consider the conditional df of excess over threshold u:P(X − u ≥ y|X > u) = P(X > u+ y)P(X > u)=1− F (u+ y)1− F (u)≈[(1 + ξ u+y−µσ)+(1 + ξ uµσ)+]−1/ξ=(1 + ξyσu)−1/ξ+, y > 0,where σu = σ + ξ(u− µ). From Definition 9, we can see that X − u|X > u ∼ GP (σu, ξ) foru large.Theorem A.4.1 (Pickands-Balkema-de Haan). For ξ ∈ R, the following are equivalent:711. F ∈ D(Gξ).2. There exists a positive function σ(·) such thatlimu↑xFsupy∈(0,xF−u)|Fu(y)−Hσ(u),ξ(y)| = 0,where Fu(y) := P(X − u ≤ y|X > u), y ≤ 0, u < xF ≤ ∞.The above theorem reveals duality between the limiting distribution of maxima and ex-cesses over a high threshold; if the normalized maxima have aGEV (µ, σ, ξ) distribution as thelimit, then conditional excesses over a limiting threshold have a generalized Pareto distributionwith the same shape parameter ξ.Once we obtain the maximum likelihood estimates of ξ and σu, the (1− p)-th quantile canbe estimated with:Qˆ1−p = u+σˆuξˆ((p/φˆu)−ξˆ − 1), ξˆ 6= 0,where φˆu = #{Xi > u}/n is the empirical estimate of the probability of exceedance overthreshold u.72Appendix BTables and FiguresB.1 Time series plots(a) AFL (b) ALL(c) AXP (d) BAC73(e) BEN (f) BK(g) GS (h) TROW(i) DJUSFNFigure B.1: Time series plots of daily losses for institutions and financial system.74(a) AFL (b) ALL(c) AXP (d) BAC(e) BEN (f) BK(g) GS (h) TROW75(i) DJUSFNFigure B.2: Time series plots of realized residuals for institutions and financial system.76B.2 χ(u) and χ¯(u) plots(a) BK-DJUSFN (b) AFL-DJUSFN(c) ALL-DJUSFN (d) AXP-DJUSFN(e) BEN-DJUSFN (f) GS-DJUSFN(g) TROW-DJUSFNFigure B.3: χ(u) and χ¯(u) plots for other seven institutions77B.3 Plots of CoVaR estimates against the sample fractionfor other seven institutions(a) BK-DJUSFN(b) AFL-DJUSFN78(c) ALL-DJUSFN(d) AXP-DJUSFN79(e) BEN-DJUSFN(f) GS-DJUSFN80(g) TROW-DJUSFNFigure B.4: Estimates of CoVaR as a function of k with raw data at level pn = (0.05, 0.05) fordifferent values of m.81(a) BK-DJUSFN(b) AFL-DJUSFN82(c) ALL-DJUSFN(d) AXP-DJUSFN83(e) BEN-DJUSFN(f) GS-DJUSFN84(g) TROW-DJUSFNFigure B.5: Estimates of CoVaR as a function of k with raw data at level pn = (0.02, 0.05) fordifferent values of m.85(a) BK-DJUSFN(b) AFL-DJUSFN86(c) ALL-DJUSFN(d) AXP-DJUSFN87(e) BEN-DJUSFN(f) GS-DJUSFN88(g) TROW-DJUSFNFigure B.6: Estimates of CoVaR as a function of ks2 with realized residuals at level pn =(0.02, 0.05). The vertical line represents the ks2 = ks1 = 230.89