{"http:\/\/dx.doi.org\/10.14288\/1.0380425":{"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool":[{"value":"Science, Faculty of","type":"literal","lang":"en"},{"value":"Statistics, Department of","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider":[{"value":"DSpace","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeCampus":[{"value":"UBCV","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/creator":[{"value":"Zhou, Menglin","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/issued":[{"value":"2019-08-14T16:05:36Z","type":"literal","lang":"en"},{"value":"2019","type":"literal","lang":"en"}],"http:\/\/vivoweb.org\/ontology\/core#relatedDegree":[{"value":"Master of Science - MSc","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeGrantor":[{"value":"University of British Columbia","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/description":[{"value":"The global financial crisis of 2007-2009 revealed the importance of systemic risk: the risk that may destabilize the global economy due to financial contagion. Accurate assessment of systemic risk would not only enable regulators to introduce suitable policies to mitigate the risk, but also allow individual institutions to monitor and mitigate their vulnerability. An effective measure of systemic risk should be able to capture the co-movements between a financial system (or market) and individual financial institutions. One popular measure of systemic risk is CoVaR. In this thesis, a methodology is proposed to compute dynamic forecasts of CoVaR semi-parametrically within the classical framework of multivariate extreme value theory (EVT). According to the definition, CoVaR can be viewed as a high quantile of a conditional distribution where the conditioning event corresponds to large losses of an institution. The idea of our methodology is to relate this conditional distribution to the tail dependence function. We develop an EVT-based framework to estimate CoVaR statically by combining parametric modelling of the tail dependence function to address the issue of data sparsity in the joint tail regions and semi-parametric univariate tail estimation techniques. The performance of the methodology is illustrated via simulation studies and real data examples.","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO":[{"value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/71285?expand=metadata","type":"literal","lang":"en"}],"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note":[{"value":"Extreme Value Approach to CoVaR EstimationbyMenglin ZhouB.Sc., Sun Yat-sen University, 2017A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Statistics)The University of British Columbia(Vancouver)August 2019c\u00a9Menglin Zhou, 2019The following individuals certify that they have read, and recommend to the Faculty of Grad-uate and Postdoctoral Studies for acceptance, the thesis entitled:Extreme Value Approach to CoVaR Estimationsubmitted by Menglin Zhou in partial fulfillment of the requirements for the degree of Masterof Science in Statistics.Examining Committee:Natalia Nolde, StatisticsSupervisorHarry JoeAdditional ExamineriiAbstractThe global financial crisis of 2007-2009 revealed the importance of systemic risk: the risk thatmay destabilize the global economy due to financial contagion. Accurate assessment of sys-temic risk would not only enable regulators to introduce suitable policies to mitigate the risk,but also allow individual institutions to monitor and mitigate their vulnerability. An effectivemeasure of systemic risk should be able to capture the co-movements between a financial sys-tem (or market) and individual financial institutions. One popular measure of systemic riskis CoVaR. In this thesis, a methodology is proposed to compute dynamic forecasts of Co-VaR semi-parametrically within the classical framework of multivariate extreme value theory(EVT). According to the definition, CoVaR can be viewed as a high quantile of a conditionaldistribution where the conditioning event corresponds to large losses of an institution. The ideaof our methodology is to relate this conditional distribution to the tail dependence function.We develop an EVT-based framework to estimate CoVaR statically by combining paramet-ric modelling of the tail dependence function to address the issue of data sparsity in the jointtail regions and semi-parametric univariate tail estimation techniques. The performance of themethodology is illustrated via simulation studies and real data examples.iiiLay SummaryOne of the popular measures of systemic risk is Conditional Value-at-Risk (CoVaR), whichcan capture co-movements between a financial system (or market) and individual financialinstitutions. CoVaR is defined as a high quantile of the conditional distribution of systemproxy such as a market index conditional on the event that an institution is experiencing alarge loss in excess of a high quantile or the so-called Value-at-Risk (VaR). In view of datasparsity and model uncertainty when dealing with extreme events, empirical estimates andfully parametric statistical inference are not reliable and an effective way is to rely on theasymptotic approximations in the spirit of extreme value theory (EVT). In this sense, thisthesis develops a flexible EVT-based framework to estimate CoVaR semi-parametrically bycombining parametric modelling of the tail dependence function to address the issue of datasparsity in the joint tail regions and nonparametric univariate tail estimation techniques.ivPrefaceThis thesis is original, unpublished work by the author, Menglin Zhou, under the supervisionof Professor Natalia Nolde. The research idea in Chapter 3 was proposed by Professors NataliaNolde and Chen Zhou. All simulations and analyses were designed and carried out by theauthor.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5vi2.1 Multivariate Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . 52.1.1 Univariate EVT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Hill estimator for the tail index . . . . . . . . . . . . . . . . . . . . . . 92.1.3 Extreme quantile estimation . . . . . . . . . . . . . . . . . . . . . . . 132.1.4 Multivariate extreme value distributions . . . . . . . . . . . . . . . . . 142.2 Tail Dependence Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.3 Estimation of the tail dependence function . . . . . . . . . . . . . . . . 203 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1 CoVaR Estimation in a Stationary Setting . . . . . . . . . . . . . . . . . . . . 233.2 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.1 Performance of the M-estimator of the TD Function . . . . . . . . . . . 293.2.2 Performance of the CoVaR Estimator . . . . . . . . . . . . . . . . . . 364 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.1 Backtesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51vii5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67A.1 Regular Variation of Skew-t Distribution . . . . . . . . . . . . . . . . . . . . . 67A.2 Proof of Proposition 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68A.3 Proof of Proposition 2.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69A.4 Peaks-Over-Threshold Method . . . . . . . . . . . . . . . . . . . . . . . . . . 71B Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.1 Time series plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73B.2 \u03c7(u) and \u03c7\u00af(u) plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77B.3 Plots of CoVaR estimates against the sample fraction for other seven institutions 78viiiList of TablesTable 2.1 Domain of attraction of the Fre\u00b4chet distribution . . . . . . . . . . . . . . . . 9Table 3.1 Summary statistics of proposed CoVaR estimates at level pn = 0.05. Themargins of the first four distributions are all standard Fre\u00b4chet distributionand the margins of the bivariate t distribution are all student t distribution. . 37Table 3.2 Summary statistics of proposed CoVaR estimates at level pn = 0.01. Themargins of the first four distributions are all standard Fre\u00b4chet distributionand the margins of the bivariate t distribution are all student t distribution. . 45Table 4.1 Unconditional coverage tests for VaR of institutions and CoVaR based onraw data. The X and Y variables for VaR and CoVaR are 100 \u00d7 log return.\u201cLog, HR, Bilog, Alog, t\u201d stands for logistic, Hu\u00a8sler-Reiss, bilogistic, asym-mertric and t distribution, respectively. En is the number of exceedances ofthe VaR estimate, and Ebn is the number of joint exceedances of VaR andCoVaR estimates. Moreover, T represents the test statistic value in (4.3). . . 55Table 4.2 Unconditional coverage tests for VaR of institutions and CoVaR based onrealized residuals at level pn = (0.02, 0.05). \u201cLog, HR, Bilog, Alog, t\u201dstands for logistic, Hu\u00a8sler-Reiss, bilogistic, asymmertric and t distribution,respectively. En is the number of exceedances of the VaR estimate, and Ebnis the number of joint exceedances of VaR and CoVaR estimates. Moreover,T represents the test statistic value in (4.3). . . . . . . . . . . . . . . . . . . 59ixTable 4.3 The average quantile scores S\u00af of CoVaR estimates based on realized resid-uals at level pn = (0.02, 0.05). . . . . . . . . . . . . . . . . . . . . . . . . 60xList of FiguresFigure 3.1 Plot of R(1, \u03b7) as a function of \u03b7 for the bivariate logistic distribution fordifferent values of \u03b8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.2 Plot of R(1, \u03b7) as a function of \u03b7 for the bivariate Hu\u00a8sler-Reiss distributionfor different values of \u03b8. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Figure 3.3 Plot of R(1, \u03b7) as a function of \u03b7 for the bivariate bilogistic distribution fordifferent values of \u03b1 and \u03b2. . . . . . . . . . . . . . . . . . . . . . . . . . . 27Figure 3.4 Plots of R(1, \u03b7) as a function of \u03b7 for the bivariate asymmetric logisticdistribution for different values of \u03b8, \u03c81, \u03c82. . . . . . . . . . . . . . . . . . 27Figure 3.5 Plot ofR(1, \u03b7) as a function of \u03b7 for the bivariate t distribution for differentvalues of \u03bd and \u03c1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Figure 3.6 The bias and RMSE of M-estimator of \u03b8 based on 100 samples of size 2000simulated from the bivariate logistic model with parameter \u03b8 = 0.6. . . . . 30Figure 3.7 The bias and RMSE of M-estimator based on 100 samples of size 2000simulated from the bivariate HR model with parameter \u03b8 = 2.5. . . . . . . 31Figure 3.8 The bias and RMSE of M-estimator based on 100 samples of size 2000simulated from the bivariate bilogistic model with parameter \u03b1 = 0.4, \u03b2 =0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32xiFigure 3.9 The bias and RMSE of M-estimator based on 100 replications of size 2500simulated for the asymmetric logistic model with parameter \u03b8 = 0.6, \u03c81 =0.5, \u03c82 = 0.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 3.10 The bias and RMSE of simultaneous M-estimator based on 100 samplesof size 3000 simulated from the bivariate t model with parameter \u03bd = 6,\u03c1 = 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34Figure 3.11 The sampling densities of estimated parameters based on 100 samples forcorresponding parametric models. . . . . . . . . . . . . . . . . . . . . . . 36Figure 3.12 The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2000 for the logistic modelwith parameter \u03b8 = 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Figure 3.13 The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2000 for the HR model withparameter \u03b8 = 2.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 3.14 The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2000 for the bilogistic modelwith parameters \u03b1 = 0.4, \u03b2 = 0.7. . . . . . . . . . . . . . . . . . . . . . . 39Figure 3.15 The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 2500 for the asymmetriclogistic model with parameters \u03b8 = 0.6, \u03c81 = 0.5, \u03c82 = 0.8. . . . . . . . . 40Figure 3.16 The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en atlevel pn = 0.05 based on 100 samples of size 3000 for t distribution withparameters \u03bd = 5, \u03c1 = 0.6. . . . . . . . . . . . . . . . . . . . . . . . . . . 40Figure 3.17 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the logistic modelwith parameter \u03b8 = 0.6. Threshold u is chosen as: u = Yn,n\u2212k with k = 450. 42xiiFigure 3.18 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the HR modelwith parameter \u03b8 = 2.5. Threshold u is chosen as: u = Yn,n\u2212k with k = 700. 43Figure 3.19 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the bilogisticmodel with parameter \u03b1 = 0.4 and \u03b2 = 0.7. Threshold u is chosen as:u = Yn,n\u2212k with k = 450. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 3.20 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for the asymmetriclogistic model with parameter \u03b8 = 0.6, \u03c81 = 0.5 and \u03c82 = 0.8. Thresholdu is chosen as: u = Yn,n\u2212k with k = 400. . . . . . . . . . . . . . . . . . . 44Figure 3.21 The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and enat level pn = 0.01 based on 200 samples of size 5000 for t distribution withparameter \u03bd = 5 and \u03c1 = 0.6. Threshold u is chosen as: u = Yn,n\u2212k withk = 150. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 4.1 Scatter plots of standardized daily losses (%) for time series introduced inSection 4.2 over the period from June 27, 2000 to May 9, 2019. . . . . . . 49Figure 4.2 \u03c7(u) and \u03c7\u00af(u) plots for the BAC-DJUSFN data. . . . . . . . . . . . . . . . 50Figure 4.3 Maximum likelihood estimates of \u03c7\u00af with 95% confidence bands based onthe profile likelihood for the ALL-DJUSFN data. . . . . . . . . . . . . . . 51Figure 4.4 Estimates of CoVaR as a function of k at level pn = (0.05, 0.05) for differ-ent values of m for original BAC-DJUSFN data. . . . . . . . . . . . . . . . 53Figure 4.5 Estimates of CoVaR as a function of k at level pn = (0.02, 0.05) for differ-ent values of m for original BAC-DJUSFN data. . . . . . . . . . . . . . . . 53Figure 4.6 Scatter plots of realized residuals from time series introduced in Section4.2 over the period from June 27, 2000 to May 9, 2019. . . . . . . . . . . . 56xiiiFigure 4.7 Estimates of CoVaR as a function of ks2 at level pn = (0.02, 0.05) forestimated residuals from BAC-DJUSFN data. The dotted vertical line rep-resent the ks2 = ks1 = 230. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Figure B.1 Time series plots of daily losses for institutions and financial system. . . . . 74Figure B.2 Time series plots of realized residuals for institutions and financial system. . 76Figure B.3 \u03c7(u) and \u03c7\u00af(u) plots for other seven institutions . . . . . . . . . . . . . . . 77Figure B.4 Estimates of CoVaR as a function of k with raw data at level pn = (0.05, 0.05)for different values of m. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Figure B.5 Estimates of CoVaR as a function of k with raw data at level pn = (0.02, 0.05)for different values of m. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Figure B.6 Estimates of CoVaR as a function of ks2 with realized residuals at levelpn = (0.02, 0.05). The vertical line represents the ks2 = ks1 = 230. . . . . . 89xivGlossaryABiasAFLALLAMSEAVarAXPBACBENBKCEVCLTCoVaRdfDJUSFNDOAESTEVDEVTGPGSHRAsymptotic BiasAFLAC INCALLSTATE CORPAsymptotic Mean Squared ErrorAsymptotic VarianceAMERICAN EXPRESS COBANK OF AMERICA CORPFRANKLIN RESOURCES INCBANK NEWYORK INCConditional Extreme ValueCentral Limit TheoremConditional Value-at-Riskdistribution functionDow Jones US Financials IndexDomain of AttractionExtended Skew-t DistributionExtreme Value DistributionExtreme Value TheoryGeneralized Pareto DistributionGOLDMAN SACHS GROUP INCHu\u00a8sler-ReissxvMESMEVPOTRMSESESSTTDTROWVaRMarginal Expected ShortfallMultivariate Extreme ValuePeaks Over ThresholdRoot Mean Squared ErrorSystemic Expected ShortfallSkew-t DistributionTail DependenceT ROWE PRICE GROUP INCValue-at-RiskxviAcknowledgmentsFirst and foremost I would like to thank my supervisor Natalia Nolde. It is quite lucky to be herstudent, as she is not only knowledgable in statistics, but also patient with students. I appreciateall her contributions of time, ideas, and funding to my research. The joy and enthusiasm shehas for her research inspire me to further pursue a Ph.D. degree. Natalia, many thanks for yourkind support and guidance.I would also love to thank my family for all their love. I am grateful for my mom. Herunfailing love and unconditional support in all of my pursuits are invaluable. I would like tothank my boyfriend Xiaotian Zhan for taking care of me when I suffer. His understandingencourage me to move forward.Finally, I am grateful to my friends in UBC and SYSU. Thanks for their inspirations whenI came cross research problems. Thanks for their accompany when I felt lonely in this foreignland. I have been very fortunate to be friend with so many incredible people.xviiChapter 1IntroductionThe global financial crisis of 2007-2009 alerted us to the importance of systemic risk: the riskor the possibility of breakdowns in an entire system, as opposed to breakdowns in individualparts or components, that can be contained without harming the entire system (Kaufman andScott [2003]). Systemic risk is argued to be a particular feature of financial systems and mayhave significant adverse effects on the global economy due to financial contagion (De Bandtand Hartmann [2000]). The spillover losses to the U.S. and some foreign banks when theHerstatt Bank in Germany failed and was closed by the authorities in 1974 are typical examples.To mitigate the risk spillovers and keep the stability of the financial system, there is a clear needfor effective measurement of systemic risk.Kaufman and Scott [2003] point out that systemic risk is evidenced by the co-movements(correlation) among most or all the components in an entire system. It is followed that an ef-fective measure of systemic risk should be able to capture co-movements between a financialsystem (or market) and individual financial institutions. Benoit et al. [2017] give a comprehen-sive survey of systemic risk, reviewing measures of systemic risk and connecting them to thecurrent regulatory debate. Acharya et al. [2017] calculate Marginal Expected Shortfall (MES)and Systemic Expected Shortfall (SES) by using equity returns of a financial institution. MESis an institution\u2019s average loss when the financial system is in its left tail, and SES extendsMES by calculating the weighted average of the institution\u2019s MES and its leverage. Acharyaet al. [2012] propose SRISK measure, which corresponds to the expected capital shortfall ofa given financial institution, conditional on a system crisis. The firms with the largest capitalshortfall are considered the most systemically risky from the SRISK prospective.1In addition to the measures introduced above, this report considers another popular measureof systemic risk \u2013 the Conditional Value-at-Risk (CoVaR). Adrian and Brunnermeier [2011]define CoVaR as a high quantile of the conditional distribution of one institution conditionalon the event that another institution is experiencing a large loss being at a high quantile or theso-called Value-at-Risk VaR. VaR is a widely-used risk measure by financial institutions. Fora random variable X , VaR at confidence level 1 \u2212 p is defined as the (1 \u2212 p)-quantile of theunderlying distribution:VaRX(1\u2212 p) = infx{P(X \u2264 x) \u2265 1\u2212 p} , p \u2208 (0, 1). (1.1)By considering a specific case where the first institution is the financial system, it is able toassess the impact of a financial institution\u2019s large losses to systemic risk.Girardi and Ergu\u00a8n [2013] modify the definition of CoVaR by specifying the conditionalevent as a loss in excess of the VaR, rather than being exactly at the VaR level. LetX \u223c F1 andY \u223c F2 denote, respectively, losses for a financial institution and a system proxy. Followingthe modified definition, the CoVaRY |X at confidence level 1\u2212 p is defined as:P{Y \u2265 CoVaRY |X(1\u2212 p)|X \u2265 VaRX(1\u2212 p)}= p, p \u2208 (0, 1). (1.2)This modification allows us to consider more severe distress events of institutions and to back-test CoVaR estimates with the tests used for VaR (Girardi and Ergu\u00a8n [2013]). In addition,Mainik and Schaanning [2014] show that this change leads to the dependence consistency ofCoVaR, as it allows CoVaR to hold a monotonic relationship with the Pearson correlation co-efficient for elliptical distributions. In other words, as the losses of an institution become morecorrelated with the financial system, its systemic risk will increase.CoVaR is actually a high quantile of the conditional distribution where the conditioningevent corresponds to large losses. From the definition of CoVaR, one possible way to esti-mate CoVaR is to estimate the conditional probability P{Y \u2265 y|X \u2265 x} directly. However,when x is large, empirical estimates of this conditional probability might not be sensible sincethere are too few, even no, observations falling in the region of interest. Moreover, the fullyparametric statistical inference (see e.g. Girardi and Ergu\u00a8n [2013]) focuses on the central partof the data whereas actual interest is in the tails. In situations dealing with extreme events,such as risk management in insurance, finance, and hydrology, an effective approach is to relyon asymptotic models in the spirit of extreme value theory (EVT); see e.g., Embrechts et al.[1997].2A conditional extreme value (CEV) model was proposed by Heffernan and Tawn [2004],followed by Heffernan and Resnick [2007] and Das and Resnick [2011]. Classical multivari-ate extreme value (MEV) models have limitations in the application when interest is in a tailregion rather than the joint tail region. The CEV model addresses this issue by conditioningon one component of the random vector and finding the limiting conditional distribution of theremaining components as the conditioning variable becomes large. Moreover, the CEV modelrelaxes the assumption of MEV model that the distribution of (X, Y ) is in a multivariate do-main of attraction (DOA) to that when only one of the marginal distributions is in the univariateDOA. Abdous et al. [2005] estimate P{Y < y|X \u2265 x} when x is large for the class of ellip-tical distribution; Nolde and Zhang [2018] extend their methodology to a more general classof skew-elliptical distributions, which can describe the asymmetry for asset pricing and riskmanagement in finance and insurance, and put it into a semi-parametric EVT-based frameworkfor CoVaR estimation.Some limitations of the approach in Nolde and Zhang [2018] are the requirement of multi-variate regular variation, which in particular imposes the restriction of the same tail index forboth the institutional and system losses, and a somewhat restrictive parametric assumption onthe extremal dependence structure. While in some cases these assumptions may be viable, ingeneral, system losses tend to be lighter-tailed than those of individual institutions based onour empirical analysis, and exhibit a greater variety of dependence structures. Our aim is toexplore a more flexible framework to address CoVaR estimation.The idea of our methodology is to link the definition of CoVaR in (1.2) with the tail depen-dence function introduced in Nikoloulopoulos et al. [2009] and Joe et al. [2010]. Assume thatthe distribution function of the random pair (X, Y ) is in the DOA of a bivariate extreme valuedistribution and random variable Y has a positive tail index. Following de Haan and Ferreira[2006], we havelimu\u21920P{Y > F\u221212 (1\u2212 uy), X > F\u221211 (1\u2212 ux)}u= R(x, y), x, y > 0. (1.3)where R is the upper tail dependence function. The background information on multivariateEVT including concept of domain of attraction and tail dependence function will be providedin Chapter 3. Moreover, we assume that there exists a constant \u03b7 such thatCoVaRY |X(1\u2212 p) = VaRY (1\u2212 p\u03b7), (1.4)3which allows us to rewrite equation (1.2) as follows:P {Y > VaRY (1\u2212 p\u03b7), X > VaRX(1\u2212 p)}p= p. (1.5)With the positive dependence relationship between X and Y , we have 0 < \u03b7 < 1 and \u03b7 = \u03b7pdepends on the value of p. In an extreme value setting where the confidence level p is close tozero, combining (1.3) and (1.5) impliesP {Y > VaRY (1\u2212 p\u03b7), X > VaRX(1\u2212 p)}p=P {F2(Y ) > 1\u2212 p\u03b7, F1(X) > 1\u2212 p}p\u2248 R(1, \u03b7).Hence, we are looking for an \u03b7\u2217 such that R(1, \u03b7\u2217) = p. An estimator of CoVaR can then beobtained based on equation (1.4). Due to data sparsity, efficiency can be gained by assuming aparametric model for the tail dependence function R(x, y) = R(x, y;\u03b8). We apply a methodof moment to estimate the unknown parameters in R(x, y;\u03b8); see e.g., Einmahl et al. [2008]and Einmahl et al. [2012]. This moment estimator is shown to be consistent and asymptoticallynormal under weak conditions. The tail index parameter \u03b3 > 0 for the system proxy Y can beestimated using the Hill estimator (Hill [1975]), in which the sampling fraction is automaticallyselected with a two-step subsample bootstrap method of Danielsson et al. [2001].This report is organized as follows. Chapter 2 reviews relevant definitions and results thatwill be used as a basis for the proposed methodology. Chapter 3 give details of CoVaR es-timation, and illustrate performance of the proposed methods via several simulation studies.Chapter 4 gives an application of the developed estimator to financial data. Finally, Chapter5 presents some concluding remarks and outlines directions for future research. Proofs andadditional figures are delegated to the Appendixes.4Chapter 2BackgroundThe aim of this chapter is to review relevant definitions and results from the literature that willbe used in the sequel as a basis for the proposed methodology for CoVaR estimation, includingextreme value theory, Hill estimation for the tail index and extreme quantile estimation.2.1 Multivariate Extreme Value TheoryMultivariate extreme value theory (EVT) provides the main probabilistic framework that wewill adopt. Intuitively, EVT deals with extreme events which occur with very small proba-bility. The central result of EVT is the Fisher-Tippett theorem (see Theorem 2.1.1), which isdeveloped in parallel with the central limit theorem (CLT). Suppose we have a sequence of in-dependent and identically distributed (i.i.d.) non-degenerated random variables X1, X2, ...Xn.The general CLT concerns the limit laws of sample sums X1 + X2 + ... + Xn when properlynormalized and centered, whereas the Fisher-Tippett theorem investigates the limit laws ofsample maxima max{X1, X2, ..., Xn} or min{X1, X2, ..., Xn} when properly normalized andcentered (for more details, see Embrechts et al. [1997]).The EVT has been widely used in many fields, especially in financial risk management,which is closely related to tail probabilities and quantiles. The ability to manage extremefinancial risks (e.g., currency crises, stock market crashes and large bond defaults) effectivelyis in fact the ability to assess the extreme probabilities and quantiles accurately (Diebold et al.5[2000]). For example, in the credit or operational risk management, the goal is to determinethe risk capital we require as a cushion against irregular losses. In the problem of portfolioselection, a safe criterion is to select the assets which minimize the probability of a returnbelow a prespecified threshold return level. Traditional statistical methods that based on theentire dataset produce a good fit in central region but is biased to the assessment of tail regions,where fewer or even no observations fall. In this sense, it seems that EVT can help us to buildmore appropriate statistical models describing extreme events in financial risk management.2.1.1 Univariate EVTWe begin by recalling the key results from univariate EVT. For more details, please refer toResnick [1987] or de Haan and Ferreira [2006].Definition 1. Suppose X1, ..., Xn are i.i.d. random variables with distribution function (df) F .Let Mn := max{X1, ..., Xn} denotes the sample maximum. The df F is said to belong to the(maximum) domain of attraction of df G (written F \u2208 D(G)) if(i) G is non-degenerate, and(ii) There exist an > 0 and bn \u2208 R such thatP{Mn \u2212 bnan\u2264 x}= F n(anx+ bn)\u2192G(x), n\u2192\u221e, for all x \u2208 C(G),where C(G) denotes the set of all continuity points of G.In the CLT, with finite variance, the non-degenerate limit distribution is found to be thenormal distribution. In the EVT, Fisher-Tippett theorem help us to identify the class of limit dfG.Theorem 2.1.1 (Fisher and Tippett [1928], Gnedenko [1943]). Suppose F \u2208 D(G). Then G isone of the following types1:1. Type I, Gumbel: \u039b(x) = exp{\u2212e\u2212x}, x \u2208 R;2. Type II, Fre\u00b4chet: \u03a6\u03b1(x) ={0, x < 0exp{\u2212x\u2212\u03b1}, x \u2265 0, \u03b1 > 0;1Two dfs U(x) and V (x) are of the same type if for some A > 0, B \u2208 R, we have V (x) = U(Ax + B) forall x.63. Type III, Weibull: \u03a8\u03b1(x) ={exp{\u2212(\u2212x)\u03b1}, x < 01, x \u2265 0, \u03b1 > 0.These three types of distributions are called the class of Extreme Value Distributions (EVD).For statistical purposes, Von Mises [1936] and Jenkinson [1955] generalize and unify theabove three types of extreme value distributions into one family, named generalized extremevalue distribution (GEV (\u00b5, \u03c3, \u03be)), whose df is given byG\u03be(x\u2212 \u00b5\u03c3)=\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3exp{\u2212e\u2212(x\u2212\u00b5)\/\u03c3}, \u03be = 0,exp{\u2212(1 + \u03bex\u2212 \u00b5\u03c3)\u22121\/\u03be}, \u03be 6= 0 1 + \u03be x\u2212 \u00b5\u03c3> 0.where \u00b5 \u2208 R is the location parameter, \u03c3 > 0 is the scale parameter and \u03be \u2208 R is termed as theshape parameter or tail index. It is clear that when \u03be > 0, G\u03be is of the same type as the Fre\u00b4chetdf with \u03be = 1\/\u03b1; when \u03be < 0, G\u03be is of the same type as the Weibull df with \u03be = \u22121\/\u03b1. AndG0 is of the same type as the Gumbel df.Definition 2. A non-degenerate df F is max-stable if for X1, ..., Xn i.i.d. with df F , there existan > 0 and bn \u2208 R such that Mn d= anX1 + bn; i.e., for each n, Mn and X1 are of the sametype.Clearly, the Gumbel df is max-stable with an = 1 and bn = \u2212 log n; the Fre\u00b4chet df is max-stable with an = n1\/\u03b1(\u03b1 > 0) and bn = 0; the Weibull df is max-stable with an = n1\/\u03b1(\u03b1 < 0)and bn = 0.In the application of risk management, the data often show a pattern of heavy tails, suchas record-breaking insurance losses and financial log-returns. Heavy tail is a characteristic ofphenomena where the probability of a huge value is relatively big. An effective tool for dealingwith heavy-tail phenomena is the theory of regularly varying functions, so here we give somebrief introductions about regular variation (for more details, see Resnick [2007]).Definition 3. A positive measurable function h on (0,\u221e) is regularly varying at infinity withindex \u03c1 \u2208 R (written h \u2208 RV\u03c1) if for all x > 0,limt\u2192\u221eh(tx)h(t)= x\u03c1,where \u03c1 is the exponent of variation.7If \u03c1 = 0, we call the function slowly varying. For example, function L(x) = log x is aslowly varying function, as limx\u2192\u221e log(tx)\/ log x = 1. Moreover, if function h \u2208 RV\u03c1, thenh(x) = L(x) \u00b7 x\u03c1, where L is a slowly varying function.With the definition above, we can see that the distributions we concerned in application offinancial risk management are actually the distributions whose tails are regularly varying.Definition 4. A random variable X with df F is said to be regularly varying with index \u03b3 > 0(written 1\u2212 F \u2208 RV\u2212\u03b3) if for every x > 0,limt\u2192\u221eP(X > tx)P(X > t)= x\u2212\u03b3.Intuitively, a random variable is regularly varying when its associated distribution has aheavy tail which decays according to a power law with exponent\u2212\u03b3. In probability theory, theparameter \u03b3 > 0 is often called the tail index.Example 2.1.1. The Student\u2019s t distribution with degree of freedom \u03bd has a density function:f(x) =\u0393(\u03bd+12)\u221a\u03bdpi\u0393(\u03bd\/2)(1 +x2\u03bd)\u2212 \u03bd+12, x \u2208 R.It is easy to verify that f \u2208 RV\u2212\u03bd\u22121. For every x \u2208 R, we havelimt\u2192\u221e1\u2212 FT (tx; \u03bd)1\u2212 FT (t; \u03bd) = limt\u2192\u221exf(tx)f(t)= x\u2212\u03bd ,which means that the tail of Student\u2019s t distribution with degree of freedom \u03bd is regularlyvarying with tail index \u03bd.Theorem 2.1.2 (Gnedenko [1943]). F \u2208 D(\u03a6\u03b1) if and only if 1 \u2212 F \u2208 RV\u2212\u03b1. In this case,F n(anx)\u2192\u03a6\u03b1(x) with an = (1\/(1\u2212 F ))\u22121 (n).Theorem 2.1.2 identifies that Frchet domain of attraction characterizes heavy-tailed dis-tributions with a regularly-varying upper tail. To be specific, a random variable that has aheavy-tailed df F should be regularly varying and the df F is in the Fre\u00b4chet domain of at-traction. This connection will help us to construct our semi-parametric framework of CoVaRestimation. Here, we give four examples that are in the Fre\u00b4chet domain of attraction in Table2.1[Embrechts et al. [1997]].8Table 2.1: Domain of attraction of the Fre\u00b4chet distributionCauchy distributionf(x) = (pi(1 + x2))\u22121, x \u2208 R.an = n\/pi.Pareto distributionf(x) =\u03b2k\u03b2x\u03b2+1, x \u2265 k; k, \u03b2 > 0.an = (kn)1\/\u03b2.Burr distributionf(x) =ckxc\u22121(1 + xc)k+1, , x > 0; c, k > 0.an = (n1\/k \u2212 1)1\/c.Loggamma distributionf(x) =\u03b1\u03b2\u0393(\u03b2)(lnx)\u03b2\u22121x\u2212\u03b1\u22121, x > 1; \u03b1, \u03b2 > 0.an =((\u0393(\u03b2))\u22121(lnn)\u03b2\u22121n)1\/\u03b1.2.1.2 Hill estimator for the tail indexIn this section, we introduce the widely used Hill estimator (Hill [1975]) of the tail index in thequantile view (for detail, see Beirlant et al. [2006]). Let X1, X2, ..., Xn be independent randomvariables with common df F .Consider the mean excess function defined as:e(t) = E(X \u2212 t|X > t).The mean excess function can be estimated empirically by:e\u02c6n(t) =\u2211ni=1Xi1(t,\u221e)(Xi)\u2211ni=1 1(t,\u221e)(Xi)\u2212 t,where 1(t,\u221e)(Xi) equals 1 if Xi > t, and 0 otherwise. Let Xn,1 \u2264 Xn,2 \u2264 ... \u2264 Xn,n be theorder statistics and take t = Xn,n\u2212k, k = 1, .., n\u22121. The estimator of the mean excess functionat Xn,n\u2212k can be rewritten asek,n = e\u02c6n(Xn,n\u2212k) =1kk\u2211i=1Xn,n\u2212i+1 \u2212Xn,n\u2212k. (2.1)In addition to being the estimator of e(t) at some specific values of t, ek,n can also be interpretedas an estimator of the slope of the exponential Q-Q plot to the right of a reference point with9coordinates (\u2212 log(k\/n), Xn\u2212k,n).Suppose the df F has a regularly varying tail, that is1\u2212 F (x) = x\u22121\/\u03b3L1(x), \u03b3 > 0, (2.2)where L1 is a slowly varying function and 1\/\u03b3 is the tail index. In this case, the distribution isidentified as Pareto-type distribution, which includes Burr distribution, Fre\u00b4chet distribution andlog-gamma distribution. Define Q(p) as the quantile function: Q(p) := inf{x : F (x) \u2265 p}.An equivalent formula of (2.2) is:Q(1\u2212 1\/x) = x\u03b3L2(x), \u03b3 > 0, (2.3)where L2(x) is also a slowly varying function and is linked with L1 via de Bruyn conjugation2(see Proposition 2.5 in Beirlant et al. [2006]). Since for every slowly varying function L2(x),we have logL2(1\/p)\/ log(1\/p)\u21920 as p\u21920 (see Proposition 2.4 in Beirlant et al. [2006]), thenthe Pareto-type distribution would satisfylimp\u21920logQ(1\u2212 p)\u2212 log p = \u03b3.It means that the Pareto Q-Q plot, which is the exponential Q-Q plot based on the log-transformeddata, is ultimately linear with slope \u03b3 near the largest observations and we could use the meanexcess values introduced in equation (2.1) of the log-transformed data to estimate the slope.Then we could obtain the well-known Hill estimator as:\u03b3n(k) =1kk\u2211i=1logXn,n\u2212i+1 \u2212 logXn,n\u2212k, (2.4)where k = kn \u2208 {1, ..., n} is an intermediate sequence; that is, kn\u2192\u221e and kn\/n\u21920 as n\u2192\u221e.Assume the df F satisfies the second-order condition (de Haan and Resnick [1996]): thereexists a function A\u2217, not changing sign near infinity, such that for x > 0limt\u2192\u221e((1\u2212 F (tx)1\u2212 F (t) \u2212 x\u22121\/\u03b3)\/A\u2217(t))= x\u22121\/\u03b3x\u03b2\/\u03b3 \u2212 1\u03b2\/\u03b3,2If l(x) is slowly varying function, then there exists an slowly varying function l\u2217(x), the de Bruyn conjugateof l, such that l(x)l\u2217 (xl(x))\u21921, as x\u2192\u221e. The de Bruyn conjugate is asymptotically unique in the sense that ifalso is slowly varying and l(x)l\u02dc (xl(x))\u21921, then l\u2217 \u223c l\u02dc. Furthermore (l\u2217)\u2217 \u223c l.10where \u03b2 \u2264 0 is the second-order parameter. A reformulated version of this condition with theinverse function U of 1\/(1 \u2212 F ) is: there exists a function A, not changing sign near infinity,such thatlimt\u2192\u221eU(tx)\/U(t)\u2212 x\u03b3A(t)= x\u03b3x\u03b2 \u2212 1\u03b2, (2.5)where function |A| is regularly varying at infinity with index \u03b2.Using notation An,k = A(n+1k+1 ), the asymptotic bias of the Hill estimator can be expressedas:ABias(\u03b3n(k)) \u223c An,k 1kk\u2211i=1( ik + 1)\u2212\u03b2\u223c An,k1\u2212 \u03b2 , k, n\u2192\u221e, k\/n\u21920.The asymptotic variance of the Hill estimator isAVar(\u03b3n(k)) \u223c \u03b32k, k, n\u2192\u221e, k\/n\u21920.Notice that the bias will be small only if An,k is small, which in turn is k to be small, andthe variance will be small if k is large. The asymptotic normality of the Hill estimator can beexpected when k, n\u2192\u221e, k\/n\u21920 and if\u221akAn,k\u21920,\u221ak(\u03b3n(k)\/\u03b3 \u2212 1) d\u2192 N(0, 1).An important step in Hill estimation is to choose the value of k. An ideal method is toselect k by minimizing the asymptotic mean squared error of \u03b3n(k), which is defined ask0(n) := arg minkAMSE(n, k) = arg minkAsyE(\u03b3n(k)\u2212 \u03b3)2. (2.6)The main problem is that there is an unknown parameter \u03b3 in (2.6). Due to this considera-tion, Danielsson et al. [2001] propose to select k by minimizing AsyE (Mn(k)\u2212 2(\u03b3n(k))2)2,where Mn(k) = 1k\u2211ki=1(logXn,n\u2212i+1 \u2212 logXn,n\u2212k)2. They show that under some conditions(see Theorem 1 and 2 in Danielsson et al. [2001]), the k-value that minimizes AMSE(n, k)and the k-value that minimizes Asy E (Mn(k)\u2212 2(\u03b3n(k))2)2 are of the same general order(with respect to n). Moreover, in order to yield an AMSE estimator which is asymptotic toAMSE(n, k), they use a two-step subsample bootstrap algorithm, which is shown in Algorithm1 below.11Suppose drawing A\u2217ni = {X\u22171 , ..., X\u2217ni}(ni < n, i = 1, 2) from An = {X1, ..., Xn} withreplacement and X\u2217ni,1 \u2264 ... \u2264 X\u2217ni,ni denote the order statistics of A\u2217ni . Define\u03b3\u2217ni(ki) =1kik1\u2211j=1logX\u2217ni,ni\u2212j+1 \u2212 logX\u2217ni,ni\u2212ki ;M\u2217ni(ki) =1kiki\u2211j=1(logX\u2217ni,ni\u2212j+1 \u2212 logX\u2217ni,ni\u2212ki)2;L(ni, ki) = E((M\u2217ni(ki)\u2212 2(\u03b3\u2217ni(ki))2)2 |An) .Suppose k\u2217i,0(ni) minimizes L(ni, ki). Then the optimal choice of k in Danielsson et al.[2001] is defined ask\u02c60(n) =(k\u22171,0(n1))2k\u22172,0(n2)((log k\u22171,0(n1))2(2 log n1 \u2212 log k\u22171,0(n1))2)logn1\u2212log k\u22171,0(n1)\/ logn1. (2.7)Algorithm 1 Two-step Subsample Bootstrap Method1: Input step size h and the number of bootstrap samples B; set N = d(n\u2212\u221an)\/he;2: for each integer j \u2208 [1, N ] do3: set n1,j =\u221an+ (j \u2212 1)h, and draw B bootstrap samples of size n1,j from An;4: calculate L(n1,j, k1) at each integer k1 \u2208 [1, n1,j], and find the k\u22171,0(n1,j) that minimizesL(n1,j, k1);5: set n2,j = (n1,j)2\/n, and draw B bootstrap samples of size n2,j from An;6: calculate L(n2,j, k2) at each integer k2 \u2208 [1, n2,j], and find the k\u22172,0(n2,j) that minimizesL(n2,j, k2);7: Calculate R(n1,j) = L(n1,j, k\u22171,0)2\/L(n2,j, k\u22172,0).8: end for9: set n1 = arg minn1,j R(n1,j), and repeat step 4, 5, 6 to get k\u22171,0(n1) and k\u22172,0(n2);10: Calculate k\u02c60(n) with equation (2.7).From Danielsson et al. [2001], the tail estimator \u03b3n(k\u02c60) based on k\u02c60(n) above will have thesame asymptotic efficiency as \u03b3n(k0) based on k0(n) defined in (2.6). All the details are shownin Theorems 4, 6 and Corollaries 5, 7 in Danielsson et al. [2001],122.1.3 Extreme quantile estimationRecall equation (1.1) in Section 1, the VaR at a confidence level 1\u2212 p is defined as the (1\u2212 p)-quantile, so estimating VaR is actually estimating a quantile. Following Beirlant et al. [2006],by assuming that the ultimate linearity of the Pareto quantile plot persists from the largest kobservations till infinity, we can summarize the quantile plot(\u2212 log i\u22121n, logXn,n\u2212i+1), i =1, ..., k + 1 with line:y = logXn,n\u2212k + \u03b3\u02c6n(k)(x+ logkn),where \u03b3\u02c6n(k) is the Hill estimator of \u03b3 (see equation (2.4)).To get an estimator of Q(1 \u2212 p), take x = \u2212 log p, and the estimator (first proposed byWeissman [1978]) is given byQ\u02c6k,1\u2212p = exp{logXn,n\u2212k + \u03b3\u02c6n(k)(\u2212 log p+ log kn)}= Xn,n\u2212k( knp)\u03b3\u02c6n(k). (2.8)Denote the asymptotic expectation operator by E\u221e. When p = pn\u21920 and npn\u2192c > 0Beirlant et al. [2006] conclude that,E\u221e(logQ\u02c6k,1\u2212pQ(1\u2212 p))\u223c An,k1\u2212 \u03b2 logknp+ An,k1\u2212 (npk)\u2212\u03b2\u03b2andAVar(log Q\u02c6k,1\u2212p) \u223c \u03b32k(1 + log2knp), k, n\u2192\u221e, k\/n\u21920.Furthermore, when k, n\u2192\u221e and k\/n\u21920 such that\u221akE\u221e(logQ\u02c6k,1\u2212pQ(1\u2212p))\u21920,\u221ak(1 + log2knp)\u22121\/2( Q\u02c6k,1\u2212pQ(1\u2212 p) \u2212 1)d\u2192 N(0, \u03b32).132.1.4 Multivariate extreme value distributionsThe problem of CoVaR estimation is inherently bivariate, so it is necessary for us to extendunivariate EVT in Section 2.1.1 to the multivariate setting. For the univariate case, the orderingprinciple is clear and unambiguous, but ordering is not unique in multivariate setting. Barnet[1976] discusses several different categories of order relations for multivariate data, and themost popular one is called marginal ordering: for d-dimensional vectors x = (x1, ..., xd) andy = (y1, ..., yd), the relation x \u2264 y is defined as xj \u2264 yj for all j = 1, ..., d. With thisordering principle, the component-wise maximum of random vectors x and y is defined asx \u2228 y := (x1 \u2228 y1, ..., xd \u2228 yd).Definition 5. Consider a sample of i.i.d d-dimensional observations Yi \u2208 Rd(i = 1, ..., n)with df F and margins Fj, j = 1, .., d. The sample maximum Mn = (Mn,1, ...Mn,d) is definedas a vector of component-wise maxima (i.e. Mn =\u2228ni=1 Yi). The multivariate df F is said tobelong to the (maximum) domain of attraction of a multivariate df G (written as F \u2208 Dd(G)),if there exist Rd-sequences an > 0 and bn such that for all continuity points y of G,limn\u2192\u221eP(Mn \u2212 bnan\u2264 y)= limn\u2192\u221eF n(any + bn) = G(y), (2.9)where G is the limit distribution with non-degenerate margins Gj , any = (an,1y1, ..., an,dyd)and (Mn \u2212 bn)\/an = ((Mn,1 \u2212 bn,1)\/an,1, ..., (Mn,d \u2212 bn,d)\/an,d). Then, G is called multi-variate extreme value df.By setting yi =\u221e, \u2200i 6= j in (2.9), we have for j = 1, ..., d,limn\u2192\u221eF nj (an,jyj + bn,j) = Gj(yj).This indicates that each margins Gj of G is a univariate extreme value df and Fj \u2208 D(Gj).Example 2.1.2. A d-dimensional random vector Y follows a multivariate logistic distributionwith dependence parameter \u03b8 \u2208 (0, 1] if its joint df isG(y) = exp{\u2212( d\u2211j=1z1\/\u03b8j)\u03b8}, zj \u2265 0,where zj = {1 + \u03bej(yi \u2212 \u00b5i)\/\u03c3i}\u22121\/\u03bei (see Gumbel [1960]).Example 2.1.3. Let B be the nonempty subsets of {1, 2, ..., d}. Let B1 = {b \u2208 B : |b| = 1},where |b| is the number of elements in the set b, and let B(j) = {b \u2208 B : j \u2208 b}. The14d-dimensional multivariate asymmetric logistic df (see Tawn [1990]) is given byG(y) = exp{\u2212\u2211b\u2208B[\u2211j\u2208b(\u03c8j,bzj)1\/\u03b8b]\u03b8b},where zj = {1 + \u03bej(yi \u2212 \u00b5i)\/\u03c3i}\u22121\/\u03bei , the dependence parameters \u03b8b \u2208 (0, 1] for all b \u2208B \\ B1, and the asymmetry parameters \u03c8j,b \u2208 [0, 1] for all b \u2208 B and j \u2208 b. The constraints\u2211j\u2208B(j) \u03c8j,b = 1 ensure that the marginal distributions are generalized extreme value. Themodel contains 2d \u2212 d\u2212 1 dependence parameters and d(2d\u22121 \u2212 1) asymmetry parameters.2.2 Tail Dependence FunctionIn addition to characterizing the marginal distributions of multivariate extremes, another im-portant aspect of studying multivariate extremes is their dependence structure. There existsa great variety of equivalent descriptions of extreme value dependence structures (for moredetails, refer to Beirlant et al. [2006]), and the tail dependence (TD) function is one of the pop-ular ways. As the CoVaR estimation problem involves bivariate random vectors, we restrictour attention to the TD function in the bivariate setting.2.2.1 DefinitionConsider a random vector (X, Y ) with df F and margins F1, F2. Let U = (U1, U2) =(F1(X), F2(Y )). Obviously, the random vector U has standard uniform margins.Definition 6. The df F is said to have the upper TD function R(x, y) if for all x, y > 0, thefollowing limit exists:limu\u21920P{U1 \u2265 1\u2212 ux, U2 \u2265 1\u2212 uy}u= limu\u21920P{1\u2212 F1(X) \u2264 ux, 1\u2212 F2(Y ) \u2264 uy}u=R(x, y).(2.10)In addition to the upper TD function we use in this study, another popular TD function isthe stable TD function introduced by Huang [1992].15Definition 7. The df F is said to have a stable TD function l(x, y) if for all x, y > 0, thefollowing limits exists:limu\u21920P{U1 \u2265 1\u2212 ux or U2 \u2265 1\u2212 uy}u= limu\u21920P{1\u2212 F1(X) \u2264 ux or 1\u2212 F2(Y ) \u2264 uy}u=l(x, y).(2.11)It is easy to find the stable TD function for EVDs. Suppose the random vector (X, Y ) hasa joint df G. Then the stable TD function can be expressed as:l(x, y) = \u2212 logG (G\u221211 (e\u2212x), G\u221212 (e\u2212y)) , (2.12)where G1 and G2 are the margins of G. Therefore, we can obtain the upper TD function forthe bivariate extreme value distribution function with formula:R(x, y) = x+ y \u2212 l(x, y). (2.13)The upper TD function is differentiable almost surely and homogeneous of degree one3 (seee.g., Nikoloulopoulos et al. [2009]). With Euler\u2019s theorem4 (Wilson [1912]) on homogeneousfunctions, the upper TD functions can be written as:R(x, y) = x\u2202R(x, y)\u2202x+ y\u2202R(x, y)\u2202y,where partial derivatives \u2202R\/\u2202x, \u2202R\/\u2202y are homogeneous of order 0 and bounded. Withthe sufficient condition of continuous second-order partial derivatives, the order of limits anddifferentiation can be exchanged. Then we have (see (2.3), Nikoloulopoulos et al. [2009])\u2202R(x, y)\u2202x= limu\u21920P{U2 > 1\u2212 uy|U1 = 1\u2212 ux},and\u2202R(x, y)\u2202y= limu\u21920P{U1 > 1\u2212 ux|U2 = 1\u2212 uy}.3If f : V\u2192W is a function between two vector spaces on a field F , and k is an integer, then f is said to behomogeneous of degree k if f(\u03b1v) = \u03b1kf(v) for all \u03b1 \u2208 F and v \u2208 V .4Let f : Rn+\u2192R be continuous and also differentiable on Rn+. Then f is homogeneous of degree k if and onlyif for all x \u2208 Rn+, kf(x) =\u2211ni=1 xi\u2202f\u2202xi.16Therefore, the upper TD function can be rewritten asR(x, y) = x limu\u21920P{U2 > 1\u2212 uy|U1 = 1\u2212 ux}+ y limu\u21920P{U1 > 1\u2212 ux|U2 = 1\u2212 uy}= x limu\u21920P{Y > Q1(1\u2212 uy)|X = Q2(1\u2212 ux)}+ y limu\u21920P{X > Q1(1\u2212 ux)|Y = Q2(1\u2212 uy)},(2.14)where Q1 and Q2 are the lower quantile functions of the marginal distributions for X and Y ,respectively.2.2.2 ExamplesIn this section, we give examples of the TD function for five popular distributions. These fiveexample models will be used later in the simulation and empirical studies.Example 2.2.1. The bivariate logistic df with standard Fre\u00b4chet margins is given byG(x, y; \u03b8) = exp{\u2212(x\u22121\/\u03b8 + (y\u22121\/\u03b8)\u03b8} , (2.15)where x, y > 0 and \u03b8 \u2208 (0, 1] is the dependence parameter. The dependence increases as \u03b8decreases and \u03b8 = 1 stands for independence while \u03b8 = 0 represents comonotonicity (per-fect dependence). The corresponding margins are G1(x) = G2(x) = exp(\u22121\/x), so thatG\u221211 (e\u2212x) = G\u221212 (e\u2212x) = 1\/x. Using equation (2.12), the stable TD function is given by(Gumbel [1960])l(x, y; \u03b8) = \u2212 logG(1\/x, 1\/y; \u03b8) = (x1\/\u03b8 + y1\/\u03b8)\u03b8.And then the upper TD function of the bivariate logistic distribution has the form:R(x, y; \u03b8) = x+ y \u2212 (x1\/\u03b8 + y1\/\u03b8)\u03b8. (2.16)Example 2.2.2. The bivariate Hu\u00a8sler-Reiss df (Hu\u00a8sler and Reiss [1989]) with standard Fre\u00b4chetmargins isG(x, y; \u03b8) = exp{\u2212x\u22121\u03a6(\u03b8\u22121 +\u03b82log(y\/x))\u2212 y\u22121\u03a6(\u03b8\u22121 +\u03b82log(x\/y))}, (2.17)where x, y > 0, \u03b8 > 0 and \u03a6(\u00b7) is the standard normal df. As \u03b8 increases, the dependence will17increase, that is when \u03b8 = 0, X and Y are independent and when \u03b8\u2192\u221e, X and Y are perfectlydependent. Following the same procedure as above, the corresponding stable TD function isexpressed asl(x, y; \u03b8) = x\u03a6(\u03b8\u22121 +\u03b82log(x\/y))+ y\u03a6(\u03b8\u22121 +\u03b82log(y\/x)),and the tail dependence function is given byR(x, y; \u03b8) = x+ y \u2212 x\u03a6(\u03b8\u22121 +\u03b82log(x\/y))\u2212 y\u03a6(\u03b8\u22121 +\u03b82log(y\/x)). (2.18)Example 2.2.3. The bilogistic df (Smith [1990]) with standard Fre\u00b4chet margins is written asG(x, y;\u03b1, \u03b2) = exp{\u2212x\u22121q1\u2212\u03b1 \u2212 y\u22121(1\u2212 q)1\u2212\u03b2} , x, y > 0, (2.19)where q is the root of the equation (1\u2212\u03b1)x\u22121(1\u2212 q)\u03b2 \u2212 (1\u2212 \u03b2)y\u22121q\u03b1 = 0, and 0 < \u03b1, \u03b2 < 1.The dependence becomes stronger as each of \u03b1, \u03b2 decreases. As \u03b1 = \u03b2 = 0, X and Y areindependent and \u03b1 = \u03b2\u21921 represents comonotonicity. The stable TD function of the bilogisticmodel isl(x, y;\u03b1, \u03b2) =\u222b 10max{(1\u2212 \u03b1)t\u2212\u03b1x, (1\u2212 \u03b2)(1\u2212 t)\u2212\u03b2y} dt,and the upper TD function is expressed asR(x, y;\u03b1, \u03b2) = x+ y \u2212\u222b 10max{(1\u2212 \u03b1)t\u2212\u03b1x, (1\u2212 \u03b2)(1\u2212 t)\u2212\u03b2y} dt. (2.20)Example 2.2.4. The bivariate asymmetric logistic distribution with standard Fre\u00b4chet marginshas df of the formG(x, y;\u03c81, \u03c82, \u03b8) = exp{\u2212(1\u2212 \u03c81)\/x\u2212 (1\u2212 \u03c82)\/y \u2212((\u03c81\/x)1\/\u03b8 + (\u03c82\/y)1\/\u03b8)\u03b8}, (2.21)where x, y > 0, \u03b8 is the dependence parameter and \u03c81, \u03c82 \u2208 [0, 1] are asymmetry parameters.We should note that when the asymmetry parameters are zero, the distribution becomes thebivariate logistic distribution (see Example 2.2.1). The stable TD function of the bivariateasysmmetric logistic distribution is introduced as an extension of that of the logistic distribution(Tawn [1988]) and has the form:l(x, y;\u03c81, \u03c82, \u03b8) = (1\u2212 \u03c81)x+ (1\u2212 \u03c82)y +((x\u03c81)1\/\u03b8 + (y\u03c82)1\/\u03b8)\u03b8.18Therefore, we could get the upper TD function asR(x, y;\u03c81, \u03c82, \u03b8) = \u03c81x+ \u03c82y \u2212((x\u03c81)1\/\u03b8 + (y\u03c82)1\/\u03b8)\u03b8. (2.22)Example 2.2.5. Suppose W = (X, Y ) has a bivariate t distribution with location parameter\u00b5 = (0, 0)T , scale parameter \u2126 =(1 \u03c1\u03c1 1), \u03bd degree of freedom and equal margins FX(\u00b7) =FY (\u00b7) = FT(\u00b7; \u03bd), where FT(\u00b7; \u03bd) is the df of a univariate Stundet\u2019s t random variable with \u03bddegrees of freedom. Then the joint density function of W is written asfT(w; \u03c1, \u03bd) =\u0393((\u03bd + 2)\/2)\u221a1\u2212 \u03c12\u03bdpi\u0393(\u03bd\/2)(1 +1\u03bdwT\u2126\u22121w)\u2212(\u03bd+2)\/2,w \u2208 R2. (2.23)The dependence strength increases as \u03bd decrease or \u03c1 increases. And X and Y are independentas \u03bd\u2192\u221e and \u03c1 = 0. They are perfectly dependent as \u03bd\u21920 and \u03c1 = 1.Proposition 2.2.1. (Demarta and McNeil [2005]) Suppose a random vector (X, Y )T has thejoint density function in (2.23). Then its upper TD function is given by (see A.2 for proof.)R(x, y; \u03c1, \u03bd) = xFT(\u221a \u03bd + 11\u2212 \u03c12(\u03c1\u2212 (y\/x)\u22121\/\u03bd); \u03bd + 1)+ yFT(\u221a \u03bd + 11\u2212 \u03c12(\u03c1\u2212 (x\/y)\u22121\/\u03bd); \u03bd + 1). (2.24)Example 2.2.6. Suppose W = (X, Y )T has a bivariate skew-t distribution with location pa-rameter \u00b5 = (0, 0)T , shape parameter \u03b1 = (\u03b11, \u03b12)T , scale parameter \u2126 =(1 \u03c1\u03c1 1), and\u03bd degree of freedom, denoted as ST2(0,\u2126,\u03b1, \u03bd). Then its joint density is given by (Azzaliniand Capitanio [2003])fST(w) = 2fT (w; \u03c1, \u03bd)FT(\u03b1Tw\u221a\u03bd + 2\u03bd + wT\u2126\u22121w; \u03bd + 2). (2.25)When \u03b11 = \u03b12 = 0, the bivariate skew-t distribution reduces to a bivariate t distribution.The dependence strength is controlled by the parameters \u03bd and \u03c1, which is the same as thatin the bivariate t distribution. Before we introduce the TD function of the bivariate skew-tdistribution, we need to give a definition of the extended skew-t distribution, which will beused to specify the upper TD function.Definition 8. A random variable Y \u2208 R follows a univariate extended skew-t distributionwith location parameter \u00b5, scale parameter \u03c9, shape parameter \u03b1, extended parameter \u03c4 and19degree of freedom \u03bd > 0, denoted by Y \u223c EST1(\u00b5, \u03c9, \u03b1, \u03c4, \u03bd), if its density function isfEST(y) =fT(y;\u00b5, \u03c9, \u03bd)FT(\u03c4\/\u221a1 + \u03b12; \u03bd)FT(\u221a\u03bd + 1\u03bd + z2(\u03b1z + \u03c4); \u03bd + 1),where z = (y\u2212\u00b5)\/\u03c9. When \u03c4 = 0, the distribution becomes the univariate skew-t distribution.Proposition 2.2.2. Suppose a random vector (X, Y )T has joint density function in (2.25). Thenits upper TD function is given by (see A.3 for proof)R(x, y; \u03c1, \u03b11, \u03b12, \u03bd) = x \u00b7 F\u00afEST( \u221a\u03bd + 1\u221a1\u2212 \u03c12((x\u00af\/y\u00af)1\/\u03bd \u2212 \u03c1); 0, 1,\u221a1\u2212 \u03c12\u03b12, \u03c41, \u03bd + 1)+ y \u00b7 F\u00afEST( \u221a\u03bd + 1\u221a1\u2212 \u03c12((y\u00af\/x\u00af)1\/\u03bd \u2212 \u03c1); 0, 1,\u221a1\u2212 \u03c12\u03b11, \u03c42, \u03bd + 1),(2.26)where x\u00af = xFT (\u03b1\u00af2\u221a\u03bd + 1; \u03bd), y\u00af = xFT (\u03b1\u00af1\u221a\u03bd + 1; \u03bd), \u03b1\u00af1 =\u03b11 + \u03c1\u03b12\u221a1 + (1\u2212 \u03c12)\u03b12, \u03b1\u00af2 =\u03b12 + \u03c1\u03b11\u221a1 + (1\u2212 \u03c12)\u03b11, \u03c41 =\u221a\u03bd + 1(\u03b11 + \u03b12\u03c1), \u03c42 =\u221a\u03bd + 1(\u03b12 + \u03b11\u03c1) and F\u00afEST is the sur-vival function of the extended skew-t distribution in Definition 8.2.2.3 Estimation of the tail dependence functionThere exists a number of methods that can be used to estimate the TD function, however, dueto data sparsity, efficiency can be gained by assuming a parametric model for the TD function.Coles and Tawn [1991] and Joe et al. [1992] apply maximum likelihood estimation to estimate,while Ledford and Tawn [1996] and Smith [1994] use a censored likelihood approach to im-plement the estimation. Einmahl et al. [2008] point out that these likelihood-based estimationmethods require the smoothness (or even existence) of the partial derivatives of the TD func-tion. Therefore, they propose an estimator based on the method-of-moments for dimensiontwo, which requires smaller set of conditions. In the simulation studies and subsequent dataanalysis, we adopt the method-of-moments (M-estimator) proposed in Einmahl et al. [2008]and later extended in Einmahl et al. [2012]. This extended M-estimator can be used in arbitrarydimension d and its consistency and asymptotic normality hold under weak conditions.Let (X1, Y1), ..., (Xn, Yn) be independent random vectors in R2 with a common continuousdf F and margins F1 and F2. Let RXi and RYi denote the rank of Xi among X1, ..., Xn and20the rank of Yi among Y1, .., Yn, respectively, where i \u2208 {1, ..., n}. Then for 1 \u2264 m \u2264 n, anonparametric estimator for the bivariate upper TD function R is defined as:R\u02c6n(x, y) :=1mn\u2211i=11{RXi \u2265 n+12\u2212mx,RYi \u2265 n+12\u2212my}, (2.27)where m = mn \u2208 {1, ..., n} is an intermediate sequence.We suppose that the function R belongs to some parametric family {R(\u00b7, \u00b7;\u03b8) : \u03b8 \u2208 \u0398},where \u0398 \u2282 Rp(p \u2265 1) is the parameter space. Let g = (g1, ..., gp)T : [0, 1]2\u2192Rp be a vectorof integrable functions. Define function \u03d5 : \u0398\u2192Rp as:\u03d5(\u03b8) :=\u222b \u222b[0,1]2g(x, y)R(x, y;\u03b8)dxdy, (2.28)where \u03d5 is a homeomorphism5 between \u0398 and its image. For example, in the logistic andHu\u00a8sler-Reiss distribution, \u03d5(\u03b8) is a 1-1 mapping of the dependence parameter. In the asym-metric distribution, one component of \u03d5(\u03b8) is the mapping of the dependence parameter andthe other two components are mappings of the asymmetry parameters.Let \u03b80 denote the true value of parameter \u03b8. The M-estimator \u03b8\u02c6n of \u03b80 is defined as aminimizer of the functionSm,n(\u03b8) =\u2223\u2223\u2223\u2223\u2223\u2223\u2223\u2223\u03d5(\u03b8)\u2212 \u222b \u222b[0,1]2g(x, y)R\u02c6n(x, y)dxdy\u2223\u2223\u2223\u2223\u2223\u2223\u2223\u22232 , (2.29)where || \u00b7 || is the Euclidean norm. However, we should note that the choice of function \u03d5 isnot unique, which means that the M-estimation is not unique. How to choose a proper \u03d5 willbe discussed in Section 3.2.1.Theorem 2.2.3. (Existence, uniqueness and consistency of \u03b8\u02c6n) Define \u0398\u02c6n as the set of mini-mizers of Sm,n in (2.29). Let g : [0, 1]2\u2192Rp be integrable.(i) If \u03d5 is a homeomorphism from \u0398\u2192\u03d5(\u0398) and if there exists \u000f0 > 0 such that the set{\u03b8 \u2208 \u0398 : ||\u03b8 \u2212 \u03b80|| \u2264 \u000f0} is closed, then for every \u000f such that \u000f0 > \u000f > 0, as n\u2192\u221e,P(\u0398\u02c6n 6= \u2205 and \u0398\u02c6n \u2282 {\u03b8 \u2208 \u0398 : ||\u03b8 \u2212 \u03b80|| \u2264 \u000f})\u21921.5A function f : X\u2192Y between two topological space (X, \u03c4X) and (Y, \u03c4Y ) is homeomorphism if: (1) fis bijection (one-to-one and onto); (2)f is continuous; (3) the inverse function f\u22121 is continuous (f is openmapping).21(ii) If in addition to the assumptions of (i), \u03b80 is in the interior of the parameter space,\u03d5 is twice continuously differentiable and its derivative matrix D\u03d5(\u03b80) is of full rank,then, with probability tending to one, Sm,n in equation (2.29) has a unique minimizer \u03b8\u02c6n.Hence,\u03b8\u02c6nP\u2192 \u03b80 as n\u2192\u221e.Denote W as a mean-zero Wiener process on [0,\u221e]2 \\ {(\u221e,\u221e)} with covariance functionE(W (x1, y1)W (x2, y2))= R(x1 \u2227 x2, y1 \u2227 y2)and for x, y \u2208 [0,\u221e), let W1(x) := W (x,\u221e), W2(x) := W (\u221e, y). Further, for (x, y) \u2208[0,\u221e)2, let R1(x, y) and R2(x, y) be the right-hand partial derivatives of R at the point (x, y)with respect to the first and second coordinate, respectively. WriteB(x, y) = W (x, y)\u2212R1(x, y)W1(x)\u2212R2(x, y)W2(y),B\u02dc =\u222b \u222b[0,1]2g(x, y)B(x, y)dxdy.Theorem 2.2.4. (Asymptotic normality of \u03b8\u02c6n) In addition to the assumptions of Theorem2.2.3(ii), if the following two conditions also hold:(i) t\u22121P{1 \u2212 FX(X) \u2264 ux, 1 \u2212 FY (Y ) \u2264 uy} \u2212 R(x, y) = O(t\u03b1), uniformly on the set{(x, y) : x+ y = 1, x \u2265 0, y \u2265 0} as u\u21920, for some \u03b1 > 0;(ii) m = mn\u2192\u221e and m = o(n2\u03b1\/(1+2\u03b1)) as n\u2192\u221e,then as n\u2192\u221e, \u221am(\u03b8\u02c6n \u2212 \u03b80) d\u2192 D\u03d5(\u03b80)\u22121B\u02dc. (2.30)Theorems 2.2.3 and 2.2.4 establish the existence and asymptotic normality of M-estimator.For proofs of these two theorems, please refer to Einmahl et al. [2012].22Chapter 3MethodologyIn this chapter, we focus on the so-called conditional Value-at-Risk or CoVaR as a way tocapture systemic risk contributions, and explore how the proposed EVT-based semi-parametricmethodology can be utilized in the stationary setting to produce estimates of CoVaR. We alsofurther examine the performance with several simulation studies.3.1 CoVaR Estimation in a Stationary SettingIn this study, we adopt a modified definition of CoVaR proposed by Girardi and Ergu\u00a8n [2013],where the financial distress is specified by a loss in excess of the VaR, rather than being at theVaR level. Mainik and Schaanning [2014] show that, under this definition, CoVaR is depen-dence consistent in the sense that an increase in the strength of dependence does lead to anincrease in systemic risk as measured by CoVaR.Let X and and Y denote, respectively, losses for a financial institution and a system proxy.The VaR at a confidence level p \u2208 (0, 1) forX is defined as the 1\u2212p-quantile of the underlyingdistribution (assuming the quantile is single-valued):P{X \u2265 VaRX(1\u2212 p)} = p,23and CoVaRY |X(1\u2212 p) is defined by the (1\u2212 p)-quantile of the conditional distribution:P{Y \u2265 CoVaRY |X(1\u2212 p)|X \u2265 VaRX(1\u2212 p)} = p. (3.1)The idea of our methodology is to link the definition of CoVaR to the tail dependencefunction. As CoVaR is actually a high-quantile of the conditional distribution of Y , it seemssensible to make the following assumption: there exists a constant \u03b7 such thatCoVaRY |X(1\u2212 p) = VaRY (1\u2212 p\u03b7). (3.2)Under this assumption, we can re-write equation (3.1) asP {Y > VaRY (1\u2212 p\u03b7), X > VaRX(1\u2212 p)}p= p. (3.3)Suppose the two-dimensional random vector (X, Y ) has joint distribution function F andcontinuous margins F1 and F2. Assume that(i) F \u2208 D2(G), where G is a bivariate extreme value distribution, and(ii) F2 \u2208 D(\u03a61\/\u03b3) for some \u03b3 > 0.From (ii), we have (see Chapter 6 in de Haan and Ferreira [2006])limu\u21920P{F1(X) > 1\u2212 ux, F2(Y ) > 1\u2212 uy}u= R(x, y), x, y > 0, (3.4)where R is known as the upper TD function discussed in Section 2.2.Combining equation (3.3) and (3.4), we have for p close to zero:P {Y > VaRY (1\u2212 p\u03b7), X > VaRX(1\u2212 p)}p=P {F2(Y ) > 1\u2212 p\u03b7, F1(X) > 1\u2212 p}p\u2248 R(1, \u03b7).(3.5)Due to the fact that the distribution function of Y is in the domain of attraction with positiveindex 1\/\u03b3, we have 1 \u2212 F2 \u2208 RV\u22121\/\u03b3 (see Theorem 2.1.2), which means VaRY (1 \u2212 p\u03b7) \u2248\u03b7\u2212\u03b3 VaRY (1\u2212 p) (see Proposition 0.8 (v) in Resnick [1987]). Hence, if we can find an \u03b7\u2217 such24that R(1, \u03b7\u2217) = p, we obtain the following approximation for CoVaR valid for p close to zero:CoVaRY |X(1\u2212 p) = VaRY (1\u2212 p\u03b7\u2217) \u2248 VaRY (1\u2212 p)\u03b7\u2212\u03b3\u2217 .In order to find \u03b7\u2217, it is necessary for us to estimate the TD function first. There are variousways to estimate the upper TD function. However, due to data sparsity, efficiency can be gainedby assuming a flexible parametric model for the TD function R(x, y) = R(x, y;\u03b8), where \u03b8denotes the parameter vector. Note that the functionR is the df of a measure (see Chapter 6.1.5in de Haan and Ferreira [2006]), so R(x, y;\u03b8) is monotone at x and y. Moreover, we can seethat 0 \u2264 R(x, y) \u2264 x\u2227 y, so 0 \u2264 R(1, \u03b7;\u03b8) \u2264 \u03b7 when \u03b7 \u2264 1. In this sense, there always existsan \u03b7\u2217 > p such that R(1, \u03b7\u2217;\u03b8) = p. If we can find such \u03b7\u2217, then we should have \u03b7\u2217 = g(\u03b8, p),that is, \u03b7\u2217 is a function of CoVaR level p and model parameter \u03b8. We next illustrate plots ofR(1, \u03b7\u2217;\u03b8) as a function of \u03b7\u2217 for some parametric models and at various parameter values toshow how \u03b7\u2217 is influenced by the parameter \u03b8.Example 3.1.1 (Bivariate logistic distribution). The function R(1, \u03b7) of the bivariate logisticdistribution is given byR(1, \u03b7; \u03b8) = 1 + \u03b7 \u2212 (1 + \u03b71\/\u03b8)\u03b8, 0 < \u03b8 \u2264 1.From Figure 3.1, we can see that there is a negative monotone relationship between R(1, \u03b7)and \u03b8, which indicates that for a fixed level p, as the strength of dependence increases (i.e., \u03b8is smaller), \u03b7\u2217 becomes samller. Moreover,R(1, \u03b7; 1) = 0 for all \u03b7. As \u03b8\u21920, R(1, \u03b7; \u03b8)\u2192\u03b7 for\u03b7 < 1 and R(1, \u03b7; \u03b8)\u21921 for \u03b7 \u2265 1.Figure 3.1: Plot of R(1, \u03b7) as a function of \u03b7 for the bivariate logistic distribution for differentvalues of \u03b8.25Example 3.1.2 (Bivariate Hu\u00a8sler-Reiss distribution). The function R(1, \u03b7) of the bivariateHu\u00a8sler-Reiss distribution is expressed asR(1, \u03b7; \u03b8) = 1 + \u03b7 \u2212 \u03a6(\u03b8\u22121 \u2212 \u03b82log \u03b7)\u2212 \u03b7\u03a6(\u03b8\u22121 +\u03b82log \u03b7), \u03b8 > 0.From Figure 3.2, we can see that at a given level p, as \u03b8 increases, that is the strength of taildependence increases, \u03b7\u2217 will becomes smaller. Moreover, as \u03b8\u21920, R(1, \u03b7; \u03b8)\u21920 for all \u03b7. As\u03b8\u2192\u221e, R(1, \u03b7; \u03b8)\u2192\u03b7 for \u03b7 < 1 and R(1, \u03b7; \u03b8)\u21921 for \u03b7 > 1.Figure 3.2: Plot of R(1, \u03b7) as a function of \u03b7 for the bivariate Hu\u00a8sler-Reiss distribution fordifferent values of \u03b8.Example 3.1.3 (Bivariate bilogistic distribution). The function R(1, \u03b7) of the bivariate bilogis-tic distribution is written asR(1, \u03b7;\u03b1, \u03b2) = 1 + \u03b7 \u2212\u222b 10max{(1\u2212 \u03b1)t\u2212\u03b1, (1\u2212 \u03b2)(1\u2212 t)\u2212\u03b2\u03b7} dt, 0 < \u03b1, \u03b2 < 1.From Figure 3.3, there is a positive monotone relationship between \u03b1 and \u03b2. That is to say, fora given level p, \u03b7 decreases as the dependence becomes stronger (i.e., each of \u03b1, \u03b2 decreases).26Figure 3.3: Plot ofR(1, \u03b7) as a function of \u03b7 for the bivariate bilogistic distribution for differentvalues of \u03b1 and \u03b2.Example 3.1.4 (Bivariate asymmetric logistic distribution). The function R(1, \u03b7) of the bivari-ate asymmetric logistic distribution is given byR(1, \u03b7;\u03c81, \u03c82, \u03b8) = \u03c81 + \u03c82\u03b7 \u2212(\u03c81\/\u03b81 + (\u03b7\u03c82)1\/\u03b8)\u03b8, 0 < \u03b8 \u2264 1 and 0 \u2264 \u03c81, \u03c82 \u2264 1.Obviously, the effect of dependence strength (the value of \u03b8) on the value of \u03b7\u2217 will be the sameas for the bivariate logistic distribution. Furthermore, in Figure 3.4, we can see that \u03b7\u2217 willincreases as \u03c81 or \u03c82 decreases. However, when we increase three parameters simultaneously,the change in the value of \u03b7\u2217 for a given value of p may increase or decrease.Figure 3.4: Plots of R(1, \u03b7) as a function of \u03b7 for the bivariate asymmetric logistic distributionfor different values of \u03b8, \u03c81, \u03c82.Example 3.1.5 (Bivariate t distribution). The function R(1, \u03b7) of the bivariate t distribution27with joint density function (2.23) is given byR(1, \u03b7; \u03c1, \u03bd) = FT(\u221a \u03bd + 11\u2212 \u03c12 (\u03c1\u2212 \u03b7\u22121\/\u03bd); 0, 1, \u03bd + 1)+ \u03b7FT(\u221a \u03bd + 11\u2212 \u03c12 (\u03c1\u2212 \u03b71\/\u03bd); 0, 1, \u03bd + 1), 0 \u2264 \u03c1 \u2264 1 and \u03bd > 0.In Figure 3.5, it is obvious that, \u03b7 increases with stronger dependence (i.e., \u03bd decreases or \u03c1increases). Moreover, for every \u03bd, R(1, \u03b7; 1, \u03bd) = \u03b7. As \u03bd\u2192\u221e, the bivariate t distributionbecomes the bivariate Gaussian distribution and then R(1, \u03b7; 0, \u03bd)\u21920.Figure 3.5: Plot of R(1, \u03b7) as a function of \u03b7 for the bivariate t distribution for different valuesof \u03bd and \u03c1.Let {(X1, Y1), ..., (Xn, Yn)} be a random sample. In an extreme value setting, supposep = pn is small relative to sample size n. We estimate the parameter vector \u03b8 in R(1, \u03b7;\u03b8)with the M-estimator in Section 2.2.3. And then, the estimator of \u03b7\u2217 is expressed as\u03b7\u02c6\u2217 = g(\u03b8\u02c6, pn), (3.6)where \u03b8\u02c6 is the M-estimator. Combining the Hill estimator for the tail index \u03b3 in Section 2.1.2, anonparametric estimator of a high-quantile for VaRY (1\u2212 p) in Section 2.1.3 and the estimatorfor parameter \u03b7\u2217 in (3.6), we obtain the following estimator for CoVaR at level pn:C\u0302oVaRY |X(1\u2212 pn) = Yn,n\u2212k( knpn)\u03b3\u02c6\u03b7\u02c6\u2212\u03b3\u02c6\u2217 , (3.7)where Yn,1 \u2264 ... \u2264 Yn,n are the order statistics. Algorithm 2 summarizes the procedure of28CoVaR estimation for the random sample {(X1, Y1), ..., (Xn, Yn)}.Algorithm 2 CoVaR Estimation in Stationary Setting1: Select a parametric model for the tail dependence function, and estimate parameter \u03b8 usingM-estimator;2: Obtain \u03b7\u02c6\u2217 by solving equation R(1, \u03b7\u2217; \u03b8\u02c6) = pn for a given pn;3: Obtain Hill estimator of \u03b3 with equation (2.4), where k is choosing with the two-stepsubsample bootstrap method described in Algorithm 1;4: Estimate VaRY (1\u2212 pn) by utilizing \u03b3\u02c6 with equation (2.8);5: Input all the estimators into equation (3.7) to get the estimator of CoVaRY |X(1\u2212 pn).3.2 Simulation StudiesIn this section, we report results of five simulation studies to show the performance of theproposed method. These studies are based on the following distributions: bivariate logisticdistribution, bivariate asymmetric logistic distribution, bivariate Hu\u00a8sler-Reiss distribution, bi-variate bilogistic distribution and bivariate t distribution. The first four models all in the classof bivariate EVDs. Due to the max-stability property of EVDs, they are in the domain of at-traction. Moreover, the bivariate t distribution has regularly varying tail, hence is also in thedomain of attraction (see e.g., Example 5.21 in Resnick [1987]).3.2.1 Performance of the M-estimator of the TD FunctionThe M-estimator discussed in Section 2.2.3 depends on the choices of m in equation (2.27)and the function vector g in equation (2.28). However, what is a proper choice of m and g? Toanswer this question, we simulated samples with different sizes from several bivariate distribu-tions to explore how the values of m and g influence the M-estimator of model parameters. Toillustrate the performance of estimators, bias and root mean squared error (RMSE) are plottedfor a range of values of m. For each example, we look at 100 replications of samples and setm \u2208 {80, 130, 180, 230, 280, 330} for the first four examples, while for bivariate t distribu-tion, we try m \u2208 {50, 100, 150, 200, 250, 300, 350}. The sample sizes vary according to thedimension of parameter space \u0398.Example 1. Bivariate logistic distribution29The bivariate logistic distribution has the upper TD function given in equation (2.16). Fol-lowing the analysis in Einmahl et al. [2012], we take (\u2202\/\u2202\u03b8)R(x, y; \u03b8) as the optimal choice ofg. Note that the optimal function g depends on the true values of model parameters and hencewould not be available in practice. We also consider other choices of g, given by low orderpolynomials: g0(x, y) = 1, g1(x, y) = x1. We simulate random samples of size n = 2000 froma bivariate logistic model with \u03b8 = 0.6. Results are shown in Figure 3.6.Figure 3.6: The bias and RMSE of M-estimator of \u03b8 based on 100 samples of size 2000 simu-lated from the bivariate logistic model with parameter \u03b8 = 0.6.In Figure 3.6, it is observed that, for all choices of function g, the bias increases as m be-comes bigger, while RMSE is minimized around m = 180. Compared with the optimal choiceof g, the simpler options have smaller biases and RMSEs, and the simplest one g0(x, y) = 1shows the best performance with the smallest bias and RMSE. However, compared with m,the differences among the three choices of g are quite small, which means that in practice, thefunction g does not affect the estimation much and we can use the simplest form of g directly.This is consistent with findings reported in Einmahl et al. [2012]. However, the value of mappears to matter considerably here. From Figure 3.6, it seems that choosing m between 150and 250 is reasonable when the sample size is 2000.Example 2. Bivariate Hu\u00a8sler-Reiss (HR) distributionThe bivariate HR distribution defined in equation (2.17) has the upper TD function as spec-ified in (2.18). Similarly, we simulate random samples of size n = 2000 from a bivariate HRmodel with \u03b8 = 2.5. Based on the observation above regarding the choice of g, in this case, a30simple form is assigned to function g: g(x, y) = x.Figure 3.7: The bias and RMSE of M-estimator based on 100 samples of size 2000 simulatedfrom the bivariate HR model with parameter \u03b8 = 2.5.Based on Figure 3.7, the results for the HR distribution show a similar behavior as thosefor the logistic distribution. To be specific, the bias increases with m, while the RMSE attainsa minimal point around m = 280. It means that for the HR distribution, when n = 2000, achoice of m between 250 and 300 could be considered in practice.Example 3. Bivariate bilogistic distributionThe upper TD function of the bivariate bilogistic distribution is given in equation (2.18).Following the same steps as in the previous two examples, we simulate from the bivariatebilogistic model with \u03b1 = 0.4, \u03b2 = 0.7 and n = 2000. In this case, two parameters need tobe estimated, so we set the function vector as g(x, y) = (1, x)T . The biases and RMSEs areplotted for \u03b1 and \u03b2 separately in Figure 3.8.31Figure 3.8: The bias and RMSE of M-estimator based on 100 samples of size 2000 simulatedfrom the bivariate bilogistic model with parameter \u03b1 = 0.4, \u03b2 = 0.7.In Figure 3.8, we can see that as m increases, the absolute value of bias also increases forboth \u03b1 and \u03b2. Moreover, RMSE\u2019s are both minimized at aroundm = 180, which is the same asin Example 1 for the bivariate logistic distribution. These results indicate that for the bivariatebilogistic model, when we have 2000 observations, a value of m between 150 and 200 wouldbe a good choice.Example 4. Bivariate asymmetric logistic distributionIn Section 2.2, equation (2.22) gives the upper TD function of the bivariate asymmetriclogistic distribution, where three parameters need to be estimated. In this simulation, we alsoonly concentrate on the choices of m, and set the function vector g as in Einmahl et al. [2012].We simulate from a bivariate asymmetric logistical model with \u03b8 = 0.6, \u03c81 = 0.5 and\u03c82 = 0.8 and set g(x, y) =(1, x, 2(x+ y))T . As the dimension of the parameter space here islarger than in the previous examples, we increase the sample size to 2500 to keep the accuracyof estimators.32Figure 3.9: The bias and RMSE of M-estimator based on 100 replications of size 2500 simu-lated for the asymmetric logistic model with parameter \u03b8 = 0.6, \u03c81 = 0.5, \u03c82 = 0.8.In Figure 3.9, results show an increasing trend in the absolute value of the bias overm and aminimal point of RMSE at m = 180 for all three parameters. These observations indicate thata reasonable choice for m is around 150 \u2013 200 when n = 2500 for the bivariate asymmetriclogistic model.Example 5. Bivariate t distributionWe give the upper TD function of the standard bivariate t distribution in equation (2.24).There are two parameters \u03c1 and \u03bd that need to be estimated. We simulate from a bivariatet distribution with \u03c1 = 0.6, \u03bd = 5 and set g(x, y) = (x, x + y)T . Due to the difficulty ofestimating the tail parameter, we increase the sample size to 3000 to make the estimators moreaccurate.Figure 3.10 shows the bias and RMSE of M-estimators of the two parameters. We can seethat although the M-estimator of \u03bd has a good performance and displays a similar behaviorwith respect to values of m as seen in Examples 1 \u2013 4, most of the bias and RMSE of \u03c1 arebigger than 0.1, accounting for 25% of the true real value and indicating poor performance ofthe estimator of \u03c1. Moreover, while the bias of \u03c1\u02c6 increases as m increases, the RMSE alsobecomes bigger, indicating the dominance of bias. However, based on the performance of \u03bd\u02c6,m = 100 here is a reasonable choice.33Figure 3.10: The bias and RMSE of simultaneous M-estimator based on 100 samples of size3000 simulated from the bivariate t model with parameter \u03bd = 6, \u03c1 = 0.6.In summary, compared with choosing function g, selection of a suitable value of m is lessstraightforward. A good choice of m depends on the model, the dimension of the parameterspace and also the sample size. It is hard to make a general recommendation. Simulation stud-ies provide some guidance with regard to selection of m for a given model based on bias andRMSE of estimators in finite sample settings. For example, from the results for the bivariatelogistic model, we find that a value of m \u2208 [150, 200] for sample size n = 2000 may be agood choice, that is about 8% \u2013 10% of the sample size. In our later analysis, including simu-lation and empirical studies, we would let m equal to 9% of the sample size when estimatingparameters of the TD function under the bivariate logistic model.Figure 3.11 displays sampling densities for model parameters using values of m, whichminimized RMSE.34(a) logistic distribution: n = 2000, m = 180 (b) HR distribution: n = 2000, m = 280(c) Bilogistic distribution: n = 2000, m = 180(d) Asysmmetric logisitic distribution: n = 2500, m = 18035(e) t distribution: n = 3000, m = 100Figure 3.11: The sampling densities of estimated parameters based on 100 samples for corre-sponding parametric models.From the sampling densities of the model parameter(s), it is clear that the estimated param-eters in the first four examples are roughly distributed around the true value. For the bivariatet distribution, the estimated \u03c1 has a bigger bias compared with other parameters, but the es-timator of \u03bd performs well with small variability. Einmahl et al. [2008] and Einmahl et al.[2012] show that under some conditions (for details, see Theorem 2.2.4 in Section 2.2.3), theM-estimators will be asymptotically normal, which is reflected in our plots. Due to the finitesample sizes, estimators exhibit minor biases. Overall, M-estimators of parameters of the TDfunction perform well in the considered examples, and will hence used in subsequent studies.3.2.2 Performance of the CoVaR EstimatorRecall equation (3.7) for estimating CoVaR in Section 3.1. For a given pn, in order to estimateCoVaR, we need to estimate \u03b3, \u03b7\u2217 and VaRY (1 \u2212 pn). Tail index \u03b3 is estimated with theHill estimator; VaRY (1 \u2212 pn) is estimated with nonparametric extreme quantile estimator byutilizing \u03b3\u02c6; \u03b7\u2217 is estimated by solving equation R(1, \u03b7\u2217; \u03b8\u02c6) = pn, where R(x, y,\u03b8) is theupper TD function for an assumed parametric model and \u03b8\u02c6 is the M-estimator of \u03b8. All theseestimators are then used to produce an estimator of CoVaR. As accuracy of CoVaR estimationis influenced by accuracy of estimators for the three components mentioned above, our initialanalysis aims to separate estimation errors attributed to each of the three components.36In this section, we explore the performance of our CoVaR estimator together with \u03b7\u02c6\u2217, \u03b3\u02c6 andV\u0302aRY (1 \u2212 pn). Monte Carlo simulations are carried with the same five example distributionsas in Section 3.2.1. At first, the parameters follow the same setting as in Figure 3.11 so thatwe can also see how the M-estimators affect our CoVaR estimation. To be specific, for logisticdistribution, we put \u03b8 = 0.6, n = 2000 and m = 180. For HR distribution, we let \u03b8 = 2.5,n = 2000 and m = 280. For bilogistic distribution, we generate samples with \u03b1 = 0.4,\u03b2 = 0.7, n = 2000 and estimate with m = 180. For asymmetric logistic distribution, we have\u03b8 = 0.6, \u03c81 = 0.5, \u03c82 = 0.8, N = 2500 and m = 180. For t distribution, we let \u03bd = 5,\u03c1 = 0.6, n = 3000 and m = 100. For each model, we let pn = 0.05 and do 100 replications ofthe CoVaR estimation procedure.When exploring the performance of C\u0302oVaRY |X(1 \u2212 pn), the true CoVaRY |X(1 \u2212 pn) iscomputed by finding the quantile of the conditional distribution, which is given as the root ofh(y) =\u222b{(u,v)\u2208R2:u>c,v>y}f(u, v)dudv = p2n, (3.8)where c = VaRX(1 \u2212 pn) is the true (1 \u2212 pn)-quantile of the marginal distribution of X ,and f(x, y) is the joint density function of (X, Y ). Table 3.1 gives the summary statistics ofproposed CoVaR estimates with \u03b7\u02c6\u2217.Table 3.1: Summary statistics of proposed CoVaR estimates at level pn = 0.05. The margins ofthe first four distributions are all standard Fre\u00b4chet distribution and the margins of the bivariatet distribution are all student t distribution.logistic HR bilogistic asymmetric logistic tTrue CoVaRY |X 367.3064 399.4755 341.5227 281.4862 4.4215Mean 446.3422 463.4036 460.9443 327.7476 4.5000Median 425.4983 456.3283 434.5008 314.4263 4.4763Standard deviation 127.9184 130.7512 149.8606 83.3997 0.5457To further explore the performance, for each model, we also compute a CoVaR\u2217Y |X(1\u2212pn)asCoVaR\u2217Y |X(1\u2212 pn) = VaRY (1\u2212 pn\u03b7\u2217), (3.9)where VaRY (1 \u2212 pn\u03b7\u2217) is the true (1 \u2212 pn\u03b7\u2217)-quantile of the marginal distribution of Y and\u03b7\u2217 is the value solving equation R(1, \u03b7\u2217;\u03b8) = pn with true \u03b8. This \u03b7\u2217 is also used to examinethe performance of \u03b7\u02c6\u2217. In order to further see how the estimated \u03b7\u2217 would affect the estimationof CoVaR, we also make a comparison between C\u0302oVaRY |X(1 \u2212 pn) in (3.7) with estimator37C\u0302oVaR\u2217Y |X(1 \u2212 pn) = Yn,n\u2212k (k\/(npn))\u03b3\u02c6 \u03b7\u2212\u03b3\u02c6\u2217 . Moreover, we also compute the real value of \u03b7in (3.2) as\u03b70 =P{Y > CoVaRY |X(1\u2212 pn)}pn, (3.10)where CoVaRY |X(1 \u2212 pn) is the true value. Apart from exploring the performance of estima-tors, the distances between \u03b70 and \u03b7\u2217, CoVaRY |X and CoVaR\u2217Y |X can show us how well theupper TD function approximates the conditional tail probability when pn is relatively small.Furthermore, for each replication, we compute the empirical estimate of the conditionalexceedance probability asen :=#{Xi > V\u0302aRX(1\u2212 pn), Yi > C\u0302oVaRY |X(1\u2212 pn)}#{Xi > V\u0302aRX(1\u2212 pn)}, (3.11)where (X1, Y1), ..., (Xn, Yn) are simulated vectors, and V\u0302aRX(1 \u2212 pn) is obtained from themethod given in Section 2.1.3. When estimation of VaR and CoVaR is accurate, the ratioen should be close to the probability level pn. It is analogy to the VaR backtesting (Kuesteret al. [2006]) based on the exceedances of VaR, which will be discussed in Section 4.1. Thisstatistic is useful in practice as a check on the accuracy of CoVaR estimator, provided pn isnot too extreme (small) relative to the sample size. All the results are displayed by smoothedhistograms of estimators in Figures 3.12, 3.13, 3.14, 3.15 and 3.16.Figure 3.12: The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2000 for the logistic model with parameter \u03b8 = 0.6.38Figure 3.13: The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2000 for the HR model with parameter \u03b8 = 2.5.Figure 3.14: The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2000 for the bilogistic model with parameters \u03b1 = 0.4,\u03b2 = 0.7.39Figure 3.15: The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 2500 for the asymmetric logistic model with parame-ters \u03b8 = 0.6, \u03c81 = 0.5, \u03c82 = 0.8.Figure 3.16: The sampling densities of estimates of \u03b3, \u03b7\u2217, VaRY , CoVaRY |X and en at levelpn = 0.05 based on 100 samples of size 3000 for t distribution with parameters \u03bd = 5, \u03c1 = 0.6.Based on the sampling densities, firstly we can observe that the Hill estimator of \u03b3 exhibitsa small positive bias in all situations. Secondly, except the case of the bivariate t distribution,the estimators of \u03b7\u2217 are nearly unbiased and densities are symmetric over the true \u03b7\u2217, especiallyin the cases of the HR distribution and the bilogistic distribution. This also indicates that the40M-estimator of \u03b8 performs well and will not cause a large bias to the estimator of \u03b7\u2217 for thefirst four examples. The value of \u03b7\u2217 are close to the values of \u03b70 in those four cases, while\u03b7\u02c6\u2217 in bilogistic distribution underestimate \u03b70 considerably from Figure 3.14. This may be dueto the fact that the range of estimate values in the bilogistic model is much narrower thanthat in the logistic distribution and the asymmetric logistic distribution. In the case of the tdistribution, although the difference between \u03b70 and \u03b7\u2217 is bigger and \u03b7\u02c6\u2217 is more biased, aninteresting phenomenon is that the density of \u03b7\u02c6\u2217 is centered and symmetric over \u03b70, which isexpected to lead to more accurate CoVaR estimation.Thirdly, the estimates of VaRY (1 \u2212 pn) are distributed around the true value in all ex-amples. However, the standard errors of V\u0302aRY (1 \u2212 pn) are big, which indicates that thisnonparametric estimator is not quite stable and is more likely to fail in practice. Moreover,for the estimation of CoVaR, all the estimates of CoVaRY |X(1 \u2212 pn) are roughly distributedaround CoVaR\u2217Y |X(1 \u2212 pn) = VaRY (1 \u2212 pn\u03b7\u2217), while most of them over-estimate the truevalue of CoVaRY |X(1 \u2212 pn). This is sensible, as under the key assumption in our methodol-ogy, C\u0302oVaR(1\u2212 pn) is in fact estimating VaRY (1\u2212 pn\u03b7\u2217). Furthermore, there is no differencebetween the densities of C\u0302oVaRY |X(1\u2212pn) and those of C\u0302oVaR\u2217Y |X(1\u2212pn), indicating that theestimated \u03b7\u2217 does not have a big influence on the CoVaR estimation. The shapes of densitiesof C\u0302oVaRY |X(1 \u2212 pn) are similar to those of V\u0302aRY (1 \u2212 pn), showing that CoVaR estimationis dominated by the estimation of VaRY . In logistic model and HR model, the true CoVaRY |Xand CoVaR\u2217Y |X overlap with each other, which is resulted from the tiny distance between \u03b70and \u03b7\u2217, implying that for these two distribution, R(1, \u03b7) approximates the conditional proba-bility P{Y > VaRY (1 \u2212 p\u03b7)|X > VaRX(1 \u2212 p)} very well. Finally, from the histograms ofthe empirical estimates of the conditional exceedance probabilities, we can confirm the earlierobservation that there is a tendancy for the proposed CoVaR estimator to over-estimate the truevalue of CoVaR.From the results above, it seems that most of the CoVaR estimation error can be attributedto the estimation of VaRY (1 \u2212 pn). To further explore the influence of V\u0302aRY (1 \u2212 pn), forall distributions, we do another round of simulations by increasing the sample sizes to 5000,replication number to 200 and decrease the level pn to 0.01. These changes allow us to producemore accurate estimates of each component. In addition to applying the nonparametric estima-tion of VaRY (1 \u2212 pn) introduced in Section 2.1.3, we also estimate the VaRY (1 \u2212 pn) withthe Peaks-Over-Threshold (POT) method (see e.g., Embrechts et al. [1997]), where a general-ized Pareto (GP) distribution is used to approximate the distribution function of exceedancesabove threshold u (Pickands [1975]). The details of this method are given in Appendix A.4.41Paramters of the GP distribution are fitted via maximum likelihood estimation, implemented infunction \u201cgdp.fit\u201d of the R package \u201cismev\u201d.An important aspect of the POT method is the selection of threshold u. Due to the largenumber of datasets, it is impossible for us to select u manually for each sample with the meanexcess plot (Davison and Smith [1990]) or parameter stability plots. In our simulations, weadopt a pragmatic approach and set the threshold consistent with the value of k in the Hillestimator of \u03b3: u = Yn,n\u2212k. For brevity of presentation, we only illustrate the samplingdensities of the two estimator of VaRY (1 \u2212 pn) together with the corresponding estimatorsof CoVaRY |X(1 \u2212 pn). We also plot the histograms of en to further see the performance ofCoVaRY |X(1\u2212 pn) estimation based on the semi-parametric of the quantile. The denominatorand nominator of en in (3.11) are also computed and plotted in Figure 3.17 \u2013 3.21. They arethe number of exceedances of VaRX :En := #{Xi > V\u0302aRX(1\u2212 pn)}(3.12)and the number of joint exceedances:Ebn := #{Xi > V\u0302aRX(1\u2212 pn), Yi > C\u0302oVaRY |X(1\u2212 pn)}(3.13)Figure 3.17: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the logistic model with parameter \u03b8 = 0.6.Threshold u is chosen as: u = Yn,n\u2212k with k = 450.42Figure 3.18: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the HR model with parameter \u03b8 = 2.5.Threshold u is chosen as: u = Yn,n\u2212k with k = 700.Figure 3.19: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the bilogistic model with parameter \u03b1 = 0.4and \u03b2 = 0.7. Threshold u is chosen as: u = Yn,n\u2212k with k = 450.43Figure 3.20: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for the asymmetric logistic model with parameter\u03b8 = 0.6, \u03c81 = 0.5 and \u03c82 = 0.8. Threshold u is chosen as: u = Yn,n\u2212k with k = 400.Figure 3.21: The sampling densities of estimates of VaRY , CoVaRY |X , En, Ebn and en at levelpn = 0.01 based on 200 samples of size 5000 for t distribution with parameter \u03bd = 5 and\u03c1 = 0.6. Threshold u is chosen as: u = Yn,n\u2212k with k = 150.According to the sampling densities of estimated VaRY (1\u2212 pn), it seems that the two esti-44mators of VaRY (1\u2212pn) are quite similar to each other, especially in the cases of the HR modeland bilogistic model. Both of them have large variability that may lead to prediction inaccu-racies in practice. For all example distributions, the sampling densities of C\u0302oVaRY |X(1\u2212 pn)share similar shape of those of V\u0302aRY (1 \u2212 pn), which is consistent with the results given inFigure 3.12 \u2013 3.16, where simulations are generated with smaller sample sizes and at levelpn = 0.05. This confirms that the key to improve the performance of CoVaR estimation inour methodology is to improve the accuracy of the estimator of VaRY (1 \u2212 pn). From the his-tograms of En, most of the empirical estimators are bigger smaller than the nominal value:5000 \u00d7 0.01 = 50, indicating that the estimator of VaRX(1 \u2212 pn) overestimate the true valueslightly. The histograms of conditional probability shows that the CoVaR estimation based onlarger sample sizes and smaller pn is better and more acceptable, because most of the proba-bility values are between 0 and 0.02. This is close our nominal value from the histograms ofEbn. As by letting pn = 0.01 and n = 5000, the nominal nominal value of Ebn is n\u00d7 p2n = 0.5.However, the number of exceedances can only be integer, which means that Ebn is expected tobe 0 or 1. When Ebn = 0, en = 0 for every En, and when En = 1, en = 0.02 when En = 50.The summary statistics of proposed estimates are given in Table 3.2.Table 3.2: Summary statistics of proposed CoVaR estimates at level pn = 0.01. The margins ofthe first four distributions are all standard Fre\u00b4chet distribution and the margins of the bivariatet distribution are all student t distribution.logistic HR bilogistic asymmetric logistic tTrue CoVaRY |X 9719.46 9999.50 7660.29 281.4862 9.2215Mean 1313.98 463.4036 13772.85 10887.50 11.04Median 12158.84 456.3283 13166.82 10500.52 10.84Standard deviation 4986.23 4694.61 3644.54 83.3997 2.6445Chapter 4Empirical StudyIn this chapter, we use the proposed CoVaR estimation method on financial time series in astationary setting by utilizing the five parametric models discussed in Section 3.2. Due to thevolatility of financial time series, we apply the methodology to the realized innovations froma AR(1)-GARCH(1,1) filter. However, Sun and Zhou [2014] demonstrate that the realizedinnovations from GARCH filter for a sample with normal innovations may follow a heavy-tailed distribution, we also consider to perform the CoVaR estimation on the raw financialtime series. A backtesting procedure in Girardi and Ergu\u00a8n [2013] is applied to assess theperformance.Remark 1. Mainik and Schaanning [2014] give a more general definition of CoVaR than pre-sented in equation (3.1). Let (X, Y ) be a random vector with joint df F and continuous marginsF1 and F2. The CoVaR at confidence level 1 \u2212 p2 is defined as the (1 \u2212 p2)-quantile of theconditional distribution:P{Y \u2265 CoVaRY |X(p1, p2)|X \u2265 VaRX(1\u2212 p1)}= p2, (4.1)where risk levels p1 and p2 can be different. Unlike the definition in Girardi and Ergu\u00a8n [2013],this CoVaR definition allows us to consider different levels of CoVaR when conditioning on thesame distress event, and the same level of CoVaR when conditioning on a more severe distressevent. It is straightforward to adapt our methodology to this more general definition of CoVaR.Under the same assumptions as stated in Section 3.1, we can re-write equation (4.1) as46p2 =P{Y > VaRY (1\u2212 p2\u03b7), X > VaRX(1\u2212 p1)}p1=P{F2(Y ) > 1\u2212 p2p1\u03b7 \u00d7 p1, F1(X) > 1\u2212 p1}p1.(4.2)When p1 is close to 0, (4.2) can be approximated with R(1,p2p1\u03b7), where R(x, y) is the upperTD function of (X, Y ). Then, if we can find \u03b7\u2217 such that R(1,p2p1\u03b7\u2217)= p2, we can also esti-mate CoVaRY |X(p1, p2) with VaRY (1\u2212 p2)\u03b7\u2212\u03b3\u2217 , where \u03b3 is the tail index for random variableY .4.1 BacktestingIn this section, we review the backtesting procedure to assess and compare different CoVaRestimates from various parametric model assumptions on TD function of R. By noting thatCoVaR is actually the quantile of a conditional distribution, its evaluation can be achievedwith the standard Kupiec [1995] and Christoffersen [1998] tests. The test is first designedfor evaluating the accuracy of interval forecasts, and has been applied to assess the predictiveperformance of VaR (Kuester et al. [2006]) and CoVaR (Girardi and Ergu\u00a8n [2013]).Let {Xt}t\u2208N and {Yt}t\u2208N denote losses (negative log-returns) for an institution and a systemproxy. The VaR at confidence level 1\u2212 p1 \u2208 (0, 1) for Xt is written as VaRXt(1\u2212 p1) and theCoVaR at confidence level 1\u2212p2 \u2208 (0, 1) for Yt conditional on Xt > VaRXt(1\u2212p1) is writtenas CoVaRYt|Xt(p1, p2).Suppose our sample include n observations with t = 1, ..., n. For each institution, definethe \u201chit sequence\u201d of violation asIXt ={1, if Xt > VaRXt(1\u2212 p1)0, if Xt \u2264 VaRXt(1\u2212 p1).For a sub-sample (assumed to have n1 observations), when the institution is in financial distress{Xt > VaRXt(1\u2212 p1)}, construct the second \u201chit sequence\u201d of violation asIYt|Xt ={1, if Yt > CoVaRYt|Xt(p1, p2)0, if Yt \u2264 CoVaRYt|Xt(p1, p2).47We test the correct unconditional coverage of VaR forecasts at level 1 \u2212 p1 by specifyingthe following hypotheses:H0 : E[IXt ] \u2261 \u03bb = p1 versus HA : E[IXt ] \u2261 \u03bb 6= p1.Under the null hypothesis, the likelihood-ratio test statistic is given by (for more details, seeKupiec [1995]):LRuc = 2[L(\u03bb\u02c6; IX1 , IX2 , ..., IXn)\u2212 L(p1; IX1 , IX2 , ..., IXn)] \u00b7\u223c \u03c721, for large n, (4.3)where L(x; IX1 , IX2 , ..., IXn) = log [xn1(1\u2212 x)n\u2212n1 ]. The maximum-likelihood estimator \u03bb\u02c6 isn1\/T1, where n1 is the number of violations, that is #{t : IXt = 1}.Similarly, the null hypothesis to test the unconditional coverage property of CoVaR fore-casts at confidence level 1\u2212 p2 isH0 : E[IYt|Xt ] \u2261 \u03bb = p2,and the maximum-likelihood estimation \u03bb\u02c6 under alternative hypothesis is n2\/n1, where n2 isthe number of violations in the second \u201chit sequence\u201d, that is #{t : IYt|Xt = 1}.In addition to the unconditional coverage test based on the binomial distribution, furthercalibration tests for risk measures are discussed in Nolde and Ziegel [2017]. They proposea two-stage procedures for assessment of risk measure forecasts, which can not only assesscalibration of risk measure forecasts, but also allow for a comparison of several methods bysupplementing traditional backtests with comparative backtests.4.2 DataWe consider a partial panel of financial institutions studied in Acharya et al. [2017] with amarket capitalization in excess of 5 billion USD as of the end of June 2007. The total num-ber of firms in our sample is 8, including Bank OF AMERICA CORP (BAC), BANK NEWYORK INC (BK), AFLAC INC (AFL), ALLSTATE CORP (ALL), AMERICAN EXPRESSCO (AXP), FRANKLIN RESOURCES INC (BEN), GOLDMAN SACHS GROUP INC (GS)and T ROWE PRICE GROUP INC (TROW). The sample period is from June 26, 2000 to May489, 2019, consisting of 4747 daily price observations for each institution. The Dow Jones USFinancials Index (DJUSFN) is used as a proxy for the aggregate financial system. The dailyprices and capitalization information are extracted from Yahoo Finance. Daily losses (%) arecalculated as negative log returns. Figure B.1 in Appendix B.1 gives the time series plots ofdaily losses for institutions and DJUSFN.Before we do the estimation, it is essential for us to validate the tail dependence assumptionof our samples. Figure 4.1displays scatter plots of losses for each institution and the market in-dex. For presentation purpose, we standardized the daily losses by subtracting the sample meanand dividing the sample standard deviation. From the plots, for each institution, its dependencewith DJUSFN seems to be persistent in both the upper and lower joint tails, especially for theBAC-DJUSFN and BEN-DJUSFN data pairs. This is indicative of the variables being taildependent.Figure 4.1: Scatter plots of standardized daily losses (%) for time series introduced in Section4.2 over the period from June 27, 2000 to May 9, 2019.One way to test for tail dependence is via estimating the extremal dependence measures \u03c7and \u03c7\u00af (Coles et al. [1999]). Let random variables X and Y denote, respectively, losses of afinancial institution and a system proxy with joint df F and margins F1 and F2. The extremaldependence measures \u03c7 and \u03c7\u00af are defined as\u03c7 = limt\u2192\u221eP{X > F\u221211 (1\u2212 1\/t), Y > F\u221212 (1\u2212 1\/t)}P{X > F\u221211 (1\u2212 1\/t)} ,49\u03c7\u00af = limt\u2192\u221e2 logP{X > F\u221211 (1\u2212 1\/t)}logP{X > F\u221211 (1\u2212 1\/t), Y > F\u221212 (1\u2212 1\/t)} \u2212 1,where 0 \u2264 \u03c7 \u2264 1 and \u22121 < \u03c7\u00af \u2264 1 provided the limits exist. If (X, Y ) is tail independent,then \u03c7 = 0 and \u22121 < \u03c7\u00af < 1; if (X, Y ) is tail dependent, then \u03c7\u00af = 1 and 0 < \u03c7 < 1. Define\u03c7(u) := 2\u2212 logC(u, u)log uand \u03c7\u00af(u) :=2 log(1\u2212 u)log C\u00af(u, u)\u2212 1,where C(u1, u2) := F(F\u221211 (u1), F\u221211 (u2))for (u1, u2) \u2208 [0, 1]2 is the copula of bivariate dfF and C\u00af(u1, u2) = 1\u2212 u1 \u2212 u2 + C(u1, u2) is the survival copula. It is shown (see Coles andTawn [1991]) that\u03c7 = limu\u21921\u03c7(u) and \u03c7\u00af = limu\u21921\u03c7\u00af(u).By replacing C and C\u00af with their empirical estimators, we can obtain simple estimators of(\u03c7, \u03c7\u00af). Figure 4.2 illustrates the \u03c7(u) and \u03c7\u00af(u) plots for the BAC-DJUSFN data, which areimplemented with the function \u201cchi.plot\u201d in R package \u201cevd\u201d. The \u03c7(u) and \u03c7\u00af(u) plots forother institutions are given in Appendix B.2.Figure 4.2: \u03c7(u) and \u03c7\u00af(u) plots for the BAC-DJUSFN data.From theses plots, when taking sampling variability into account, the \u03c7\u00af(u) estimates ap-pear to be converging to one for u\u21921, especially in BEN-DJUSFN and DS-DJUSFN data.This justifies the assumption of tail dependence. However, the estimates of \u03c7 for the ALL-DJUSFN pair have a decreasing trend towards 0 when u\u21921, which suggests the possibility oftail independence. In order to further confirm the tail dependence assumption for this dataset,we re-estimate \u03c7\u00af with an alternative method.50Ledford and Tawn [1996] established that under weak conditionsP{X > F\u221211 (1\u2212 1\/t), Y > F\u221212 (1\u2212 1\/t)} \u2208 RV\u22121\/\u03be, (4.4)where \u03be is called the residual dependence coefficient. It follows that \u03c7\u00af = 2\u03be \u2212 1. Trans-form the original random variables X and Y to unit Fre\u00b4chet: Z1 = \u22121\/ logF1(X) andZ2 = \u22121\/ logF (Y ) and define T = min{Z1, Z2}. We note that (4.4) is equivalent toP(Z1 > z, Z2 > z) = P(T > z) \u2208 RV\u22121\/\u03be.Hence, we can estimate \u03be (and \u03c7\u00af) as the shape parameter for random variable T with themaximum likelihood estimator in the POT method. Figure 4.3 gives the plot of estimated \u03c7\u00afversus threshold in the POT method for the ALL-DJUSFN data. From the graph, the 95%confidence band covers the value of one for thresholds exceeding roughly the 70%-quantile.This gives support for the earlier claim that the data could be assumed to be tail dependent.Figure 4.3: Maximum likelihood estimates of \u03c7\u00af with 95% confidence bands based on theprofile likelihood for the ALL-DJUSFN data.4.3 ResultsFirstly, we show the results of CoVaR estimation by treating observations in time series of dailylosses for firms and a market proxy as i.i.d.. We use the whole dataset to do the estimation51and then compare the CoVaR estimates under different parametric models by utilizing theunconditional coverage test in Section 4.1.Rather than to selecting k in Hill estimator for tail index automatically with the two-stepsubsample bootstrap algorithm, in these studies, we want to see how the k value will affect theestimator of CoVaR. Due to this consideration, we assess stability of estimates to values of kby plotting k versus CoVaR estimates. All the five parametric models in Section 3.2 are usedto do the estimation, and for each model, we consider three choices of m for the M-estimator:m\/n \u2208 {9%, 20%, 30%} for logistic distribution, m\/n \u2208 {14%, 25%, 35%} for Hu\u00a8sler-Reissdistribution, m\/n \u2208 {9%, 20%, 30%} for bilogistic distribution, m\/n \u2208 {8%, 20%, 30%}for asymmetric logistic distribution and m\/n \u2208 {3%, 15%, 20%} for t distribution. For alldatasets, n = 4746 is the sample size. The first value of m in each considered set correspondsto the best values in terms of RMSE, as it comes from the results of simulation studies in Sec-tion 3.2.1. We consider p = pn = (p1,n, p2,n) = (0.05, 0.05) and (0.02, 0.05). For brevityof presentation, here we just show the graphs of BAC company in Figures 4.4 and 4.5. Theadditional graphs for the other seven institutions are given in Appendix B.3.From the graphs, we can see that the lines for different m values nearly coincide with eachother, especially for the logistic, Hu\u00a8sler-Reiss and bilogistic distributions, indicating that thechoice of m will not have a large effect on CoVaR estimation. Furthermore, for all datasetsand parametric models, the CoVaR estimates tend to increase as k increases and have largevariability for k values below 100. This is similar to the effect of k in the tail index estimationas we point out in Section 2.1.2: if k is large, the bias will be large, and if k is small, thevariance will be large. In this situation, we hope to select a k value that balances bias andvariance. From the graphs, it seems that the estimators are much more stable from aroundk = 280 to k = 380, thus choosing a k inside [280, 380] seems to be suitable. This correspondsto k\/n \u2208 [6%, 8%].52Figure 4.4: Estimates of CoVaR as a function of k at level pn = (0.05, 0.05) for differentvalues of m for original BAC-DJUSFN data.Figure 4.5: Estimates of CoVaR as a function of k at level pn = (0.02, 0.05) for differentvalues of m for original BAC-DJUSFN data.53The next step is to evaluate the proposed estimator of CoVaR via the backtesting procedureusing the unconditional coverage test. Based on the results above, for each parametric modeland institution, we compute the CoVaR estimates with the first value of m in each consideredset and k = 350. Before we perform the unconditional coverage test, we need to estimate theVaR of institutions, where we also use the nonparametric extreme quantile estimation methodand let k = 300 (the k values are selected by plotting k versus estimates of VaR). We alsobacktest these VaR estimates to see how this nonparametric extreme quantile estimation methodperforms. Results are given in Table 4.1.From the p-values of VaR backtesting, we can see that for all institutions, the unconditionalcoverage property for the VaR measure is satisfied at 5% significance level, indicating a goodperformance of the nonparametric extreme quantile estimation method. For both risk levels,the CoVaR estimates for the logistic, Hu\u00a8sler-Reiss, bilogistic and t distribution over-estimateCoVaR as the values of Ebn are smaller than the nominal value: En \u00d7 0.05. From the p-valuesat risk level pn = (0.05, 0.05), for all institutions, the CoVaR measure under the logistic dis-tribution, the asymmetric logistic distribution and the t distribution satisfies the unconditionalcoverage property at 5% significance level, while for some institutions, the unconditional cov-erage property is rejected under the Hu\u00a8sler-Reiss and the bilogistic distribution. Moreover, theasymmetric logistic distribution seems to produce more accurate estimates, as its Ebn\u2019s are theclosest to the nominal value. From the p-values at risk level pn = (0.02, 0.05), only the Co-VaR estimates under asymmetric logistic distribution satisfy the unconditional coverage prop-erty, indicating that for the datasets, asymmetric logistic distribution is the most suitable modelamong the considered five models, which is in line with the results at level pn = (0.05, 0.05).We should note that the financial time series we plot in Figure B.1 show volatility clusteringand are dependent, which violates the i.i.d. assumption. We next design to remove the volatil-ity and time dependence via an AR(1)-GARCH(1,1) filter and see how the proposed CoVaRestimator would perform on the resulting residuals. Let Xt denote the losses for an institutionor the market index. By assuming the AR(1)-GARCH(1,1) model, the losses satisfyXt = \u00b5t + \u03c3tZt, \u00b5t = \u03b10 + \u03b11Xt\u22121, \u03c32t = \u03b20 + \u03b21(\u03c3t\u22121Zt\u22121)2 + \u03b22\u03c32t\u22121,where the standardized vectors {Zt}t\u2208N are i.i.d. with zero mean and variance one. Param-eters in the AR(1)-GARCH(1,1) filter are estimated using maximum likelihood assuming astandardized skew-t distribution (Ferna\u00b4ndez and Steel [1998]) for the innovations {Zt}.54Table 4.1: Unconditional coverage tests for VaR of institutions and CoVaR based on raw data. The X and Y variables for VaR andCoVaR are 100 \u00d7 log return. \u201cLog, HR, Bilog, Alog, t\u201d stands for logistic, Hu\u00a8sler-Reiss, bilogistic, asymmertric and t distribution,respectively. En is the number of exceedances of the VaR estimate, and Ebn is the number of joint exceedances of VaR and CoVaRestimates. Moreover, T represents the test statistic value in (4.3).pn = (0.05, 0.05) pn = (0.02, 0.05)VaR CoVaR VaR CoVaRLog HR Bilog Alog t Log HR Bilog Alog tBACestimate 3.486 10.659 10.699 10.662 9.414 10.406 5.675 17.003 17.067 17.008 15.019 16.599En\/Ebn 237 7 6 6 12 7 92 1 1 1 3 1T 0.001 2.434 3.684 3.684 0.002 2.434 0.093 4.294 4.294 4.294 0.664 4.294p-value 0.984 0.119 0.055 0.055 0.964 0.119 0.761 0.038 0.038 0.038 0.415 0.038BKestimate 3.213 10.598 10.696 10.604 9.192 10.192 4.729 16.905 17.063 16.915 14.664 16.258En\/Ebn 237 7 6 7 14 9 96 1 1 1 3 1T 0.001 2.434 3.684 2.434 0.389 0.784 0.013 4.619 4.619 4.619 0.815 4.619p-value 0.984 0.119 0.055 0.119 0.533 0.376 0.911 0.032 0.032 0.032 0.367 0.032AFLestimate 2.718 10.503 10.677 10.509 8.841 10.074 4.436 16.754 17.032 16.764 14.103 16.070En\/Ebn 249 7 6 7 15 10 98 1 1 1 3 1T 0.598 2.963 4.315 2.963 0.518 0.543 0.751 4.783 4.783 4.783 0.895 4.783p-value 0.439 0.085 0.038 0.085 0.472 0.461 0.751 0.029 0.029 0.029 0.344 0.029ALLestimate 2.279 10.383 10.656 10.395 8.639 9.974 3.733 16.563 16.999 16.582 13.780 15.911En\/Ebn 239 7 7 7 15 10 102 1 1 1 3 1T 0.013 2.520 2.520 2.520 0.761 0.354 0.526 5.113 5.1136 5.113 1.061 5.113p-value 0.910 0.112 0.112 0.112 0.383 0.552 0.468 0.024 0.024 0.024 0.303 0.024AXPestimate 3.196 10.624 10.698 10.629 9.282 10.222 4.810 16.947 17.066 16.955 14.806 16.307En\/Ebn 246 7 6 7 13 8 99 1 1 1 3 1T 0.332 2.828 4.154 2.828 0.041 1.796 0.177 4.865 4.865 4.865 0.936 4.865p-value 0.565 0.093 0.042 0.093 0.839 0.180 0.674 0.027 0.027 0.027 0.333 0.027BENestimate 3.157 10.644 10.699 10.649 9.254 10.296 4.589 16.979 17.067 16.987 14.763 16.424En\/Ebn 248 7 6 7 12 7 103 1 1 1 3 1T 0.501 2.918 4.261 2.918 0.014 2.918 0.683 5.196 5.196 5.196 1.105 5.196p-value 0.479 0.087 0.039 0.088 0.907 0.088 0.409 0.023 0.023 0.023 0.293 0.023GSestimate 3.228 10.567 10.693 10.572 9.039 9.956 4.709 16.856 17.057 16.864 14.419 15.881En\/Ebn 257 7 6 6 12 7 97 1 1 1 3 1T 1.678 3.335 4.751 4.751 0.061 3.335 0.046 4.701 4.701 4.701 0.855 4.701p-value 0.195 0.068 0.029 0.029 0.806 0.068 0.830 0.030 0.030 0.030 0.355 0.030TROWestimate 3.227 10.658 10.702 10.661 9.408 10.220 4.781 17.002 17.071 17.006 15.008 16.302En\/Ebn 246 7 6 6 12 8 103 1 1 1 3 1T 0.332 2.828 4.154 4.154 0.008 1.796 0.683 5.196 5.196 5.196 1.105 5.196p-value 0.565 0.093 0.042 0.042 0.930 0.180 0.409 0.023 0.023 0.023 0.293 0.02355With the estimates of \u00b5t and \u03c3t, we can obtain the sequence of residuals as:{Z\u02c6t = (Xt \u2212\u00b5\u02c6t)\/\u03c3\u02c6t}. The following estimation is based on these realized residuals {Z\u02c6t}. We give the timeseries plots of realized residuals for institutions and market index in Figure B.2 of AppendixB.1.Before we perform the estimation, we produce the scatter plots of institution\u2019s residualsversus market\u2019s residuals to validate the tail dependence assumption. Graphs are shown inFigure 4.6. We can see that the upper tail dependence is obvious for each pair.Figure 4.6: Scatter plots of realized residuals from time series introduced in Section 4.2 overthe period from June 27, 2000 to May 9, 2019.Remark 2. Throughout the nonparametric extreme quantile estimation we discussed in Section2.1.3 and the analysis we constructed before, we estimate the VaR at a confidence level 1 \u2212 pfor the i.i.d. random variables Y1, Y2, ..., Yn via formula:V\u0302aRY (1\u2212 p) = Yn,n\u2212k( knp)\u03b3\u02c6n(k),where Yn,1 \u2264 Yn,2 \u2264 ... \u2264 Yn,n are the order statistics and \u03b3\u02c6n(k) is the Hill estimator of the tailindex with sample fraction k. However, a more general way to estimate the VaR isV\u0302aRY (1\u2212 p) = Yn,n\u2212k2( k2np)\u03b3\u02c6n(k1), (4.5)where the intermediate sequences k1 = k1,n, k2 = k2,n can be different. From this formula, toestimate the VaR, we first select a proper k1 for the tail index using Hill plot or the bootstrap-56based method. Then, we put the estimates \u03b3\u02c6n(k1) into (4.5) and do sensitivity analysis betweenk2 and VaR estimates to select a proper value of k2 and get an accurate estimate of VaR. In thefollowing estimation, we will apply this procedure to estimate the VaR of market index andinstitutions. We denote ki1 and ki2 as the sample fractions of the tail index estimate and VaRestimate for institution i (i \u2208 {AFL, ALL, AXP, BAC, BEN, BK, GS, TROW}), respectively.The parameter ks1 and ks2 are denoted as the sample fractions of the tail index estimate and VaRestimate for the market index, respectively.For each institution, we consider pn = (0.02, 0.05) and the first value of m in each con-sidered set. We select the ki1\u2019s and ks1 through the stability of Hill plot and the results are:ks1 = 230, kAFL1 = 230, kALL1 = 230, kAXP1 = 240, kBAC1 = 240, kBEN1 = 240, kBK1 = 245,kGS1 = 250 and kTROW1 = 240. To select ks2, we plot VaR estimates as well as the plots ofCoVaR estimates under different parametric model assumptions as a function of ks2. For thesake of brevity, we also just show the graphs of BAC company here in Figure 4.7. The othersare shown in Figure B.6 in Appendix B.3.Figure 4.7: Estimates of CoVaR as a function of ks2 at level pn = (0.02, 0.05) for estimatedresiduals from BAC-DJUSFN data. The dotted vertical line represent the ks2 = ks1 = 230.From the plots, firstly we can see that the curves of CoVaR estimates under different modelassumptions have the same shape as the curve of VaR estimates. This is sensible, as the quotient57between CoVaR estimator and VaR estimator is \u03b7\u02c6\u2212\u03b3\u02c6\u2217 , which is uncorrelated with the value ofks2. Moreover, the curves seem to be stable from around 100 to 250, and then decrease linearly.Thus, it seems that selecting ks2 = ks1 = 230 is reasonable for all pairs. Compared withthe graphs in Figures 4.4 and 4.5, these plots are more informative to select a proper valueof sample fraction, as the stable region seems to be more obvious. This may indicate thatconstructing sensitivity analysis following procedures in Remark 2 is more suitable for theproposed CoVaR estimation methodology.To select the value of ki2, we plot the VaR estimates against sample fraction ki2 and thegraphs suggest selecting ki2 = 200 for all i. The next step is to assess the CoVaR estimates withthe unconditional coverage test. Based on the results above, we put the CoVaR estimates withks2 = 230 into backtesting. The results are given in Table 4.2.From the test results of estimated VaR of institutions, all VaR estimates perform well andsatisfy the unconditional coverage property, although the VaR estimates for company GS ap-pear to be slightly overestimate. This indicates that ki2 = 200 for all i is a good choice here.From the estimates of CoVaR, it seems that the logistic, Hu\u00a8sler-Reiss and bilogistic distribu-tions have similar performance, as their estimates for each institution are close to each other.However, from the results of Ebn and p-value of the unconditional coverage test, we can seethat the estimates under these three parametric models over-estimate CoVaR and the uncon-ditional coverage test is rejected at 5% significance level, especially under the Hu\u00a8sler-Reissdistribution. However, the estimates under asymmetric logistic and t distributions have Esb thatare much closer to the nominal value: En \u00d7 0.05, and all pass the unconditional coverage testat 5% significance level, indicating more accurate estimation of CoVaR. These are similar tothe backtesting results of the raw data in Table 4.1. However, for the asymmetric logistic andt distributions, although they seem to have the same Ebn value for some institutions, their esti-mates are quite different from each other. This means that although this traditional calibrationbacktesting procedure allows us to assess a particular risk measure estimation method, it mayfail to provide an adequate comparison of the results among several methods.To make further comparisons across these five parametric models, we then calculate anaverage quantile score for each institution under each model. Following the notations in Section4.1, define I = {t : IXt = 1} and n1 = |I|. The average quantile score S\u00af is expressed asS\u00af =1n1\u2211t\u2208IS(C\u0302oVaRYt|Xt(p1, p2), Yt),58Table 4.2: Unconditional coverage tests for VaR of institutions and CoVaR based on realizedresiduals at level pn = (0.02, 0.05). \u201cLog, HR, Bilog, Alog, t\u201d stands for logistic, Hu\u00a8sler-Reiss, bilogistic, asymmertric and t distribution, respectively. En is the number of exceedancesof the VaR estimate, and Ebn is the number of joint exceedances of VaR and CoVaR estimates.Moreover, T represents the test statistic value in (4.3).VaR CoVaRLog HR Bilog Alog tBACestimate 2.239 5.220 5.257 5.223 4.791 4.989En\/Ebn 90 1 1 1 6 3T 0.265 4.133 4.133 4.133 0.479 0.593p-value 0.607 0.042 0.042 0.042 0.489 0.441BKestimate 2.178 5.168 5.251 5.173 4.638 4.879En\/Ebn 98 1 1 1 6 4T 0.101 4.782 4.782 4.782 0.243 0.185p-value 0.751 0.029 0.029 0.029 0.622 0.667AFLestimate 2.147 5.028 5.206 5.252 4.374 4.579En\/Ebn 97 1 0 0 4 4T 0.046 4.701 1.373 1.373 0.166 0.166p-value 0.830 0.030 NaN NaN 0.683 0.683ALLestimate 2.088 4.960 5.197 4.974 4.271 4.403En\/Ebn 94 2 0 2 4 4T 0.009 2.063 1.370 2.063 0.115 0.115p-value 0.924 0.151 NaN 0.151 0.734 0.734AXPestimate 2.200 5.162 5.249 5.166 4.606 4.885En\/Ebn 96 1 1 1 6 4T 0.012 4.619 4.619 4.619 0.294 0.148p-value 0.911 0.032 0.032 0.032 0.588 0.700BENestimate 2.221 5.192 5.253 5.195 4.693 4.902En\/Ebn 93 1 1 1 6 3T 0.040 4.375 4.375 4.375 0.379 0.701p-value 0.842 0.037 0.037 0.037 0.538 0.402GSestimate 2.181 5.197 5.252 5.200 4.7078 4.908En\/Ebn 85 0 0 0 5 2T 1.096 1.361 1.361 1.361 0.132 1.547p-value 0.295 NaN NaN NaN 0.716 0.214TROWestimate 2.146 5.181 5.253 5.183 4.686 4.907En\/Ebn 97 1 1 1 6 3T 0.046 4.701 4.701 4.701 0.268 0.855p-value 0.830 0.030 0.030 0.030 0.605 0.35559where S(r, x) is a consistent scoring function for VaR at level 1 \u2212 p2, which is expressed as(for more details, see Nolde and Ziegel [2017]):S(r, x) = (p2 \u2212 1{x > r}) r + 1{x > r}x.A lower score indicates better performance. The results are reported in Table 4.3. From the val-ues of S\u00af, the asymmetric logistic distribution and the t distributions has superior performanceon CoVaR estimation, compared with the other three distributions. In addition, among the firstfour extreme value distributions, it seems that the multi-parameter model (asymmetric logisticdistribution) has the better performance than the one-parameter models (logistic and HR distri-butions). This is making sense, as from the M-estimates of model parameters, for each dataset,\u03c8\u02c61 is around 0.1 while \u03c8\u02c62 is around 0.6, showing big difference and thus obvious asymme-try. Furthermore, the HR distribution seems to have the worst performance among these fivedistributions for the most of institutions. These results are consistent with the results of uncon-ditional coverage tests in Table 4.2. Finally, the model with the best performance seems to varyacross different institutions, but can only be either the asymmetric logistic distribution or the tdistribution.Table 4.3: The average quantile scores S\u00af of CoVaR estimates based on realized residuals atlevel pn = (0.02, 0.05).BAC BK AFL ALL AXP BEN GS TROWLog 0.262 0.256 0.253 0.251 0.259 0.260 0.259 0.260HR 0.263 0.263 0.260 0.260 0.263 0.263 0.263 0.263Bilog 0.262 0.260 0.263 0.251 0.259 0.261 0.260 0.260Alog 0.253 0.254 0.244 0.227 0.255 0.254 0.249 0.254t 0.255 0.252 0.243 0.244 0.252 0.253 0.249 0.25360Chapter 5ConclusionThis thesis develops a semi-parametric method for estimating a high quantile of a conditionaldistribution given that one of the components of a bivariate random vector is extreme, and thenapplies this method to systemic risk estimation with CoVaR as a risk measure. The method-ology rests on asymptotic results from multivariate EVT, combining parametric modelling ofthe tail dependence function to address the issue of data sparsity in the joint tail regions andnonparametric univariate tail estimation techniques.The main advantage of EVT-based estimators is that they capture useful information in thetail of the data without restricting the behaviour of the central part. We note that there is an-other EVT-based estimation method for CoVaR, which makes use of the limit expression forthe conditional probability given that one of the components of a bivariate random vector isextreme; see Nolde and Zhang [2018]. Although this method is shown to provide a compet-itive alternative to a flexible fully-parametric method, its requirement of multivariate regularvariation imposes the restriction of the same tail index for both the institutional and systemlosses, and a somewhat restrictive parametric assumption on the extremal dependence struc-ture. Thus, another advantage of the framework in our methodology is that it allows to balancemodel uncertainty under a milder condition of multivariate domain of attraction.In the simulation studies in Section 3.2, it was shown that the variance of the proposedCoVaR estimator is dominated by the variability in the estimation of the high quantile. We alsoobserved a minor positive bias under all models used in data simulation. The methodology hasbeen applied to real data including log-returns on eight US financial firms with returns on the61Dow Jones US Financials Index used as market proxy. A test for unconditional coverage wascarried out on CoVaR estimates under several parametric model assumptions on the tail depen-dence function. It was found that there were minor differences among the CoVaR estimatesunder symmetric models, and the asymmetric model showed the best performance, since thebacktest for unconditional coverage was passed under the asymmetric logistic distribution forall financial firms. A similar analysis was conducted after applying an AR(1)-GARCH(1,1)filter to the individual time series. Through the backtesting results, the asymmetric logistic dis-tribution and t distribution were found to have superior performance on the resulting CoVaRestimates and passed the unconditional coverage test for financial institutions. The other threemodels also resulted in similar CoVaR estimates and the unconditional coverage property wererejected under these three models.As the next step, we would like to conduct a more rigorous backtesting of the proposedCoVaR estimation procedure in order to be able to identify the best performing model forthe tail dependence function, as well as to compare our methodology with other availableapproaches. Through the simulation studies, we were able to assess finite sample propertiesof the proposed estimator. In the follow-up work, we aim to study asymptotic properties ofthe estimator, including consistency and normality. Finally, we are interested in extendingour methodology to the case where the losses of an institution and of the system exhibit tailindependence as the current framework only applies to situations in which the tail dependencefunction is not identically zero, i.e., only in the presence of tail dependence.62BibliographyB. Abdous, A.L. Fouge`res, and K. Ghoudi. Extreme behaviour for bivariate elliptical distribu-tions. Canadian Journal of Statistics, 33:317\u2013334, 2005. \u2192 page 3V.V. Acharya, R. Engle, and M. Richardson. Capital shortfall: A new approach to ranking andregulating systemic risks. American Economic Review, 102:59\u201364, 2012. \u2192 page 1V.V. Acharya, L.H. Pedersen, T. Philippon, and M. Richardson. Measuring systemic risk. TheReview of Financial Studies, 30:2\u201347, 2017. \u2192 pages 1, 48T. Adrian and M.K. Brunnermeier. CoVaR. Technical report, National Bureau of EconomicResearch, 2011. \u2192 page 2A. Azzalini and A. Capitanio. Distributions generated by perturbation of symmetry with em-phasis on a multivariate skew-t distribution. Journal of the Royal Statistical Society: SeriesB, 65:367\u2013389, 2003. \u2192 page 19V. Barnet. The ordering of multivariate data (with discussion). Journal of the Royal StatisticsSociety, Series A, 139:318\u2013354, 1976. \u2192 page 14J. Beirlant, Y. Goegebeur, J. Segers, and J.L. Teugels. Statistics of Extremes: Theory andApplications. John Wiley & Sons, 2006. \u2192 pages 9, 10, 13, 15S. Benoit, J.E. Colliard, C. Hurlin, and C. Pe\u00b4rignon. Where the risks lie: A survey on systemicrisk. Review of Finance, 21:109\u2013152, 2017. \u2192 page 1P.F. Christoffersen. Evaluating interval forecasts. International Economic Review, pages 841\u2013862, 1998. \u2192 page 47S.G. Coles and J.A. Tawn. Modelling extreme multivariate events. Journal of the Royal Statis-tical Society: Series B, 53:377\u2013392, 1991. \u2192 pages 20, 50S.G. Coles, J. Heffernan, and J.A. Tawn. Dependence measures for multivariate extremes.Extremes, 2:339\u2013365, 1999. \u2192 page 49J. Danielsson, L. de Haan, L. Peng, and C.G. de Vries. Using a bootstrap method to choosethe sample fraction in tail index estimation. Journal of Multivariate Analysis, 76:226\u2013248,2001. \u2192 pages 4, 11, 1263B. Das and S.I. Resnick. Conditioning on an extreme component: Model consistency withregular variation on cones. Bernoulli, 17:226\u2013252, 2011. \u2192 page 3Anthony C Davison and Richard L Smith. Models for exceedances over high thresholds. Jour-nal of the Royal Statistical Society: Series B (Methodological), 52(3):393\u2013425, 1990. \u2192page 42O. De Bandt and P. Hartmann. Systemic risk: a survey. 2000. \u2192 page 1L. de Haan and A. Ferreira. Extreme Value Theory: An Introduction. Springer Science &Business Media, 2006. \u2192 pages 3, 6, 24, 25L. de Haan and S.I. Resnick. Second-order regular variation and rates of convergence inextreme-value theory. The Annals of Probability, 24:97\u2013124, 1996. \u2192 page 10Stefano Demarta and Alexander J McNeil. The t copula and related copulas. Internationalstatistical review, 73(1):111\u2013129, 2005. \u2192 page 19F.X. Diebold, T. Schuermann, and J.D. Stroughair. Pitfalls and opportunities in the use ofextreme value theory in risk management. The Journal of Risk Finance, 1:30\u201335, 2000. \u2192page 5J.H. Einmahl, A. Krajina, and J. Segers. A method of moments estimator of tail dependence.Bernoulli, 14:1003\u20131026, 2008. \u2192 pages 4, 20, 36J.H. Einmahl, A. Krajina, and J. Segers. An m-estimator for tail dependence in arbitrary di-mensions. The Annals of Statistics, 40:1764\u20131793, 2012. \u2192 pages 4, 20, 22, 30, 32, 36P. Embrechts, C. Klu\u00a8ppelberg, and T. Mikosch. Modelling Extremal Events: for Insurance andFinance. Springer Berlin Heidelberg, 1997. \u2192 pages 2, 5, 8, 41C. Ferna\u00b4ndez and M. Steel. On Bayesian modeling of fat tails and skewness. Journal of theAmerican Statistical Association, 93:359\u2013371, 1998. \u2192 page 54R.A. Fisher and L.H.C. Tippett. Limiting forms of the frequency distribution of the largest orsmallest member of a sample. In Mathematical Proceedings of the Cambridge PhilosophicalSociety, volume 24, pages 180\u2013190. Cambridge University Press, 1928. \u2192 page 6G. Girardi and A.T. Ergu\u00a8n. Systemic risk measurement: Multivariate GARCH estimation ofCoVaR. Journal of Banking & Finance, 37:3169\u20133180, 2013. \u2192 pages 2, 23, 46, 47B. Gnedenko. Sur la distribution limite du terme maximum d\u2019une serie ale\u00b4atoire. Annals ofMathematics, pages 423\u2013453, 1943. \u2192 pages 6, 8E.J. Gumbel. Distributions des valeurs extreme\u02c6s en plusiers dimensions. Publ. Inst. Statist.Univ. Paris, 9:171\u2013173, 1960. \u2192 pages 14, 17J.E. Heffernan and S.I. Resnick. Limit laws for random vectors with an extreme component.The Annals of Applied Probability, 17:537\u2013571, 2007. \u2192 page 364J.E. Heffernan and J.A. Tawn. A conditional approach for multivariate extreme values (withdiscussion). Journal of the Royal Statistical Society: Series B, 66:497\u2013546, 2004. \u2192 page 3B.M. Hill. A simple general approach to inference about the tail of a distribution. The Annalsof Statistics, pages 1163\u20131174, 1975. \u2192 pages 4, 9X. Huang. Statistics of bivariate extremes. Tinbergen Institute Research Series, 22, 1992. \u2192page 15J. Hu\u00a8sler and R.D. Reiss. Maxima of normal random vectors: between independence andcomplete dependence. Statistics & Probability Letters, 7:283\u2013286, 1989. \u2192 page 17A.F. Jenkinson. The frequency distribution of the annual maximum (or minimum) values ofmeteorological elements. Quarterly Journal of the Royal Meteorological Society, 81:158\u2013171, 1955. \u2192 page 7H. Joe, R.L. Smith, and I. Weissman. Bivariate threshold methods for extremes. Journal of theRoyal Statistical Society: Series B (Methodological), 54:171\u2013183, 1992. \u2192 page 20H. Joe, H. Li, and A.K. Nikoloulopoulos. Tail dependence functions and vine copulas. Journalof Multivariate Analysis, 101:252\u2013270, 2010. \u2192 page 3G.G. Kaufman and K.E. Scott. What is systemic risk, and do bank regulators retard or con-tribute to it? The Independent Review, 7:371\u2013391, 2003. \u2192 page 1S. Kotz and S. Nadarajah. Multivariate t-distributions and Their Applications. CambridgeUniversity Press, 2004. \u2192 page 68K. Kuester, S. Mittnik, and M.S. Paolella. Value-at-risk prediction: A comparison of alternativestrategies. Journal of Financial Econometrics, 4:53\u201389, 2006. \u2192 pages 38, 47P. Kupiec. Techniques for verifying the accuracy of risk measurement models. The Journal ofDerivatives, 3, 1995. \u2192 pages 47, 48A.W. Ledford and J.A. Tawn. Statistics for near independence in multivariate extreme values.Biometrika, 83:169\u2013187, 1996. \u2192 pages 20, 50G. Mainik and E. Schaanning. On dependence consistency of CoVaR and some other systemicrisk measures. Statistics & Risk Modeling, 31:49\u201377, 2014. \u2192 pages 2, 23, 46A.K. Nikoloulopoulos, H. Joe, and H. Li. Extreme value properties of multivariate t copulas.Extremes, 12:129\u2013148, 2009. \u2192 pages 3, 16, 68N. Nolde and J. Zhang. Conditional extremes in asymmetric financial markets. Journal ofBusiness & Economic Statistics, pages 1\u201329, 2018. \u2192 pages 3, 61N. Nolde and J. Ziegel. Elicitability and backtesting: Perspectives for banking regulation. Theannals of applied statistics, 11:1833\u20131874, 2017. \u2192 pages 48, 6065S.A. Padoan. Multivariate extreme models based on underlying skew-t and skew-normal dis-tributions. Journal of Multivariate Analysis, 102:977\u2013991, 2011. \u2192 pages 68, 69I.J. Pickands. Statistical inference using extreme order statistics. The Annals of Statistics, 3:119\u2013131, 1975. \u2192 page 41S.I. Resnick. Extreme Values, Regular Variation and Point Processes. Springer, 1987.\u2192 pages6, 24, 29S.I. Resnick. Heavy-tail Phenomena: Probabilistic and Statistical Modeling. Springer Science& Business Media, 2007. \u2192 page 7S.I. Resnick. Extreme Values, Regular Variation and Point Processes. Springer, 2013. \u2192 page67R.L. Smith. Extreme value theory. Handbook of Applicable Mathematics, 7:437\u2013471, 1990.\u2192 page 18R.L. Smith. Multivariate threshold methods. In Extreme Value Theory and Applications, pages225\u2013248. Springer, 1994. \u2192 page 20P. Sun and C. Zhou. Diagnosing the distribution of GARCH innovations. Journal of EmpiricalFinance, 29:287\u2013303, 2014. \u2192 page 46J.A. Tawn. Bivariate extreme value theory: models and estimation. Biometrika, 75:397\u2013415,1988. \u2192 page 18J.A. Tawn. Modelling multivariate extreme value distributions. Biometrika, 77:245\u2013253, 1990.\u2192 page 15R. Von Mises. La distribution de la plus grande de n valuers. Rev. Math. Union Interbalcanique,1:141\u2013160, 1936. \u2192 page 7I. Weissman. Estimation of parameters and large quantiles based on the k largest observations.Journal of the American Statistical Association, 73:812\u2013815, 1978. \u2192 page 13E.B. Wilson. Advanced Calculus. Ginn, 1912. \u2192 page 1666Appendix AProofsIn this appendix, we collect several supplementaty theoretical results and derivations referredto in the main body of the thesis.A.1 Regular Variation of Skew-t DistributionLet Y \u223c ST1(0, 1, \u03b1, \u03bd) be a random variable with density functionfST (y) = 2fT (y; \u03bd)FT(\u03b1y\u221a\u03bd + 1\u03bd + y2; \u03bd + 1),where fT (\u00b7; \u03bd) and FT (\u00b7; \u03bd) are the density function and df of a univariate Student\u2019s t randomvariable with \u03bd degree of freedom.It can be verified that the tail of the standard skew-t distribution, for \u03bd > 1 and \u03b1 \u2208 R, isa regularly varying function with index \u2212\u03bd (see p 13-17, Resnick [2013]), and 1 \u2212 FST(y) \u223c\u03bd\u22121y\u2212\u03bds(y;\u03b1, \u03bd) as y\u2192\u221e, wheres(y;\u03b1, \u03bd) =2\u0393{(\u03bd + 1)\/2}\u0393(\u03bd\/2)(1y2+1\u03bd)\u2212(\u03bd+1)\/2FT(\u03b1\u221a\u03bd + 1; 0, 1, \u03bd)is a slowly varying function.67Define Q\u03b1,\u03bd(\u00b7) as the lower quantile function of distribution ST1(0, 1, \u03b1, \u03bd). Then we canget as u\u2192 0 (see (4), Padoan [2011]),Q\u03b1,\u03bd(1\u2212 u) \u223c (s(\u03b1, \u03bd)\/u)1\/\u03bd , (A.1)wheres(\u03b1, \u03bd) =\u0393{(\u03bd + 1)\/2}\u03bd\u03bd\/2FT(\u03b1\u221a\u03bd + 1; 0, 1, \u03bd)\u0393(\u03bd\/2)\u221api.A.2 Proof of Proposition 2.2.1The proof is following Nikoloulopoulos et al. [2009], where the TD function is derived inmultivariate version.Lemma A.2.1 (Kotz and Nadarajah [2004]). Let X = (XT1 ,XT2 )T \u2208 Rd follows the multivari-ate t distribution with parameter = 0, scale parameter \u03a3 and \u03bd degree of freedom, where\u03a3 =(\u03a311 \u03a312\u03a321 \u03a322), \u03a322;1 = \u03a322 \u2212 \u03a321\u03a3\u2212111 \u03a312,If X1 and X2 are k and d\u2212 k dimensional sub-vectors, respectively, thenP{X2 \u2264 x2|X1 = x1} = FT(\u221a\u03bd + k\u03bd + xT1 \u03a3\u2212111 x1x2;1; 0,\u03a322;1, \u03bd + k),where x2;1 = x2 \u2212 \u03a321\u03a3\u2212111 x1.We know that the student\u2019s t distribution is a special case of the standard skew-t distributionwith \u03b1 = 0. So from equation (A.1), the Q1(1\u2212 ux) and Q2(1\u2212 uy) in (2.14) satisfy:Q1(1\u2212 ux) \u223c (s(0, \u03bd)\/ux)1\/\u03bd and Q2(1\u2212 uy) \u223c (s(0, \u03bd)\/uy)1\/\u03bd , u\u21920.Then the upper tail dependence function of the bivariate t distribution can be written asR(x, y; \u03c1, \u03bd) = x limu\u21920P{Y > (s(0, \u03bd)\/(uy))1\/\u03bd |X = (s(0, \u03bd)\/(ux))1\/\u03bd}+ y limu\u21920P{X > (s(0, \u03bd)\/(ux))1\/\u03bd |Y = (s(0, \u03bd)\/(uy))1\/\u03bd}.(A.2)68From Lemma A.2.1, we obtainP{Y \u2264 {s(0, \u03bd)\/(uy)}1\/\u03bd |X = (s(0, \u03bd)\/(ux))1\/\u03bd}= FT(\u221a(\u03bd + 1)(s(0, \u03bd)\/(ux))2\/\u03bd(1\u2212 \u03c12)(v + (s(0, \u03bd)\/(ux))2\/\u03bd)((yx)\u22121\/\u03bd\u2212 \u03c1); 0, 1, \u03bd + 1).(A.3)Observing thatlimu\u21920(s(0, \u03bd)\/ux)2\/\u03bd\u03bd + (s(0, \u03bd)\/ux)2\/\u03bd= 1,we obtain the final expression for the upper tail function of the bivariate t distribution:R(x, y; \u03c1, \u03bd) = xFT(\u221a\u03bd + 11\u2212 \u03c12(\u03c1\u2212(yx)\u22121\/\u03bd); \u03bd + 1)+ yFT(\u221a\u03bd + 11\u2212 \u03c12(\u03c1\u2212(xy)\u22121\/\u03bd); \u03bd + 1).(A.4)A.3 Proof of Proposition 2.2.2The proof is following Padoan [2011], where the TD function is derived in multivariate version.Lemma A.3.1 (Padoan [2011]). Let X = (X1, X2)T \u223c ST2(0,\u2126,\u03b1, \u03bd), where \u2126 =(1 \u03c1\u03c1 1).Then for i 6= j \u2208 {1, 2},\u2022 Xj \u223c ST1(0, 1, \u03b1\u00afj, \u03bd), where \u03b1\u00afj = \u03b1j+\u03c1\u03b1i\u221a1+\u03b1i(1\u2212\u03c12).\u2022 The conditional distribution is univariate extended skew-t distribution:P(Xi \u2264 xi|Xj = xj) = FEST((xi \u2212 \u03c1xj)\/\u221a\u03b1Qj(1\u2212 \u03c12); 0, 1,\u221a1\u2212 \u03c12\u03b1i, \u03c4i|j, \u03bd + 1),where \u03b1Qj = (\u03bd + x2j)\/(\u03bd + 1), \u03c4i|j =\u221a(\u03bd + 1\/(\u03bd + x2j))(\u03b1i\u03c1+ \u03b1j)xj .Then from Lemma A.3.1, for the bivariate skew-t distribution in Example 2.2.6, we can69rewrite the conditional probability in equation (2.14) aslimu\u21920P{U2 > 1\u2212 uy|U1 = 1\u2212 ux}=limu\u21920P{Y > Q2(1\u2212 uy)|X = Q1(1\u2212 ux)}=limu\u21920F\u00afEST\uf8eb\uf8ed Q2(1\u2212uy)Q1(1\u2212ux) \u2212 \u03c1\u221a1\u2212 \u03c12\u221a\u03bd + 1\u221a\u03bdQ1(1\u2212ux)2 + 1; 0, 1,\u221a1\u2212 \u03c12\u03b12, \u03c41, \u03bd + 1\uf8f6\uf8f8 ,where \u03c41 =\u221a\u03bd+1\u03bd\/q\u03b1\u00af1,\u03bd(1\u2212uy)2+1(\u03b12\u03c1 + \u03b11) and F\u00afEST is the survival function of FEST. Fur-thermore, Q1, Q2 are the lower quantile functions for ST1(0, 1, \u03b1\u00af1, \u03bd) and ST1(0, 1, \u03b1\u00af2, \u03bd),respectively. The parameters \u03b1\u00af1 and \u03b1\u00af2 are given in Lemma A.3.1.Applying regular variation of the tail of the skew-t distribution, we havelimu\u21920Q2(1\u2212 uy)Q1(1\u2212 ux) =( y\u00afx\u00af)\u22121\/\u03bdandlimu\u21920\u03bd\/Q1(1\u2212 ux)2 = 0where x\u00af = xFT(\u03b1\u00af2\u221a\u03bd + 1; 0, 1, \u03bd) and y\u00af = yFT(\u03b1\u00af1\u221a\u03bd + 1; 0, 1, \u03bd). Similarly, we can getthe form of limu\u21920 P{U2 > 1 \u2212 uy|U1 = 1 \u2212 ux}. Combining all the results, we have thefinal expression for the the upper tail dependence function of the bivariate skew-t distributionfunction asR(x, y; \u03c1, \u03b11, \u03b12, \u03bd) = x \u00b7 F\u00afEST( \u221a\u03bd + 1\u221a1\u2212 \u03c12((x\u00afy\u00af)1\/\u03bd\u2212 \u03c1); 0, 1,\u221a1\u2212 \u03c12\u03b12, \u03c41, \u03bd + 1)+ y \u00b7 F\u00afEST( \u221a\u03bd + 1\u221a1\u2212 \u03c12(( y\u00afx\u00af)1\/v\u2212 \u03c1); 0, 1,\u221a1\u2212 \u03c12\u03b11, \u03c42, \u03bd + 1),(A.5)where x\u00af = xT\u03bd(\u03b1\u00af2\u221a\u03bd + 1), y\u00af = yT\u03bd(\u03b1\u00af1\u221a\u03bd + 1), \u03b1\u00af1 = \u03b11+\u03c1\u03b12\u221a1+(1\u2212\u03c12)\u03b12, \u03b1\u00af2 = \u03b12+\u03c1\u03b11\u221a1+(1\u2212\u03c12)\u03b11,\u03c41 =\u221a\u03bd + 1(\u03b11 + \u03b12\u03c1), \u03c42 =\u221a\u03bd + 1(\u03b12 + \u03b11\u03c1).70A.4 Peaks-Over-Threshold MethodOne way to define an extreme event is when the process exceeds a high threshold; i.e., eventsof the form {X > u} for large threshold u. The question is can we find a limiting distributionfor excesses over u, X \u2212 u, conditional on the event that X has exceeded u, i.e., X > u.Definition 9. The distribution with df of the formH\u03c3,\u03be(y) =\uf8f1\uf8f4\uf8f2\uf8f4\uf8f31\u2212(1 + \u03bey\u03c3)\u22121\/\u03be+, \u03be 6= 01\u2212 e\u2212y\/\u03c3, \u03be = 0is called a generalized Pareto distribution with scale parameter \u03c3 > 0 andd shape parameter\u03be \u2208 R, written as GP (\u03c3, \u03be).Suppose X1, ..., Xn are i.i.d random variables with df F , where F \u2208 D(G\u03be). That is to say,when n is large, we haveF n(anx+ bn) \u2248 G\u03be(x) = exp{\u2212(1 + \u03bex)\u22121\/\u03be+}. (A.6)Define xF = sup{x : F (x) < 1}. In particular, we can assume that there exist u, near xF , suchthat (A.6) holds for all x such that anx+ bn > u. Let anx+ bn = y, bn = \u00b5 and an = \u03c3. Thenwe haveF (y) \u2248 exp{\u2212(1 + \u03bey \u2212 u\u03c3)\u22121\/\u03be+}\u2248 1\u2212(1 + \u03bey \u2212 u\u03c3)\u22121\/\u03be+, y > u.Now consider the conditional df of excess over threshold u:P(X \u2212 u \u2265 y|X > u) = P(X > u+ y)P(X > u)=1\u2212 F (u+ y)1\u2212 F (u)\u2248[(1 + \u03be u+y\u2212\u00b5\u03c3)+(1 + \u03be u\u00b5\u03c3)+]\u22121\/\u03be=(1 + \u03bey\u03c3u)\u22121\/\u03be+, y > 0,where \u03c3u = \u03c3 + \u03be(u\u2212 \u00b5). From Definition 9, we can see that X \u2212 u|X > u \u223c GP (\u03c3u, \u03be) foru large.Theorem A.4.1 (Pickands-Balkema-de Haan). For \u03be \u2208 R, the following are equivalent:711. F \u2208 D(G\u03be).2. There exists a positive function \u03c3(\u00b7) such thatlimu\u2191xFsupy\u2208(0,xF\u2212u)|Fu(y)\u2212H\u03c3(u),\u03be(y)| = 0,where Fu(y) := P(X \u2212 u \u2264 y|X > u), y \u2264 0, u < xF \u2264 \u221e.The above theorem reveals duality between the limiting distribution of maxima and ex-cesses over a high threshold; if the normalized maxima have aGEV (\u00b5, \u03c3, \u03be) distribution as thelimit, then conditional excesses over a limiting threshold have a generalized Pareto distributionwith the same shape parameter \u03be.Once we obtain the maximum likelihood estimates of \u03be and \u03c3u, the (1\u2212 p)-th quantile canbe estimated with:Q\u02c61\u2212p = u+\u03c3\u02c6u\u03be\u02c6((p\/\u03c6\u02c6u)\u2212\u03be\u02c6 \u2212 1), \u03be\u02c6 6= 0,where \u03c6\u02c6u = #{Xi > u}\/n is the empirical estimate of the probability of exceedance overthreshold u.72Appendix BTables and FiguresB.1 Time series plots(a) AFL (b) ALL(c) AXP (d) BAC73(e) BEN (f) BK(g) GS (h) TROW(i) DJUSFNFigure B.1: Time series plots of daily losses for institutions and financial system.74(a) AFL (b) ALL(c) AXP (d) BAC(e) BEN (f) BK(g) GS (h) TROW75(i) DJUSFNFigure B.2: Time series plots of realized residuals for institutions and financial system.76B.2 \u03c7(u) and \u03c7\u00af(u) plots(a) BK-DJUSFN (b) AFL-DJUSFN(c) ALL-DJUSFN (d) AXP-DJUSFN(e) BEN-DJUSFN (f) GS-DJUSFN(g) TROW-DJUSFNFigure B.3: \u03c7(u) and \u03c7\u00af(u) plots for other seven institutions77B.3 Plots of CoVaR estimates against the sample fractionfor other seven institutions(a) BK-DJUSFN(b) AFL-DJUSFN78(c) ALL-DJUSFN(d) AXP-DJUSFN79(e) BEN-DJUSFN(f) GS-DJUSFN80(g) TROW-DJUSFNFigure B.4: Estimates of CoVaR as a function of k with raw data at level pn = (0.05, 0.05) fordifferent values of m.81(a) BK-DJUSFN(b) AFL-DJUSFN82(c) ALL-DJUSFN(d) AXP-DJUSFN83(e) BEN-DJUSFN(f) GS-DJUSFN84(g) TROW-DJUSFNFigure B.5: Estimates of CoVaR as a function of k with raw data at level pn = (0.02, 0.05) fordifferent values of m.85(a) BK-DJUSFN(b) AFL-DJUSFN86(c) ALL-DJUSFN(d) AXP-DJUSFN87(e) BEN-DJUSFN(f) GS-DJUSFN88(g) TROW-DJUSFNFigure B.6: Estimates of CoVaR as a function of ks2 with realized residuals at level pn =(0.02, 0.05). The vertical line represents the ks2 = ks1 = 230.89","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/hasType":[{"value":"Thesis\/Dissertation","type":"literal","lang":"en"}],"http:\/\/vivoweb.org\/ontology\/core#dateIssued":[{"value":"2019-09","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt":[{"value":"10.14288\/1.0380425","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/language":[{"value":"eng","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline":[{"value":"Statistics","type":"literal","lang":"en"}],"http:\/\/www.europeana.eu\/schemas\/edm\/provider":[{"value":"Vancouver : University of British Columbia Library","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/publisher":[{"value":"University of British Columbia","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/rights":[{"value":"Attribution-NonCommercial-NoDerivatives 4.0 International","type":"literal","lang":"*"}],"https:\/\/open.library.ubc.ca\/terms#rightsURI":[{"value":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/","type":"literal","lang":"*"}],"https:\/\/open.library.ubc.ca\/terms#scholarLevel":[{"value":"Graduate","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/title":[{"value":"Extreme value approach to CoVaR estimation","type":"literal","lang":"en"}],"http:\/\/purl.org\/dc\/terms\/type":[{"value":"Text","type":"literal","lang":"en"}],"https:\/\/open.library.ubc.ca\/terms#identifierURI":[{"value":"http:\/\/hdl.handle.net\/2429\/71285","type":"literal","lang":"en"}]}}